bootstack.data.FileSourceConfig#
- class bootstack.data.FileSourceConfig(file_format='auto', encoding='utf-8', delimiter=None, quotechar='"', skip_rows=0, header_row=0, has_header=True, json_lines=False, json_records_key=None, xml_record_tag=None, hdf5_key=None, column_renames=None, column_types=None, column_transforms=None, columns_to_load=None, default_values=None, row_filter=None, row_transform=None, chunk_size=10000, progress_callback=None)#
Bases:
objectConfiguration for file parsing and the per-record transformation pipeline.
Example
config = FileSourceConfig( column_renames={'emp_id': 'id'}, column_types={'age': int}, )
- column_renames: Dict[str, str] | None = None#
Mapping from each existing column name to its replacement.
- column_transforms: Dict[str, Callable[[Any], Any]] | None = None#
Mapping from a column name to a transform applied to its values.
- column_types: Dict[str, Type] | None = None#
Mapping from a column name to the target type to convert its values to.
- default_values: Dict[str, Any] | None = None#
Mapping from a column name to a fill value for missing or null entries.
- file_format: Literal['auto', 'csv', 'tsv', 'json', 'jsonl', 'ndjson', 'xml', 'parquet', 'feather', 'hdf5'] = 'auto'#
Format override; auto-detected from the extension when
'auto'.
- json_records_key: str | None = None#
Key whose value is the records list in a JSON object (e.g.
'data'for{'data': [...]});None= a top-level array, or the object itself as one record.
- progress_callback: Callable[[int], None] | None = None#
Function
(count)called after each ingested chunk with the running total of rows loaded so far.
- row_filter: Callable[[Dict[str, Any]], bool] | None = None#
Function
(row_dict) -> boolto filter rows during load.