Data Sources#

A data source is the bridge between your records and the data-bound widgets — ListView and DataTable. It owns the records and serves them a page at a time, so the same widget works whether your data lives in memory, a SQLite database, or a file on disk. (Tree isn’t data-source-backed — it holds its own nodes in memory — but shares the same record and data bag model.)

You often don’t touch a data source at all — pass items= / rows= and the widget builds one for you. Reach for an explicit source when you want to share data between widgets, back it with a database, or load it from a file.

In-memory data#

MemoryDataSource holds a list of record dicts. Create it and load rows with load(), then hand it to a widget:

from bootstack.data import MemoryDataSource, SqliteDataSource, FileDataSource

records = [
    {"name": "Ada", "role": "Engineer"},
    {"name": "Linus", "role": "Maintainer"}
]

ds = MemoryDataSource().load(records)
bs.ListView(data_source=ds)

SQLite-backed data#

SqliteDataSource keeps rows in an SQLite database (in-memory by default, or a file path). It is the default source a DataTable builds when you pass rows= — or supply your own to back the table with a database file:

ds = SqliteDataSource("app.db")
ds.load(records)
bs.DataTable(data_source=ds)

File-backed data#

FileDataSource reads a file and streams it — a chunk at a time — into a SQLite working store, so even a multi-million-row file loads with bounded memory. After load() it is a SqliteDataSource: paging, filtering, sorting, and CRUD are all fast SQL. Configure parsing and transforms with a FileSourceConfig:

ds = FileDataSource("people.csv")
ds.load()
bs.DataTable(data_source=ds)

The original file is read-only input — edits live in the working store and are never written back. To save changes, export to a new file (export_csv or the DataTable export menu); reload() re-ingests from the file.

The working store is, by choice:

temporary on disk (default) — bounded memory, removed on close() (and automatically, as a safety net, when the source is dropped or at exit).
cache="people.db" — a persistent store: edits survive restarts, and re-opening skips re-ingest while the cache is newer than the source file.
cache=":memory:" — in-memory: compact, but RAM-bound.

Close the store when done — explicitly or with a with block:

with FileDataSource("people.csv", cache="people.db") as ds:
    ds.load()
    first = ds.page(0)

Formats. CSV, TSV, JSON, JSONL/NDJSON, and XML are built in. Columnar and scientific formats are available through optional extras — Parquet and Feather (pip install bootstack[parquet]) and HDF5 (pip install bootstack[hdf5]); each is a streaming reader, and a clear error tells you to install the extra if it is missing.

JSON comes in two shapes: a top-level array of objects (.json), or JSONL/NDJSON — one object per line (.jsonl / .ndjson), which streams a record at a time and is the right choice for large data. When the records are nested under a key (an API response like {"data": [...]}), point at it with json_records_key="data".

FileDataSource("export.ndjson")                                  # streamed
FileDataSource("api.json", FileSourceConfig(json_records_key="data"))

Carrying extra data#

A record can hold more than the widget shows. The columns of a DataTable or the template of a ListView are a view over the record — fields you don’t display are still carried through, and event handlers get the whole record back, not a stripped-down shadow:

rows = [
    {"id": 1, "name": "Ada", "role": "Engineer",
     "tags": ["math", "logic"], "profile": {"era": 1840}},
]
table = bs.DataTable(rows=rows, columns=["name", "role"])  # tags/profile hidden

table.on_row_click(lambda e: print(e.record["tags"]))      # → ['math', 'logic']

This works the same on every source, but what a field may hold depends on where the records live:

In-memory (MemoryDataSource and the default ListView source) holds anything, including live Python objects, by reference. The field you put in is the object you get back.
SQLite-backed (SqliteDataSource, and FileDataSource — which ingests the file into a SQLite store). Scalar fields (text, numbers, booleans) become real columns you can filter and sort on. Non-scalar fields (lists, dicts) are carried as JSON automatically and merged back transparently on read — so records still read flat and complete. Because they ride a JSON blob, bagged fields are preserved but not queryable via where / order (keep anything you need to filter on as a scalar field). Values must be JSON-serializable; handing a live object to a SQLite-backed source raises SerializationError — use an in-memory source for those.

Filtering and sorting#

Build a filter condition with col and apply it with where(). Sort with order() — a leading - sorts descending. Both return the source, so they chain, and both behave the same whether the data lives in memory, SQLite, or a file:

from bootstack.data import col

ds.where(col("age") >= 25)
ds.where(col("department").is_in(["Sales", "Engineering"]))
ds.where(col("name").contains("ada"))
ds.order("-salary", "name")           # salary descending, then name ascending

ds.where(None)                        # clear the filter
ds.order()                            # clear the sort

A column supports the comparison operators (==, !=, <, <=, >, >=), text matching (contains, startswith, endswith — case-insensitive), is_in(values), and is_null() / is_not_null().

Combining conditions#

To require several conditions at once, use all_of (every condition must hold) or any_of (at least one). They read top-to-bottom and need no parentheses:

from bootstack.data import all_of, any_of, col

ds.where(all_of(col("status") == "active", col("name").contains("ada")))
ds.where(any_of(col("dept") == "Sales", col("dept") == "Engineering"))

For a complex filter, build the pieces as named conditions and pass the result in — it reads far better than one long expression:

active = col("status") == "active"
senior = col("level").is_in(["senior", "staff"])
ds.where(all_of(active, senior))

The operators & (and), | (or), and ~ (not) also combine conditions. They are terser, but mind Python’s precedence — & / | bind tighter than the comparisons, so each comparison needs its own parentheses:

ds.where((col("status") == "active") & (col("age") >= 25))

Conditions never interpolate values into SQL — SQLite binds them as parameters — so a filter built from user input cannot inject SQL.

Note

The data widgets drive filtering and sorting through their own UI (column headers, the search bar, column filters). Call where() / order() yourself when you share a source between widgets or filter programmatically — bound widgets refresh automatically (see Observing changes).

Observing changes#

A source broadcasts its changes, so a widget bound to one stays in sync without a manual refresh. Mutate the source directly — even from a background thread — and any bound Table or ListView updates itself:

ds = MemoryDataSource().load(initial_rows)
bs.ListView(data_source=ds)

# Later — from a poll loop, a websocket, any thread:
ds.insert(new_row)        # the list refreshes on its own

The update is marshaled onto the UI thread for you, and a burst of mutations in one turn is coalesced into a single refresh.

Use on_change to react yourself — for example, to drive a dashboard tile from the row count. With no argument it returns a Stream you can map / debounce and listen to; with a handler it subscribes directly and returns a cancellable subscription. The handler receives a DataChangeEvent:

ds.on_change().map(lambda e: ds.count).listen(badge.set_value)

sub = ds.on_change(lambda e: print("changed:", e.kind))
sub.cancel()

observe goes a step further: declare a where / order query once and get a live result set — the matching rows now, and a fresh set whenever a relevant change lands. It is the “observable query” pattern, ideal for a small derived view or a metric:

ds.observe(col("status") == "active", "-created").listen(
    lambda rows: gauge.set_value(len(rows))
)

Note

observe re-runs the whole query and re-emits the full result set on every relevant change, so keep it to small derived sets (metrics, a short list, a side panel). Large or virtualized views — Table, ListView — should bind to the source directly instead; they already listen via on_change and refetch only their visible window.

Exporting#

Write a source’s records to a file with save() — the format is chosen by the path extension, and records stream out so a large export stays at flat memory. The active where / order view is respected, so you export what the source currently shows:

ds.save("people.csv")                      # CSV
ds.save("people.jsonl")                    # JSON Lines — a record per line
ds.save("active.json", selected_only=True) # only the selected rows

Built-in formats are CSV, TSV, JSON, JSONL, and XML; Parquet, Feather, and HDF5 come with the optional extras (pip install bootstack[parquet] / bootstack[hdf5]). JSON and JSONL preserve nested structure (lists, dicts); the flat text formats stringify non-scalar fields.

Reading and writing go through symmetric registries, so read_records and save round-trip, and you can teach both a new format:

from bootstack.data import read_records, register_writer

ds.save("dump.jsonl")
rows = list(read_records("dump.jsonl"))    # same records back

@register_writer(".ndjson.gz")             # add your own format
def write_gzipped_jsonl(path, records, config=None):
    ...

The DataTable export menu is built on these same writers — see its export_formats option.

Writing your own source#

Any object that satisfies DataSourceProtocol can back a data widget. The easiest way to build one is to subclass BaseDataSource, which supplies the shared paging and utility logic and leaves you to implement the storage-specific methods (load, page, CRUD):

from bootstack.data import BaseDataSource

class ApiDataSource(BaseDataSource):
    def load(self, records): ...
    def page(self, page=None): ...

Records are plain dicts — Record is dict[str, Primitive], and Primitive is the set of values a cell may hold (str, int, float, bool, None).

Honoring the data bag#

The data bag is a contract, not a mechanism, so participating takes almost nothing:

Return complete records from page / page_slice / get — including fields the widget doesn’t display. Don’t strip anything.
Declare any bookkeeping keys you add (an internal id, a selection flag) by overriding _internal_fields(). The inherited _public_record / _record_id then hide them and surface id for you.

That’s the whole contract. How a field survives is your backend’s concern: an in-memory or document store (e.g. MongoDB’s BSON) holds nested values and live objects natively, so it honors the contract for free. A store with scalar-only columns has to serialize non-scalar fields itself — that is exactly what SqliteDataSource does with a hidden JSON column, and why it raises SerializationError for values it can’t serialize. None of that machinery is required of a custom source; only sources that genuinely serialize need it.

API reference#

The complete reference — every method, the col expression API, and the reader/writer registries — lives in Data. At a glance:

`MemoryDataSource`	In-memory data manager with pagination, filtering, sorting, and CRUD operations.
`SqliteDataSource`	SQLite-backed data manager with pagination, filtering, sorting, and CRUD operations.
`FileDataSource`	A `SqliteDataSource` whose data is streamed in from a file.
`FileSourceConfig`	Configuration for file parsing and the per-record transformation pipeline.
`DataSourceProtocol`	Protocol defining the interface for data source implementations.
`BaseDataSource`	Abstract base class for datasource implementations.
`col`	Reference a column by name for use in `where()` / `order()`.
`any_of`	Combine conditions with OR.
`all_of`	Combine conditions with AND.
`read_records`	Stream records from a file, choosing the reader by format or extension.
`write_records`	Write records to a file, choosing the writer by `format` or extension.