Data Sources#
A data source is the bridge between your records and the data-bound widgets — ListView and DataTable. It owns the records and serves them a page at a time, so the same widget works whether your data lives in memory, a SQLite database, or a file on disk. (Tree isn’t data-source-backed — it holds its own nodes in memory — but shares the same record and data bag model.)
You often don’t touch a data source at all — pass items= / rows= and the
widget builds one for you. Reach for an explicit source when you want to share
data between widgets, back it with a database, or load it from a file.
In-memory data#
MemoryDataSource holds a list of record dicts. Create it and load rows with
load(), then hand it to a widget:
from bootstack.data import MemoryDataSource, SqliteDataSource, FileDataSource
records = [
{"name": "Ada", "role": "Engineer"},
{"name": "Linus", "role": "Maintainer"}
]
ds = MemoryDataSource().load(records)
bs.ListView(data_source=ds)
SQLite-backed data#
SqliteDataSource keeps rows in an SQLite database (in-memory by default, or
a file path). It is the default source a DataTable
builds when you pass rows= — or
supply your own to back the table with a database file:
ds = SqliteDataSource("app.db")
ds.load(records)
bs.DataTable(data_source=ds)
File-backed data#
FileDataSource reads a file and streams it — a chunk at a time — into a
SQLite working store, so even a multi-million-row file loads with bounded
memory. After load() it is a SqliteDataSource: paging, filtering,
sorting, and CRUD are all fast SQL. Configure parsing and transforms with a
FileSourceConfig:
ds = FileDataSource("people.csv")
ds.load()
bs.DataTable(data_source=ds)
The original file is read-only input — edits live in the working store and
are never written back. To save changes, export to a new file
(export_csv or the
DataTable export menu);
reload() re-ingests from the file.
The working store is, by choice:
temporary on disk (default) — bounded memory, removed on
close()(and automatically, as a safety net, when the source is dropped or at exit).cache="people.db"— a persistent store: edits survive restarts, and re-opening skips re-ingest while the cache is newer than the source file.cache=":memory:"— in-memory: compact, but RAM-bound.
Close the store when done — explicitly or with a with block:
with FileDataSource("people.csv", cache="people.db") as ds:
ds.load()
first = ds.page(0)
Formats. CSV, TSV, JSON, JSONL/NDJSON, and XML are built in. Columnar and
scientific formats are available through optional extras — Parquet and Feather
(pip install bootstack[parquet]) and HDF5 (pip install bootstack[hdf5]);
each is a streaming reader, and a clear error tells you to install the extra if
it is missing.
JSON comes in two shapes: a top-level array of objects (.json), or
JSONL/NDJSON — one object per line (.jsonl / .ndjson), which streams a
record at a time and is the right choice for large data. When the records are
nested under a key (an API response like {"data": [...]}), point at it with
json_records_key="data".
FileDataSource("export.ndjson") # streamed
FileDataSource("api.json", FileSourceConfig(json_records_key="data"))
Carrying extra data#
A record can hold more than the widget shows. The columns of a DataTable or
the template of a ListView are a view over the record — fields you don’t
display are still carried through, and event handlers get the whole record back,
not a stripped-down shadow:
rows = [
{"id": 1, "name": "Ada", "role": "Engineer",
"tags": ["math", "logic"], "profile": {"era": 1840}},
]
table = bs.DataTable(rows=rows, columns=["name", "role"]) # tags/profile hidden
table.on_row_click(lambda e: print(e.record["tags"])) # → ['math', 'logic']
This works the same on every source, but what a field may hold depends on where the records live:
In-memory (
MemoryDataSourceand the defaultListViewsource) holds anything, including live Python objects, by reference. The field you put in is the object you get back.SQLite-backed (
SqliteDataSource, andFileDataSource— which ingests the file into a SQLite store). Scalar fields (text, numbers, booleans) become real columns you can filter and sort on. Non-scalar fields (lists, dicts) are carried as JSON automatically and merged back transparently on read — so records still read flat and complete. Because they ride a JSON blob, bagged fields are preserved but not queryable viawhere/order(keep anything you need to filter on as a scalar field). Values must be JSON-serializable; handing a live object to a SQLite-backed source raisesSerializationError— use an in-memory source for those.
Filtering and sorting#
Build a filter condition with col and apply it with where(). Sort
with order() — a leading - sorts descending. Both return the source, so
they chain, and both behave the same whether the data lives in memory, SQLite,
or a file:
from bootstack.data import col
ds.where(col("age") >= 25)
ds.where(col("department").is_in(["Sales", "Engineering"]))
ds.where(col("name").contains("ada"))
ds.order("-salary", "name") # salary descending, then name ascending
ds.where(None) # clear the filter
ds.order() # clear the sort
A column supports the comparison operators (==, !=, <, <=,
>, >=), text matching (contains, startswith, endswith —
case-insensitive), is_in(values), and is_null() / is_not_null().
Combining conditions#
To require several conditions at once, use all_of (every condition must
hold) or any_of (at least one). They read top-to-bottom and need no
parentheses:
from bootstack.data import all_of, any_of, col
ds.where(all_of(col("status") == "active", col("name").contains("ada")))
ds.where(any_of(col("dept") == "Sales", col("dept") == "Engineering"))
For a complex filter, build the pieces as named conditions and pass the result in — it reads far better than one long expression:
active = col("status") == "active"
senior = col("level").is_in(["senior", "staff"])
ds.where(all_of(active, senior))
The operators & (and), | (or), and ~ (not) also combine conditions.
They are terser, but mind Python’s precedence — & / | bind tighter than
the comparisons, so each comparison needs its own parentheses:
ds.where((col("status") == "active") & (col("age") >= 25))
Conditions never interpolate values into SQL — SQLite binds them as parameters — so a filter built from user input cannot inject SQL.
Note
The data widgets drive filtering and sorting through their own UI (column
headers, the search bar, column filters). Call where() / order()
yourself when you share a source between widgets or filter programmatically —
bound widgets refresh automatically (see Observing changes).
Observing changes#
A source broadcasts its changes, so a widget bound to one stays in sync without
a manual refresh. Mutate the source directly — even from a background thread —
and any bound Table or ListView updates itself:
ds = MemoryDataSource().load(initial_rows)
bs.ListView(data_source=ds)
# Later — from a poll loop, a websocket, any thread:
ds.insert(new_row) # the list refreshes on its own
The update is marshaled onto the UI thread for you, and a burst of mutations in one turn is coalesced into a single refresh.
Use on_change to react yourself — for example, to drive a dashboard tile
from the row count. With no argument it returns a Stream you can map / debounce and listen to; with a
handler it subscribes directly and returns a cancellable subscription. The
handler receives a DataChangeEvent:
ds.on_change().map(lambda e: ds.count).listen(badge.set_value)
sub = ds.on_change(lambda e: print("changed:", e.kind))
sub.cancel()
observe goes a step further: declare a where / order query once and
get a live result set — the matching rows now, and a fresh set whenever a
relevant change lands. It is the “observable query” pattern, ideal for a small
derived view or a metric:
ds.observe(col("status") == "active", "-created").listen(
lambda rows: gauge.set_value(len(rows))
)
Note
observe re-runs the whole query and re-emits the full result set on every
relevant change, so keep it to small derived sets (metrics, a short list, a
side panel). Large or virtualized views — Table, ListView — should
bind to the source directly instead; they already listen via on_change
and refetch only their visible window.
Exporting#
Write a source’s records to a file with save() — the format is chosen by the
path extension, and records stream out so a large export stays at flat memory.
The active where / order view is respected, so you export what the source
currently shows:
ds.save("people.csv") # CSV
ds.save("people.jsonl") # JSON Lines — a record per line
ds.save("active.json", selected_only=True) # only the selected rows
Built-in formats are CSV, TSV, JSON, JSONL, and XML; Parquet, Feather, and HDF5
come with the optional extras (pip install bootstack[parquet] /
bootstack[hdf5]). JSON and JSONL preserve nested structure (lists, dicts);
the flat text formats stringify non-scalar fields.
Reading and writing go through symmetric registries, so read_records and
save round-trip, and you can teach both a new format:
from bootstack.data import read_records, register_writer
ds.save("dump.jsonl")
rows = list(read_records("dump.jsonl")) # same records back
@register_writer(".ndjson.gz") # add your own format
def write_gzipped_jsonl(path, records, config=None):
...
The DataTable export menu is
built on these same writers — see its export_formats option.
Writing your own source#
Any object that satisfies DataSourceProtocol can back a data widget. The easiest way to
build one is to subclass BaseDataSource,
which supplies the shared paging and utility logic and leaves you to implement
the storage-specific methods (load, page, CRUD):
from bootstack.data import BaseDataSource
class ApiDataSource(BaseDataSource):
def load(self, records): ...
def page(self, page=None): ...
Records are plain dicts — Record is
dict[str, Primitive], and Primitive is
the set of values a cell may hold (str, int, float, bool,
None).
Honoring the data bag#
The data bag is a contract, not a mechanism, so participating takes almost nothing:
Return complete records from
page/page_slice/get— including fields the widget doesn’t display. Don’t strip anything.Declare any bookkeeping keys you add (an internal id, a selection flag) by overriding
_internal_fields(). The inherited_public_record/_record_idthen hide them and surfaceidfor you.
That’s the whole contract. How a field survives is your backend’s concern:
an in-memory or document store (e.g. MongoDB’s BSON) holds nested values and live
objects natively, so it honors the contract for free. A store with scalar-only
columns has to serialize non-scalar fields itself — that is exactly what
SqliteDataSource does with a hidden JSON column, and why it raises
SerializationError for values it
can’t serialize. None of that machinery is required of a custom source; only
sources that genuinely serialize need it.
See also#
API reference#
The complete reference — every method, the col expression API, and the
reader/writer registries — lives in Data. At a glance:
In-memory data manager with pagination, filtering, sorting, and CRUD operations. |
|
SQLite-backed data manager with pagination, filtering, sorting, and CRUD operations. |
|
A |
|
Configuration for file parsing and the per-record transformation pipeline. |
|
Protocol defining the interface for data source implementations. |
|
Abstract base class for datasource implementations. |
|
Reference a column by name for use in |
|
Combine conditions with OR. |
|
Combine conditions with AND. |
|
Stream records from a file, choosing the reader by format or extension. |
|
Write records to a file, choosing the writer by |