Storage layer
Storage is where Beside’s local-first promise becomes concrete. The storage
plugin owns the bytes on disk: raw events, screenshot/audio assets, derived
frames, embeddings, sessions, meetings, day events, memory chunks, and the
namespaced records produced by capture hooks. Nothing here ever phones home —
your ~/.beside/ directory is the entire product state, on your disk, in
formats you can read with standard tools.
The default plugin (local) ships with the runtime and is what almost every
install uses. The interface (IStorage) is large because the surface is real:
SQLite + asset directories give you a queryable knowledge base and a vector
store without dragging in Postgres, Pinecone, or anyone else’s cloud.
local — SQLite + asset directories
storage:
plugin: local
local:
path: ~/.beside
max_size_gb: 50
retention_days: 365
vacuum:
compress_after_minutes: 60
compress_quality: 40
thumbnail_after_days: 30
thumbnail_max_dim: 480
delete_after_days: 180
tick_interval_min: 15
batch_size: 50
What lives where:
~/.beside/raw/events.sqlite— a single SQLite database with WAL, embedded FTS5 search, vector tables for embeddings, and indexes for sessions, meetings, day events, memory chunks, and hook records.~/.beside/raw/frames/<day>/<id>.webp— screenshot assets, written once and tiered later by the vacuum (compressed → thumbnail → deleted).~/.beside/raw/audio/{inbox,processed,failed}/— audio chunks moving through the transcription worker.~/.beside/index/— the wiki the index strategy emits.
What the IStorage surface gives you
IStorage is a thick interface on purpose: it lets index strategies, hooks,
and exports stay simple. The headline operations:
- Events — append-only writes, plus replayable reads with
from,to,types,apps, paginated and checkpointable per index strategy (since_checkpoint,markIndexed). - Assets —
writeAsset/readAssetare the only ways anything writes to the asset directory, so the vacuum can move bytes around safely. - Frames —
upsertFrame,searchFrames,getFrameContext(before, after),getJournal(day), OCR + accessibility task queues, perceptual-hash dedupe. - Embeddings —
searchFrameEmbeddings,upsertFrameEmbeddings,findExistingFrameEmbeddings, plus the same trio for memory chunks. - Sessions / meetings / day events — assignment, listing, summary writeback, and full-bin clears for reindexing.
- Memory chunks —
replaceMemoryChunks,upsertMemoryChunks,searchMemoryChunks,searchMemoryChunkEmbeddings— the queryable surface the MCP exporter exposes to agents. - Vacuum — tier transitions for assets, deletes for frames orphaned by
retention,
runMaintenancefor SQLite VACUUM/ANALYZE, optional WAL checkpoints. - Hook records —
hookPut/hookGet/hookList/hookClear, scoped perhookIdso plugins can never read each other’s data.
The vacuum
Storage isn’t just "write everything forever". A background storage vacuum keeps the asset directory tractable.
- After
compress_after_minutes(default 60), originals are re-encoded atcompress_quality: 40— same dimensions, smaller bytes. - After
thumbnail_after_days(default 30), assets are downscaled tothumbnail_max_dim: 480so older days still preview but cost almost nothing. - After
delete_after_days(default 180), the asset bytes are removed but the metadata (timestamp, app, window title, OCR text, embedding) stays — so the wiki can still cite a frame that no longer has its picture.
The vacuum loop ticks every tick_interval_min minutes and only processes
batch_size frames per tick, so it never spikes CPU.
Why a typed storage layer matters
Most "memory" tools blur the line between application logic and storage. Beside
deliberately enforces it via IStorage, which buys two things:
- You can replace storage without breaking anything else. Want a
network-mounted SQLite, or a Postgres-backed store, or an in-memory test
double? Implement
IStorage, register it as a plugin, pointconfig.yamlat it. The orchestrator, hooks, index strategy, and exports don’t change. - The wiki and the database stay in sync. Pages live on disk, but every
page also references a
MemoryChunkrow with content hashes, embeddings, and source frame IDs. That’s what makes hybrid keyword + vector retrieval feel honest — the agent gets a citation, not a vibe.
Writing a custom storage plugin
The minimum viable IStorage is large but mechanical. Most people don’t need
to write one — but if you do, the trick is to lean on replaceMemoryChunks
and upsertFrameEmbeddings for the bulk operations and treat the per-frame
methods as the source of truth.
const factory: PluginFactory<IStorage> = async ({ dataDir, config, logger }) => {
// open your database, create tables, return an IStorage implementation
};
export default factory;
Important constraints if you write one:
writemust be append-only and durable — the runtime treats raw events as the truth.markIndexed(strategy, eventIds)must be idempotent — strategies replay on restart.- Asset paths must be stable strings relative to your storage root — frames carry them around for the life of the install.
Because storage is the bottom of the stack, it’s also the layer that defines
Beside’s posture toward your data: one directory you can back up, encrypt with
FileVault, sync with whatever you already trust, or wipe with rm -rf. Capture
hook records are namespaced and isolated per hook, the wiki on disk is plain
Markdown, and the vacuum keeps disk pressure flat over months of always-on
use. There is no "Beside cloud" to opt out of, because there isn’t one.
