The store & external-writer contract
The SQLite store
musefs-db/src/schema.rs defines the schema as an ordered list of migrations
(MIGRATIONS: the MIGRATION_V1 baseline plus MIGRATION_V2, which adds the
scanner-owned fingerprint/content_hash columns); user_version records the
schema version (2).
The store is the interface external tools write to — the beets and Picard
plugins under contrib/ write tags and art here out-of-band.
- The baseline schema (
MIGRATION_V1): the core tables —tracks(one row per backing file: path, format, audio byte range, size/nanosecond-mtime/ctime stamps,content_version),tags(multi-value key/value rows ordered byordinal, with an optionalvalue_blobfor binary tags),art(content-addressed, deduplicated image blobs),track_art(per-track art links with picture type and ordering), andstructural_blocks(read-only, derived-from-file FLACSTREAMINFO/SEEKTABLEmetadata, not part of the editable contract). Deleting a track cascades to itstagsandtrack_artrows. Triggers bump the owning track'scontent_version/updated_aton anytags/track_artedit;CHECKconstraints enforce the contract invariants below at commit time. A bounded, self-pruningtrack_changesring (capacity 8192,CHANGELOG_CAP) fed by triggers ontracksgives O(changed) refresh — every metadata edit funnels through anUPDATEon the tracks row, relying on SQLite's nested trigger activation (on by default). Freshness-superset triggers makecontent_versioncover every DB-knowable input to synthesized bytes:art_reject_content_update(art is content-addressed and immutable),art_ad(a deleted art row bumps referencing tracks so an orphan rebuilds to a clean serve-time error),tracks_geometry_au(scanner-owned geometry changes), andstructural_blocks_ai/_ad.
The external-writer contract
Ownership. External tools get full read/write on tags, art, and
track_art. The scanner owns the structural columns of tracks (id,
backing_path, format, audio_offset, audio_length, backing_size,
backing_mtime_ns, backing_ctime_ns, content_version, updated_at) and
all of structural_blocks: those are derived from probing the file, and
external tools must run musefs scan rather than compute them.
tracks.fingerprint and tracks.content_hash are also scanner-owned,
read-only-derived columns — like structural_blocks, they are never part of
the editable tag contract and external tools never write them.
fingerprint is a SHA-256 over the probe's parsed output (deterministic per
file, excludes filesystem stamps such as mtime/ctime), computed in the
parallel probe worker at zero extra I/O. content_hash is a full-file
SHA-256, stored as 64-char hex; it is computed only at the full checksum
tier (--checksum=full), which requires an eager whole-file read. Neither
column is UNIQUE by design — duplicate-content tracks legitimately share
both values. On a normal scan, when a probed file's path is not yet in the
store and its fingerprint matches exactly one orphaned row (a row whose
backing_path no longer exists on disk), the scanner retargets that row to
the new path in place, preserving its id, tags, and art rather than
orphaning them. This is how musefs recovers from a backing-library move or
reorganization: run musefs scan after moving files, and existing store rows
follow their backing files to the new locations.
What the store enforces. SQLite CHECK constraints reject the
malformed shapes at commit, so an external writer cannot persist them:
- an unknown
formatstring, or a negative length/offset/size/version; - an
audio_offset + audio_lengthrunning past the storedbacking_size; - a binary tag row whose
valueis non-empty; - an
art.byte_lenthat disagrees with its blob, or asha256of the wrong length; - a
picture_typeoutside0..=20; - a
tags.keyover 256 chars ortags.valueover 256 KiB; tags.keymust be non-empty and contain no ASCII control characters (a DBCHECKenforces this, rejecting violating writes — with one blind spot: an embedded NUL terminates SQLite'slength()/GLOB, so a key likea\0bslips theCHECK. The scanner's own floor drops it before insert, and the Vorbis path rejects it on synthesis). Additionally, only keys within the Vorbis field-name grammar (ASCII0x20–0x7D, excluding=) survive FLAC/Ogg synthesis — others are dropped and logged. MP3/M4A custom keys may use the wider set (e.g.=,:, spaces, non-ASCII).- a
value_bloboverMAX_BINARY_TAG_BYTES; - an
art.mimeover 255 chars orbyte_lenoverMAX_ART_BYTES; - a
track_art.descriptionover 1 KiB; - a
structural_blocksrow with an unknownkind, negativeordinal, orbodyover the FLAC 24-bit block limit.
Schema identity. On open, musefs also validates schema identity: a
sqlite_master comparison against a freshly-migrated reference plus PRAGMA foreign_key_check, rejecting anything that is not the canonical latest schema
with a message telling the user to run musefs scan. A store whose
user_version is newer than this binary's latest migration (a future or
third-party tool bumped the schema) is refused up front with a distinct
"store is newer than this binary" error rather than silently treated as
already-migrated — an older binary must not risk misreading a newer contract.
Art is immutable once written. art rows are content-addressed by
sha256; a trigger rejects any in-place UPDATE of an art row's
content columns (data, sha256, mime, byte_len, width, height) with
RAISE(ABORT) — a multi-row UPDATE art touching any content column aborts the
whole statement. To change a track's art, insert a new content-addressed row
and relink it via track_art (which bumps content_version); do not mutate an
existing row. Deleting an art row still referenced by track_art (possible
only with foreign_keys OFF) bumps every referencing track so the mount serves
a clean EIO on the now-orphaned reference instead of stale bytes.
What musefs defends at serve time. CHECKs cannot catch a scanner-owned
field mutated to a well-formed value that no longer matches the real file
on disk: backing_size or backing_mtime_ns/backing_ctime_ns that drift
from the actual file's stat, or audio bounds that fit the stored
backing_size but overrun the file once it has shrunk. musefs re-stats the
backing file on every resolve and treats such rows as untrusted input,
degrading to a controlled
BackingChanged/layout error, never undefined behavior. The store's
CHECK rejects art over MAX_ART_BYTES (16 MiB − 64 KiB) at write time;
resolve also re-checks it (ArtTooLarge, all formats) to backstop a writer
that disables check enforcement, and the scanner's ingest-time drop is
tracked in #284.
Referential gaps are treated the same way: a track_art row whose art_id
has no matching art row (an orphan an external writer can produce with FK
enforcement disabled) fails the serve with EIO rather than silently dropping
the art.
Merge vs. replace. An external writer may merge rather than fully
replace text tags — overwriting only the keys it manages and leaving the rest
of the scan-seeded set in place — provided it tracks its own managed-key set
out of band (the beets plugin uses a beets flexattr; the store is not the
place for plugin state). musefs renders tags outside its native VOCAB
(musefs-format/src/tagmap.rs) by passthrough (Vorbis uppercased, mp3
TXXX, mp4 freeform), so such tags appear but are not guaranteed
byte-identical to a given tagger's own per-format encoding. A merge matches
the keys it manages case-insensitively, so a writer's canonical
(lowercase) key replaces a scan-seeded row stored under the backing file's
native case (e.g. Vorbis LABEL) instead of coexisting with it — Vorbis keys
render case-insensitively, so two such rows would otherwise duplicate.
Path layout offload. External tools can also offload path layout
entirely: a plugin evaluates its own (arbitrarily complex) path logic, writes
the resulting relative path into a custom text tag — e.g. INSERT INTO tags (track_id, key, value, ordinal) VALUES (?, 'beets_path', 'Pink Floyd/Animals/01 Pigs', 0) — and the user mounts with --template '$!{beets_path}'. Because the field map is just the (lowercased) tag keys,
any number of such tags (beets_path, lidarr_path, …) can back different
concurrent mounts. The path field keeps embedded / as directory separators
but sanitizes each segment and drops empty/./.. segments, so a
misbehaving writer cannot inject traversal or empty components into the tree.
The shared Python library. contrib/python-musefs/ encodes this contract
for plugin authors, including a generated copy of the schema
(musefs_common/schema.py, regenerated from schema.rs by a drift-guarded
test — see CONTRIBUTING). Its tag/art replace operations
each wrap their DELETE+INSERT in a SQLite savepoint, so they are
individually atomic and the "caller owns the transaction" guarantee holds even
on an autocommit connection. The Lidarr integration
uses the same shared library from a Custom Script workflow. Its Lidarr
destination tree is only a tracking aid, made of symlinks by default; musefs
remains the consumer-facing filesystem.
CI proves this contract end to end in the contract job (see
CONTRIBUTING): a Python writer's tags/art, layered on a
scanned track, are synthesized by the Rust serve path and read back by an
independent reader.
External writers prune in one of two ways depending on how they own files.
For in-place writers (e.g. the beets plugin), existence-based pruning — dropping
the row of a removed backing file — is a deliberate act owned by musefs scan --revalidate; the plugin never prunes on its own (it exposes the revalidate
scan via beet musefs --revalidate). The prune_missing helper in
musefs_common implements the same by-existence delete for writers that prefer
to own pruning themselves. Link-tree writers (e.g. the Lidarr integration) never
delete the backing files they point at, so they prune by identity instead: a
source-reported album/artist deletion removes the rows carrying the matching
MusicBrainz id.
Connections are mode-typed (Db<ReadWrite> / Db<ReadOnly>), opened in WAL
mode with a busy timeout. The serve path uses a DbPool whose per-thread
variant hands each reader thread its own connection — WAL reads never contend.