Freshness, tree & scanning
Freshness: two version counters
Two distinct counters drive correctness; they answer different questions.
content_version (per-track column) answers "did this track's served
bytes change?". The DB triggers increment it on any input the database can see that changes
synthesized bytes: tag and track_art edits, art-row deletes that orphan a
reference, scanner-owned geometry changes (format, audio bounds, backing
size/nanosecond-mtime), and FLAC structural-block changes. It is
therefore a superset key — the one input it cannot cover is an on-disk backing
change with no DB write, which resolve (and, since #279, a size-cache
getattr hit) catches by re-statting the backing file and degrading to
BackingChanged. The scanner stamps the backing file's (size, mtime_ns, ctime_ns) tuple from the probed file descriptor using a pre/post fstat
sandwich: if the file's metadata changes between the two stats, the entry is
dropped. ctime defeats an mtime-forging writer (e.g. touch -m). The
HeaderCache (reader.rs) — a byte-budgeted concurrent cache (64 MiB
default) of resolved layouts — keys each entry on it: a hit with a stale
content_version rebuilds the layout. Independently of the cache, every
resolve re-stats the backing file and errors with BackingChanged if its
size, mtime, or ctime drifted from the scanned values, so a silently replaced
backing file is never spliced at stale offsets. The per-handle read path
re-stats the held descriptor on every read too, so this guarantee holds on the
hot path and not only through resolve().
data_version (PRAGMA data_version, whole-DB) answers "did anyone
commit anything?". Musefs::poll_refresh compares it to the last seen
value; on a change it consults the track_changes ring and applies an
incremental, O(changed) rebuild: only the affected tracks' tree entries
are re-rendered, exactly the removed tracks' cache entries are dropped, and
the inodes whose content_version rose are reported to the FUSE layer. If
the mount slept past the ring's capacity (or the ring was truncated), it
falls back to a full tree rebuild — correct by construction, and a bulk
change wants one anyway. The new version stamp is committed only after a
successful rebuild; failures arm a retry backoff.
The FUSE layer fires poll_refresh on metadata ops (lookup, readdir,
…) off the dispatch thread, so external edits appear without remounting.
Polling is debounced (--poll-interval-ms) and rebuilds are single-flighted:
a metadata-op storm costs at most one rebuild per interval. When mounted with
--keep-cache, the changed-inode notifications drive kernel page-cache
invalidation (inval_inode), so a re-tagged file never serves stale cached
bytes.
Virtual tree
VirtualTree::build (musefs-core/src/tree.rs) materializes an inode → node
mapping from rendered paths. Paths come from beets-style templates
(template.rs): $field / ${field} substitutions (with ${a|b} fallback
chains) over the track's tag fields, each resolving through per-field fallbacks
and then a global default_fallback; [...] conditional sections suppress
their literals when every field they reference is empty. With skip_on_missing
set (CLI --skip-on-missing), an unresolved top-level field instead drops the
track from the mount: render_one returns None, so the track enters neither
the snapshot nor the tree, and the incremental refresh path reclassifies a track
that loses (or regains) such a field as a removal (or addition). Plain values are
sanitized to a single path component ('/' and control characters become '_',
components equal to . or .. are dropped, and any component is truncated to
255 bytes on a UTF-8 boundary so it stays within NAME_MAX),
while a $!{field} path field keeps '/' as directory separators (sanitizing
each segment and dropping empty/./.. segments) so a precomputed multi-level
path expands into real directories. Path collisions are resolved
deterministically by appending (k) before the extension
(disambiguate). mapping.rs bridges DB tag rows to the format layer's
inputs and to template fields — ordering and multi-value semantics live
there.
Inodes are stable across rebuilds: a persistent path→inode allocator
(InodeAllocator) reuses an unchanged rendered path's inode and never
recycles a retired one, so a descriptor held open across a refresh keeps
resolving to the same node and a stale FUSE handle can never alias a
different file. On case-insensitive mounts the key is case-folded, so a
survivor keeps its inode even when an unrelated deletion flips a merged
directory's display casing (#305). A path that vanished degrades to
ENOENT, bounded by the entry/attr TTL. (Retired paths are pruned once they outnumber live ones,
bounding the allocator at twice the live tree; a path that returns after a
prune gets a fresh inode.)
Scanning
scan_directory (musefs-core/src/scan.rs) ingests a backing directory:
collect supported audio files, probe each (format detection → audio
offset/length, tags, pictures, structural blocks) on a parallel probe
pipeline feeding a single DB writer, committing in batches. Probing reads
are bounded — the scanner never slurps whole files — and ingestion caps
per-item sizes (MAX_ART_BYTES, MAX_BINARY_TAG_BYTES) so a crafted file
cannot balloon the store. An over-cap picture or binary tag is dropped and
logged (RUST_LOG=warn) rather than vanishing silently, so a track that
appears to have lost its cover art has an explanation in the logs; a
supported-extension file that fails to parse, or errors mid-probe, is
likewise logged with the reason and counted failed.
Symlinks are not followed by default: a symlinked file or directory is
logged (RUST_LOG=info/warn) and skipped, which keeps the walk immune to
directory-symlink cycles. Passing --follow-symlinks resolves them — symlinked
audio files and directories are scanned — guarded by a visited (dev, ino) set
so symlink cycles terminate, and by a second file-level (dev, ino) set so a
file reached via both a real path and a symlink is ingested once rather than
upserting its canonical track row twice. Because that set keys on (dev, ino),
multiple hardlinks to the same inode are likewise collapsed to a single track
under --follow-symlinks. Broken symlinks are logged and skipped without
aborting the scan. The root argument is always followed regardless of the
flag; only links encountered during recursion are gated.
revalidate is the maintenance pass (scan --revalidate): re-probe only
files whose (size, mtime_ns, ctime_ns) freshness stamp changed — a
ctime-only move (e.g. a forged-mtime in-place rewrite) is still re-probed
(skipping unchanged files preserves external tag edits in the DB),
delete tracks under the scanned root whose
backing file is gone, and garbage-collect now-unreferenced art. Pruning is
scoped to the scanned root, so revalidating one library root never removes
tracks belonging to another. Because a track is keyed by its canonical
backing path, a file scanned via --follow-symlinks whose real target lives
outside the scanned root falls outside the prune scope: if that target later
disappears, its stale row is not pruned by revalidating this root.
The contrib ecosystem
External writers live under contrib/: python-musefs is the shared
store-contract library (schema-version check, tag/art writes, sha256 art
content-addressing, the musefs scan shell-out); the
beets plugin, the
Picard plugin, and the
Lidarr integration (a Custom Script workflow)
build host-specific tag mapping on top of it. Each one's README covers its own
setup and behavior;
CONTRIBUTING covers their test suites and the
generated-schema/vendoring mechanics.