Introduction
A read-only FUSE filesystem that presents a re-tagged, reorganized view of your music library — without modifying or duplicating a single byte of the original audio. Fix tags, art, and folder structure in a SQLite store; the mount shows a clean library while your files stay exactly as they are.
What it's for
- A clean view of a messy library. Your files keep their on-disk chaos; the mount presents one consistent, template-driven tree for players and media managers.
- Tag editing without touching files. Edit the SQLite store (directly, or via the beets plugin, Picard plugin, or Lidarr integration) and the mounted view updates live — no remount, no rewrite, no re-rip anxiety.
- Lossless-by-construction experimentation. Change your tags, try a different organization scheme, new cover art — the originals are physically read-only to the mount. Backing up a current library is as easy as copying the db file.
- Hash-stable by construction. The mount never rewrites a byte, so each backing file's checksum is exactly what it was the day it arrived — anything verified by hash keeps verifying, and anything you're seeding keeps seeding, however aggressively you retag and reorganize the view on top.
Note: This project was built with AI. The general workflow was to use the superpowers skills to provide a framework. Claude Opus was used to write plans and specs which were then implemented by another model, primarily MiMo v2.5.
One of my goals in building this project was to "vibe code" something that was decisively not slop. I believe I've realized that objective and I hope that you take the project on its merits.
If you disagree, please let me know! I'd love to know where I came up short so I can improve things.
Status
All five formats ship with embedded cover art and binary-tag preservation.
The serve path has been through a performance/concurrency hardening pass for
real-world player and media-manager access against large libraries on
HDD/SSD/NFS, and the parsers are continuously fuzzed. beets, Picard, and
Lidarr plugins ship in contrib/. See the
CHANGELOG for history.
Deeper reading: the architecture reference for how it works, the contributor guide for the development workflow.
Quick start
cargo install musefs # compiles from source — needs a Rust toolchain,
# libfuse3-dev and pkg-config; prebuilt binaries
# and container images: see Installing
musefs scan ~/Music --db library.db # ingest your library
mkdir -p ~/mnt/music
musefs mount ~/mnt/music --db library.db \
--template '$albumartist/$album/$title'
# mount blocks until unmounted: fusermount3 -u ~/mnt/music (or Ctrl-C)
~/mnt/music now serves your library as
Album Artist/Album/Title.flac — with each file's metadata generated fresh
from the database, spliced in front of your original, untouched audio.
Installation
Three ways to get musefs: a prebuilt binary (no toolchain needed), building from source, or a container image. Whichever you pick, mounting needs a 64-bit FUSE-capable OS (Linux, FreeBSD, macOS) — see Platform support.
Important: Linux and FreeBSD are E2E tested. I don't have anything running macOS to test on, if you run this on one let me know if it works, or especially if it doesn't!
At present AMD64, AARCH64, and RISC-V 64 are supported. If you'd like 32-bit support please open an issue.
Prebuilt binaries
Each tagged release attaches static/portable Linux binaries for six targets:
| Target | libc | Notes |
|---|---|---|
x86_64-unknown-linux-gnu | glibc | Pinned to glibc 2.17 — runs on essentially any current distro. |
aarch64-unknown-linux-gnu | glibc | glibc 2.17 floor, ARM64. |
x86_64-unknown-linux-musl | musl | Fully static — runs on Alpine / scratch containers. |
aarch64-unknown-linux-musl | musl | Fully static, ARM64. |
riscv64gc-unknown-linux-gnu | glibc | glibc 2.27 floor, RISC-V 64. |
riscv64gc-unknown-linux-musl | musl | Fully static, RISC-V 64. |
The *-musl build is statically linked, so it runs on any Linux host of
that architecture regardless of libc — glibc distros (Debian/Ubuntu/Fedora)
included, not just Alpine/musl. For mixed or containerized deployments it is the
simplest choice: one binary you can drop onto a glibc host and an Alpine image
alike.
Download the tarball for your target from the latest release, verify it, and extract:
sha256sum -c musefs-<version>-<target>.tar.gz.sha256
tar -xzf musefs-<version>-<target>.tar.gz # yields ./musefs
Runtime requirements: the binaries mount via FUSE's fusermount3 helper, so
the target needs the FUSE userspace tools and /dev/fuse:
- Debian/Ubuntu:
apt-get install fuse3 - Alpine:
apk add fuse3
No glibc/libfuse install is needed for the musl binaries beyond fuse3.
Note: On Ubuntu 24.04+ (libfuse ≥ 3.17) the
fusermount3AppArmor profile only permits unprivileged mounts under whitelisted prefixes ($HOME/**,/mnt,/media,/tmp, …). Mounting elsewhere fails withfusermount3: mount failed: Permission denied— see Mounting for the whitelist and the fix.
Building from source
cargo install musefs compiles the latest release; building needs a stable
Rust toolchain (2024 edition) plus the FUSE headers (libfuse3-dev) and
pkg-config. To install the latest development version instead:
cargo install --git https://github.com/Sohex/musefs musefs
The same fuse3 runtime requirement as the prebuilt binaries applies.
The binary uses jemalloc as its global allocator by default (it bounds
resident memory for the long-lived mount daemon under heavy concurrent reads).
Distribution packagers or anyone debugging memory with valgrind/heaptrack can
build against the system allocator instead with
cargo build -p musefs --no-default-features (or cargo install musefs --no-default-features).
Platform support
| Platform | FUSE | Kernel passthrough (StructureOnly) | Notes |
|---|---|---|---|
| Linux | Yes (/dev/fuse + fusermount3, from the fuse3 package) | Yes (6.9+, falls back to daemon serving otherwise) | Full support. |
| FreeBSD | Yes (pure-rust /dev/fuse backend; fusefs kernel module, no libfuse) | No | Full FUSE support. |
| macOS (FUSE-T) | Best-effort | No | Compiles and runs unit tests with macos-no-mount; mounted e2e is not yet validated. |
On platforms without kernel passthrough, --mode structure-only still serves
the original bytes, just through the daemon instead of the kernel.
Filename case-folding is platform-aware: --case-insensitive <true|false>
defaults to true on macOS and false on Linux/FreeBSD. When enabled,
filenames are compared case-insensitively — case-variant directories merge into
one (first-seen casing wins) and case-variant files get a numeric suffix (e.g.
Song (2)); case-insensitive mounts refresh via a full rebuild rather than the
incremental fast path.
Running in containers
Container images
Each tagged release also publishes multi-arch images to the GitHub Container Registry:
| Image | libc | Platforms |
|---|---|---|
ghcr.io/sohex/musefs:<version>, ghcr.io/sohex/musefs:latest | glibc | amd64, arm64, riscv64 |
ghcr.io/sohex/musefs:<version>-musl, ghcr.io/sohex/musefs:musl | musl | amd64, arm64, riscv64 |
docker pull selects the CPU architecture automatically. Use the -musl /
:musl tags when slotting musefs into an Alpine-based stack; the default
(glibc) tags suit everything else. Floating :latest / :musl track the most
recent stable release only — prereleases publish only version-pinned tags.
Running musefs on the host is the simplest, best-supported option — it is an ordinary FUSE daemon and the image exists mainly to colocate musefs with containerized media managers (e.g. Lidarr). If you do containerize, mind the gotchas below.
Required flags
musefs mounts via FUSE, so the container needs /dev/fuse and the matching
capability:
docker run --rm \
--device /dev/fuse --cap-add SYS_ADMIN --security-opt apparmor=unconfined \
-v /path/to/library:/library:ro \
-v /path/to/store:/store \
ghcr.io/sohex/musefs:latest scan /library --db /store/musefs.db
Without --device /dev/fuse --cap-add SYS_ADMIN --security-opt apparmor=unconfined
the mount cannot be established.
Note: The apparmor flag may or may not be necessary depending on how your system is configured.
Note that CAP_SYS_ADMIN is a broadly privileged capability — it grants far more
than FUSE mounting (mounting arbitrary filesystems, and more). It is unavoidable
for an in-container FUSE mount — even rootless Podman cannot drop it; without
--cap-add SYS_ADMIN the mount fails with fusermount3: mount failed: Permission denied. Under rootless Podman the capability is confined to the container's user
namespace rather than the host, so its blast radius is smaller, but it is still
required. Running musefs on the host needs no such capability at all.
Runs as a non-root user
The images run as a dedicated unprivileged user (default uid/gid 1000), not
root — musefs mounts via the setuid fusermount3 helper and needs no root of
its own. Consequences for the commands above:
- The bind-mounted store volume must be writable by that uid. Either
chown 1000:1000 /path/to/storeon the host, or add--user $(id -u):$(id -g)to run as your own uid. The library volume is mounted:ro, so its ownership does not matter. - To bake an image whose user matches your host account (so no
chownor--useris needed), build from source with--build-arg MUSEFS_UID=$(id -u) --build-arg MUSEFS_GID=$(id -g). - The images include
user_allow_otherin/etc/fuse.conf, so a non-root--allow-other/--owner/--groupmount (needed to share the mount across containers or users, below) passes musefs's pre-flight check. See Ownership and permissions.
The mount-visibility gotcha (read this before sharing the mount)
A FUSE mount made inside a container lives in that container's mount namespace.
By default neither the host nor other containers can see it, so pointing a second
container (your media manager) at musefs's output does not work out of the box.
To share the mount you propagate it between containers through a host directory:
musefs binds that directory with rshared and mounts itself there, and the
consumer binds the same directory with rslave so the mount propagates in. The
host directory must itself be a shared mount.
# A host directory both containers bind to, marked shared so mounts propagate.
mkdir -p /srv/musefs-mnt
mount --bind /srv/musefs-mnt /srv/musefs-mnt
mount --make-rshared /srv/musefs-mnt
# A named volume for the store, writable by the image's unprivileged user.
podman volume create musefs-store
# musefs container: bind rshared, mount musefs there with --allow-other.
podman run -d --name musefs \
--device /dev/fuse --cap-add SYS_ADMIN --security-opt apparmor=unconfined \
-v /path/to/library:/library:ro -v musefs-store:/store \
--mount type=bind,source=/srv/musefs-mnt,destination=/mnt/musefs,bind-propagation=rshared \
ghcr.io/sohex/musefs:latest mount /mnt/musefs --db /store/musefs.db --allow-other
# consumer container: bind the same host path rslave; the mount propagates in.
podman run -d --name player \
--mount type=bind,source=/srv/musefs-mnt,destination=/music,bind-propagation=rslave \
ghcr.io/sohex/yourmediamanager:latest
Use a named volume (or an already-writable host path) for the store: a bind from
a root-owned host directory is read-only to the image's unprivileged user and
musefs aborts before mounting. --allow-other is required because the consumer
container runs as a different uid than the musefs container; without it the
consumer gets Permission denied on the mount. See
Ownership and permissions.
Note: Some hardened kernels block cross-uid access to an unprivileged user's FUSE mount even with
--allow-other— for example when the fuse module'sallow_sys_admin_accessparameter isN, or unprivileged user namespaces are restricted. If the consumer still getsPermission denied, set/sys/module/fuse/parameters/allow_sys_admin_accesstoY, or run musefs and the consumer under the same uid.
Both the glibc and musl images carry the fuse3 userspace tools; pick :musl
if your other containers are Alpine-based, otherwise the default tags are fine.
Sharing a host mount into a container
Running musefs on the host instead of in a container is simpler and needs no
CAP_SYS_ADMIN. Mark the mount point as shared and mount musefs there with
--allow-other, then bind it into the consumer container with rslave so the
host's musefs mount propagates in:
# On the host: mark the mount point shared, then mount musefs with --allow-other.
mkdir -p /srv/musefs-mnt
mount --bind /srv/musefs-mnt /srv/musefs-mnt
mount --make-rshared /srv/musefs-mnt
musefs mount /srv/musefs-mnt --db /store/musefs.db --allow-other &
podman run -d \
--mount type=bind,source=/srv/musefs-mnt,destination=/music,bind-propagation=rslave \
ghcr.io/sohex/yourmediamanager:latest
# the container reads the re-tagged view at /music, byte-for-byte live
rslave is what keeps this working across restarts: a plain bind only captures
whatever is mounted when the container starts, so it shows an empty directory if
musefs mounts later and a stale view after a musefs restart.
Scanning
musefs --version (or -V) prints the build version; --help on the root or
any subcommand lists its flags.
Scan
musefs scan /path/to/music --db library.db # ingest (dirs recurse)
musefs scan /path/to/music --db library.db --revalidate
scan probes each audio file (FLAC, MP3, M4A/M4B, Ogg, WAV), recording its
audio byte range, tags, and embedded art in the store. It takes one or more
files or directories, and --jobs N controls probe parallelism.
--follow-symlinks walks symlinked files and directories (off by default, so
symlinks are logged and skipped). --quiet
(-q) suppresses the per-target summary for scripting; scan failures still
surface on stderr (raise detail with -v/-vv, or RUST_LOG=info).
scan and scan --revalidate show a live progress indicator: on an interactive
terminal, a discovery spinner followed by a determinate bar (position, percent,
ETA, current file); on a non-interactive stderr (piped or logged), throttled
ingested N/M (P%) lines. --quiet (-q) suppresses the progress indicator
and the per-target summary. Each summary line ends with the elapsed time.
The per-target summary reads scanned N: … skipped X, failed Y. skipped
counts every file that isn't a supported audio format — cover art, .cue /
.log / .nfo sidecars, and anything else non-audio — so a large skipped
number (hundreds or thousands on a big library) is expected, not an error.
A per-extension breakdown of the skip count is logged at end of scan (e.g.
skipped 42: jpg=20, cue=10, log=8, <none>=4), so you can tell expected
sidecars from anything genuinely unexpected. failed is the one
to watch: those are audio files musefs recognised by extension but could not
parse. Format dispatch is by extension only —
there is no content sniffing and no fallback to another parser, so a file
whose contents don't match its extension (e.g. a FLAC named .mp3) is handed
to the wrong parser, fails, and is counted here rather than retried. Renaming
files across formats makes them vanish from the mount; fix the extension and
rescan.
If any file fails (failed Y with Y > 0), scan exits 2 even though the
batch otherwise completes and the parseable files are ingested — so a pipeline
like musefs scan … && musefs mount … stops on a partial or total ingest
failure rather than mounting an incomplete library. A successful scan exits 0;
a hard error (a missing target, an unreadable DB) still exits 1. The exit code
is the only machine-detectable signal; per-file failures otherwise surface only
on stderr.
--revalidate is the maintenance pass: it skips unchanged files —
preserving any tag edits you made in the store — prunes tracks whose
backing file is gone, and garbage-collects orphaned art.
Content checksums and move re-identification
--checksum=none|fingerprint|full (env MUSEFS_CHECKSUM, default
fingerprint) controls what content checksums scan computes and stores.
none— no checksums (legacy behavior).fingerprint— compute a cheap fingerprint for each file, derived from the probe's parsed output (tags, audio bounds, embedded art). This is the default: it rides the existing probe at essentially no extra I/O cost and is sufficient for routine move detection.full— fingerprint plus an eager full-file SHA-256. Use this when you want collision-proof retargeting or a forensic content identity for every file.
Two flags govern how a fingerprint match is confirmed before retargeting a moved file:
--fast(envMUSEFS_FAST) — fingerprint match is always sufficient; never reads the full file even when a storedcontent_hashexists.--strict(envMUSEFS_STRICT) — require a full-hash match; if the matched candidate has no storedcontent_hash, refuse the retarget and insert a fresh row instead. The default (neither flag) auto-escalates: full-hash the new file when the candidate already has acontent_hash, and trust the fingerprint alone when it does not.
--fast and --strict are mutually exclusive.
Move re-identification workflow. After moving or reorganizing your backing
library, run a normal musefs scan on the new locations. For each file not
already in the store, the scanner looks up rows whose fingerprint matches and
whose old path is gone, and retargets the unique match in place — its id,
tags, and art are preserved. Move recovery only applies to rows that were
fingerprinted before the move (rows scanned under --checksum=none have no
fingerprint and cannot be retargeted until a later fingerprint-tier pass).
Run scan after a move and ideally before any revalidate — revalidate
still prunes tracks whose backing file is gone, so it will remove un-retargeted
rows if run first.
Mounting & path templates
Mount
musefs mount /path/to/mountpoint --db library.db \
--template '$albumartist/$album/$title' \
--default-fallback Unknown \
--fallback albumartist='Unknown Artist' \
--mode synthesis # or: structure-only
mount blocks until the filesystem is unmounted (fusermount3 -u, or
Ctrl-C).
mountnever creates the store — unlikescan, it requires a populated DB to already exist and exits non-zero otherwise. Interactively this is invisible (thescan→mountquick start always seeds it first), but it bites automation: amountstarted at boot before anything has scanned hard-fails (and crash-loops underRestart=). Seed the store with an initialscan, or order the mount after it — seecontrib/systemd.
Mounting at an arbitrary path may be denied by AppArmor. On distros that ship an AppArmor profile for
fusermount3(Ubuntu 24.04+ / libfuse ≥ 3.17), unprivileged FUSE mounts are only allowed when the mountpoint is under a whitelisted prefix — the shipped profile permits$HOME/**,/mnt,/media,/tmp,/cvmfs,$XDG_RUNTIME_DIR, plus flatpak dirs. Mounting elsewhere (e.g. a data volume at/data/...) fails withfusermount3: mount failed: Permission denied, and the kernel audit log showsapparmor="DENIED" operation="mount" … profile="fusermount3". The mountpoint's own ownership is irrelevant — AppArmor rejects themount()syscall first. Fix it by mounting under a permitted prefix, or by whitelisting your prefix in/etc/apparmor.d/local/fusermount3(the shipped profile ends withinclude if exists <local/fusermount3>).
Two modes:
synthesis(default) — files carry metadata freshly generated from the store, spliced ahead of the original audio bytes.structure-only— files are served byte-for-byte as they are on disk; only the directory tree is virtual.
Edit tags or art in the database while mounted (another scan, a
beets/Picard/Lidarr sync, raw SQL) and the view refreshes automatically.
Run musefs <command> --help for the full flag list.
Path templates
Paths come from a beets-style template (matched case-insensitively; any tag key in the store works):
$field/${field}— substitute a tag field (e.g.$artist,$album,$title,$tracknumber,$date,$genre).${albumartist|artist}— fallback chain: the first present field wins, before the--default-fallbackvalue (defaultUnknown) is used.- A missing field resolves in order: the field's value, then a per-field
fallback from
--fallback FIELD=VALUE(repeatable, e.g.--fallback albumartist='Unknown Artist'), then--default-fallback. Per-field fallbacks let one field default differently from the rest. --skip-on-missing— drop a track from the mount entirely when a top-level template field stays unresolved, instead of substituting--default-fallback. Per-field--fallbackchains and[ … ]sections are unaffected (a field resolved via its fallback counts as present, and section fields stay optional). Handy when an external tool tags only some tracks, e.g.--template '$!{beets_path}' --skip-on-missinghides tracks beets left without abeets_path(such as deduplicated albums).[ … ]— conditional section: the bracketed text is emitted only when at least one field inside it is present. So$album[ - CD $disc]yieldsAlbum - CD 2, or justAlbumon a single-disc release. Write$[/$]for literal brackets.$!{field}— path field: the value's/are kept as directory separators (each segment sanitized; empty/./..dropped). Lets an external tool precompute a whole relative path into one tag and mount it as--template '$!{beets_path}'.
Anything else is literal. Name collisions get a deterministic (2), (3), …
suffix. Every rendered component is capped at 255 bytes (NAME_MAX, truncated on
a UTF-8 boundary, extension preserved), and a plain field whose value is
exactly . or .. is dropped rather than creating an unusable directory. The
default template is $albumartist/$album/$title.
Brackets and braces must be balanced: an unclosed [ section or an
unterminated ${ / $!{ field is rejected at mount time with an error naming
the problem, rather than silently folding the rest of the template into the
open construct. To check a template before committing to a mount, add
--dry-run: it validates the template, prints a sample of the paths the mount
would expose along with the total file and directory counts, then exits without
mounting.
Tuning & metrics
Tuning
The defaults are sensible for most setups, including the two measured storage wins —
daemon-level backing read-ahead (--read-ahead-budget-mib, the single biggest win for
NFS/remote) and keeping the kernel page cache across opens (--keep-cache, on by default,
~3× faster reopen on HDD/NFS). The kernel-level read-ahead / background knobs have little
measurable effect (see the storage-tunables benchmarks for the methodology
and numbers).
| Flag | Default | What it does |
|---|---|---|
--poll-interval-ms | 1000 | Debounce window for detecting external DB edits. |
--read-ahead-budget-mib | 64 | Per-mount RAM budget (MiB) for backing read-ahead: the daemon coalesces a stream's small FUSE reads into one large positioned read, so the backing client can pipeline/parallelize them. The biggest lever for slow/high-latency backing — ~5–6× single-stream throughput over a 200 ms-RTT NFS mount; neutral on local disk. Shared across all active streams with LRU eviction; 0 disables it. |
--read-ahead-prefetch | disabled | Advanced: add background prefetch threads on top of read amplification. Off by default — benchmarks found amplification alone delivers the entire read-ahead win, while the threads add ~10% overhead with no measured benefit. Enable only when profiling a backend where a single large read does not self-pipeline. |
--keep-cache <true|false> | true | Keep the kernel page cache across opens. On by default — it is the one measured storage win: repeat opens of a file are served from cache instead of re-read over slow storage (~3× faster reopen on HDD/NFS in our benches). External re-tags auto-invalidate the affected files, so cached bytes never go stale. Disable with --keep-cache false (e.g. on a memory-constrained host where the page cache is contended). |
--attr-ttl-ms | 1000 | How long the kernel may trust cached entry/attr lookups. Higher cuts lookup/getattr traffic — useful for metadata-heavy clients (library scanners) over high-latency backing — but bounds how fast external edits become visible. |
--max-readahead-kib | 512 | Kernel read-ahead window (clamped to the kernel maximum). Distinct from --read-ahead-budget-mib (the daemon-level read-ahead, which is the effective one): this kernel knob does not speed up musefs streaming, since reads reach the daemon in fixed FUSE-sized chunks regardless. On HDD, values well above the default can even hurt. Leave at the default unless your own profiling shows otherwise. |
--max-background | 64 | Max outstanding background (read-ahead/async) requests the kernel keeps in flight. Does not bound foreground reads (those scale with client concurrency), so it has little effect on read throughput; left for completeness. |
Filename case-folding (--case-insensitive) is platform behaviour rather than
a performance knob — see Platform support.
Metrics
musefs mount optionally exposes runtime telemetry through a synthetic
.musefs-metrics/ directory at the mount root:
musefs mount /mnt/music --db library.db --expose-metrics # or: MUSEFS_EXPOSE_METRICS=1
cat /mnt/music/.musefs-metrics/metrics
# HELP musefs_uptime_seconds Seconds since the mount started.
# TYPE musefs_uptime_seconds gauge
musefs_uptime_seconds 60
# HELP musefs_handles_open Open file handles in the core slab.
# TYPE musefs_handles_open gauge
musefs_handles_open 3
# HELP musefs_cache_header_hits_total Raw header-cache key hits; a hit may still trigger a content-version rebuild.
# TYPE musefs_cache_header_hits_total counter
musefs_cache_header_hits_total 100
--expose-metrics (default off) is a runtime flag that gates the virtual
file; it is unrelated to the compile-time metrics cargo feature, which adds
syscall counters (opens, preads, etc.) to the output. The jemalloc allocator
stats require a build with the jemalloc feature, which is the default.
The metrics file advertises st_size == 0 (like /proc), so use an
EOF-aware reader — cat, head -c, or the Prometheus textfile collector —
not a stat-and-read-by-size approach.
Maintenance
Compacting the store (musefs vacuum)
The SQLite store only grows as you use it: deleting tracks (beets/Lidarr prunes), garbage-collecting orphaned art, and the schema migration all leave free pages behind that are not automatically reclaimed. Because embedded art is stored inline (up to ~16 MiB per image), a library that has churned art can carry significant dead space.
musefs vacuum compacts the store and reports how much it reclaimed:
musefs vacuum --db library.db # or: MUSEFS_DB=library.db musefs vacuum
vacuumed library.db: 412.7 MiB → 318.2 MiB (reclaimed 94.5 MiB)
It runs SQLite's VACUUM followed by a WAL checkpoint, rewriting the database
into a compact form.
Run it while unmounted
VACUUM needs a write lock on the store and rewrites the whole file. Run it when
nothing else is using the database — no mount, no scan. If the store is in use,
the command fails with an actionable error rather than fighting for the lock:
error: the store is in use — unmount the filesystem or stop any scan before vacuuming
Notes
- Full rewrite. Each run rewrites the entire database and transiently needs
free disk space roughly equal to the store size (it builds a complete copy
before swapping). Running it again on an already-compact store is safe and
reports
(already compact). - May upgrade the schema. Like every musefs command that opens the store for
writing,
vacuummigrates an older store to the current schema version before compacting.
Ownership, permissions & config
Ownership and permissions
By default the mount presents the launching process's uid/gid and read-only
permission bits (555 dirs, 444 files), and is reachable only by the user who
performed the mount (and root).
To present a different owner — e.g. a media-server service account — and let that
account actually reach the mount, pass --owner/--group (or --allow-other).
Either makes musefs mount with allow_other and default_permissions: other
users can traverse the mount, and the kernel enforces the presented owner/mode
bits instead of ignoring them.
| Flag | Default | What it does |
|---|---|---|
--owner <NAME|UID> | process uid | User presented as the owner of every entry. Accepts a username or a numeric uid. Implies --allow-other. |
--group <NAME|GID> | process gid | Group presented for every entry. Accepts a group name or a numeric gid. Implies --allow-other. |
--allow-other | off | Mount with allow_other + default_permissions so accounts other than the mounting user can reach the mount and the owner/mode bits are enforced. Implied by --owner/--group. |
--file-mode <OCTAL> | 444 | Permission bits for regular files, in octal. The mount is read-only, so write bits are advertised but writes still fail with EROFS. |
--dir-mode <OCTAL> | 555 | Permission bits for directories, in octal. |
The default 444/555 bits are world-readable, so any account can read once
allow_other is on. To restrict the mount to the presented owner/group, drop the
world bits (e.g. --file-mode 440 --dir-mode 550) — only then does
--owner/--group gate access rather than merely label it.
Non-root mounts need user_allow_other. When you are not root, libfuse
refuses an allow_other mount unless /etc/fuse.conf contains a line
user_allow_other. musefs checks this before mounting and fails with an
explanatory error if it is missing; add the line to /etc/fuse.conf, or run
musefs as root. (This is libfuse/system policy, not a musefs restriction.) The
published container images already include this line, so non-root allow_other
mounts work out of the box there.
--allow-other grants other users — but not root. A FUSE mount made with
allow_other (not allow_root) is reachable by other unprivileged users, yet
root specifically cannot traverse or stat it when it is owned by another
user. This surprises root-run tooling (Ansible, boot scripts):
mountpoint -q <mnt>/stat <mnt>run as root report it as not a mountpoint — they try to stat through the mount and get EACCES. Detect the mount from root withfindmnt <mnt>or/proc/mountsinstead, which read the mount table rather than the filesystem.- Don't have root manage the mountpoint directory while it is mounted: a
root task that re-asserts the directory (e.g. Ansible
file: state=directory) fails with EACCES/EEXIST on every run after the first. Create the directory before mounting, or run such tasks as the mounting user.
Configuring with environment variables
Every scalar mount and scan flag can also be set with a MUSEFS_*
environment variable — uppercase the long flag and turn dashes into
underscores (e.g. --poll-interval-ms → MUSEFS_POLL_INTERVAL_MS, the
mount mountpoint → MUSEFS_MOUNTPOINT). An explicit flag always overrides
its env var, which overrides the default. Boolean flags (e.g.
MUSEFS_KEEP_CACHE, MUSEFS_REVALIDATE, MUSEFS_FOLLOW_SYMLINKS,
MUSEFS_QUIET, MUSEFS_ALLOW_OTHER, MUSEFS_CASE_INSENSITIVE,
MUSEFS_EXPOSE_METRICS, MUSEFS_FAST, MUSEFS_STRICT) accept a
case-insensitive boolish value — true/false, yes/no, on/off,
1/0 — and reject anything else. The repeatable --fallback and the
scan targets are command-line only. See
contrib/systemd/musefs.conf.example
for a commented example covering the common settings.
These variables are read the same way no matter how musefs is launched:
exported into the shell before running the binary directly
(MUSEFS_DB=… musefs mount), set via a systemd EnvironmentFile= or
Environment= directive, or passed into a container with -e/--env-file.
The configuration surface is identical across all three; the sections below
just show the per-deployment wiring.
Running as a systemd user service
To run musefs on the host at login, drop-in units live in
contrib/systemd/: a musefs.service mount daemon, an
optional musefs-scan.timer for periodic re-scans, and a commented
musefs.conf.example holding every MUSEFS_* setting. Copy the units to
~/.config/systemd/user/, copy the config to ~/.config/musefs/musefs.conf,
edit MUSEFS_MOUNTPOINT and MUSEFS_DB, then
systemctl --user enable --now musefs.service. See
the systemd integration guide for the full
walkthrough and the PATH / linger gotchas.
FAQ
Does musefs ever write to my audio files? No. The mount is read-only and the scanner only reads. The served files are assembled on the fly: generated metadata plus positioned reads of your originals. Nothing is ever copied or rewritten.
Where do my edited tags live?
In the SQLite store (--db). Edit it with the
beets or Picard
plugins, the Lidarr integration, or with plain
SQL — the schema is a documented, stable contract
(see the SQLite store).
Do edits show up without remounting? Yes. The mount polls the database (debounced) and picks up external commits automatically, with stable inodes across refreshes — even files held open keep working.
Can I write through the mount? No — and it's not planned. Out-of-band editing against the store is the design: it's what guarantees your originals can never be corrupted.
Is it fast enough for a big library on a NAS?
That's the design target: synthesized headers are cached, blocking reads run
on a worker pool so a slow disk never stalls the filesystem, and read-ahead,
cache TTLs, and poll intervals are all tunable. In
structure-only mode on kernel 6.9+, reads can bypass the daemon entirely
via FUSE passthrough (needs CAP_SYS_ADMIN).
A file in the mount won't open / reads error — why?
The most common cause is a backing file that changed since its last scan
(musefs refuses to serve a file whose size, mtime, or ctime drifted, rather
than splice at stale offsets). Run musefs scan --revalidate to re-probe it.
Supported formats
musefs synthesizes fresh metadata for each supported container while serving the original audio bytes verbatim. Each format has its own page for the exact synthesis behavior and lossy edges.
| Format | Extensions | What is synthesized |
|---|---|---|
| FLAC | .flac | Regenerates the metadata blocks; preserves STREAMINFO/SEEKTABLE bit-exact |
| MP3 | .mp3 | Regenerates the ID3v2.4 tag; audio frames (incl. Xing/LAME) untouched |
| M4A | .m4a, .m4b | Rebuilds the moov atom, patching chunk offsets; mdat served verbatim |
| Ogg | .ogg, .oga, .opus | Regenerates header pages; audio pages verbatim, only page seq/CRC patched in place |
| WAV | .wav | Regenerates the RIFF front (LIST/INFO + embedded ID3v2); data payload verbatim |
FLAC
How musefs scans and synthesizes native FLAC files (.flac). FLAC inside an
Ogg container is a different beast — see Ogg. For the segment
model these layouts plug into, see
the segment model.
What round-trips
- All text tags. Canonical keys (
title,artist,albumartist,date,tracknumber, …) map to their conventional Vorbis field names via the shared vocabulary (musefs-format/src/tagmap.rs); any other field round-trips verbatim by its own name. Multi-value fields keep their order. User-defined keys that are not legal Vorbis field names (empty, containing=, control characters, or non-ASCII bytes — i.e. outside ASCII0x20–0x7Dminus=) are dropped on synthesis and logged; they cannot round-trip by name. - Binary metadata blocks.
APPLICATIONandCUESHEETblocks are captured at scan time as binary tags (anAPPLICATIONpayload includes its 4-byte application id) and re-emitted on synthesis, streamed from the DB rather than held in memory. - Embedded pictures. Each
PICTUREblock round-trips with its MIME type, picture type, description, and dimensions; image bytes are stored content-addressed and streamed at read time. - Structural blocks.
STREAMINFOandSEEKTABLEare preserved bit-exact. They are captured into the read-onlystructural_blocksstore at scan time (external tools must not edit them) and re-emitted on synthesis.
Lossy edges
PADDINGblocks are dropped — the synthesized file carries no padding.- Metadata blocks of unknown/reserved types are dropped at scan time.
- A
PICTUREblock whose picture type falls outside the standard0–20range is clamped to0(Other) at scan time, matching the store'strack_art.picture_typeCHECK. This sharedPICTUREparser also serves FLAC-in-Ogg, so the same clamp applies there. - The
VORBIS_COMMENTvendor string is replaced with musefs's own. - Vorbis field names are case-insensitive by spec; musefs re-emits canonical
keys under their conventional uppercase names and upper-cases unknown
field names. A field stored as
MixedCasecomes back asMIXEDCASE— same field to a conforming reader, different bytes.
How synthesis works
flac::synthesize_layout (musefs-format/src/flac.rs) builds the layout in
this order — an inline metadata region, DB-streamed payloads, then the
untouched audio:
offset 0
┌──────────────────────────────────────────────┐ ┐
│ █ "fLaC" marker (Inline) │ │
│ █ STREAMINFO / SEEKTABLE, bit-exact (Inline) │ │ generated
│ █ VORBIS_COMMENT rebuilt from DB (Inline) │ │ metadata
│ ▒ APPLICATION / CUESHEET bodies (BinaryTag) │ │ region
│ █ PICTURE framing + ▒ image bytes (ArtImage) │ │
├──────────────────────────────────────────────┤ ┘
│ ░ audio frames, verbatim (BackingAudio) │
└──────────────────────────────────────────────┘
EOF █ inline-generated ▒ DB-streamed ░ untouched backing
Inline— thefLaCmarker plus the preserved structural blocks (STREAMINFO,SEEKTABLE, sorted by block type) and aVORBIS_COMMENTblock regenerated entirely from the DB tag rows.BinaryTag— one segment per storedAPPLICATION/CUESHEETblock, streamed from the DB at read time.ArtImage— onePICTUREblock per linked art row; the block framing is inline, the image bytes stream from the blob store.BackingAudio— the original audio frames, served by positioned reads at the storedaudio_offset/audio_length.
Structural blocks normally come from the structural_blocks store. A
database scanned before that store existed has no rows there; synthesis then
falls back to re-reading the file's front for every preserved block
(carrying APPLICATION/CUESHEET inline and suppressing the streamed
binary tags so nothing is emitted twice). A re-scan upgrades the track to
the streamed path.
Quirks & invariants
- The audio frames are never touched: the backing segment starts exactly at
the scanned audio offset, and the byte-identical-audio property is asserted
by
musefs-format/tests/proptest_flac.rsand the mutagen interop suite (musefs-core/tests/interop_emit.rs). - Synthesis re-parses its own inline output in tests
(
flac_tag_roundtrip_is_stable): the regenerated front must be a valid FLAC metadata region whose computed audio boundary equals the layout's header length. - Block-body sizes are bounded at parse time (
MAX_BLOCK_BODY); a crafted file cannot force a huge allocation. - The parser now rejects (at scan and synthesis) any FLAC whose metadata does not begin with exactly one 34-byte STREAMINFO block; a crafted store providing malformed structural rows fails synthesis with a controlled error rather than emitting decoder-rejected output.
MP3
How musefs scans and synthesizes MP3 files (.mp3) and their ID3v2 metadata.
For the segment model these layouts plug into, see
the segment model. The ID3v2 builder
described here is shared with WAV's embedded id3 chunk — see
WAV.
What round-trips
- Canonical text tags (
title,artist,albumartist,date,tracknumber, …) map to their standard ID3v2 text frames (TIT2,TPE1,TPE2,TDRC,TRCK, …) via the shared vocabulary (musefs-format/src/tagmap.rs). NUL-separated multi-value frames yield one tag row per value and are re-emitted NUL-separated in a single frame. - Vocabulary
TXXXkeys (ReplayGain fields, MusicBrainz album/artist ids) round-trip throughTXXXframes with their fixed, exact-case descriptions (e.g.MusicBrainz Album Id). - Unmapped standard text frames round-trip keyed by their own frame id: a
TSSE(or a legacy v2.3TYER) comes back as the same frame inside the synthesized tag. - Other user-defined keys round-trip as
TXXXframes keyed by their own description, original casing preserved. - Comments and lyrics (
COMM/USLT): one tag row per frame. A frame with a placeholder language (XXX/und/empty) and no descriptor folds to the sharedcomment/lyricskey; one carrying a real language or descriptor is keyedid3:COMM:<lang>:<desc>/id3:USLT:<lang>:<desc>so per-language or description-keyed frames stay distinct, and both fields are restored on synthesis. - Ratings and play counts: a
POPMframe is promoted at scan time torating(the raw 0–255 byte) andplaycount(omitted when 0) text tags, and rebuilt as aPOPMframe on synthesis. - MusicBrainz track id: a
UFIDframe with thehttp://musicbrainz.orgowner is promoted tomusicbrainz_trackidand rebuilt with the same owner. - Opaque binary frames, byte-exact:
PRIV,GEOB,SYLT,MCDI, URL (W***) frames, non-MusicBrainzUFIDs, and unknown frames are captured verbatim (frame id + raw body) and re-emitted streamed from the DB (BinaryTagsegments) — never held in memory. - Embedded pictures (
APIC): MIME type, picture type, and description round-trip; image bytes are stored content-addressed and streamed.
Lossy edges
- The synthesized tag is always ID3v2.4, regardless of the source tag's version (v2.2/v2.3 tags are parsed but never re-emitted as such).
- A
COMM/USLTframe folded to the sharedcomment/lyricskey (placeholder language, no descriptor) is re-emitted with languageXXXand an empty descriptor, so a sourceundplaceholder comes back asXXX. Frames carrying a real language or descriptor are preserved (see above). POPM: the owner ("email to user") field is dropped by design. MultiplePOPMframes collapse to one (first rating wins, last parseable play count wins); counters aboveu32::MAXclamp to 4 bytes.- ID3v1 is not read. A file whose only tag is ID3v1 scans with no tags (populate the DB via beets/Picard instead). A trailing ID3v1 tag is also excluded from the audio region, so the synthesized file does not carry it.
- The audio locator validates the ID3v2 major version (2–4) and rejects
synchsafe size bytes with the high bit set, producing a controlled
Malformederror rather than mask-decoding an invalid offset. Tags using unsynchronisation or an extended header still scan — their declared size already covers the audio boundary. - Scan-time tag extraction is skipped entirely — by a deliberate
denial-of-service guard, see below — for tags using unsynchronisation, an
extended header, non-zero frame flags (compression/encryption), malformed
synchsafe size fields, or containing
CHAP/CTOCchapter frames. Such files still mount and serve; they just contribute no scanned tags. - ID3v2.2 binary frames are not extracted (3-char ids; text and art still
parse).
APICwidth/height are not recorded at scan time. - An
APICpicture type outside the standard0–20range (theid3crate'sUndefined(u8)variant can exceed 20) is clamped to0(Other) at scan time, matching the store'strack_art.picture_typeCHECK.
How synthesis works
mp3::synthesize_layout (musefs-format/src/mp3.rs) emits a fresh ID3v2.4
tag followed by the untouched audio:
offset 0
┌──────────────────────────────────────────────┐ ┐
│ █ ID3v2.4 header (10 bytes) (Inline) │ │
│ █ text / TXXX / COMM / USLT frames (Inline) │ │ generated
│ █ rebuilt POPM / UFID frames (Inline) │ │ ID3v2.4
│ █ frame header + ▒ opaque body (BinaryTag) │ │ tag
│ █ APIC framing + ▒ image bytes (ArtImage) │ │
├──────────────────────────────────────────────┤ ┘
│ ░ MPEG audio incl. Xing/LAME, (BackingAudio) │
│ ░ verbatim │
└──────────────────────────────────────────────┘
EOF █ inline-generated ▒ DB-streamed ░ untouched backing
Inline— the 10-byte tag header, all text/TXXX/COMM/USLTframes, and the rebuiltPOPM/UFIDframes. Frame sizes are synchsafe-bounded; oversized frames fail synthesis rather than emit a corrupt tag.- Per picture: inline
APICframing + anArtImagesegment streaming the image bytes. - Per opaque binary frame: an inline frame header + a
BinaryTagsegment streaming the body from the DB (empty payloads are skipped — they would fail layout validation). BackingAudio— the audio region located at scan time: everything after the leading ID3v2 tag and before a trailing ID3v1 tag, anchored by an MPEG frame-sync check. The Xing/LAME info frame is an MPEG frame, so it travels with the audio untouched.
Quirks & invariants
- The OOM guard (
id3v2_alloc_safe): theid3parser crate eagerly allocates a frame's declared size (v2.3 sizes are plain 32-bit — up to 4 GiB), so musefs validates every frame bound itself before handing a buffer to the crate, and refuses tags it cannot validate. Found and locked in by themp3fuzz target; the conservative skips listed under "Lossy edges" are this guard. - Byte-identical audio and tag round-trip stability are asserted by
musefs-format/tests/proptest_mp3.rsand the mutagen interop suite (musefs-core/tests/interop_emit.rs).
M4A
How musefs scans and synthesizes MP4-container audio (.m4a, .m4b). Only
unfragmented files with exactly one track, and that track audio (soun), are
accepted; anything else is skipped at scan time. For the segment model these
layouts plug into, see the segment model.
What round-trips
- Canonical text tags map to their standard
ilstatoms (©nam,©ART,aART,©alb,©day, …) via the shared vocabulary (musefs-format/src/tagmap.rs). - Vocabulary freeform keys (ReplayGain fields, MusicBrainz album/artist
ids,
ISRC,COPYRIGHT, …) round-trip through----freeform atoms under thecom.apple.iTunesmean, matched case-insensitively. - Other text freeform atoms round-trip keyed by their verbatim
name, original casing preserved. - Track and disc numbers, with totals: the binary
trkn/diskatoms are decoded totracknumber/discnumberas"N"or"N/M"(the "N of M" total, matching ID3TRCK/TPOS) and rebuilt as binary atoms with the total filled in. - Integer atoms:
tmpo/cpil/pgapmap to the canonicalbpm/compilation/gaplesskeys (shared with ID3TBPM/TCMPand Vorbis) and are rebuilt as type-21 integer atoms. - Multi-value atoms: every
datasub-box of an atom is read (the iTunes multiple-dataconvention), so a multi-valued atom round-trips all its values, not just the first. - Opaque binary freeform atoms, byte-exact: a
----atom whose payload is binary-typed is captured verbatim under the key----:<mean>:<name>(so the mean survives) and re-emitted streamed from the DB (BinaryTagsegment). - Cover art: every
datachild of acovratom (the iTunes multiple-artwork convention) is ingested; synthesis emits onecovratom with onedatachild per stored art row, in order, image bytes streamed.
Lossy edges
- A text freeform atom under a mean other than
com.apple.iTunesis re-emitted with thecom.apple.iTunesmean (the scan keys text freeform by name only). Binary freeform atoms keep their mean via the----:<mean>:<name>key. - Binary
ilstatoms outside the handled set (trkn/disk, thetmpo/cpil/pgapinteger atoms, and----freeform) are dropped at scan time, since they are not re-emitted on synthesis. covringestion accepts only JPEG (type 13) and PNG (type 14) artwork; other type codes are skipped. MP4 has no picture-type or description fields: scanned art becomes "front cover" with an empty description, and any non-PNG stored art is emitted with the JPEG type code.- A
covrimage or binary----value larger than its size cap is skipped at scan time — before the image is materialized out of a potentially largemoov— and logged (awarnline on stderr) so the lossy drop is explained rather than silent.
How synthesis works
mp4::synthesize_layout (musefs-format/src/mp4.rs) regenerates the moov
box and serves [ftyp][regenerated moov][mdat header][mdat payload]:
offset 0
┌──────────────────────────────────────────────┐ ┐
│ █ ftyp, copied verbatim (Inline) │ │
│ █ moov: kept structural children, (Inline) │ │ regenerated
│ █ stco/co64 offset values += Δ │ │ front
│ █ fresh udta/meta/ilst framing (Inline) │ │
│ █ ---- framing + ▒ freeform body (BinaryTag) │ │
│ █ covr framing + ▒ image bytes (ArtImage) │ │
│ █ mdat header (Inline) │ │
├──────────────────────────────────────────────┤ ┘
│ ░ mdat payload, verbatim (BackingAudio) │
└──────────────────────────────────────────────┘
EOF █ inline-generated ▒ DB-streamed ░ untouched backing
Δ = new mdat payload offset − old
- The scan keeps
moov's structural children and drops its oldudta. A freshudta/meta/ilstis built from the DB: inline box framing, with each opaque----value and each cover image spliced in as streamedBinaryTag/ArtImagesegments. Every enclosing box size accounts for the streamed lengths, so the spliced bytes land exactly where the sizes say. - The
mdatpayload is served verbatim (BackingAudio), merely relocated: every chunk offset instco(32-bit) orco64(64-bit) shifts by one constant delta. Only offset values are patched, never box sizes, so the newmoovsize is computable before the delta — no circular dependency. A 32-bitstcooffset that would overflow fails synthesis rather than corrupt. - A
moovthat sits aftermdat(common for faststart-less files) is handled by a streaming reader that skips the mdat payload — the potentially hundreds-of-MB payload is never read at resolve time.
Quirks & invariants
- The structural metadata read at resolve time is capped
(
MAX_MP4_METADATA_BYTES, 256 MiB); a file declaring more is refused with a controlled error instead of ballooning memory. - MP4 box sizes are 32-bit: oversized synthesized metadata (e.g. enormous
art) fails with
TooLargeat the format boundary rather than emitting a truncated size field. - Byte-identical audio and structural validity are asserted by
musefs-format/tests/proptest_mp4.rs, an offset-patching oracle test (mp4_oracle.rs), and the mutagen interop suite (musefs-core/tests/interop_emit.rs).
Ogg (Opus / Vorbis / FLAC-in-Ogg)
How musefs scans and synthesizes Ogg files (.ogg, .oga, .opus) carrying
an Opus, Vorbis, or FLAC logical bitstream. Multiplexed and chained Ogg is
detected and skipped at scan time: within the header region every page must
share the first page's serial, and only the first page may carry
beginning-of-stream. For the segment model these layouts plug into, see
the segment model. Native FLAC files
are covered by FLAC.
The Ogg invariant
Original Ogg packet payload bytes are preserved during synthesis; page sequence numbers and CRCs may be patched intentionally. Synthesis regenerates the logical bitstream's header pages (to embed fresh tags and art), which changes the header page count; the audio pages that follow are served verbatim except that each page header's sequence number is shifted by a constant delta and its CRC recomputed in place. The served audio byte length is unchanged — renumbering patches, never recopies.
Verified by musefs-format/tests/proptest_ogg.rs (crate feature fuzzing),
read_at integration tests comparing source and synthesized audio payloads
(musefs-core/src/reader.rs test modules), and the mutagen interop suite
(musefs-core/tests/interop_emit.rs).
What round-trips
- All text tags. VorbisComments are rebuilt from the DB through the same
builder as FLAC: canonical keys map to their conventional field names via
the shared vocabulary (
musefs-format/src/tagmap.rs); any other field round-trips verbatim by its own name, in order, multi-values included. User-defined keys outside the Vorbis field-name grammar (empty, containing=, control characters, or non-ASCII — outside ASCII0x20–0x7Dminus=) are dropped on synthesis and logged. - Embedded pictures, with MIME type, picture type, description, and dimensions — in both art encodings (see below).
- Codec headers. The identification packet (
OpusHead, Vorbis identification, the OggFLACSTREAMINFOcarrier) and any trailing header packets (e.g. the Vorbis setup packet) are preserved; only the comment metadata is regenerated.
Lossy edges
- The VorbisComment vendor string is replaced with musefs's own.
- Vorbis field names are case-insensitive by spec; canonical keys come back under their conventional uppercase names and unknown field names are upper-cased on synthesis.
- Ogg carries no binary-tag slot: only text comments and pictures exist, so there is nothing else to preserve.
- Embedded pictures are parsed through FLAC's
PICTUREblock reader, so a picture type outside the standard0–20range is clamped to0(Other) at scan time, matching the store'strack_art.picture_typeCHECK. - Embedded picture descriptions are right-padded with up to two trailing
spaces. The FLAC PICTURE block is built with its description padded so the
prefix length —
32 + mime.len() + description.len(), i.e. everything before the image bytes — is a multiple of 3 (picture_prefix,musefs-format/src/ogg/mod.rs), which is what makesbase64(prefix ++ image) == base64(prefix) ++ base64(image)and lets the image's base64 be served as an independent, incrementally-streamable substring (the art split above). Padding the description is the safe place to do it — the MIME type must stay a valid type. So a synthesized picture's description can differ from the original by up to two trailing spaces; this applies to Opus/Vorbis and OggFLAC alike, since both build the block body the same way.
How synthesis works
ogg::synthesize_layout (musefs-format/src/ogg/mod.rs) produces:
offset 0
┌──────────────────────────────────────────────┐ ┐
│ █ identification page, preserved (Inline) │ │ regenerated
│ █ comment page(s) rebuilt from DB (Inline) │ │ header
│ ▒ art windows, base64/raw (OggArtSlice) │ │ pages
│ █ trailing header pages, preserved (Inline) │ │ (repaginated)
├──────────────────────────────────────────────┤ ┘
│ ░ audio pages: payload verbatim, (OggAudio) │
│ ░ page seq += Δ, CRC repatched in place │
└──────────────────────────────────────────────┘
EOF █ inline-generated ▒ DB-streamed
░ backing pages (headers patched in place, payload untouched)
Δ = synthesized header page count − original
Inline— the regenerated header pages: the preserved identification packet, a comment packet rebuilt from the DB, and the preserved trailing header packets, repaginated with correct CRCs.- The art split. Opus and Vorbis embed art as base64
METADATA_BLOCK_PICTUREcomments (the decoded bytes are a FLAC PICTURE block body): each image is anOggArtSlicerun — a window ofbase64(image)encoded incrementally at read time from the blob store, never materialized whole. Artwork is streamed at synthesis time: page CRCs are computed from page-boundedArtSourcewindows, and the full image and its base64 copy are never materialized. FLAC-in-Ogg instead carries one native FLACPICTUREblock packet per image (rawOggArtSliceruns, no base64); the last metadata packet's last-block flag and packet 0's 16-bit following-packet count are recomputed to match. Art exceedingMAX_ART_BYTES(16 MiB − 64 KiB) is rejected by the store'sCHECK, with a resolve-time cap backstopping a writer that disables check enforcement. OggAudio— one compact segment covering all original audio pages, with the page-count delta to apply to every sequence number.
At read time there is no in-memory page index: the page containing a
requested offset is found by a bounded backward scan (CRC-validated), then
pages are walked forward with each header patched algebraically and payload
bytes served by exact positioned reads. A one-page memo on the resolved file
short-circuits the scan for sequential reads. A page walk that overruns the
scanned audio bounds is a hard Malformed error — corrupt or misaligned
data is refused, not served. Synthesized page sequence numbers wrap modulo
2³² (matching Ogg's u32 sequence field), so files whose audio pages have
very high sequence numbers serve correctly rather than failing the read.
The forward page-walk reads (serve_ogg_window) flow through the shared backing
read-ahead buffer (BackingReader, see
backing read-ahead) just like PCM
BackingAudio reads, so a sequential Ogg stream amortizes backing latency the
same way. The read-ahead cache holds raw backing bytes keyed by absolute
offset, so it is orthogonal to header patching: the algebraic CRC/sequence
rewrite happens on the bytes after they are read, and the cache never sees a
patched page. (The backward find_page_start scan and its CRC check stay on the
raw fd — they are short, non-sequential probes that the forward-streaming window
would not help.)
CRC patching: the linear-CRC trick
This is the neatest thing in the Ogg path. Every Ogg page carries a CRC-32 over
its entire contents — header and payload, with the 4-byte CRC field
treated as zero during the computation (musefs-format/src/ogg/crc.rs).
Renumbering shifts every audio page's sequence number by Δ, which changes 4
header bytes (offsets 18..22). Naively, repairing the CRC means re-checksumming
the whole page — including the up-to-64 KB payload that musefs has gone out of
its way never to pull into memory.
It doesn't have to. The Ogg CRC uses init 0, no input/output reflection, and no
final XOR, which makes it linear over GF(2): for two equal-length messages,
crc32(A ⊕ B) == crc32(A) ⊕ crc32(B). Take A = the original page and B = a
delta page the same length as the original but all zeros except bytes 18..22,
which hold old_seq ⊕ new_seq. Then A ⊕ B is exactly the renumbered page, so:
new_crc = old_crc ⊕ crc32(DELTA)
and the payload — identical in A and A ⊕ B — cancels out entirely. The
patched CRC depends only on the old CRC (already in the header) and the 4-byte
sequence delta. The payload is never read.
Computing crc32(DELTA) also avoids walking the page. The 18 leading zero bytes
leave the running CRC at 0 (TABLE[0] = 0, so each step is a no-op), so the
computation starts directly from the 4-byte seq delta, then only has to "advance
the CRC over" the trailing zeros (the rest of the header plus the whole payload
length, read straight from the segment table). That advance is crc_shift_zeros
— the CRC-32 of appending n zero bytes. Appending one zero byte is a fixed
linear map on the 32-bit CRC state, so appending n of them is that 32×32 GF(2)
matrix raised to the n-th power by repeated squaring: O(log n), independent
of page size. Small, typical pages take a cheaper per-byte loop; only a huge
single packet laced into max-size pages crosses the matrix threshold.
The net effect is that patch_page_header_algebraic
(musefs-format/src/ogg/page.rs) repairs each served audio page's header from
just its 27 + seg_count header bytes, in work bounded independent of payload
size — and the audio payload stays untouched on disk, spliced in verbatim by
positioned reads. That is what lets the Ogg invariant
("renumbering patches, never recopies") hold at serve time without a per-page
in-memory index.
Quirks & invariants
- Page and header sizes are bounded at parse and serve time
(
MAX_OGG_PAGE_BYTES,MAX_OGG_HEADER_BYTESinmusefs-core/src/ogg_index.rs); a crafted file cannot force unbounded allocation. Theogg,ogg_page,b64, andvorbiscommentfuzz targets hammer these paths. - The incremental base64 encoder is windowed by output offset: any byte
range of the encoded form can be produced from the corresponding slice of
raw image bytes (
musefs-format/src/ogg/b64.rs). - The serve path's determinism does not depend on the memo: a content change rebuilds the resolved file and starts with a fresh, empty memo.
WAV
How musefs scans and synthesizes RIFF/WAVE files (.wav). WAV has no single
native tag standard, so musefs writes metadata twice: a broad-compatibility
LIST/INFO chunk and a full-fidelity embedded id3 chunk. For the
segment model these layouts plug into, see
the segment model. The ID3v2 tag inside
the id3 chunk is built by the same code as MP3's — MP3's
round-trip and lossy-edge rules apply to it wholesale.
What round-trips
- All text tags, via the embedded
id3chunk (full ID3v2.4, exactly as for MP3: canonical frames,TXXXextension slot, frame-id passthrough). - The INFO subset, twice. Seven canonical keys also get a native
LIST/INFOsubchunk for ID3-unaware readers:title→INAM,artist→IART,album→IPRD,date→ICRD,genre→IGNR,comment→ICMT,tracknumber→ITRK. - Binary ID3 frames and promoted tags (
POPM→rating/playcount, MusicBrainzUFID→musicbrainz_trackid, opaquePRIV/GEOB/… byte-exact) — classification identical to MP3, only the chunk extraction differs. - Embedded pictures:
APICframes inside theid3chunk, MIME + picture type + description preserved, image bytes streamed. - Structural chunks:
fmt(required) andfact(when present) are preserved from the original front.
At scan time, tags are merged per field from both surfaces with id3 taking
precedence and INFO filling gaps; only chunk headers are walked — the
data payload is never read.
Lossy edges
- Non-structural chunks are dropped. The synthesized front carries only
fmt,fact, the newLIST/INFO, and the newid3chunk: cue points (cue), broadcast-wave metadata (bext), sampler loops (smpl), and any other chunk from the original front are not reproduced. - The INFO chunk carries only the seven-field vocabulary above; readers that
understand only INFO see just those fields. Everything still rides in
the
id3chunk. - All of MP3's ID3 lossy edges apply to the
id3chunk: ID3v2.4-only output, placeholder-languageCOMM/USLTreset toXXX,POPMowner dropped, ID3v1 ignored, the OOM-guard skips (the authoritative list lives in MP3's lossy edges). - Tags trailing a very large
datapayload are not seen. When thedatapayload pushes anyLIST/INFOorid3chunk beyond the scan probe ceiling (64 MiB), the file is still ingested — thedatachunk header gives the audio bounds without reading the payload — but those trailing tag chunks are not read at scan time. Front-positioned metadata is unaffected.
How synthesis works
wav::synthesize_layout (musefs-format/src/wav.rs) regenerates the entire
RIFF front, then serves the untouched payload:
offset 0
┌──────────────────────────────────────────────┐ ┐
│ █ RIFF/WAVE framing (Inline) │ │
│ █ fmt (+ fact), preserved (Inline) │ │ regenerated
│ █ LIST/INFO chunk (7-field subset) (Inline) │ │ RIFF front
│ █ id3 chunk: ID3v2.4 text frames (Inline) │ │ (metadata
│ █ frame header + ▒ opaque body (BinaryTag) │ │ written
│ █ APIC framing + ▒ image bytes (ArtImage) │ │ twice)
├──────────────────────────────────────────────┤ ┘
│ ░ data chunk payload, verbatim (BackingAudio) │
└──────────────────────────────────────────────┘
EOF █ inline-generated ▒ DB-streamed ░ untouched backing
Inline—RIFF/WAVEframing, the preservedfmt(andfact) chunks, the rebuiltLIST/INFOchunk, and the embeddedid3chunk's text frames. Every chunk length is known up front, so theRIFFsize and each chunk size field are byte-exact — no placeholder sizes.- Inside the
id3chunk:APICframing inline withArtImagesegments streaming image bytes, andBinaryTagsegments streaming opaque ID3 frame bodies, exactly as in MP3 synthesis. BackingAudio— the originaldatachunk payload, served verbatim by positioned reads.
RIFF form-size enforcement
Every RIFF/WAVE file declares a form size at bytes 4..8 (riff_size).
The form covers bytes 8 through 8 + riff_size and must encompass all
top-level chunks (fmt , data, LIST, id3 , …). musefs enforces
this at parse time:
riff_wave_startparses the RIFF size and returnsform_end = 8 + riff_size.locate_audioandlocate_audio_at_ceilingreject any file whereform_endexceeds the physical file or where thedatachunk payload extends pastform_end.- Streaming or concatenated WAVs that write
riff_size = 0or0xFFFFFFFFare rejected, but only incidentally: there is no explicit sentinel check.riff_size = 0yieldsform_end = 8, which is smaller than any file carrying adatapayload, and0xFFFFFFFFyields aform_endlarger than any real file — both fall foul of the bounds checks above. Detecting and honouring those sentinels explicitly is a deferred follow-up.
Quirks & invariants
- A file must have both a
fmtchunk and adatachunk to scan; the declareddatasize must lie within the file. - The ID3-in-WAV path inherits MP3's allocation-bomb guard
(
id3v2_alloc_safe): a craftedid3chunk cannot OOM the scanner — this exact vector was found by thewavfuzz target. - Byte-identical audio and front re-parseability are asserted by
musefs-format/tests/proptest_wav.rsand the mutagen interop suite (musefs-core/tests/interop_emit.rs).
Integrations
External tools write tags and art into the musefs SQLite store; a live mount reflects their edits without copying audio. Each integration has its own page.
- beets — the
musefsbeets plugin - Picard — the MusicBrainz Picard plugin
- Lidarr — Custom Script integration
- systemd — running musefs as a user/system service
- python-musefs — the shared store-contract library behind the plugins
The plugin packages have their own changelog at
contrib/CHANGELOG.md.
beets-musefs
A beets plugin that syncs your beets metadata (tags + cover art) into a musefs SQLite store, so a live musefs mount shows a re-tagged view of your library without rewriting any audio.
How it fits together
- The plugin owns the tags (and cover art, when beets has it) of each track, keyed by the file's canonical real path.
- The structural columns (audio offsets, size, mtime) can only come from musefs
probing the file, so the plugin runs
musefs scanfor you (via thebinconfig) before syncing — it never tries to compute those itself. beet musefsscans the library and then syncs; the import/write hooks scan just the touched file and then sync. musefs's auto-refresh shows changes live — no remount, and no separate scan step.
Install
pip install beets-musefs
This pulls in the shared python-musefs runtime
library from PyPI automatically — both packages are published, so no
working-tree install is needed.
Use via pluginpath (no package install)
The plugin itself doesn't need to be installed — point beets at the plugin's
beetsplug directory and it loads at runtime. It still needs the shared
python-musefs runtime library importable, so install that first:
pip install python-musefs
beets adds pluginpath entries directly to the beetsplug package path, so it
must be the beetsplug dir itself (not its parent). In your beets config.yaml:
pluginpath: /path/to/musefs/contrib/beets/beetsplug
plugins: musefs
musefs:
db: ~/musefs.db # path to the musefs SQLite store (required)
bin: musefs # musefs executable for auto-scan; use a full path if
# not on $PATH, e.g. /path/to/musefs/target/release/musefs
# autoscan: yes # default; runs `musefs scan` for you. Set `no` to
# # manage scanning yourself (hooks then best-effort).
# fields: # optional: map extra beets fields to musefs keys
# comments: comment
Development install (from a checkout)
To hack on the plugin or run the test suite against your working tree, install both packages editable from the repo so imports resolve to the local source:
pip install -e contrib/python-musefs # shared library
pip install -e "contrib/beets[test]" # plugin + test deps
Workflow (test drive)
# Sync beets metadata into the store. Auto-scans the library first (creating the
# DB if needed) — no separate `musefs scan` step.
beet musefs # everything
beet musefs albumartist:"Boards of Canada" # a subset (scans just those files)
beet musefs -n # dry run: report counts, write nothing
beet musefs --revalidate # also prune rows whose backing file is gone
# Mount the re-tagged view.
musefs mount ~/mnt --db ~/musefs.db \
--template '$albumartist/$album/$tracknumber - $title'
# ...or mirror your beets library layout exactly, via the computed beets_path tag.
musefs mount ~/mnt --db ~/musefs.db --template '$!{beets_path}'
Imports and tag write-backs auto-sync via event hooks: beet import and
beet modify -w … record the touched items and reconcile them once the command
finishes — when each file's path is final (beets has no move event, and a write
fires before its move). The reconcile scans the new path and writes its tags,
but it never prunes — pruning is a deliberate act (see below). A move
therefore leaves the old path's row behind until you run beet musefs --revalidate. A metadata-only beet modify (no -w) doesn't fire a hook —
re-run beet musefs. With autoscan: no, run musefs scan yourself first; the
hooks then skip gracefully if the DB is missing.
Never writing to your backing audio files
If your backing files must stay byte-for-byte untouched — you're seeding them as a torrent, the library is immutable, or you simply want beets to drive the musefs view without ever rewriting a tag — configure beets to never write to disk:
import:
copy: no
move: no
write: no
write: no is enough on its own: every stock beets plugin gates its file writes
on import.write. The musefs plugin reads canonical metadata from the beets
database, not from the files, and musefs scan ingests/synthesizes embedded
art itself — so write: no loses nothing in the musefs view.
A few plugins ignore that gate or are redundant in this mode:
- scrub — deletes all tags from files directly via mutagen, ignoring
import.write; its auto-import hook would wipe tags from your backing files. Don't enable it. - embedart — embeds cover art into the audio files. Redundant: musefs already
presents embedded art in the virtual files (scan ingestion plus the plugin's
overlay of the album's
artpath). - zero — only acts during a file write, so it is inert with
write: no(nothing to do, but nothing to worry about either).
Notes
- Field coverage: every tag beets writes to a file (its
_media_tag_fields) is synced — ReplayGain, MusicBrainz IDs, comment, lyrics, grouping, isrc, multi-valued artists, and any custom field — under canonical musefs keys. Read-only file facts (bitrate, length, …) are never written as tags. - Merge, not replace: beets' values win for the fields it manages; any other tag already embedded in the file is preserved in the view.
- Deletions stick: the plugin records the keys it manages per track in a
musefs_managedbeets flexattr (stored in the beets DB only — never in your audio files or the musefs store). Remove a tag in beets and it is removed from the view and stays gone across re-scans. --restore-backing(orrestore_backing: yes): when you remove a tag in beets, let the file's original embedded value reappear instead of disappearing.- Caveat: sticky deletion relies on
autoscan: yes(the default), which re-derives the file's embedded tags before each sync. Withautoscan: no, a deletion only takes effect after your next manualmusefs scan. - Cover art: taken from the album's
artpath(beets' external cover file). beets art wins when present; otherwise any artmusefs scaningested from embedded pictures is preserved. - Computed path (
beets_path): each sync also writes abeets_pathtext tag holding the track's beets library-relative path (from yourpaths:config, viaitem.destination), with the file extension removed — musefs re-appends it. Mount with--template '$!{beets_path}'(the$!{}path field keeps/as directory separators) to mirror your beets layout, including layouts musefs's own template engine can't express. Setwrite_path: noin themusefs:config to skip it. Do not add an extension in a template that consumesbeets_path. See the computed-tag workflow in the architecture overview. - Pruning is a deliberate act. The plugin never prunes on its own. Pruning
track rows whose backing file is gone from disk (renames/moves/deletes) is owned
entirely by
musefs scan --revalidate, reachable from beets asbeet musefs --revalidate(which forwards the flag to the auto-scan). Plainbeet musefsand the passive end-of-command reconcile (beet import/beet modify -w) only sync, so a transient backing-storage loss — an unmounted network share, an offline drive, a momentary realpath divergence — can never mass-delete plugin metadata. Runbeet musefs --revalidate(ormusefs scan --revalidate) while the library is available to clear stale rows left by a move or an on-disk delete. - Removals are not auto-pruned.
beet remove/beet remove -ddoes not prune the store; runbeet musefs --revalidateafterwards to drop the rows whose backing file is now gone. A barebeet remove(which keeps the file on disk) leaves a servable row in place even then — musefs can still serve those bytes. - Orphaned art: replacing art can orphan old blobs;
musefs scan --revalidategarbage-collects them. - Schema version: the plugin refuses to run if the DB's
user_versiondiffers from the version it targets — rebuild after upgrading musefs.
Tests
The tests live under tests/ and use a local virtualenv with beets + pytest.
cd contrib/beets
uv venv # create .venv (once)
source .venv/bin/activate
uv pip install -e ../python-musefs # shared library (editable, from the working tree)
uv pip install -r requirements.txt # beets + pytest
python -m pytest # unit + integration (no Rust binary)
python -m pytest -m musefs_bin # path-matching gate vs the real `musefs` binary
python -m pytest -m e2e # full beets -> mount -> playback end-to-end
The musefs_bin gate shells out to the real musefs binary, so build it first
from the repo root (cargo build) and run it against a fresh build. The e2e
tier additionally needs ffmpeg and /dev/fuse + fusermount3: it generates
audio, imports it with beets, retags, syncs, mounts via FUSE, and verifies the
mount's tags and byte-identical audio (including a move-reconcile case). Both
tiers are deselected from the default run and skip cleanly if their tools are
absent.
musefs-picard
A MusicBrainz Picard plugin that syncs your Picard metadata (tags + front cover) into a musefs SQLite store, so a live musefs mount shows a re-tagged view of your library without rewriting any audio.
How it fits together
Picard has no way to redirect its Save to a database, so this plugin adds a context-menu action instead: match/edit as usual, then right-click your selection → "Sync to musefs" instead of pressing Save. The plugin:
- runs
musefs scanon each selected file to create/refresh its track row and structural columns (the offsets only musefs can compute), then - writes Picard's tags and front cover into the store, keyed by the file's canonical real path.
musefs's auto-refresh surfaces the change at the mount with no remount. The audio file is never saved by Picard.
Install (local / development)
Picard loads "folder plugins" from its plugins directory. Copy (or symlink) the
musefs/ folder there:
- Linux:
~/.config/MusicBrainz/Picard/plugins/ - macOS:
~/Library/Preferences/MusicBrainz/Picard/plugins/ - Windows:
%APPDATA%\MusicBrainz\Picard\plugins\
cp -r contrib/picard/musefs ~/.config/MusicBrainz/Picard/plugins/
The musefs/_common/ subfolder is the vendored python-musefs library, copied
in so the plugin folder is self-contained (Picard does not install plugin
dependencies). It is committed; you don't need to do anything to use it. If you
change the shared library, re-run python contrib/python-musefs/vendor_to_picard.py
and commit the refreshed copy — CI's drift guard enforces it.
Then enable musefs sync in Options → Plugins, and configure it in Options → musefs sync:
- musefs DB path — path to the musefs SQLite store (required).
- musefs binary — the
musefsexecutable (PATH name or full path), used to auto-create rows. Defaultmusefs. - Run
musefs scanbefore syncing — autoscan toggle (default on). With it off, runmusefs scanyourself first or the sync errors on a missing DB. - Extra field map — optional
key=valuelist mapping additional or custom Picard tag names to musefs store keys (applied verbatim, last-wins, on top of the automatic full-tag-set sync), e.g.mymood=mood.
MUSEFS_DB and MUSEFS_BIN environment variables override the DB/binary
settings (handy for testing).
Workflow
musefs mount ~/mnt --db ~/musefs.db --template '$albumartist/$album/$tracknumber - $title'- In Picard, match/cluster an album as usual.
- Right-click the album/files → Sync to musefs.
- Browse
~/mnt— the files show Picard's tags and cover, audio byte-identical.
Notes
- Front cover only: the first front-cover image Picard holds is synced.
Picard art wins when present; otherwise any art
musefs scaningested from the file's embedded picture is preserved. Re-syncing a file with no Picard art lets the embedded picture re-seed when autoscan is on (musefs scan re-reads the file); with autoscan off, existing art is left untouched. - Tags are fully replaced with Picard's view on every sync.
- Field coverage: every populated Picard tag is synced under its canonical
musefs (on-disk) key — all MusicBrainz IDs, sort and performer/credit fields,
movement, totals, and any custom field; multi-values expand and per-role
performers fold to
Name (Role). Picard's hidden~internals (length, rating, …) are never written. - Orphaned art: replacing art can orphan old blobs;
musefs scan --revalidategarbage-collects them. - Schema version: the plugin refuses to run if the DB's
user_versiondiffers from the version it targets — rebuild the store after upgrading musefs.
Tests
cd contrib/picard
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python -m pytest # unit + integration (no Picard, no Rust binary)
python -m pytest -m musefs_bin # path-matching gate vs the real `musefs` binary
The musefs_bin gate shells out to the real musefs binary, so build it first
from the repo root (cargo build). It is deselected from the default run and
skips cleanly if the binary is absent.
Real-Picard (pytest-qt) tests
The adapter (musefs/__init__.py) is exercised against a real Picard + PyQt5
install, headless. Picard isn't a clean pip wheel, so use the distro package and
bind a uv venv to the system Python it targets:
sudo apt-get install -y picard # Picard at /usr/lib/picard + system PyQt5
uv venv --system-site-packages --python "$(which python3)" # match apt Picard's C-ext interpreter
uv pip install -e 'contrib/picard[test]' # test extra includes pytest-qt
PYTHONPATH=/usr/lib/picard QT_QPA_PLATFORM=offscreen \
.venv/bin/python -m pytest contrib/picard/tests -v
These tests importorskip("picard"), so on a machine without Picard they skip
cleanly and only the Qt-free _core tests run.
Manual smoke test (full GUI round-trip)
cargo buildand create a store:musefs scan /path/to/album --db /tmp/m.db.- Copy the plugin into Picard's plugins dir; enable it; set DB path
/tmp/m.db. - Load the album in Picard, change a tag (e.g. title), add a front cover.
- Right-click → Sync to musefs; confirm the status bar / log reports
synced=N. musefs mount /tmp/mnt --db /tmp/m.dband verify the mounted file carries the new tag and cover, with byte-identical audio.
lidarr-musefs
A Lidarr integration that syncs Lidarr's metadata into a musefs SQLite store, so a live musefs mount shows a re-tagged view of your library without Lidarr ever copying, moving, or rewriting backing audio bytes.
Lidarr stays the downloader, matcher, and metadata source; its destination tree becomes a placeholder of symlinks that exists only so Lidarr can track files. Point Navidrome, Plex, Jellyfin, or other consumers at the musefs mount instead.
How it fits together
The package installs two console scripts that plug into Lidarr's hooks:
musefs-lidarr-import(Import Using Script) — replaces Lidarr's own copy/move when it imports a download: it creates the destination entry as a symlink (or hardlink) to the downloaded file and fails closed — it never falls back to copying bytes.musefs-lidarr-sync(Custom Script notification) — fires after an import or rename: it queries Lidarr's API for the affected tracks' metadata (title, artist/albumartist, album, track/disc numbers, release date, MusicBrainz ids, genres) plus each album's cover art, runsmusefs scanon the files to create/refresh their track rows (the structural columns only musefs can compute), and writes the tags and art into the store. Transient API failures (network errors, timeouts, 5xx) are retried with backoff so a blip or a Lidarr restart mid-import doesn't silently drop the sync.
musefs's auto-refresh surfaces each sync at the mount with no remount. Both
scripts build on the shared python-musefs
store-contract library.
Install
Install the package — with its python-musefs dependency — into the
environment Lidarr uses to run custom scripts, so both scripts are on
Lidarr's PATH:
pip install lidarr-musefs
This pulls in the shared python-musefs dependency
from PyPI automatically. To install from a checkout instead (e.g. for
development), install both editable so imports resolve to the local source:
pip install -e contrib/python-musefs
pip install -e contrib/lidarr
You also need the musefs binary reachable by the sync script (see
MUSEFS_BIN below) and a musefs store/mount of your own — see the
main README.
Required Lidarr settings
- Settings -> Media Management -> Import Using Script: enabled.
- Import Script Path:
musefs-lidarr-import. - Metadata Provider -> Write Audio Tags:
Never. - File Date:
None. - Linux permission management: disabled.
Do not rely on Lidarr's built-in "Use Hardlinks instead of Copy" for this
workflow. Lidarr uses a hardlink-or-copy transfer mode internally, so a hardlink
failure can copy bytes. musefs-lidarr-import creates the destination entry
itself and fails closed.
musefs-lidarr-sync --doctor verifies these settings over the API (see
Doctor).
Lidarr Custom Script
Configure a Custom Script notification (Settings -> Connect):
- On Release Import: enabled.
- On Rename: enabled.
- On Album Delete: enabled.
- On Artist Delete: enabled.
- Path:
musefs-lidarr-sync.
Test events exit successfully without touching files or the database.
TrackRetag events are skipped with a warning because they fire after Lidarr
writes tags.
Environment
Both scripts are configured through environment variables, set in the environment Lidarr launches scripts with.
Import script:
MUSEFS_LIDARR_LINK_MODE=symlink # default; use hardlink only if symlinks are unsuitable
Sync script:
MUSEFS_DB=/path/to/musefs.db # the musefs SQLite store (required)
MUSEFS_BIN=musefs # musefs executable; full path if not on PATH
MUSEFS_LIDARR_URL=http://localhost:8686
MUSEFS_LIDARR_API_KEY=your-api-key
MUSEFS_LIDARR_AUTOSCAN=1 # default; runs `musefs scan` before each sync
API keys are redacted from logs and errors.
Manual backfill
To sync every track file Lidarr already knows about (e.g. on first setup):
musefs-lidarr-sync --all
Manual backfill requires MUSEFS_LIDARR_URL and MUSEFS_LIDARR_API_KEY. It
runs the doctor preflight first (skip with --skip-lidarr-preflight), then
queries all Lidarr artists and syncs their known track files into the musefs
DB.
Migrating an existing Lidarr library
The forward path above (new import → import script symlink → sync) works cleanly on a fresh import. Re-homing a pre-existing Lidarr library onto the musefs symlink tree runs into several Lidarr behaviors; this is the working order (observed on Lidarr v1, lsio image). None of it is a musefs bug — these are Lidarr quirks an integrator only hits here.
- Reassign the artists to the new (musefs) root folder.
- Clear the stale trackfile records before re-importing. If the artists'
existing trackfiles still reference the old root, re-import fails with
NotParentException(/old/root/... is not a child of /new/root) — Lidarr'sRemoveExistingTrackFileschokes computing the relative path. Delete the stale trackfile records first.- The empty-root deletion guard: Lidarr blocks trackfile deletion while the new root folder is empty ("Artist's root folder is empty", a mass-deletion safety guard) — a chicken-and-egg with the symlinks not existing yet. Drop a placeholder file in the root until the first symlinks land, then remove it.
- Batch the bulk delete:
DELETE /api/v1/trackfile/bulkreturns 500 on large batches (~200 ids); send ~25 ids per call.
- Re-import. The import script creates the destination symlinks.
- Backfill the store:
musefs-lidarr-sync --all.
Point musefs scan at the backing directory, not the symlink tree. The
default (--follow-symlinks off) is exactly right here: the store should key
off the real files, while Lidarr's symlink tree is just its own tracking view.
Doctor
To verify your Lidarr settings are musefs-safe:
musefs-lidarr-sync --doctor
The doctor checks Lidarr's API for:
writeAudioTags = nofileDate = nonesetPermissionsLinux = false
If MUSEFS_LIDARR_URL and MUSEFS_LIDARR_API_KEY are not configured, doctor
and sync fail because the integration cannot verify safe settings or build
complete per-track metadata.
--doctor is a runtime / post-deploy check, not an offline one: it makes a
live Lidarr API call, so it needs MUSEFS_LIDARR_URL + MUSEFS_LIDARR_API_KEY
and a reachable Lidarr instance. Run it after deployment, not at container
build time — offline it fails with connection-refused even when the
toolchain itself is wired up correctly. There is no offline "are the binary and
plugins installed/wired" check; to confirm installation at build time, test that
the musefs-lidarr-import / musefs-lidarr-sync scripts and the musefs
binary are importable/on PATH.
Smoke test
- Build and install musefs.
- Install
python-musefsandlidarr-musefsinto the environment Lidarr uses for custom scripts. - Configure Import Using Script and Custom Script as described above.
- Import a small album.
- Confirm Lidarr's destination entry is a symlink by default.
- Run
musefs mount /tmp/mnt --db "$MUSEFS_DB". - Confirm the mount shows Lidarr metadata.
- Confirm the source file's bytes and mtime did not change.
Notes
- Tags are fully replaced with Lidarr's view on every sync (scanner-written binary tags always survive — see the external-writer contract).
- Cover art: each album's Lidarr cover is fetched and written as the front cover, replacing the track's art rows on every sync (an over-cap or unreachable cover is skipped, leaving any scanner-ingested art in place).
- Schema version: the sync refuses to run if the DB's
user_versiondiffers from the version it targets — rebuild the store after upgrading musefs. - Deletions prune by MusicBrainz id, scoped to rows this plugin owns. On an
Album/Artist delete, the sync removes the matching store rows
(
musicbrainz_albumid/musicbrainz_artistid) so the mount stops presenting them. The backing audio is never touched — pruning only drops the store rows, not the files Lidarr keeps in the backing directory. A delete event for a release with no MusicBrainz id cannot be mapped and is logged and skipped. - Ownership marker. Every track the sync writes is stamped with a
musefs_lidarr_managed=1tag, and a delete only removes rows carrying that marker. Without it, amusicbrainz_albumidthe scanner seeded from a file's own native tags is indistinguishable from one Lidarr wrote, so an unrelated Lidarr delete could drop an unmanaged track's metadata. The marker is a normal text tag, so it does appear in served files (e.g. as aMUSEFS_LIDARR_MANAGEDVorbis comment / aTXXXframe / an iTunes freeform atom). A track imported under an older plugin version (before the marker existed) is treated as unmanaged and is left in place on delete — re-sync it to stamp the marker. - CI coverage: a fast smoke (real Lidarr exec path + mocked API) gates PRs, and a full real-instance download-client import e2e gates the Python releases — see the Python plugins guide.
Tests
cd contrib/lidarr
python -m venv .venv && source .venv/bin/activate
pip install -e ../python-musefs # shared library (editable, from the working tree)
pip install -e ".[test]"
python -m pytest # unit + integration (no Rust binary)
python -m pytest -m musefs_bin # path-matching gate vs the real `musefs` binary
The musefs_bin gate shells out to the real musefs binary, so build it first
from the repo root (cargo build). It is deselected from the default run and
skips cleanly if the binary is absent.
Running musefs as a systemd user service
These units run musefs on the host (the recommended deployment) under your own
user account — no root, no CAP_SYS_ADMIN.
Files
musefs.service— the mount daemon (musefs mount); blocks until stopped.musefs-scan.service+musefs-scan.timer— optional periodicmusefs scan --revalidate.musefs.conf.example— everyMUSEFS_*setting, commented with defaults.
Install
mkdir -p ~/.config/systemd/user ~/.config/musefs
cp musefs.service musefs-scan.service musefs-scan.timer ~/.config/systemd/user/
cp musefs.conf.example ~/.config/musefs/musefs.conf
$EDITOR ~/.config/musefs/musefs.conf # set MUSEFS_MOUNTPOINT and MUSEFS_DB
systemctl --user daemon-reload
systemctl --user enable --now musefs.service
Enable the periodic re-scan too (edit the library path in
musefs-scan.service first):
systemctl --user enable --now musefs-scan.timer
Hardening
These units run under the --user manager, which constrains what systemd
sandboxing is possible. The two units differ sharply:
-
musefs-scan.serviceis fully sandboxed. The scanner creates no FUSE mount, so it takes a strong sandbox (ProtectSystem=true,SystemCallFilter=, plus namespace and seccomp restrictions).ProtectSystem=true(notstrict) keeps system directories read-only while leaving your library andMUSEFS_DBwritable, so a custom DB location needs noReadWritePaths=edit. A few directives that require capability-bounding-set drops (CapabilityBoundingSet=,PrivateDevices,ProtectKernelModules,ProtectKernelLogs,ProtectClock) are omitted: the unprivileged user manager cannot apply them, and the process is already capability-less, so nothing is lost. Inspect withsystemd-analyze --user security musefs-scan.service. -
musefs.serviceis intentionally not sandboxed, and cannot be. musefs mounts via the setuidfusermount3helper.NoNewPrivileges=true— and nearly every other systemd hardening directive, since installing a seccomp filter for an unprivileged process forces the kernelno_new_privsflag — disables the setuid escalation, and the mount then fails withfusermount3: mount failed: Operation not permitted. The unit comment explains this in full.
Notes
-
The store must exist before the mount starts.
musefs mountnever creates the DB — it requires a populated store and exits non-zero otherwise, so a mount unit that starts before anything has scanned hard-fails and (withRestart=on-failure) crash-loops. Seed the store with an initialmusefs scanbeforeenable --now musefs.service. If you generate the store from another unit, order this one after it with a drop-in (systemctl --user edit musefs):[Unit] After=musefs-initial-scan.service Requires=musefs-initial-scan.service(The
musefs-scan.timeris a periodic re-scan, not the initial seed.) -
Binary location. The
--usermanager does not inherit your shell'sPATH. The units setPATHfor acargo installbinary in~/.cargo/bin; if musefs is elsewhere, edit theEnvironment=PATH=line (or makeExecStartan absolute path). -
%hvs~. Unit files expand%hto your home directory; themusefs.confEnvironmentFile does not expand%hor~— use absolute paths there, and never paste~/...into a unit directive (it is taken literally). -
Settings.
musefs.conf.exampleis a commented example of the commonMUSEFS_*mount/scan variables (every scalarmount/scanflag has aMUSEFS_*form — uppercase the long flag, dashes to underscores). Explicit flags override env vars;--fallbackand scan targets are command-line only (set them inExecStart). -
Inline overrides. Prefer
systemctl --user edit musefsto addEnvironment=lines in a drop-in; it survives reinstalls. -
Headless servers. A
--usertimer only fires while your user manager runs. For a daily scan when you are not logged in:loginctl enable-linger <user>. -
Logs.
journalctl --user -u musefs -f.
python-musefs
The shared store-contract library behind the beets,
Picard, and Lidarr musefs
plugins. It is the single source of truth for how a plugin writes the musefs
SQLite store: the schema-version check, the tags / art / track_art
writes, sha256 art content-addressing, the realpath_key path normalization,
the musefs scan shell-out (run_scan), and the per-file sync write-loop
(Record / sync_files).
Field mapping stays in each plugin — beets expands multi-valued
genres/composers into one tag each, Picard takes the first value — so this
library deliberately does not own it.
Writing a plugin
A plugin turns host metadata (a beets item, a Picard track, a Lidarr release) into musefs store writes. This library owns every store-touching step except the field mapping: you supply the per-file tag and art values, and it handles the schema check, the scan shell-out, content-addressing, and the write loop.
The write flow
The canonical order is connect → check_schema_version → run_scan → build
Records → sync_files → commit → prune_missing. The caller owns the
transaction — nothing here commits for you.
from musefs_common import (
SCAN_TIMEOUT_SECONDS,
ArtImage,
Record,
check_schema_version,
connect,
prune_missing,
realpath_key,
run_scan,
sync_files,
)
def sync(db_path, files, *, musefs_bin="musefs"):
# `run_scan` creates the DB if absent and fills the structural columns a
# plugin cannot compute (format, audio offset/length, backing size/mtime).
# On a brand-new store it must precede `connect`, which has nothing to open
# until the scan has created the file.
run_scan(musefs_bin, db_path, files, timeout=SCAN_TIMEOUT_SECONDS)
conn = connect(db_path)
try:
check_schema_version(conn) # raises SchemaMismatch on a version skew
records = [
Record(
key=realpath_key(path), # MUST equal the scanned row's backing_path
pairs=[("artist", artist), ("title", title)],
art=[ArtImage(data=cover, mime="image/jpeg")] if cover else None,
)
for path, artist, title, cover in host_metadata(files)
]
stats = sync_files(conn, records) # full-replace of plugin text tags
conn.commit() # the caller commits
prune_missing(conn) # drop rows whose backing file vanished
conn.commit()
return stats
finally:
conn.close()
For a dry run, pass dry_run=True to sync_files and conn.rollback() instead
of committing — SyncStats still reports what would change.
run_scan raises ScanError (kind ∈ {"not_found", "timeout", "failed"})
and check_schema_version raises SchemaMismatch; a host adapter formats its
own user-facing message from the exception attributes (see the beets plugin's
_scan_user_error).
The Record shape
One Record per file is your primary output. Its fields:
| field | type | meaning |
|---|---|---|
key | str | The file's identity in the store. Must be realpath_key(path) — the canonicalized absolute path the scanner stored as backing_path. A key that matches no scanned row is silently counted in SyncStats.skipped, not written. |
pairs | list[tuple[str, str]] | Ordered (tag_key, value) text tags. Duplicate keys are allowed and get contiguous ordinals (multi-valued tags). |
art | list[ArtImage] | None | Embedded pictures, already resolved to bytes. None/[] leaves existing art untouched. |
delete_keys | list[str] | None | Merge mode only: keys to clear without rewriting (see below). Ignored in replace mode. |
ArtImage(data, mime, picture_type=3, description="") is one picture: data is
raw bytes, picture_type is the ID3/FLAC type (3 = front cover). Images larger
than MAX_ART_BYTES are dropped and counted in SyncStats.skipped_art.
If every record lands in skipped, the keys and the scan target disagree —
both must canonicalize the same way, so scan the real files (not a symlink
farm) and build keys with realpath_key.
Merge vs. replace, and sticky deletes
sync_files(..., merge=False) (the default) replaces every plugin-owned
text tag on each track: it clears all value_blob IS NULL rows and rewrites
them from record.pairs. Scanner-written binary tags always survive.
sync_files(..., merge=True) merges: only the keys named in record.pairs
and record.delete_keys are touched; other scan-seeded text tags stay. Use
merge when your plugin owns a subset of the tags and must not clobber the
rest. The store does not remember which keys you manage — you track your
managed-key set out of band (the contract is explicit that the store is not the
place for plugin state).
Merge-mode key matching is case-insensitive (lower(key) = lower(?)): Vorbis
keys render case-insensitively, so a scan that seeds a tag in the file's native
case (e.g. LABEL) is correctly replaced when your plugin canonicalizes to
lowercase (label), rather than leaving the original row behind as a duplicate.
When the user removes a tag in the host, merge mode needs to delete the
now-orphaned store row. The beets plugin solves this with an accumulating
managed-key set (the musefs_managed pattern), worth copying:
- Persist, per file, the set of keys you have ever written (beets uses a flexattr; any per-file host metadata works).
- On each sync,
delete_keys = previous_managed − keys_written_now, and the new persisted set isprevious_managed ∪ keys_written_now. - A key you stop writing becomes a tombstone: it keeps getting deleted on every sync until you write it again. Persist the managed set only after the store commit succeeds, so a failed sync doesn't lose the record of what you owe.
See contrib/beets/beetsplug/_core.py (build_records / persist_managed) for
the reference implementation.
Store invariants you must respect
The full external-writer contract is in the external-writer contract. The rules that bite plugin authors:
- Write only
tags,art, andtrack_art. The scanner owns the structural columns oftracksand all ofstructural_blocks; never compute them — runmusefs scan(i.e.run_scan).CHECKconstraints reject malformed structural shapes at commit, so you cannot persist them anyway. - Binary tags survive a sync.
merge_tags/replace_tagsscope their deletes to text rows (value_blob IS NULL), so the write loop never wipes scanner-written binary tags. You may write binary tags yourself too — a binary row carries its payload invalue_bloband must leavevalueempty (the onlyCHECKon the row). - Content-address art through
upsert_art(sha256 de-dup) rather than insertingartrows by hand;sync_filesdoes this for you. - Art rows are immutable. A trigger rejects in-place updates of an
artrow's content columns (data,sha256,mime,byte_len,width,height). To change a track's art, insert a new content-addressed row viaupsert_artand relink it viareplace_track_art. - Path layout is just a tag. To drive a reorganized mount, write your
computed relative path into a custom tag (e.g.
beets_path) and mount with--template '$!{beets_path}'. musefs sanitizes each path segment, so a writer cannot inject traversal.
API reference
Everything in __all__, imported from the top-level musefs_common package.
Connection & schema
connect(db_path)→sqlite3.Connection— open with a 5s busy timeout andforeign_keys = ON.check_schema_version(conn)— raiseSchemaMismatchunless the store'suser_versionequalsEXPECTED_USER_VERSION.
Scanning
run_scan(binary, db_path, target, *, timeout=None)— shell out tomusefs scan;targetis one path or an iterable, all scanned under one process. Creates the DB if absent. RaisesScanError.
Building records
Record(key, pairs=[], art=None, delete_keys=None)— one file's sync inputs (see TheRecordshape).ArtImage(data, mime, picture_type=3, description="")— one embedded picture.realpath_key(path)— canonical path string matching the scanner'sbacking_path; acceptsstr/bytes, returnsstr.
Writing
sync_files(conn, records, *, dry_run=False, stats=None, merge=False)→SyncStats— the write loop; caller owns the transaction. Passstatsto accumulate into a caller-seeded instance.sync_one(conn, record, stats, *, dry_run=False, merge=False)— sync a single record into a caller-suppliedSyncStats.SyncStats—synced/skipped/art_linked/skipped_art/skipped_invalidcounters, plus.summary(). A record whose tags or art violate a store CHECK constraint is rolled back and skipped (not raised), bumpingskipped_invalidand appending(record.key, message)to theinvalidlist — one malformed record never aborts the batch.
Lower-level store helpers (called for you by sync_files; use directly only
for a custom write loop)
track_id_for_path(conn, key)→ track id orNone.merge_tags(conn, track_id, managed_pairs, delete_keys)— per-key replace of plugin-managed text tags, leaving unmanaged text rows intact.replace_tags(conn, track_id, pairs)— replace all plugin-owned text tags.upsert_art(conn, data, mime)→ art id — content-addressdataby sha256, inserting only if new.replace_track_art(conn, track_id, arts)— replace a track'strack_artrows;artsis[(art_id, picture_type, description), …].sniff_mime(data, path)— image mime from magic bytes, falling back to file extension.prune_missing(conn, track_ids=None)→ count — delete tracks whose backing file no longer exists (every track, or justtrack_ids).delete_tracks(conn, track_ids)→ count — unconditionally delete the given track rows (intent-based, unlikeprune_missing's on-disk existence check); theirtagsandtrack_artrows cascade away.
Reading
track_ids_for_paths(conn, keys)→{key: id}— bulkbacking_path→ track id; keys with no matching row are omitted. Chunked under SQLite's parameter cap, so arbitrarily large lookups are safe (the bulktrack_id_for_path).track_ids_by_tag(conn, key, value)→[id, …]— track ids whose plugin-owned text tag(key, value)matches (scanner-written binary tags never match); maps a source's "I deleted this album/artist" signal back to the rows it tagged.tags_for_track(conn, track_id)→[TagRow, …]ordered by key then ordinal, covering both plugin-owned text tags and scanner-written binary tags.TagRow(key, value, value_blob)— one read-back tag row. Text tags havevalue_blob is None; binary tags havevalue == ""andvalue_blobbytes.
Constants
EXPECTED_USER_VERSION— schemauser_versionthis library targets.MAX_ART_BYTES— per-image art cap; larger images are skipped.SCAN_TIMEOUT_SECONDS— default wall-clock cap for onerun_scan.
Exceptions
SchemaMismatch(found)— schema-version skew;.foundis the DB's version.ScanError(kind, *, binary, target, …)— amusefs scanfailure;.kind∈{"not_found", "timeout", "failed"}, with context attributes for messaging.
Consumers
-
beets depends on this package via pip (
contrib/beets/pyproject.toml). -
Picard cannot pip-install plugin dependencies, so the package is vendored into
contrib/picard/musefs/_common/byvendor_to_picard.py. After any change here, re-run:python contrib/python-musefs/vendor_to_picard.pyThe Picard test
tests/test_vendor_sync.pyfails if the committed copy drifts. -
Lidarr depends on this package via pip (
contrib/lidarr/pyproject.toml).
Schema coupling
musefs_common/schema.py (SCHEMA_SQL, USER_VERSION) is generated from
the Rust migrations in musefs-db/src/schema.rs — do not edit it by hand.
EXPECTED_USER_VERSION (in constants.py) derives from it. When the Rust
schema bumps, regenerate and re-vendor:
MUSEFS_REGEN_SCHEMA_PY=1 cargo test -p musefs-db schema_py
python contrib/python-musefs/vendor_to_picard.py
A musefs-db unit test fails if the generated file drifts. This is all
independent of the package's own __version__ (its release SemVer).
Tests
cd contrib/python-musefs
python -m venv .venv && source .venv/bin/activate
pip install -e ".[test]"
python -m pytest -v
ruff check . && ruff format --check .
Architecture overview
This is the technical reference for musefs internals: how a virtual file is assembled, how the workspace is layered, what the SQLite store guarantees, and how external edits become visible without a remount. For usage, see the User Guide; for the development workflow, see Contributing; for per-format behavior, see the format docs.
Design overview
musefs is a read-only passthrough FUSE filesystem with one cardinal invariant: original audio bytes are never copied or modified. A served file is not a transcoded or rewritten copy — it is assembled on the fly by splicing a freshly generated metadata region in front of positioned reads of the untouched backing file. The SQLite store is the source of truth for tags, art, and each file's audio byte range; the backing directory is the source of truth for the audio itself.
Crate layout
A strict layered Cargo workspace; dependencies point one way only:
musefs-db ─┐ SQLite store: schema/migrations, tracks/tags/art access
musefs-format┘← (db) format byte-surgery: metadata synthesis + RegionLayout
↑
musefs-core ← (db, format) orchestration: virtual tree, resolution, scanning, refresh
↑
musefs-fuse ← (core) thin FUSE adapter (fuser)
↑
musefs-cli ← (core, fuse, db) clap commands library (scan/mount logic)
musefs ← (cli) thin binary entrypoint; published as `musefs`
musefs-core is the integration layer — cross-cutting logic belongs there.
musefs-fuse, musefs-cli, and the musefs binary crate are deliberately
thin; the FUSE adapter's job is translating kernel requests into core calls
(and dispatching blocking reads onto a worker pool with per-thread reusable
buffers, so a slow backing read never stalls the FUSE dispatch thread).
The workspace also carries musefs-latencyfs, a dev/bench-only crate
(publish = false): a latency-injecting passthrough FUSE filesystem used by
the benchmarks harness to simulate slow backing stores. It
is not part of the shipping dependency graph (core uses it only as a
dev-dependency).
The serving model
The segment model
A synthesized virtual file is described by a RegionLayout
(musefs-format/src/layout.rs): an ordered list of Segments whose lengths
sum to the served file size. Six variants:
Inline(Vec<u8>)— generated framing/text bytes (an ID3v2 tag, FLAC metadata blocks, a RIFF front), fully materialized at resolve time.ArtImage { art_id, len }— embedded cover art; only the length lives in the layout. Image bytes stream from the DB blob in chunks at read time and are never buffered whole. This invariant also holds for Ogg synthesis, where page CRCs are computed from page-boundedArtSourcewindows (previously the documented exception).BackingAudio { offset, len }— a run of the original file's audio frames, served by positioned reads (read_exact_at) against the backing file.OggAudio { offset, len, seq_delta }— original Ogg audio pages served with each page's sequence number shifted byseq_deltaand its CRC recomputed in place (a resized header changes the page count). The byte length is unchanged — renumbering patches, never recopies.OggArtSlice { art_id, offset, len, base64, art_total }— a window of an embedded picture served lazily from the blob store; whenbase64, the window is base64-encoded incrementally at read time.BinaryTag { payload_id, len }— an opaque binary tag payload (e.g. an ID3PRIVframe body or a FLACAPPLICATIONblock body) streamed from the DB at read time.
read_at (musefs-core/src/reader.rs) serves a byte range by walking the
segments and splicing: inline bytes are copied, art and binary-tag payloads
are read from the DB in chunks, backing audio comes from positioned reads of
the original file, and Ogg pages are renumbered and CRC-patched in flight.
This is how the cardinal invariant holds end to end. Layouts that stream any
payload from the DB by rowid — binary tags and art (ArtImage /
OggArtSlice) — are flagged (RegionLayout::streams_db_rowid) so the reader
wraps those reads in a single WAL snapshot with a content_version recheck.
A concurrent retag (delete + reinsert reusing a freed rowid) cannot interleave
bytes from two generations of a tag or splice the wrong image. Both the
per-handle fast path and the stateless no-fh fallback apply the guard, and the
fallback re-validates its freshly opened backing fd against the resolved
stamp.
Backing read-ahead
Every backing read — BackingAudio splices and the serve_ogg_window page walk
alike — flows through a single BackingReader::read_exact_at
(musefs-core/src/readahead.rs). It caches raw backing-file bytes keyed by
absolute backing offset in a per-handle adaptive window: a sequential miss reads
one large pread (geometric growth up to a per-stream cap) instead of the
≤256 KiB FUSE chunk, so a high-latency backing client (NFS, remote) can pipeline
the RPCs behind one syscall; a seek resets the window to the floor. All handles
draw from one process-wide RAM budget (--read-ahead-budget-mib, default 64) with
deadlock-free try_lock LRU eviction. Keying on the absolute backing offset (not
the synthesized output) makes the cache retag-immune, and serving still flows
through the per-read validate_opened_backing re-stat, so the cardinal
audio-bytes invariant and freshness semantics are untouched. An optional Phase-2
background-prefetch layer (--read-ahead-prefetch) exists but is off by default —
read amplification carries the whole win (see
the backing read-ahead benchmarks).
How each format builds its layout differs enough to warrant its own document: FLAC, MP3, M4A, Ogg, WAV.
Mount modes
musefs_core::Mode selects one of two behaviors at mount time:
Synthesis(default) — the metadata region is generated from the DB and spliced ahead of the backing audio, as above. Resolve-time validation guards the stored audio bounds: ifaudio_offset + audio_lengthruns past the backing file's current length, the row no longer matches the file and the resolve fails with a controlledBackingChangederror.StructureOnly— pure passthrough: the layout is a single whole-fileBackingAudiosegment, so the original bytes are served verbatim under the templated tree. Stored audio bounds are irrelevant (the whole file is served) and are not validated in this mode.
In StructureOnly mode, on kernels with FUSE passthrough (6.9+) and a daemon
holding CAP_SYS_ADMIN (kernel-gated: run as root or
setcap cap_sys_admin=ep the binary), each open registers the backing fd
with the kernel and reads bypass the daemon entirely. The capability check is
performed at mount time and its absence pre-announced; if registration fails
at runtime anyway, passthrough is disabled for the rest of the session
(later opens skip the doomed ioctl) and reads fall back to the daemon
silently. Freshness for a passthrough handle is open-time-only — it is a
plain POSIX fd onto the backing file. In Synthesis mode no single fd
represents the spliced bytes, so passthrough never applies.
Synthetic telemetry namespace
When --expose-metrics is on, the root directory gains a synthetic
.musefs-metrics/ entry backed by reserved inodes at u64::MAX - 1 (dir) and
u64::MAX - 2 (file) — the same "top of the u64 space" trick the Spotlight
marker uses, since InodeAllocator starts at 2 and only increments. The
directory and file are disjoint from the macOS Spotlight marker at u64::MAX.
The metrics file is /proc-style: it advertises st_size == 0 and is served
via FOPEN_DIRECT_IO, so readers must read to EOF rather than trusting the
stated size. Content is rendered at open time from a snapshot of
CoreTelemetry (header/size caches, read-ahead budget/charge, virtual-tree
footprint, refresh health), FuseTelemetry (uptime, read/dir-handle gates,
worker pool, passthrough state), and optional jemalloc/syscall counters
(including read-ahead hit/miss) — see
musefs-core/src/telemetry.rs for the full
metric list. This namespace deliberately bypasses the virtual tree
(VirtualTree) and the RegionLayout / segment model: it is injected into
root-directory readdir and resolved by direct inode checks, so the cardinal
audio path is untouched.
The store & external-writer contract
The SQLite store
musefs-db/src/schema.rs defines the schema as an ordered list of migrations
(MIGRATIONS: the MIGRATION_V1 baseline plus MIGRATION_V2, which adds the
scanner-owned fingerprint/content_hash columns); user_version records the
schema version (2).
The store is the interface external tools write to — the beets and Picard
plugins under contrib/ write tags and art here out-of-band.
- The baseline schema (
MIGRATION_V1): the core tables —tracks(one row per backing file: path, format, audio byte range, size/nanosecond-mtime/ctime stamps,content_version),tags(multi-value key/value rows ordered byordinal, with an optionalvalue_blobfor binary tags),art(content-addressed, deduplicated image blobs),track_art(per-track art links with picture type and ordering), andstructural_blocks(read-only, derived-from-file FLACSTREAMINFO/SEEKTABLEmetadata, not part of the editable contract). Deleting a track cascades to itstagsandtrack_artrows. Triggers bump the owning track'scontent_version/updated_aton anytags/track_artedit;CHECKconstraints enforce the contract invariants below at commit time. A bounded, self-pruningtrack_changesring (capacity 8192,CHANGELOG_CAP) fed by triggers ontracksgives O(changed) refresh — every metadata edit funnels through anUPDATEon the tracks row, relying on SQLite's nested trigger activation (on by default). Freshness-superset triggers makecontent_versioncover every DB-knowable input to synthesized bytes:art_reject_content_update(art is content-addressed and immutable),art_ad(a deleted art row bumps referencing tracks so an orphan rebuilds to a clean serve-time error),tracks_geometry_au(scanner-owned geometry changes), andstructural_blocks_ai/_ad.
The external-writer contract
Ownership. External tools get full read/write on tags, art, and
track_art. The scanner owns the structural columns of tracks (id,
backing_path, format, audio_offset, audio_length, backing_size,
backing_mtime_ns, backing_ctime_ns, content_version, updated_at) and
all of structural_blocks: those are derived from probing the file, and
external tools must run musefs scan rather than compute them.
tracks.fingerprint and tracks.content_hash are also scanner-owned,
read-only-derived columns — like structural_blocks, they are never part of
the editable tag contract and external tools never write them.
fingerprint is a SHA-256 over the probe's parsed output (deterministic per
file, excludes filesystem stamps such as mtime/ctime), computed in the
parallel probe worker at zero extra I/O. content_hash is a full-file
SHA-256, stored as 64-char hex; it is computed only at the full checksum
tier (--checksum=full), which requires an eager whole-file read. Neither
column is UNIQUE by design — duplicate-content tracks legitimately share
both values. On a normal scan, when a probed file's path is not yet in the
store and its fingerprint matches exactly one orphaned row (a row whose
backing_path no longer exists on disk), the scanner retargets that row to
the new path in place, preserving its id, tags, and art rather than
orphaning them. This is how musefs recovers from a backing-library move or
reorganization: run musefs scan after moving files, and existing store rows
follow their backing files to the new locations.
What the store enforces. SQLite CHECK constraints reject the
malformed shapes at commit, so an external writer cannot persist them:
- an unknown
formatstring, or a negative length/offset/size/version; - an
audio_offset + audio_lengthrunning past the storedbacking_size; - a binary tag row whose
valueis non-empty; - an
art.byte_lenthat disagrees with its blob, or asha256of the wrong length; - a
picture_typeoutside0..=20; - a
tags.keyover 256 chars ortags.valueover 256 KiB; tags.keymust be non-empty and contain no ASCII control characters (a DBCHECKenforces this, rejecting violating writes — with one blind spot: an embedded NUL terminates SQLite'slength()/GLOB, so a key likea\0bslips theCHECK. The scanner's own floor drops it before insert, and the Vorbis path rejects it on synthesis). Additionally, only keys within the Vorbis field-name grammar (ASCII0x20–0x7D, excluding=) survive FLAC/Ogg synthesis — others are dropped and logged. MP3/M4A custom keys may use the wider set (e.g.=,:, spaces, non-ASCII).- a
value_bloboverMAX_BINARY_TAG_BYTES; - an
art.mimeover 255 chars orbyte_lenoverMAX_ART_BYTES; - a
track_art.descriptionover 1 KiB; - a
structural_blocksrow with an unknownkind, negativeordinal, orbodyover the FLAC 24-bit block limit.
Schema identity. On open, musefs also validates schema identity: a
sqlite_master comparison against a freshly-migrated reference plus PRAGMA foreign_key_check, rejecting anything that is not the canonical latest schema
with a message telling the user to run musefs scan. A store whose
user_version is newer than this binary's latest migration (a future or
third-party tool bumped the schema) is refused up front with a distinct
"store is newer than this binary" error rather than silently treated as
already-migrated — an older binary must not risk misreading a newer contract.
Art is immutable once written. art rows are content-addressed by
sha256; a trigger rejects any in-place UPDATE of an art row's
content columns (data, sha256, mime, byte_len, width, height) with
RAISE(ABORT) — a multi-row UPDATE art touching any content column aborts the
whole statement. To change a track's art, insert a new content-addressed row
and relink it via track_art (which bumps content_version); do not mutate an
existing row. Deleting an art row still referenced by track_art (possible
only with foreign_keys OFF) bumps every referencing track so the mount serves
a clean EIO on the now-orphaned reference instead of stale bytes.
What musefs defends at serve time. CHECKs cannot catch a scanner-owned
field mutated to a well-formed value that no longer matches the real file
on disk: backing_size or backing_mtime_ns/backing_ctime_ns that drift
from the actual file's stat, or audio bounds that fit the stored
backing_size but overrun the file once it has shrunk. musefs re-stats the
backing file on every resolve and treats such rows as untrusted input,
degrading to a controlled
BackingChanged/layout error, never undefined behavior. The store's
CHECK rejects art over MAX_ART_BYTES (16 MiB − 64 KiB) at write time;
resolve also re-checks it (ArtTooLarge, all formats) to backstop a writer
that disables check enforcement, and the scanner's ingest-time drop is
tracked in #284.
Referential gaps are treated the same way: a track_art row whose art_id
has no matching art row (an orphan an external writer can produce with FK
enforcement disabled) fails the serve with EIO rather than silently dropping
the art.
Merge vs. replace. An external writer may merge rather than fully
replace text tags — overwriting only the keys it manages and leaving the rest
of the scan-seeded set in place — provided it tracks its own managed-key set
out of band (the beets plugin uses a beets flexattr; the store is not the
place for plugin state). musefs renders tags outside its native VOCAB
(musefs-format/src/tagmap.rs) by passthrough (Vorbis uppercased, mp3
TXXX, mp4 freeform), so such tags appear but are not guaranteed
byte-identical to a given tagger's own per-format encoding. A merge matches
the keys it manages case-insensitively, so a writer's canonical
(lowercase) key replaces a scan-seeded row stored under the backing file's
native case (e.g. Vorbis LABEL) instead of coexisting with it — Vorbis keys
render case-insensitively, so two such rows would otherwise duplicate.
Path layout offload. External tools can also offload path layout
entirely: a plugin evaluates its own (arbitrarily complex) path logic, writes
the resulting relative path into a custom text tag — e.g. INSERT INTO tags (track_id, key, value, ordinal) VALUES (?, 'beets_path', 'Pink Floyd/Animals/01 Pigs', 0) — and the user mounts with --template '$!{beets_path}'. Because the field map is just the (lowercased) tag keys,
any number of such tags (beets_path, lidarr_path, …) can back different
concurrent mounts. The path field keeps embedded / as directory separators
but sanitizes each segment and drops empty/./.. segments, so a
misbehaving writer cannot inject traversal or empty components into the tree.
The shared Python library. contrib/python-musefs/ encodes this contract
for plugin authors, including a generated copy of the schema
(musefs_common/schema.py, regenerated from schema.rs by a drift-guarded
test — see CONTRIBUTING). Its tag/art replace operations
each wrap their DELETE+INSERT in a SQLite savepoint, so they are
individually atomic and the "caller owns the transaction" guarantee holds even
on an autocommit connection. The Lidarr integration
uses the same shared library from a Custom Script workflow. Its Lidarr
destination tree is only a tracking aid, made of symlinks by default; musefs
remains the consumer-facing filesystem.
CI proves this contract end to end in the contract job (see
CONTRIBUTING): a Python writer's tags/art, layered on a
scanned track, are synthesized by the Rust serve path and read back by an
independent reader.
External writers prune in one of two ways depending on how they own files.
For in-place writers (e.g. the beets plugin), existence-based pruning — dropping
the row of a removed backing file — is a deliberate act owned by musefs scan --revalidate; the plugin never prunes on its own (it exposes the revalidate
scan via beet musefs --revalidate). The prune_missing helper in
musefs_common implements the same by-existence delete for writers that prefer
to own pruning themselves. Link-tree writers (e.g. the Lidarr integration) never
delete the backing files they point at, so they prune by identity instead: a
source-reported album/artist deletion removes the rows carrying the matching
MusicBrainz id.
Connections are mode-typed (Db<ReadWrite> / Db<ReadOnly>), opened in WAL
mode with a busy timeout. The serve path uses a DbPool whose per-thread
variant hands each reader thread its own connection — WAL reads never contend.
Freshness, tree & scanning
Freshness: two version counters
Two distinct counters drive correctness; they answer different questions.
content_version (per-track column) answers "did this track's served
bytes change?". The DB triggers increment it on any input the database can see that changes
synthesized bytes: tag and track_art edits, art-row deletes that orphan a
reference, scanner-owned geometry changes (format, audio bounds, backing
size/nanosecond-mtime), and FLAC structural-block changes. It is
therefore a superset key — the one input it cannot cover is an on-disk backing
change with no DB write, which resolve (and, since #279, a size-cache
getattr hit) catches by re-statting the backing file and degrading to
BackingChanged. The scanner stamps the backing file's (size, mtime_ns, ctime_ns) tuple from the probed file descriptor using a pre/post fstat
sandwich: if the file's metadata changes between the two stats, the entry is
dropped. ctime defeats an mtime-forging writer (e.g. touch -m). The
HeaderCache (reader.rs) — a byte-budgeted concurrent cache (64 MiB
default) of resolved layouts — keys each entry on it: a hit with a stale
content_version rebuilds the layout. Independently of the cache, every
resolve re-stats the backing file and errors with BackingChanged if its
size, mtime, or ctime drifted from the scanned values, so a silently replaced
backing file is never spliced at stale offsets. The per-handle read path
re-stats the held descriptor on every read too, so this guarantee holds on the
hot path and not only through resolve().
data_version (PRAGMA data_version, whole-DB) answers "did anyone
commit anything?". Musefs::poll_refresh compares it to the last seen
value; on a change it consults the track_changes ring and applies an
incremental, O(changed) rebuild: only the affected tracks' tree entries
are re-rendered, exactly the removed tracks' cache entries are dropped, and
the inodes whose content_version rose are reported to the FUSE layer. If
the mount slept past the ring's capacity (or the ring was truncated), it
falls back to a full tree rebuild — correct by construction, and a bulk
change wants one anyway. The new version stamp is committed only after a
successful rebuild; failures arm a retry backoff.
The FUSE layer fires poll_refresh on metadata ops (lookup, readdir,
…) off the dispatch thread, so external edits appear without remounting.
Polling is debounced (--poll-interval-ms) and rebuilds are single-flighted:
a metadata-op storm costs at most one rebuild per interval. When mounted with
--keep-cache, the changed-inode notifications drive kernel page-cache
invalidation (inval_inode), so a re-tagged file never serves stale cached
bytes.
Virtual tree
VirtualTree::build (musefs-core/src/tree.rs) materializes an inode → node
mapping from rendered paths. Paths come from beets-style templates
(template.rs): $field / ${field} substitutions (with ${a|b} fallback
chains) over the track's tag fields, each resolving through per-field fallbacks
and then a global default_fallback; [...] conditional sections suppress
their literals when every field they reference is empty. With skip_on_missing
set (CLI --skip-on-missing), an unresolved top-level field instead drops the
track from the mount: render_one returns None, so the track enters neither
the snapshot nor the tree, and the incremental refresh path reclassifies a track
that loses (or regains) such a field as a removal (or addition). Plain values are
sanitized to a single path component ('/' and control characters become '_',
components equal to . or .. are dropped, and any component is truncated to
255 bytes on a UTF-8 boundary so it stays within NAME_MAX),
while a $!{field} path field keeps '/' as directory separators (sanitizing
each segment and dropping empty/./.. segments) so a precomputed multi-level
path expands into real directories. Path collisions are resolved
deterministically by appending (k) before the extension
(disambiguate). mapping.rs bridges DB tag rows to the format layer's
inputs and to template fields — ordering and multi-value semantics live
there.
Inodes are stable across rebuilds: a persistent path→inode allocator
(InodeAllocator) reuses an unchanged rendered path's inode and never
recycles a retired one, so a descriptor held open across a refresh keeps
resolving to the same node and a stale FUSE handle can never alias a
different file. On case-insensitive mounts the key is case-folded, so a
survivor keeps its inode even when an unrelated deletion flips a merged
directory's display casing (#305). A path that vanished degrades to
ENOENT, bounded by the entry/attr TTL. (Retired paths are pruned once they outnumber live ones,
bounding the allocator at twice the live tree; a path that returns after a
prune gets a fresh inode.)
Scanning
scan_directory (musefs-core/src/scan.rs) ingests a backing directory:
collect supported audio files, probe each (format detection → audio
offset/length, tags, pictures, structural blocks) on a parallel probe
pipeline feeding a single DB writer, committing in batches. Probing reads
are bounded — the scanner never slurps whole files — and ingestion caps
per-item sizes (MAX_ART_BYTES, MAX_BINARY_TAG_BYTES) so a crafted file
cannot balloon the store. An over-cap picture or binary tag is dropped and
logged (RUST_LOG=warn) rather than vanishing silently, so a track that
appears to have lost its cover art has an explanation in the logs; a
supported-extension file that fails to parse, or errors mid-probe, is
likewise logged with the reason and counted failed.
Symlinks are not followed by default: a symlinked file or directory is
logged (RUST_LOG=info/warn) and skipped, which keeps the walk immune to
directory-symlink cycles. Passing --follow-symlinks resolves them — symlinked
audio files and directories are scanned — guarded by a visited (dev, ino) set
so symlink cycles terminate, and by a second file-level (dev, ino) set so a
file reached via both a real path and a symlink is ingested once rather than
upserting its canonical track row twice. Because that set keys on (dev, ino),
multiple hardlinks to the same inode are likewise collapsed to a single track
under --follow-symlinks. Broken symlinks are logged and skipped without
aborting the scan. The root argument is always followed regardless of the
flag; only links encountered during recursion are gated.
revalidate is the maintenance pass (scan --revalidate): re-probe only
files whose (size, mtime_ns, ctime_ns) freshness stamp changed — a
ctime-only move (e.g. a forged-mtime in-place rewrite) is still re-probed
(skipping unchanged files preserves external tag edits in the DB),
delete tracks under the scanned root whose
backing file is gone, and garbage-collect now-unreferenced art. Pruning is
scoped to the scanned root, so revalidating one library root never removes
tracks belonging to another. Because a track is keyed by its canonical
backing path, a file scanned via --follow-symlinks whose real target lives
outside the scanned root falls outside the prune scope: if that target later
disappears, its stale row is not pruned by revalidating this root.
The contrib ecosystem
External writers live under contrib/: python-musefs is the shared
store-contract library (schema-version check, tag/art writes, sha256 art
content-addressing, the musefs scan shell-out); the
beets plugin, the
Picard plugin, and the
Lidarr integration (a Custom Script workflow)
build host-specific tag mapping on top of it. Each one's README covers its own
setup and behavior;
CONTRIBUTING covers their test suites and the
generated-schema/vendoring mechanics.
Getting set up
The working manual for building, testing, and landing a change. For what the pieces are, read the architecture overview first; for per-format behavior, the format docs.
Map of this document:
- Getting set up — prerequisites and the pre-commit hook.
- Build & test — everyday commands, the FUSE e2e suite, the FreeBSD VM harness.
- Test tiers beyond
cargo test— property tests, fuzzing, interop, the contract round trip, fault injection, mutation testing, sanitizers, coverage. - Code conventions — errors, integer casts, lints,
unsafe, layering. - Adding a format — the four-step recipe.
- Python plugins (contrib) — per-suite commands and the gotchas.
- Releasing — the Python (
py-v*) and Rust (v*) release flows. - PRs & commits — conventions and the before-you-push checklist.
Getting set up
Prerequisites:
- Rust — stable (edition 2024) with
rustfmtandclippy. - FUSE (to mount, or to run the FUSE end-to-end tests) — Linux with
/dev/fuseand libfuse (libfuse3-dev/libfuse3pluspkg-config), or FreeBSD with/dev/fuseand thefusefskernel module (no libfuse — see FreeBSD e2e for the in-tree VM harness). - Python 3 with
ruffandpytest— only for the Python plugin suites. shellcheckandyamllint— optional; the pre-commit hook's shell and YAML lint legs each skip with a notice if not installed.
Enable the repo's pre-commit hook once per clone:
git config core.hooksPath .githooks
The hook (.githooks/pre-commit) runs, in order: cargo fmt --all --check,
cargo clippy --all-targets -- -D warnings, the full workspace test
suite (cargo test --workspace), a conditional cargo-mutants anchor-drift
guard (only when .cargo/mutants.toml, scripts/check_mutant_anchors.py, or
a musefs-core/musefs-format source file is staged), shellcheck over
every tracked shell script, yamllint (relaxed .yamllint) over
every tracked YAML file, and ruff check + ruff format --check over
contrib/beets/, contrib/picard/, contrib/lidarr/,
contrib/python-musefs/, scripts/, and tests/interop/. A few consequences
worth internalizing:
- A commit with red tests is always rejected — there is no "commit-now-fix-later" workflow here.
- Python-only changes hit the hook too: the ruff gate lints exactly the union of paths the CI jobs lint, so a commit can't pass the hook yet fail CI lint.
- The cargo gate (fmt/clippy/test) is skipped when every staged path is under
docs/or is a Markdown file, so a docs-only commit stays fast. - The
shellcheck/yamllintlegs fire only when a shell or YAML file is staged, and skip with a notice when the tool is absent; when they do run they lint all tracked files of that type, so a sibling file can't drift unnoticed. - The mutant-anchor guard fires only when the mutants config, its check
script, or a
musefs-core/musefs-formatsource file is staged, and skips with a notice whencargo-mutantsis absent (CI re-checks it regardless). It re-validates that the.cargo/mutants.tomlexclude_reanchors still point at their intendedfile:line:colafter a line-shifting edit.
Build & test
cargo build # build the workspace
cargo test # all crates (excludes FUSE e2e)
cargo test -p musefs-core # one crate
cargo test -p musefs-core read_at # tests matching a substring
cargo clippy --all-targets # lint (policy: see below)
cargo fmt # format
The musefs binary enables the default-on jemalloc feature (jemalloc global
allocator + background purge thread). Build the system-allocator variant with
cargo build -p musefs --no-default-features — used for the RSS comparison
(scripts/rss-churn-bench.sh) and by packagers that forbid vendored C libs.
The FUSE end-to-end tests perform real mounts and are #[ignore]d:
cargo test -p musefs-fuse -- --ignored # needs /dev/fuse + libfuse
The kernel-passthrough e2e additionally needs CAP_SYS_ADMIN. Don't run
cargo under sudo — build first, then run the prebuilt test binary with sudo
(find it in target/debug/deps/):
cargo test -p musefs-fuse --no-run
sudo target/debug/deps/<e2e_test_binary> --ignored <passthrough_test_name>
- Read-consistency harness (
musefs-fuse/tests/read_consistency.rs): a seeded, reproducible randomizedpread/mmapsweep compares live-mount reads against an in-memory oracle (the seed is printed on failure to reproduce). The hermetic FLAC tests — whole-file mmap fidelity and the read-only write-refusal matrix — always run; the multi-format breadth sweep generates fixtures with ffmpeg and skips any format whose codec is unavailable.
FreeBSD e2e
The FUSE e2e suite also runs on FreeBSD, via the scripts in
scripts/freebsd-vm/. They are the single source of
truth — CI and local runs invoke the same scripts, so they can't drift:
run-local.sh— host-side orchestrator: creates and boots a FreeBSD VM under qemu/KVM and runs the suite in it. All artifacts go under the gitignored.scratch/freebsd/.provision.sh— in-guest: installsgit,ffmpeg, and the current stable Rust toolchain viarustup(FreeBSD's packagedrustlags and is too old for some deps), and loads thefusefskernel module. Run byrun-local.shand CI.run-e2e.sh— in-guest:cargo test --workspacethen the--ignoredFUSE e2e suite (guards thatffmpegis present so the decode/encode tests don't silently skip).serial-run.py— drives the VM over its serial console (the console driver used byrun-local.sh).
CI. The freebsd job in .github/workflows/ci.yml
runs these in a vmactions/freebsd-vm VM. It is expensive (a full in-VM build),
so it does not run on every PR — only when the FUSE/mount surface or its
harness changed (musefs/, musefs-fuse/, scripts/freebsd-vm/, Cargo.lock,
ci.yml) or on a release tag (v*).
Local run (one command):
sh scripts/freebsd-vm/run-local.sh
Host prerequisites (Debian/Ubuntu packages in parens): qemu-system-x86_64 +
qemu-img (qemu-system-x86, qemu-utils), xorriso (xorriso), curl +
xz (curl, xz-utils), python3. /dev/kvm for acceleration (it runs
without it, just far slower); ~6 GB free under .scratch/.
What it does, end to end:
- Downloads the official
FreeBSD-<rel>-amd64-BASIC-CLOUDINIT-ufs.qcow2image into.scratch/freebsd/(cached; downloaded once). That image directs its console to the serial line, which is what lets the harness drive it. - Creates a fresh overlay disk from the cached base each run (cheap reset).
- Boots the VM headless and logs in as
rootover the serial console (the image has an empty root password — no SSH, no keys, no cloud-init). - Serves this repo over a throwaway HTTP server on qemu's user-net gateway
(
10.0.2.2); the guestfetches and unpacks it. - Runs
provision.sh+run-e2e.shover the console and propagates the exit code, then powers the VM off.
Tunable via env: FREEBSD_REL (default 14.3-RELEASE), VM_MEM, VM_SMP,
VM_DISK, HTTP_PORT, RUN_TIMEOUT.
To drive your own VM instead, boot any FreeBSD image and, from the repo root
inside it as root, run sh scripts/freebsd-vm/provision.sh then
sh scripts/freebsd-vm/run-e2e.sh.
Notes:
- FreeBSD uses fuser's pure-rust
/dev/fusebackend — no libfuse package; only thefusefskernel module and base-systemmount_fusefs(8)are needed. - Kernel FUSE passthrough (StructureOnly) is Linux-only; on FreeBSD it falls back to daemon serving.
macOS support is best-effort: CI builds there with fuser's macos-no-mount
feature, and the platform-specific logic is unit-tested. Mounted e2e on
macOS/FUSE-T is not yet validated.
Test tiers
Test tiers beyond cargo test
Property tests
proptest invariants — panic-freedom, the byte-identical-audio guarantee,
tag round-trip stability — live in musefs-format/tests/proptest_*.rs and
musefs-core/tests/proptest_read_fidelity.rs. The format-layer suites are
gated on the fuzzing feature, which musefs-format's self-dev-dependency
enables for all of its own test builds — so a plain
cargo test -p musefs-format runs them.
Coverage-guided fuzzing
The fuzz/ crate is excluded from the workspace: workspace-wide build,
test, and clippy do not compile it, so a format-layer signature change can
break fuzz targets without anything failing locally — CI's fuzz smoke job
(cargo +nightly fuzz build) is what catches it. Check locally before
pushing a format-layer API change:
cargo install cargo-fuzz # one-time; needs nightly
cargo +nightly fuzz build # what the CI smoke job runs
cargo +nightly fuzz run <target> # flac|mp3|mp4|ogg|wav|ogg_page|b64|vorbiscomment|serve
cargo +nightly fuzz coverage <target> # confirm coverage reaches the parser
cargo run --manifest-path fuzz/Cargo.toml --bin generate_seeds # (re)build seeds
Fuzz crash regressions
When you fix a fuzz-found crash:
- Drop the reproducer bytes into
fuzz/regressions/<target>/(one file per reproducer). The per-PR fuzzsmokejob's replay step runs every committed reproducer withcargo +nightly fuzz run <target> <files> -- -runs=0— a deterministic single pass that fails the build if any known input panics again. This is separate fromfuzz/corpus/, whichcargo fuzz cminminimizes (and would prune reproducers from). - Where the crash exposed a real logic/behavior defect, also add a focused behavioral test for that logic in the owning crate's suite (the pre-commit hook gates it). The byte replay proves the exact input no longer panics; the behavioral test documents and locks in the fix. They are not interchangeable.
Coverage notes: the per-format targets also drive the bounded/ceiling probers
(*_bounded, locate_audio_at_ceiling, read_structure_from) and assert a
differential oracle against the full-buffer parse. The serve target fuzzes the
read-time serve path (read_at_with_file over adversarial layouts, including
serve_ogg_window/OggArtSlice) and is scheduled-only (built per-PR, not
smoke-run) because it builds a DB + temp backing file per input. The serve
target also exercises hostile DB rows (negative/oversized geometry,
invalid formats, orphaned/oversized art, stale binary-tag handles, content-version
mismatch) via the musefs-db fuzzing-gated with_raw_conn, plus binary-tag
streaming and distinct Opus/Vorbis/OggFLAC fixtures.
Independent-reader interop (mutagen)
Asserts that an independent ecosystem reader sees the tags musefs synthesizes, across all five formats:
pip install -r tests/interop/requirements.txt
MUSEFS_INTEROP_DIR=/tmp/i cargo test -p musefs-core --test interop_emit -- --ignored emit_interop_fixtures
MUSEFS_INTEROP_DIR=/tmp/i python -m pytest tests/interop
External-writer contract round trip
CI's contract job mandatorily proves the Python -> Rust DB contract: it builds
the binary, runs each binary-only plugin's musefs_bin tier with
MUSEFS_REQUIRE_BIN=1 (a missing binary fails instead of skipping), and runs the
round-trip harness. The harness is the single source of truth, run locally with:
pip install -r tests/contract/requirements.txt pytest && pip install -e contrib/python-musefs
bash scripts/contract-roundtrip.sh
It scans real ffmpeg-generated audio (so musefs scan owns the track geometry),
writes tags/art through musefs_common.store, synthesizes the served bytes via
cargo test --test contract_emit, and asserts with mutagen that the Python tags
and art survived. Picard's musefs_bin tier runs in the picard job (it needs
the system-Picard environment).
Failure-path fault injection
The reader and DB error paths are exercised under simulated runtime faults.
musefs_core::metrics::set_backing_fault(BackingFault::{Eio,ShortRead})
(behind the metrics feature) installs a process-global fault at the positioned
backing-read site, cleared by the returned RAII guard. Because it is global, the
tests run in their own metrics-gated binaries.
cargo test -p musefs-core --features metrics --test reader_faults
cargo test -p musefs-core --test backing_changed_fault # real file mutation
cargo test -p musefs-core --test db_corruption_fault # byte-corrupt DB
cargo test -p musefs-fuse --features metrics -- --ignored # EIO through the mount (needs /dev/fuse)
BackingChanged (re-validated in HeaderCache::resolve) and DB corruption are
driven by real conditions, not the seam. ENOSPC/read-only faults are write-path
concerns and are out of scope for the read-time suite.
Mutation testing
scripts/mutants.sh wraps cargo-mutants for the logic-bearing crates;
.cargo/mutants.toml permanently excludes the thin glue crates
(musefs-fuse, musefs-cli, musefs) and feature-gated instrumentation.
musefs-latencyfs carries real logic and has its own leg (it needs
/dev/fuse to kill its mutants).
The CI parity check for a branch is the in-diff gate — mutate only the lines your branch changed:
git diff "$(git merge-base main HEAD)...HEAD" -- '*.rs' > mutants.diff
grep -q '^@@ ' mutants.diff # IMPORTANT: an empty diff mutates nothing and exits 0 — a silent false pass
cargo mutants --in-diff mutants.diff -j2 --exclude 'musefs-latencyfs/**' --output /tmp/mutants-out/in-diff
Sharp edges:
-
Check the exit status directly. Don't pipe the run through
tail/grep— that masks the exit code. -
Scratch space and memory. cargo-mutants copies the source tree into a scratch dir under
TMPDIR/MUTANTS_TMP(which must be outside the repo). For a small in-diff mutant set, the default tmpfs/tmpis fine — and faster. For large sets (a full-crate campaign), some mutants are allocation bombs (e.g. a constant-return on a parser position helper spins a collect-loop) that can OOM the host before the test timeout fires: putTMPDIRon real disk and run inside a memory-capped cgroup, e.g.mkdir -p ~/.cache/musefs-mutants-tmp TMPDIR="$HOME/.cache/musefs-mutants-tmp" systemd-run --user --scope --collect \ -p MemoryMax=10G -p MemorySwapMax=0 \ cargo mutants --in-diff mutants.diff -j2 --exclude 'musefs-latencyfs/**' --output /tmp/mutants-out/in-diffscripts/mutants.shalso supports sharding (MUTANTS_SHARD=i/n, used by CI to split the longmusefs-formatleg), though a sharded local workflow hasn't been built out. -
Known-unkillable mutant classes get a documented
exclude_rein.cargo/mutants.toml, not test contortions. Note that cargo-mutants mutatesconstinitializer expressions too — a constant is not a hiding place for arithmetic the gate flags. -
exclude_reentries are guarded against drift. A few exclusions must pin a specificfile:line:col:(the operator+function alone isn't unique in the function); those coordinates rot silently whencargo fmtshifts the code, and a stale anchor can re-point onto a killable mutant — a silent false pass.scripts/check_mutant_anchors.pyprevents that: it lists the full unfiltered mutant set (cargo mutants --no-config --list --json) and re-validates everyexclude_reentry. It runs in the per-PRin-diffjob (.github/workflows/mutants.yml) and its unit tests run in CI'spython-musefsjob. Run it locally with:cargo mutants --no-config --list --json > /tmp/mutants-list.json python3 scripts/check_mutant_anchors.py --mutants-json /tmp/mutants-list.jsonEach entry carries a machine-checked
# guard:comment on the line directly above it:file:line:colanchors —# guard: op="<" fn="probe_file" rows=3. The guard asserts the matched mutants all share that operator and function, occupy one site, and number exactlyrows(usefn=""for a const-level site with no enclosing function). A narrowing entry (one that embeds a replacement to leave same-site siblings killable) setsrowsto that subset's size.- description anchors —
# guard: count=N(default 1) asserts the entry matches mutants spanning exactlyNdistinct sites; this is what catches a newly-added killable sibling silently joining the match set. A bare single-site description entry needs no tag.
When the guard fails: a
found nonemessage means a line:col anchor drifted — re-anchor it to the current coordinates from the listing and re-confirm the mutant there is still genuinely equivalent (a reformat can change surrounding logic, not just line numbers). Acount/rowsmismatch means a sibling appeared or disappeared — investigate before bumping the number. Purecargo fmt/line-shift drift can often be repaired automatically withpython3 scripts/check_mutant_anchors.py --fix, which re-points an anchor to its current coordinates by operator+function. It only does so when the mapping is unambiguous — every same-operator site in the function is anchored, so the positional match is exact. An anchor that pins one of several same-operator sites (the usual reason it is afile:line:colanchor rather than a description) cannot be derived from the tag alone, so--fixleaves it for manual re-anchoring and reportscan't auto-derive the coordinate; it also declines when a site was added or removed. Always eyeball the resulting diff before committing. Every newfile:line:colexclusion needs a# guard:tag (the guard rejects an untagged one), andexclude_repatterns must stay within the Rust-regex/Python-reshared subset the guard allows (\. \d + | ^ ( ) *, no inline(?...)groups).
Performance regression gating
cargo test -p musefs-core --features metrics includes
tests/perf_counters.rs: golden assertions on deterministic work counters
(preads, pread_bytes, scan_bytes_read, art/binary-tag chunks) for the
read/serve and ingest paths, plus a tree.rs unit test pinning the refresh
rebuild count as size-invariant. These are a hard gate — a legitimate change to
read/ingest/refresh work must update the golden numbers in the same PR. They run
on every non-doc PR via CI's check job. Constant-factor (wall-clock) changes
are surfaced separately by the warn-only perf-ab job (below).
The A/B benchmark runs only when musefs-core/src/** or musefs-format/src/**
change. The perf-bench matrix job benches the base and PR commits in parallel
on separate runners (one ref each), then the perf-ab job downloads both
exported baselines and posts a critcmp delta as a sticky PR comment. It is
warn-only and not a required check — GH runner noise (now including
cross-runner variance) makes wall-clock unfit for hard gating. Reproduce locally
on one machine with scripts/perf-ab.sh <base-sha> out.md.
Concurrency + sanitizers
Concurrent-reader coverage exists at two levels:
cargo test -p musefs-core --test concurrent_reads # core: HeaderCache + WAL reads (default suite)
cargo test -p musefs-fuse --test concurrent_reads -- --ignored # mount: DbPool::PerThread (needs /dev/fuse)
CI runs the core test under AddressSanitizer as a required gate (asan job)
and both tests under ThreadSanitizer as a non-required best-effort signal
(tsan job, continue-on-error). TSan cannot instrument the system C libraries
(libfuse, libsqlite3), so it is a signal, not a gate. ASan is ABI-compatible with
an uninstrumented std, but TSan is not — so the TSan command needs -Zbuild-std
(and the rust-src component) to rebuild std with the sanitizer. Reproduce
locally with:
rustup toolchain install nightly
rustup component add rust-src --toolchain nightly # for TSan's -Zbuild-std
RUSTFLAGS="-Zsanitizer=address" ASAN_OPTIONS="detect_leaks=0" \
cargo +nightly test -p musefs-core --test concurrent_reads --target x86_64-unknown-linux-gnu
RUSTFLAGS="-Zsanitizer=thread" TSAN_OPTIONS="halt_on_error=0" \
cargo +nightly test -p musefs-core -Zbuild-std --test concurrent_reads --target x86_64-unknown-linux-gnu
Coverage
cargo install cargo-llvm-cov
cargo llvm-cov --workspace --exclude musefs-fuse --exclude musefs-latencyfs --open
cargo llvm-cov --workspace --exclude musefs-fuse --exclude musefs-latencyfs --lcov --output-path lcov.info
musefs-fuse and musefs-latencyfs are excluded because these FUSE crates'
tests need a real mount; their behavior is covered by the separate e2e CI
job rather than llvm-cov. The CI e2e job also runs the binary-level
cargo test -p musefs -- --ignored and
cargo test -p musefs-latencyfs -- --ignored suites so they cannot silently
rot (they require /dev/fuse + fusermount3). CI (coverage.yml) runs this on every push/PR and
uploads to Codecov (CODECOV_TOKEN repo secret).
Conventions & adding a format
Code conventions
- Errors. Each crate has its own
error.rswith athiserrorenum;musefs-corewraps lower layers inCoreError; the CLI is the onlyanyhowconsumer. Internal error paths never discard diagnostics: noResult<_, ()>, no.map_err(|_| …)that drops a source — each variant carries its source (#[from]) or a static reason naming the broken invariant. - Integer conversions. The four clippy cast lints are deny-via-CI.
Widenings use
From;u64 -> usizeonly via the sanctionedusize_fromhelpers (musefs_db::convert, re-exported by core;musefs-formatandmusefs-latencyfscarry crate-local siblings — the workspace is declared 64-bit-only); genuine narrowings usetry_from(?for input-dependent values,.expectfor structurally bounded ones,.unwrapin tests); deliberate bit-truncation keepsasunder a reasoned#[expect]. Non-negative DB row fields are unsigned; rusqlite's checked conversions (featurefallible_uint) validate at the row boundary. - Lint policy.
clippy::pedanticminus a few intentional/noisy groups, defined in the rootCargo.tomlunder[workspace.lints]. The hook and CI deny all warnings. - Unsafe code.
unsafe_code = "deny"is set for the workspace members in the rootCargo.toml([workspace.lints.rust]); the standalonefuzz/crate is outside the workspace and is not covered. A genuinely-necessaryunsafeis opted in per-site with#[expect(unsafe_code, reason = "...")]— never a bareunsafeblock and never by relaxing the workspace lint, so everyunsafeis greppable and review-visible. Prefer a safe crate (e.g.rustixfor syscalls) over hand-rolled FFI. - Layering. Keep
musefs-fuse,musefs-cli, and themusefsbinary thin; cross-cutting logic belongs inmusefs-core(see the crate layout). - Hidden API consumers.
benches/directories and each crate'stests/are compiled only by--all-targets: after an API change, compile-check withcargo clippy --all-targets, notcargo build.
Adding a format
- Implement probe +
synthesize_layoutinmusefs-format(mirror an existing module —flac.rs,mp3.rs,mp4.rs,ogg/,wav.rs), returning aRegionLayout. - Add the variant to
musefs-db'sFormatenum, then wire it into thematch track.formatarms inreader::HeaderCache::resolve(musefs-core/src/reader.rs) and intoscan.rs(extension list, probe dispatch). - Extend the test surface: a
fuzz_check::fixtures::<fmt>()minimal file, afuzz/fuzz_targets/<fmt>.rstarget with a seed ingenerate_seeds, amusefs-format/tests/proptest_<fmt>.rs, and a manifest row inmusefs-core/tests/interop_emit.rs. - Write
docs/<FMT>.md(follow the shape of the existing five).
Python plugins
Python plugins (contrib)
The four packages share one drift-guarded contract; see the contrib ecosystem for the layout and the integration pages for plugin-specific setup.
# python-musefs: self-contained
cd contrib/python-musefs && python -m pytest && ruff check . && ruff format --check .
# beets: install the local python-musefs first so the suite tests the working
# tree, not the PyPI release (see the beets integration page for the venv flow)
cd contrib/beets && pip install -e ../python-musefs && pip install -e ".[test]" && python -m pytest tests
# picard: no install needed (vendored + pythonpath=".")
cd contrib/picard && python -m pytest tests
# lidarr: install the local python-musefs first so the suite tests the working
# tree, not the PyPI release (see the lidarr integration page for the env flow)
cd contrib/lidarr && pip install -e ../python-musefs && pip install -e ".[test]" && python -m pytest tests
Gotchas that have bitten before:
- On PEP 668 "externally managed" systems, bare
pip installfails — use a venv for the beets suite. - The real-Picard tests
importorskipPicard and Qt: without an importable Picard (e.g. the system package onPYTHONPATH), they silently skip. When touching the Picard plugin, make sure they actually ran. - The Lidarr integration is gated by two automated tiers, both deterministic
and network-free (Lidarr's metadata server is mocked too):
- PR check —
.github/workflows/lidarr-smoke.yml(scripts/lidarr-smoke.sh): a fast smoke that proves the Custom Script exec path on a real Lidarr (its Test event) and runs the content leg (musefs-lidarr-synctag-writes,musefs-lidarr-importsymlink, served-mount tags, unchanged bytes) against a local mock Lidarr API. Runs on PRs touching the Lidarr surface. - Release gate —
.github/workflows/lidarr-e2e.yml(scripts/lidarr-e2e/run-e2e.sh): the full real-instance e2e. A real Lidarr, driven by local metadata/indexer/qBittorrent mocks, performs a genuine download-client import of a real CC0 album as aNewDownload, firingOnReleaseImport, which execs the real musefs scripts; the served mount is then asserted to carry Lidarr-supplied metadata the backing file lacked, bytes unchanged. This gates the Pythonpy-v*publish and closes what used to be the manual download-client gap. The vendored CC0 fixture isscripts/lidarr-e2e/fixtures/.
- PR check —
musefs_common/schema.pyis generated frommusefs-db/src/schema.rs. After a schema change:MUSEFS_REGEN_SCHEMA_PY=1 cargo test -p musefs-db schema_py, then re-vendor Picard's copy withpython contrib/python-musefs/vendor_to_picard.py. Drift is enforced by amusefs-dbunit test and the Picard vendor-sync test.MAX_ART_BYTESincontrib/python-musefs/src/musefs_common/constants.pyis hand-mirrored frommusefs-core/src/scan.rs— update both sides together.
Releasing
Releasing the Python packages
The contrib/ Python packages (python-musefs, beets-musefs,
lidarr-musefs, and the unpublished musefs-picard) share a single version,
decoupled from the Rust crates and released on a py-v* tag. musefs-picard
tracks the version but is not uploaded to PyPI (Picard has its own plugin
registry; the shared library is vendored into it).
One-time setup (before the first release). Trusted Publishing fails until
the publisher exists on PyPI. For each of python-musefs, beets-musefs, and
lidarr-musefs:
- Create/reserve the project on PyPI.
- Add a GitHub Actions trusted publisher pointing at: owner/repo
Sohex/musefs, workflowrelease-python.yml, environmentpypi.
Also create a GitHub environment named pypi in the repo settings (it gates the
publish job).
Cutting a release:
- Choose the new version
X.Y.Zand runpython scripts/bump_python_version.py X.Y.Z. This rewrites everycontrib/*/pyproject.tomlversion, the__version__strings, thepython-musefs>=dependency floors, and re-vendors python-musefs into the Picard plugin. - Review
git diff— it should touch only the version/floor lines and the Picard vendored_common/copy. - Promote the
## [Unreleased]section ofcontrib/CHANGELOG.mdto## [X.Y.Z] - <date>. - Commit, then tag and push:
git commit -am "release: python packages X.Y.Z" git tag py-vX.Y.Z git push origin HEAD --tags release-python.ymlruns the version gate, the four Python test suites, then publishespython-musefs,beets-musefs, andlidarr-musefsto PyPI (in that order).
Releasing the Rust crates and binaries
The Rust workspace publishes to crates.io and ships prebuilt cross-compiled
binaries on a v* tag, decoupled from the Python py-v* flow. release.yml
runs one ordered graph — gate → build → smoke → publish → release-assets —
and is the source of truth; this checklist is the human side.
Pre-flight.
-
Working tree clean, on the commit you intend to release.
-
Confirm
mainis green (CI + coverage). The tag push triggers a freshci.ymlandcoverage.ymlrun, and the releasegatejob waits forci-okandcoverage-okto be green on the tagged commit before anything builds or publishes — a red tree blocks the release automatically. -
CARGO_REGISTRY_TOKENis present in repo secrets. -
Smoke-build every cross target so
jemalloc-sysis known to compile under zig before tagging (the release matrix builds with thejemallocfeature on):for t in x86_64-unknown-linux-gnu.2.17 aarch64-unknown-linux-gnu.2.17 \ x86_64-unknown-linux-musl aarch64-unknown-linux-musl; do cargo zigbuild --release -p musefs --target "$t" doneIf a target cannot build
jemalloc-sys, add--no-default-featuresto that matrix entry'scargo zigbuildinrelease.yml, rather than blocking the release. The Docker imagesCOPYthe binary this step produces (they don't run cargo), so the matching container inherits the opt-out automatically.
Version bump (do this in one commit before tagging).
- Pick the new version
X.Y.Z. - Bump the workspace
versioninCargo.toml. - Bump every internal
musefs-*path-dependency constraint that pins the old version (e.g.musefs-db = { version = "X.Y.Z", path = "..." }) — a stale internal floor fails the publish. - Promote the
## [Unreleased]section ofCHANGELOG.mdto## [X.Y.Z] - <date>. - Dry-run package each crate:
cargo package -p <crate> --lockedfor each ofmusefs-db musefs-format musefs-core musefs-fuse musefs-cli musefs. This catches packaging errors but not the cross-crate index-propagation problem (it resolves siblings via path deps); that is handled in-workflow (next section). - Commit, e.g.
git commit -am "release: vX.Y.Z".
Tag and push.
git tag vX.Y.Z
git push origin HEAD --tags
The tag push starts both CI and release.yml. The gate job blocks publishing
until ci-ok + coverage-ok are green on the tagged tree (45-minute timeout,
covering the full matrix including the FreeBSD VM e2e).
What release.yml does.
gate— verifies the tag matches the workspace version and waits for the required CI checks to pass on the tagged commit (fails closed on a failed check or timeout).build— cross-compiles the four target binaries.smoke— runs the binary smoke on each target (host + Alpine).publish— publishes crates in dependency order. For each crate it skips the publish ifname@versionalready resolves from the crates.io index, then waits for that version to appear before publishing the next dependent crate (index-propagation; #163). The skip makes a whole-workflow re-run after a partial failure safe.release-assets— creates/updates the GitHub Release and uploads the binary tarballs + checksums (only after crates publishing succeeds).
Retry / rollback.
- crates.io is yank-only — a published version cannot be un-published.
- A partial failure (e.g. crate 3 of 6 published, then a transient error) is
recovered by re-running the workflow: the publish loop skips the crates
already in the index and resumes, then runs
release-assets. No manual cleanup of the published crates is needed. - GitHub asset upload is idempotent (
gh release upload --clobber), so re-runs re-upload safely.
Post-release verification.
cargo install musefs(orcargo install musefs --version X.Y.Z) from a clean machine/container.- Download a release tarball and verify its checksum:
sha256sum -c musefs-X.Y.Z-<triple>.tar.gz.sha256. - Confirm all four target tarballs +
.sha256files are attached to the GitHub Release.
Lidarr gate at a v1.0.0 milestone. The Lidarr real-instance e2e
(lidarr-e2e.yml) gates the Python py-v* release, not this Rust flow. When a
v1.0.0 milestone bundles both, ensure the Python release (and therefore its
Lidarr e2e gate) is also run.
PRs & commits
- Conventional-style subjects (
fix(format): …,docs: …,ci: …), scoped and imperative. mainis protected by required status checks: theci-okandcoverage-okaggregator jobs must pass. CI also runs the fuzz smoke build, the in-diff mutation gate, and a security audit on PRs. Docs-only changes skip the expensive jobs at the job level — the aggregators still report.- Benchmark results, when a change warrants them, are recorded in Benchmarks.
Before you push
The pre-commit hook already gates fmt, clippy, the workspace tests, and the Python/shell/YAML lints on every commit. What it does not run — check the ones your change triggers:
- Logic changes → the in-diff mutation gate. It is CI parity, not optional polish.
- Format-layer API changes →
cargo +nightly fuzz build; thefuzz/crate is outside the workspace, so nothing else compiles it (coverage-guided fuzzing). musefs-dbschema changes → regenerate and re-vendor the Python schema mirror (Python plugins).- Picard plugin changes → make sure the real-Picard tests actually ran rather than silently skipped (gotchas).
- FUSE/mount-surface changes → run the
--ignorede2e suite locally (Build & test); the FreeBSD CI leg only runs on PRs that touch that surface.
Benchmarks
Every optimization pass re-measured apples-to-apples on one box as a
PR-isolated before/after pair, plus a cumulative 16caba4→main summary. This
file is performance only — correctness gates (byte-identical proptests, FUSE
e2e, in-diff mutation) live in CI and the contributor guide, not
here.
Read it in three layers:
- Results at a glance — the cumulative per-subsystem delta and a one-line headline per pass.
- Methodology — machine, before/after definition, the overlay rule, run conventions, storage placement. Written once; every detail section assumes it.
- Per-pass detail — one section per pass: what changed, the before/after table, the reproduce command, and the "why" where it matters.
Results at a glance
Cumulative — 16caba4 → current main (e02223e)
Composed from the per-pass isolated deltas below, anchored to current-main
absolutes. Non-isolating: a same-harness run at both ends is infeasible (API
drift means neither the 16caba4-era harness nor the main harness compiles at
the other commit), so these compose the chain of passes that touched each
subsystem rather than a single end-to-end measurement. See
Cumulative detail for the absolutes and the per-pass
composition.
| Subsystem | Headline metric | 16caba4-era | current main | Δ | Dominant pass |
|---|---|---|---|---|---|
| Ingest | fsync count (durable) | 403 | 0 | eliminated | SP1 |
| cold scan, ci flac | 32 206 ms | 47 ms | ~685× | SP1 | |
| Refresh | refresh-1 @ 20 000 tracks | 173 ms | 1 ms | ~173× | #69 |
| Serve | sequential_read/flac | 929 µs | 569 µs | −38.8% | SP3 + PR3 |
| cold_first_read/ogg | 14.96 ms | 1.51 ms | −89.9% | SP4 | |
| concurrent m16+walker | 8.20 ms | 4.15 ms | −49.4% | SP3 + PR3 |
Per-pass headlines
Each headline is the pass's single largest statistically-significant delta on its deployment-representative tier.
| Pass | Commit | Headline (this box) |
|---|---|---|
| SP1 — ingestion scalability | ccbbfaa | durable cold scan ~1150–3600× faster; fsync storm 403→0 |
| SP2 — incremental tree refresh | ed5f380 | 5 000-track refresh-1 1.4× (32→23 ms) |
| SP3 — read/serve residuals | e8d56bd | sequential_read −8 to −13% (flac/mp3/m4a/m4a-last) |
| SP4 — storage-aware Ogg serving | a62453b | ogg cold-read −88%, seek −94% |
| #69 — refresh O(changed) | e7ae912 | refresh-1 @ 20 000 ~170× (173→1 ms) |
| #114 — root fan-out lookup | 0881b31 | root fan-out @ 20 000 ~5× (5→1 ms) |
| PR2 — scan pair (#67/#68) | 2d4faf3 | −128 B/file scan I/O (flac/ogg/wav); wall within noise |
| PR3 — serve-path copies (#70) | 32be8f0 | sequential_read −7 to −11% (m4a-last/ogg/wav); concurrent −19% |
| #136 — HeaderCache quick_cache | 2e6674e | within noise (marginal m4a/ogg sequential) |
| #112 — StructureOnly passthrough | faec017 | passthrough dd 3.36× (2.5→8.4 GB/s) |
One direction inverted vs the historical file: SP1 §4 (compute-isolated, on RAM) is now faster after the change, not slower. The old file recorded SP1 as ~1.9× slower on RAM-backed tempfs (the "honest cost" of the pipeline); on this 8-core box the parallel pipeline wins even on RAM (~1.4×), at higher peak RSS. See SP1 §4.
CI regression gating
BENCHMARKS.md records hand-run absolute numbers; CI guards against regressions
in three lanes:
- Counter gate (every non-doc PR, hard).
perf_counters.rs+tree.rsgolden work-counter assertions under--features metrics. Catches algorithmic regressions (extra copy, whole-file slurp, O(N) tree rebuild). - A/B wall-clock (warn-only, core
srcPRs). Theperf-benchmatrix job benches the base and PR commits in parallel on separate runners; theperf-abjob then diffs the two exported baselines and posts acritcmpdelta as a PR comment. Never blocks. - Release record. The
benchmarksjob runs the full bench suite at thecitier on a tag and uploads the numbers as an artifact for curation here.
The fsync-storm (403→0) signal needs a real FUSE mount and lives only in the
release lane / the #[ignore] bench_scan_under_latency, not the per-PR gate.
The release artifact is named benchmark-snapshot-<tag>; download it from the
tag's workflow run. The job runs on a GitHub-hosted ubuntu-latest runner,
not the dedicated box the rest of this file uses, so its wall-times are
runner-relative and are not folded into the per-pass tables — only the
portable signals (bytes_read, pread/fsync counts, refresh flatness) are
cross-comparable. Each release's snapshot is recorded verbatim under
Release CI snapshots.
Release CI snapshots
Per-tag records from the benchmarks release job
(CI regression gating §3), run on a GitHub-hosted
ubuntu-latest runner at the ci tier. This is not the dedicated box the
per-pass tables use, so the wall-times here are
runner-relative and are a point-in-time record per release — not comparable to
those sections. The portable signals (scan_bytes_read, pread/fsync
counts, refresh flatness) are comparable, and are the no-regression check.
v1.1.0 — f865afc, single run
No regression vs the curated tables: scan_bytes_read is unchanged from
PR2 (flac/ogg/wav = 845 000 / 847 400 / 828 000 B — the
−128 B/file ID3v1 gating still holds; mp3 847 200 B unchanged; m4a uses the
seek-reader, 0 B), and single-track refresh stays flat with library size.
read_throughput (Criterion, median estimate, µs):
| bench | flac | mp3 | m4a | m4a-last | ogg | wav |
|---|---|---|---|---|---|---|
| sequential_read | 416 | 418 | 417 | 419 | 524† | 422 |
| cold_first_read | 798 | 778 | 805 | 804 | 911 | 792 |
| seek_read | 368 | 353 | 372 | 370 | 582 | 365 |
concurrent_read_walk/m8_plus_walker: 931 µs.
† sequential_read/ogg collected only 10k iterations with 19% outliers
(Criterion low-sample warning) — treat as noisy.
bench_ingest — ci tier (200 tracks × 4 KiB), runner tmpfs:
| format | scan (ms) | revalidate (ms) | scan_bytes_read (B) | RSS (KiB) |
|---|---|---|---|---|
| flac | 31 | 1 | 845 000 | 7100 |
| mp3 | 82 | 1 | 847 200 | 7164 |
| m4a | 86 | 1 | 0 | 7180 |
| m4a-last | 87 | 1 | 0 | 7184 |
| ogg | 83 | 1 | 847 400 | 7184 |
| wav | 83 | 1 | 828 000 | 7184 |
bench_refresh — ci tier, single-track re-tag:
| library size | refresh-1 (ms) | root-fanout-1 (ms) |
|---|---|---|
| 100 | 0 | 0 |
| 1000 | 1 | 1 |
| 5000 | 12 | 3 |
| 20000 | 7 | 9 |
refresh-1 vs refresh-N (200-track ci, same instance): refresh-1 0 ms, refresh-N (100 touched) 6 ms. The 5000 > 20000 inversion is single-run noise on the shared runner; the dedicated-box #69 sweep is the clean flat signal.
Methodology
Machine
| CPU | 8 cores |
| RAM | 32 GB (31 GiB) |
Durable storage (/data) | btrfs, 2-device span (sda3+sdb3), rotational; Data: single, Metadata: RAID1; zstd:1. No SSD on this box. |
RAM storage (/dev/shm) | tmpfs |
| Toolchain | rustc 1.96.0 · release builds |
| Kernel | Linux 7.0 (FUSE passthrough requires ≥6.9 + CAP_SYS_ADMIN) |
Before / after definition
History is squash-merged (linear), so each pass is one commit:
- after = the pass's own squash-merge commit.
- before = its parent,
<after>^— PR-isolated, not currentmain. This preserves attribution (each delta is exactly what that PR changed) and avoids harness drift from later passes.
The overlay rule
Two passes (SP2, SP4) report a bench that did not yet exist at their
before commit. For those, the after-commit's harness file is checked out onto
the before checkout (git checkout <after> -- <bench_file>) so the old code is
measured with the new harness. Overlay use is called out in each affected
section.
Run conventions
bench_ingest/bench_refresh(ignored tests,cargo test --release … -- --ignored): 3 runs, median reported (spread noted where it matters).bench_ingestneeds--features metrics.read_throughput(Criterion bench): Criterion's own sampling; before side saved with--save-baseline, after side compared with--baseline. Reported Δ is Criterion's change estimate.- Wall times on
/dataare box-relative (rotational disk); where a portable signal exists (fsync count, bytes_read, pread count) it is the primary number.
Storage placement
- Durable rows run on
/data(rotational btrfs).bench_ingesthonorsMUSEFS_BENCH_DIR. - RAM rows run on
/dev/shm(tmpfs).bench_ingesthonorsMUSEFS_BENCH_DIR=/dev/shm/…;bench_refreshandread_throughputignore it and followTMPDIR=/dev/shm.
Per-pass detail
SP1 — Ingestion scalability
ccbbfaa^ → ccbbfaa. bench_ingest, --features metrics. No overlay.
What changed: whole-file fs::read slurp + per-file commits at
synchronous=FULL → bounded probing reads + parallel-probe/single-writer
pipeline + per-batch transactions at synchronous=NORMAL (WAL retained).
1. Durable small files — the fsync/batching win
ci tier (200 tracks × 4 KiB, no embedded art), corpus + DB on /data. Not
compute-bound — the before path is dominated by per-file fsync latency.
| format | before scan (ms) | after scan (ms) | speedup |
|---|---|---|---|
| flac | 32 206 | 21 | 1534× |
| mp3 | 16 124 | 14 | 1152× |
| m4a | 30 089 | 19 | 1584× |
| m4a-last | 39 592 | 11 | 3599× |
| ogg | 16 153 | 14 | 1154× |
| wav | 15 574 | 12 | 1298× |
2. Durable large files — bounded reads + batching
bandwidth tier (1000 tracks × 30 MiB FLAC + art ≈ 30 GiB), on /data, 1 run.
| metric | before (slurp) | after (bounded) | Δ |
|---|---|---|---|
| scan wall (ms) | 378 041 | 15 228 | 24.8× faster |
| revalidate (ms) | 243 | 14 | 17.4× |
| peak RSS (KiB) | 98 636 | 132 436 | 0.74× (more) |
The after path reads only a ~1 MiB metadata window per file instead of slurping each 30 MiB file in full.
3. fsync count — the mechanism
ci tier (200 FLAC) scanned through the passthrough latency-FS (ssd profile),
which counts fsyncs at the FUSE layer. Wall is box-relative (rotational /data);
the fsync count is the portable signal.
| config | fsyncs | scan wall (ms, box-relative) |
|---|---|---|
before (synchronous=FULL, per-file commits) | 403 | 79 |
after (synchronous=NORMAL, batched commits) | 0 | 21 |
The 403→0 collapse is the root cause of §1's durable speedups.
4. Compute-isolated (RAM) — the trade, now a win on this box
large-compute tier (100k tracks × ~38 KiB FLAC) on /dev/shm (RAM), where
fsync is free — so the §1/§3 batching win is neutralized and only raw compute
remains. bytes_read ≈ 3.92 GiB both sides (the 38 KiB files are below the 1 MiB
window, so bounded reads don't help).
| config | before scan (ms) | after scan (ms) | revalidate before→after (ms) | peak RSS before→after (KiB) |
|---|---|---|---|---|
| default jobs | 31 241 | 22 295 | 2239 → 1278 | 27 904 → 96 084 |
--jobs 1 | 31 111 | 23 565 | 2255 → 1283 | 28 024 → 92 200 |
Finding — direction inverted vs the historical file. The old file (6-core EPYC) recorded SP1 as ~1.9× slower on RAM — the deliberate "honest cost" of the pipeline where there is no fsync win to amortize. On this 8-core box the parallel pipeline is ~1.4× faster even on RAM (the extra cores outweigh the per-file coordination), at the cost of ~3.4× peak RSS (96 MB vs 28 MB). The trade has shifted from "small RAM loss" to "RAM win for more memory" on wider hardware.
# durable §1/§2: MUSEFS_BENCH_DIR on /data ; RAM §4: MUSEFS_BENCH_DIR on /dev/shm
MUSEFS_BENCH_TIER=ci MUSEFS_BENCH_DIR=/data/bench \
cargo test --release -p musefs-core --features metrics --test bench_ingest \
-- --ignored --nocapture bench_cold_scan_and_revalidate
# §3 fsync count:
MUSEFS_BENCH_LATENCY_PROFILE=ssd MUSEFS_BENCH_TIER=ci MUSEFS_BENCH_FORMAT_MIX=flac \
cargo test --release -p musefs-core --features metrics --test bench_ingest \
bench_scan_under_latency -- --ignored --nocapture
SP2 — Incremental tree refresh
ed5f380^ → ed5f380. bench_refresh, RAM (TMPDIR=/dev/shm).
Overlay: the bench_refresh_one_across_library_sizes sweep didn't exist at
ed5f380^, so the after-commit harness is overlaid on the before checkout.
What changed: replace the O(N) VirtualTree::build_with full reconstruction
with apply_changes (in-place im-backed tree mutation) — only nodes whose id
appears in the changed/added/removed sets are touched.
ci tier, FLAC, single-track re-tag, 3 runs (median):
| library size | before (ms) | after (ms) | speedup |
|---|---|---|---|
| 100 | 0 | 0 | n/a (sub-granularity) |
| 1000 | 5 | 6 | 0.83× (noise tier) |
| 5000 | 32 | 23 | 1.39× |
Why (Stage A → Stage B): at Stage A the rebuild already rendered
incrementally (only the changed track re-rendered, O(changed)), but the
subsequent VirtualTree::build_with reconstructed the whole tree from scratch
(O(N)) — the remaining linear cost. Stage B's apply_changes removes that full
reconstruction; the residual slope (still ~23 ms at 5000) is the lighter O(N)
render-key scan + HashMap rebuild that feeds apply_changes, not a full tree
rebuild. The speedup grows with library size because diff cost is proportional to
changes, not total entries. (Corpus is single-album, so build_with time is
slightly optimistic vs a real multi-album library.)
cargo test -p musefs-core --release --test bench_refresh \
bench_refresh_one_across_library_sizes -- --ignored --nocapture
SP3 — Read/serve residuals
e8d56bd^ → e8d56bd. Criterion read_throughput, RAM. No overlay.
What changed: (1) read_segments writes each BackingAudio run directly into
the output buffer's reserved tail (no throwaway vec![0u8; n] + copy); (2)
handles: Mutex<HashMap> → lock-free sharded_slab::Slab; (3) size_cache: Mutex<HashMap> → dashmap::DashMap.
sequential_read — per-format (4 MiB files, 128 KiB reads)
| format | before (µs) | after (µs) | time Δ | thrpt Δ |
|---|---|---|---|---|
| flac | 929.1 | 839.6 | −7.9% | +8.6% |
| mp3 | 940.2 | 824.8 | −13.1% | +15.1% |
| m4a | 939.8 | 824.2 | −10.8% | +12.2% |
| m4a-last | 938.0 | 842.6 | −10.3% | +11.4% |
| ogg | 966.8 | 1049.4 | +6.3% | −5.9% |
| wav | 935.4 | 912.3 | −2.5% | +2.5% |
The metadata-light formats improve 8–13% from dropping the per-splice alloc+copy. ogg +6.3% is a low-iteration sampling anomaly (Criterion warned "Unable to complete 100 samples in 5.0s" — only 5050 iterations vs 10k for other formats).
concurrent_read_walk/m16_plus_walker
16 reader threads + one metadata walker sharing one Arc<Musefs> (includes thread
spawn/join):
| before (ms) | after (ms) | Δ | |
|---|---|---|---|
| m16_plus_walker | 8.20 | 9.48 | +15.7% |
This high-variance burst metric regressed on this run — attributable to thread spawn/join overhead in the contention path rather than the read path itself; it is not a sequential-read regression. (The old file recorded this bench as parity/improved; it swings run-to-run.)
cargo bench -p musefs-core --bench read_throughput -- sequential_read concurrent_read_walk
SP4 — Storage-aware Ogg serving
a62453b^ → a62453b. Criterion read_throughput + latency-injected read.
Overlay: cold_first_read/seek_read were added by SP4, so the after-commit
bench is overlaid on the before checkout.
What changed: replace the eager whole-region Ogg page index with a stateless
per-request backwards-scan: find_page_start locates the containing page from a
~65 KB window (CRC-validated entry guard), serve_ogg_window patches each page
header algebraically (crc_shift_zeros, no payload I/O), and a one-entry
last_page memo short-circuits the scan + CRC guard when the next request lands
inside the already-located page.
sequential_read — warm repeat-read (no page-index amortization to win)
| format | before (µs) | after (µs) | Δ |
|---|---|---|---|
| flac | 856.2 | 880.5 | +2.8% |
| mp3 | 847.7 | 894.5 | +5.5% |
| m4a | 862.5 | 816.9 | −5.3% |
| m4a-last | 872.7 | 831.6 | −4.7% |
| ogg | 1037.9 | 1048.2 | +1.0% |
| wav | 892.6 | 840.8 | −5.8% |
cold_first_read / seek_read — the Ogg win
| bench | format | before | after | Δ |
|---|---|---|---|---|
| cold_first_read | ogg | 14.956 ms | 1.799 ms | −88.0% |
| seek_read | ogg | 13.541 ms | 827 µs | −93.9% |
Non-ogg cold/seek stay within ±7% (no page index involved). The wins come from
never building the whole-file index up front — the old code reads the entire
prefix to serve even one chunk near EOF; SP4 scans ~65 KB backward, then the memo
carries the validated page forward. sequential_read/ogg is flat (+1.0%) because
it reads the full file linearly regardless — the win is cold-start and seek.
Latency-injected reads (bench_read_under_latency, nfs-hdd) — AFTER only
This bench was introduced by SP4; no before baseline exists.
| label | format | tier | storage | wall (ms) | opens | preads |
|---|---|---|---|---|---|---|
| read_whole_cold | ogg | ci | nfs-hdd | 28 | 1 | 0 |
| read_seek_cold | ogg | ci | nfs-hdd | 28 | 1 | 0 |
preads=0: the backwards-scan reads are served from the layout's inline/generated
segments without reaching the backing file. Near-equal whole/seek wall time
indicates per-file open+resolve latency dominates under nfs-hdd; the local
cold/seek benches above are the clean signal.
Why crc_shift_zeros is a hybrid
patch_page_header_algebraic advances the CRC past a page's payload via
crc_shift_zeros. The per-step loop is O(n) and dominated linear sequential_read
on max-size 65 KB pages; a GF(2) matrix-power method is O(log n) but carries a
fixed ~32-matmul cost, so it is slower for the small pages real Opus/Vorbis
streams carry. The evolution across implementations (ogg benches):
| ogg bench | linear crc | +matrix | +matrix +memo-amortized guard (shipped) |
|---|---|---|---|
| sequential_read | 17.6 ms | 6.40 ms | 0.93 ms |
| cold_first_read | ~17 ms | 7.42 ms | 1.61 ms |
| seek_read | — | 821 µs | 829 µs |
Shipped as a hybrid: per-step loop below n=16384, matrix at/above; a differential test covers both paths + the boundary.
cargo bench -p musefs-core --bench read_throughput -- cold_first_read seek_read sequential_read
MUSEFS_BENCH_LATENCY_PROFILE=nfs-hdd cargo test --release -p musefs-core \
--features metrics --test bench_ingest bench_read_under_latency -- --ignored --nocapture
#69 — Refresh O(changed)
e7ae912^ → e7ae912. bench_refresh, RAM. No overlay.
What changed: changelog-driven change detection (changelog_since +
render_keys_for on just the changed ids) replaces the O(N) render-key scan, and
collision-gated apply_changes dirtying stops the old parent chain from being
rebuilt unconditionally. Refresh-1 cost becomes O(changed).
Single-track refresh vs library size (3 runs, median)
A single-track re-tag moves the track out of its shared album dir — the structural worst case for a flat corpus (one artist / one album, N siblings).
| library size | before — full rebuild (ms) | after — O(changed) (ms) | factor |
|---|---|---|---|
| 100 | 0 | 0 | — |
| 1000 | 6 | 0 | ∞ (sub-ms) |
| 5000 | 33 | 0 | ∞ (sub-ms) |
| 20000 | 173 | 1 | ~170× |
The after sweep is flat: refresh-1 @ 20 000 is within 1 ms of @ 100, against a linear ~170 ms slope before.
One-vs-many (same Musefs instance, 200-track ci tier)
| label | wall (ms) |
|---|---|
| refresh-1 | 0 |
| refresh-N (100 touched) | 4 |
refresh-N scales with the touched set, not the library.
# before (apply the 4-point sweep edit first):
sed -i 's/\[100usize, 1000, 5000\]/[100usize, 1000, 5000, 20000]/' musefs-core/tests/bench_refresh.rs
cargo test -p musefs-core --release --test bench_refresh \
bench_refresh_one_across_library_sizes -- --ignored --nocapture
cargo test -p musefs-core --release --test bench_refresh \
bench_refresh_one_vs_many -- --ignored --nocapture
#114 — Rendered child lookup (root fan-out)
0881b31^ → 0881b31. bench_refresh, RAM. Overlay: the
bench_refresh_root_fanout_one_across_library_sizes bench was added by #114, so
its harness is overlaid on the before checkout.
What changed: a rendered-name child index turns the root sibling scan in
deepest_existing_ancestor into an indexed miss. The corpus uses N top-level
artist directories; the timed update retags one track to fallback Unknown/…,
exercising an absent rendered-name lookup at root.
| library size (top-level artists) | before (ms) | after (ms) |
|---|---|---|
| 100 | 0 | 0 |
| 1000 | 0 | 0 |
| 5000 | 2 | 0 |
| 20000 | 5 | 1 |
~5× at the 20 000-artist fan-out; ≤5 k is already ≤2 ms on both sides.
cargo test -p musefs-core --release --test bench_refresh \
bench_refresh_root_fanout_one_across_library_sizes -- --ignored --nocapture
PR2 — Scan pair (#67/#68)
2d4faf3^ → 2d4faf3. bench_ingest, --features metrics, RAM, 3 runs.
No overlay.
What changed: (#67) gate the 128-byte ID3v1 tail read to .mp3 files — only
MP3 consumes the frame; (#68) ingest_bulk drains the owned Unit batch by
value, moving picture payloads into the DB structs instead of cloning.
Wall time — ci tier (200 tracks × 4 KiB, no art), median of 3
| format | before (ms) | after (ms) |
|---|---|---|
| flac | 29 | 30 |
| mp3 | 21 | 23 |
| m4a | 27 | 26 |
| m4a-last | 32 | 26 |
| ogg | 22 | 24 |
| wav | 21 | 24 |
Wall time is within run-to-run noise — at ci tier (4 KiB files, no embedded art) there is no picture payload to move, so #68's win doesn't show here. It appears on art-bearing corpora (the bandwidth tier / real libraries) where the clone was O(art-size) per file.
Scan I/O — the #67 signal (scan_bytes_read)
| format | before (B) | after (B) | Δ total | Δ per file |
|---|---|---|---|---|
| flac | 870 600 | 845 000 | −25 600 | −128 B |
| mp3 | 847 200 | 847 200 | 0 | 0 (tail still read) |
| m4a | 0 | 0 | 0 | n/a (seek-reader path) |
| m4a-last | 0 | 0 | 0 | n/a |
| ogg | 873 000 | 847 400 | −25 600 | −128 B |
| wav | 853 600 | 828 000 | −25 600 | −128 B |
Non-MP3 formats drop exactly the 128-byte ID3v1 tail per file (−25 600 B over the 200-track corpus). MP3 is unchanged; M4A uses the seek-reader, not the front-anchored probe path.
MUSEFS_BENCH_TIER=ci MUSEFS_BENCH_DIR=/dev/shm/bench \
cargo test -p musefs-core --release --features metrics --test bench_ingest \
-- --ignored --nocapture bench_cold_scan_and_revalidate
PR3 — Serve-path copies (#70)
32be8f0^ → 32be8f0. Criterion read_throughput, RAM. No overlay.
What changed: four stacked serve-path copy eliminations — DB chunk readers
fill the caller's &mut [u8]; read_segments writes ArtImage/BinaryTag/raw
OggArtSlice arms into the output buffer's resized tail; Musefs::read_into
serves into a caller buffer; and the FUSE layer reuses a per-worker thread-local
scratch buffer. None touches synthesis or layout (served audio stays
byte-identical).
sequential_read
| format | before (µs) | after (µs) | Δ | verdict |
|---|---|---|---|---|
| flac | 939.8 | 924.8 | −2.1% | noise |
| mp3 | 917.2 | 884.1 | −3.1% | noise |
| m4a | 904.1 | 877.6 | −3.7% | noise |
| m4a-last | 909.8 | 860.3 | −7.4% | improved |
| ogg | 1080.4 | 963.4 | −9.1% | improved |
| wav | 925.6 | 815.7 | −11.1% | improved |
cold_first_read / seek_read / concurrent
| bench | before | after | Δ | verdict |
|---|---|---|---|---|
| cold_first_read/flac | 1.652 ms | 1.557 ms | −5.8% | improved |
| cold_first_read/mp3 | 1.590 ms | 1.678 ms | +5.5% | regressed (within 10%) |
| cold_first_read/ogg | 1.781 ms | 1.694 ms | −4.9% | improved |
| seek_read (all) | — | — | within ±2.7% | held |
| concurrent_read_walk/m16 | 9.490 ms | 7.642 ms | −19.5% | improved |
No format breaches the >10% rise gate. The concurrent burst metric improves 19% here (it is high-variance and swings run-to-run; see SP3).
cargo bench -p musefs-core --bench read_throughput -- \
sequential_read concurrent_read_walk cold_first_read seek_read
#136 — HeaderCache → quick_cache
2e6674e^ → 2e6674e. Criterion read_throughput, RAM. No overlay.
What changed: an S3-FIFO byte-weighted quick_cache replaces the hand-rolled
16-shard Mutex LRU — the serve path's last shared std lock is gone.
At a glance: within noise. No workload regresses outside noise; the only movers are marginal sequential_read improvements on the metadata-light formats.
| bench | before | after | Δ | verdict |
|---|---|---|---|---|
| sequential_read/m4a | 851.1 µs | 794.7 µs | −6.6% | improved |
| sequential_read/m4a-last | 855.2 µs | 798.3 µs | −6.7% | improved |
| sequential_read/ogg | 1.043 ms | 962.9 µs | −7.7% | improved |
| sequential_read/flac,mp3,wav | — | — | within noise | held |
| cold_first_read (all) | — | — | within noise / −3.6% m4a | held |
| seek_read (all) | — | — | within noise | held |
| concurrent_read_walk/m16 | 5.557 ms | 5.451 ms | −1.9% | held |
cargo bench -p musefs-core --bench read_throughput
#112 — StructureOnly kernel passthrough
0881b31 → faec017. Bespoke dd harness (committed:
benches/passthrough_dd.sh), sudo (passthrough
needs CAP_SYS_ADMIN).
What changed: the backing fd is registered at open (FUSE passthrough, kernel ≥6.9); the kernel serves StructureOnly reads directly from the backing inode, bypassing the daemon round-trip.
512 MiB WAV backing on /dev/shm (RAM-cached, isolates FUSE-path overhead),
dd bs=1M sequential read, fresh mount per binary, 3 runs each:
| run 1 | run 2 | run 3 | median | |
|---|---|---|---|---|
| before (daemon reads) | 2.5 GB/s | 2.5 GB/s | 2.7 GB/s | 2.5 GB/s |
| after (passthrough) | 8.4 GB/s | 8.3 GB/s | 8.9 GB/s | 8.4 GB/s |
3.36× on this RAM-cached sequential workload: the before path round-trips
every ~128 KiB chunk through the daemon (wakeup + positioned read + copy back via
/dev/fuse); the after path reads straight from the backing inode's page cache.
sudo benches/passthrough_dd.sh target/release/musefs /dev/shm/pt 512
Cumulative detail
16caba4 → current main (e02223e). Derived, non-isolating — composed from
the per-pass isolated deltas above, anchored to current-main absolutes. A
same-harness end-to-end run is infeasible: MountConfig.case_insensitive and
scan_directory_with/ScanOptions/revalidate_with don't exist at 16caba4
(so main's harnesses can't compile there), and the 16caba4-era harness omits
the now-required case_insensitive field (so it can't compile on main either).
The deltas below name the contributing passes and the dominant one; unrelated
speedups are not multiplied into a single headline.
Current-main absolutes (1 run, native harness)
Ingest — ci tier, /data, bench_ingest:
| format | scan (ms) | revalidate (ms) | RSS (KiB) |
|---|---|---|---|
| flac | 47 | 2 | 6900 |
| mp3 | 25 | 2 | 6944 |
| m4a | 55 | 2 | 6956 |
| m4a-last | 39 | 3 | 6980 |
| ogg | 20 | 2 | 6980 |
| wav | 25 | 3 | 6984 |
Refresh — RAM, bench_refresh_one_across_library_sizes: refresh-1 @ 100 / 1000
/ 5000 = 0 ms; @ 20 000 = 1 ms.
Serve — RAM, read_throughput (Criterion median): sequential_read flac 569 µs
· mp3 563 µs · m4a 566 µs · m4a-last 568 µs · ogg 737 µs · wav 598 µs;
cold_first_read ogg 1.507 ms; seek_read ogg 806 µs; concurrent m16+walker 4.15 ms.
Composed per-subsystem deltas
Ingest = SP1 ∘ PR2. Dominated by SP1's durable-fsync elimination; PR2 is the −128 B/file + move-not-clone refinement.
| metric | pre-SP1 | current main | Δ |
|---|---|---|---|
| fsync count (latencyfs) | 403 | 0 | eliminated |
| scan_wall (ci flac) | 32 206 ms | 47 ms | ~685× |
| scan_wall (bandwidth flac) | 378 041 ms | ~15 228 ms† | ~24.8× |
| scan_bytes_read (ci flac) | 870 600 B | 845 000 B | −128 B/file |
† Bandwidth tier not re-measured at main; figure is SP1's after number.
Refresh = SP2 ∘ #69 ∘ #114. The O(N)→flat journey; dominant pass is #69 (changelog-driven O(changed) rebuild), with #114 shaving the 20 k root fan-out on top.
| metric | pre-SP2 | current main | Δ |
|---|---|---|---|
| refresh-1 @ 1000 | 5 ms | 0 ms | ∞ (sub-ms) |
| refresh-1 @ 5000 | 32 ms | 0 ms | ∞ (sub-ms) |
| refresh-1 @ 20000 | 173 ms | 1 ms | ~173× |
Serve = SP3 ∘ SP4 ∘ PR3 ∘ #136. SP3 + PR3 drive the cross-format sequential/cold/seek wins (alloc elimination + copy reduction); SP4 owns the ogg cold/seek collapse.
| metric | pre-SP3 | current main | Δ |
|---|---|---|---|
| sequential_read/flac | 929 µs | 569 µs | −38.8% |
| sequential_read/mp3 | 940 µs | 563 µs | −40.1% |
| sequential_read/m4a | 940 µs | 566 µs | −39.8% |
| sequential_read/ogg | 967 µs | 737 µs | −23.8% |
| sequential_read/wav | 935 µs | 598 µs | −36.1% |
| cold_first_read/ogg | 14.96 ms | 1.51 ms | −89.9% |
| seek_read/ogg | 13.54 ms | 806 µs | −94.0% |
| concurrent m16+walker | 8.20 ms | 4.15 ms | −49.4% |
Criterion's own change: lines compare against the previous on-machine baseline
(itself already optimized); the absolutes above are the reliable end-to-end
signal.
Storage tunables
A proposed --storage-profile {ssd,hdd,nfs} preset would have bumped
--max-readahead-kib and --max-background (and enabled --keep-cache) per medium,
on the premise that "larger read-ahead hides HDD/NFS latency." Measured against real
storage, that premise does not hold — only --keep-cache shows a benefit — so the
preset was dropped and these flags keep their defaults. This section records the
evidence.
Methodology
Unlike the optimization passes above (tmpfs, in-process Criterion), these run through a real kernel mount with a real reader, because the tunables are kernel↔FUSE negotiation parameters invisible to an in-process driver:
- Backing: real RAID-1 HDD (
/home,/dev/md127) and a btrfs HDD span (/data,/dev/sda3); for NFS, a loopback NFSv4.2 export (exportfs+mount -t nfs localhost:…) whose backing is tmpfs (isolates the RPC tax) or HDD (RPC + seeks). - Latency:
tc qdisc add dev lo root netem delay <X>msaddsXper packet → ≈2XRTT per NFS RPC. Tested at 8 ms, 50 ms, and 200 ms RTT (the last ≈ a trans-Pacific server). - Cold reads:
sync; echo 3 > /proc/sys/vm/drop_cachesbefore each measured read — without it the page cache serves repeats and hides all backing latency. - Mode:
synthesis, notstructure-only. Structure-only triggers kernel FUSE passthrough when the process is privileged (these run as root), which serves the backing fd directly and bypasses the daemon read path — and with it every tunable that acts on that path. Synthesis splicesBackingAudioreads through the daemon, the real serving path. - Why not the injected
MUSEFS_FAULT_*_USmodel: it cannot show a read-ahead effect. FUSE delivers reads to the daemon in fixed ≤256 KiB chunks (max_pages, already pinned at the kernel's 1 MiB ceiling byfuser's 16 MiB defaultmax_write), so the per-preadcount — and thus any per-preadinjected latency total — is independent ofmax_readahead.
Reproduce: benches/storage_tunables_bench.sh (needs /dev/fuse, root, and for the
NFS rows nfs-kernel-server + tc). HDD numbers are noisy (±10–15%); the trends, not
the digits, are the signal.
--max-readahead-kib — no benefit anywhere; hurts on HDD
Cold single-stream sequential throughput (MB/s), synthesis:
| readahead KiB | HDD /home (RAID1) | HDD /data (btrfs) | NFS 8 ms | NFS-on-HDD 50 ms | NFS-on-HDD 200 ms |
|---|---|---|---|---|---|
| 512 (default) | 248 | 127 | 30.8 | 4.7 | 1.3 |
| 2048 | 191 | 72 | 30.6 | 4.9 | 1.3 |
| 4096 | 153 | 84 | 30.5 | 4.9 | 1.3 |
| 32 (probe) | 237 | 75 | — | — | — |
(File sizes differ per column — 512 MiB local, 96 MiB at 50 ms, 48 MiB at 200 ms — so compare within a column, not across. The 200 ms column ≈ a trans-Pacific server: flat to the last digit.)
The window size barely moves throughput, and on HDD values ≥2048 KiB are among the slowest (peak is ~128–512 KiB). The reason is visible on NFS: 512 MiB ÷ 256 KiB × 8 ms ≈ 16 s ≈ the observed 31 MB/s — a single stream is served serially, one ≤256 KiB read at a time, each paying the full RTT, with no prefetch overlap that a larger window could exploit.
--max-background — no effect on read throughput
Wall time (s) for N concurrent cold streams over distinct tracks:
| max_background | HDD /home (16 streams) | NFS 8 ms (16) | NFS-on-HDD 50 ms (80) | NFS-on-HDD 200 ms (24) |
|---|---|---|---|---|
| 64 (default) | 4.55 | 5.16 | 177.8 | 238.5 |
| 128 | 5.05 | 5.18 | 175.7 | 237.4 |
64 ≈ 128 even with 80 > 64 streams. Expected: musefs's FuseConfig notes
max_background caps background work and that "foreground reads are bounded only by
client concurrency, not by this." The concurrent reads here are foreground.
(Concurrency does hide latency — 16 NFS streams reach ~10× single-stream aggregate —
but that is client parallelism, which max_background does not gate.)
--keep-cache — the one real win (~3×)
Cold read then immediate reopen (no cache drop between); reopen_s is the signal:
| keep_cache | HDD reopen (s) | NFS 8 ms reopen (s) | NFS-on-HDD 50 ms reopen (s) |
|---|---|---|---|
| false | 0.224 | 0.207 | 0.039 |
| true | 0.062 | 0.060 | 0.014 |
With --keep-cache the kernel retains the page cache across opens, so a re-opened file
is served from RAM instead of re-fetched over slow storage — ~3× faster reopen,
consistent across HDD and NFS. This is the only tunable worth changing for slow backing
(relevant for players/scanners that re-open files), and it needs no preset. It is on by
default as of #432 (inode invalidation on retag keeps it consistent); disable with
--keep-cache false on memory-constrained hosts.
Conclusions
- Drop the
--storage-profilepreset. Of the four knobs it would have set, three (max_readahead,max_background, and by extension a per-medium combination of them) show no benefit;max_readahead≥2048 KiB actively hurts on HDD. The only justified change — enable--keep-cacheon HDD/NFS — does not need an abstraction. - Single-stream latency hiding — addressed in #255 (next section). The serialized read path measured above (512 MiB ÷ 256 KiB × RTT) is exactly what backing read-ahead now fixes.
Backing read-ahead (#255)
Each --max-readahead-kib row above exposed the real bottleneck: a single stream is
served one ≤256 KiB FUSE chunk at a time, each paying the full backing RTT, so a
200 ms-RTT NFS mount tops out at ~1.3 MB/s regardless of the kernel read-ahead window.
The fix is read amplification in the daemon — BackingReader coalesces a stream's
small reads into one large positioned pread (geometric window growth, global RAM budget
with LRU eviction), so the backing client can pipeline/parallelize the RPCs behind one
syscall. A background-prefetch-threads layer ("Phase 2") was also built but is off by
default (see below).
Methodology
Two harnesses. Real kernel mount (benches/storage_tunables_bench.sh): a real reader
(dd) over a real FUSE mount, cold (drop_caches) each sample, median of 3. Local backing
on a btrfs HDD; NFS via a loopback NFSv4.2 export plus tc netem for RTT. The corpus
is real FLAC (MUSEFS_BENCH_CORPUS_SRC) — a /dev/zero corpus on a compressing fs
(btrfs compress=zstd) collapses to a cached extent and never touches the platter, which
silently inverts the HDD numbers; real already-compressed audio is incompressible.
In-process (musefs-core/tests/bench_ingest.rs::bench_read_under_latency): the core read
path over musefs-latencyfs (per-op injected latency), isolating the daemon from the kernel
FUSE layer. off = --read-ahead-budget-mib 0; phase1 = the default (amplification only);
phase1+2 = --read-ahead-prefetch.
Single-stream cold throughput (MB/s)
| backing | off | phase 1 (default) | phase 1+2 | passthrough |
|---|---|---|---|---|
| local HDD (btrfs, real FLAC) | ~60 | ~62 | ~60 | ~58 |
| NFS, tmpfs-backed, 200 ms RTT | 1.2 | 7.4 | 6.8 | 9.8 |
On NFS read-ahead is a ~6× single-stream win (1.2 → 7.4 MB/s, 75 % of the kernel-passthrough
ceiling). On a real local HDD all four configs sit within run-to-run noise (~±15 %) — read-ahead
is neutral, not a regression. (An earlier /dev/zero corpus showed a spurious −35 %; it was
the zstd-compression artifact above, not read-ahead.)
Concurrent streams (8 × distinct tracks, aggregate MB/s, NFS 200 ms RTT)
| off | phase 1 (default) | phase 1+2 | passthrough |
|---|---|---|---|
| 1.6 | 13.6 | 12.1 | 16.3 |
In-process, per-op latency (16 MiB Ogg whole read; wall ms / backing preads)
| profile | off | phase 1 (default) |
|---|---|---|
| ssd (80 µs/op) | 45 ms / 774 preads | 26 ms / 32 preads |
| nfs-ssd (600 µs/op) | 138 ms / 774 | 112 ms / 32 |
Amplification collapses 774 backing round-trips to 32; the win scales with per-op latency and is already material at SSD speeds (1.7×).
Phase 2 is off by default
Background prefetch threads (Phase 2) never beat amplification alone and cost a consistent
~10 %: single-stream NFS 6.8 vs 7.4, concurrent NFS 12.1 vs 13.6, neutral on HDD. A single large
pread already lets the NFS client pipeline its RPCs, so the threads add coordination overhead
without overlap to exploit. Phase 2 is therefore opt-in (--read-ahead-prefetch), retained for
hypothetical backends where one large read does not self-pipeline.
Defaults: read-ahead on at --read-ahead-budget-mib 64, Phase-1 amplification only. Set
0 to disable on local-disk-only setups (no benefit there, though no harm either).
Internal window cap on HDD (#433)
The amplification window doubles per sequential read up to WINDOW_ABS_CAP (8 MiB,
musefs-core/src/readahead.rs). The #256 sweep above measured the kernel max_readahead
knob — where ≥2048 KiB hurts on HDD — but never this daemon-internal cap, so #433 asked whether
8 MiB is too large for spinning media.
Methodology. WINDOW_ABS_CAP is a compile-time const, so the sweep builds one release binary
per value (benches/storage_tunables_bench.sh window-cap, which patches the const in place and
restores it after each build). Cold (drop_caches) single-stream reads of the same ~270 MiB real
FLAC, synthesis mount, default flags (amplification on, prefetch off), real backing on a btrfs
HDD (/data, 4389-track corpus). Reproduce:
WINDOW_CAP_MIB="1 2 4 8 16" MUSEFS_BENCH_CORPUS_SRC=<music-tree> MUSEFS_BENCH_CORPUS_MAX_MIB=300 \
benches/storage_tunables_bench.sh window-cap <hdd-backing-dir>
Result: no measurable cap effect — the medium's noise dominates. Median MB/s (and the within-cap min–max over 7 cold samples) overlap across every cap, and the apparent ordering is an artifact of measurement order, not the cap: throughput drifts down through each run, so whatever runs first looks fastest. Sweeping the caps in the reverse order reverses the "trend".
| cap (MiB) | ascending sweep, median (min–max) | descending sweep, median |
|---|---|---|
| 1 | 104 (65–132) | 46 |
| 2 | 85 (54–142) | 58 |
| 4 | 83 (51–104) | 61 |
| 8 | 61 (54–72) | 61 |
| 16 | 68 (54–94) | 86 |
The current default (8 MiB) lands at ~61 MB/s in both orderings; the first-measured cap is fastest in both (104 for cap 1 ascending, 86 for cap 16 descending). The within-cap spread (≈50–140 MB/s) dwarfs every between-cap median gap. This corroborates the #256 finding that backing read-ahead is neutral on local HDD.
Decision: keep 8 MiB, no runtime knob. There is no HDD gain to capture, and the cap exists for
the proven case — the ~6× single-stream amplification win on high-RTT NFS/remote, where coalescing
into one large pread lets the client pipeline RPCs. Lowering the cap to chase an unmeasurable HDD
effect would regress that win.
Global allocator — steady-state RSS (#360)
Long-lived high-churn FUSE load fragments glibc malloc, growing daemon RSS over
days without a true leak. The musefs binary now defaults to the jemalloc
global allocator with a background purge thread. Measured with
scripts/rss-churn-bench.sh (Linux; median VmRSS over the flattened tail —
steady state, not peak).
Parameters: WORKERS=8 (nproc), FILES=500, CYCLES=200, WARMUP=20, no
REFRESH_CMD. DB = a freshly-scanned 4427-track store on tmpfs (/tmp); backing
audio on /data (HDD). Concurrent cat-to-/dev/null churn drives the
open/read/release handle-table and read-synthesis allocation path.
| Allocator | Steady-state RSS |
|---|---|
| system malloc | ~74.7 MiB (76496 kiB) |
| jemalloc | ~28.7 MiB (29368 kiB) |
Decision: SHIP jemalloc. Steady-state RSS is ~62% lower (jemalloc ≤ system malloc, the §4 ship rule). Under identical churn glibc retained ~46 MiB of dirty pages that jemalloc's decay + background purge return to the OS — the #360 fragmentation failure mode, reproduced and fixed. The gap is far outside run-to-run noise, so no within-noise tie-break was needed.
Scan fingerprint overhead (#464)
Bench: cargo bench -p musefs-core --bench fingerprint_overhead
Corpus: 200 minimal FLAC files (~200 B metadata + 4 KiB audio) in tempfile::tempdir() (TMPDIR =
tmpfs/RAM). Single-threaded scan (jobs: 1). Criterion, 20 samples.
| Tier | Median (ms) | µs/file |
|---|---|---|
None | 47.0 | 235 |
Fingerprint | 107.6 | 538 |
Delta: +60.6 ms / 200 files = +303 µs/file overhead (+129%).
Interpretation: The 129% overhead on this synthetic RAM-backed bench exceeds the plan's ≤15%
threshold. The overhead is dominated by the extra UPDATE tracks SET fingerprint = … SQLite
execution per file inside the batch transaction — not by SHA-256 hashing cost (SHA-256 of a few
hundred bytes is sub-microsecond). On a real HDD-backed library the probe I/O (tens-of-ms per file)
is the bottleneck, making both the hash and the DB write negligible. See plan Task E2 step 3 decision
note: the decision to keep SHA-256 and add the length CHECK was escalated to the controller because
the raw percentage exceeded the stated threshold, even though the absolute overhead (303 µs/file) is
operationally negligible at disk I/O rates.
Scan fingerprint overhead — SSD latency profile (#464)
Bench: bench_scan_under_latency in musefs-core/tests/bench_ingest.rs (MUSEFS_BENCH_LATENCY_PROFILE=ssd).
Corpus: 200 minimal FLAC files (~200 B metadata + 4 KiB audio) on a musefs-latencyfs SSD-latency
FUSE mount. Default thread count (jobs: 0). 3 runs each, median reported.
| Tier | Median (ms) | µs/file |
|---|---|---|
None | 222 | 1110 |
Fingerprint | 241 | 1205 |
Delta: +19 ms / 200 files = +95 µs/file overhead (+8.6%).
Interpretation: Under an SSD latency profile the I/O dominates and the fingerprint overhead drops to +8.6% (+95 µs/file), well within the plan's ≤15% threshold. The RAM bench's +129% (+303 µs/file) was an artefact of RAM eliminating the I/O that would normally dwarf the extra SHA-256 hash and DB write. At real SSD rates the fingerprint cost is operationally negligible.
Release notes
Curated, upgrade-focused notes for each release. For the exhaustive,
per-change list see the Changelog; for the external-writer
contrib/ packages (which version independently) see the
contrib changelog.
v1.1.0
A feature-and-hardening release on top of the v1.0.0 stable line. No CLI flags or store columns were removed, but the on-disk schema steps to version 2 and a few defaults change observable behavior — read Upgrading from v1.0.0 before you update an existing store.
Highlights
- Runtime telemetry. An opt-in
--expose-metrics(envMUSEFS_EXPOSE_METRICS) surfaces a synthetic.musefs-metrics/directory at the mount root whosemetricsfile renders Prometheus-format counters for getattr/read/open activity, backing read-ahead behavior, and (with the jemalloc build) allocator stats. Off by default. See Tuning & metrics. - Scan progress indicator.
scanandscan --revalidaterender a live progress bar on an interactive terminal and fall back to periodicingested N/M (P%)lines when output is redirected. A new--quiet/-qsuppresses it. --skip-on-missingtemplate flag. Opt-in (envMUSEFS_SKIP_ON_MISSING): drops a track from the mount when a top-level template field stays unresolved, instead of substituting--default-fallback. The motivating case is--template '$!{beets_path}' --skip-on-missing, hiding tracks beets left without abeets_pathrather than collapsing them into anUnknownbucket.--read-ahead-prefetchflag. Opt-in background prefetch threads layered on read amplification, default off — benchmarks found amplification alone delivers the read-ahead win, so enable this only when profiling a backend where a single large read does not self-pipeline.- riscv64 release platform. Prebuilt
riscv64gc-unknown-linux-{gnu,musl}binaries andlinux/riscv64Docker images now ship with each tagged release. Container bases moved to current stable (Debian trixie, Alpine 3.23). statfsreply. The mount now reports a synthetic non-zero capacity with ample free space, sodfno longer shows a 0-byte filesystem and capacity-checking importers (Lidarr et al.) no longer balk.- Per-extension skip breakdown. End-of-scan summary breaks the
skippedcount down by lowercased extension (e.g.skipped 42: jpg=20, cue=10, log=8) so a large skip count is diagnosable. Log-only; the counters are unchanged. musefs vacuum. A maintenance command that compacts the SQLite store — reclaiming the free pages that prunes, orphan-art GC, and the migration leave behind — and reports the space reclaimed. Run it while unmounted. See Maintenance.
Plus a substantial round of correctness and robustness fixes across the read fast path (rowid-reuse consistency for art segments), the MP4/QuickTime metadata walk, ID3 synthesis, and the prune/delete paths — see the Changelog for the full list.
Upgrading from v1.0.0
1. Back up your store. The schema migration below is one-way. While no scan
or external writer is touching the database, copy musefs.db (and its -wal /
-shm sidecars if present). A v1.0.0 binary has no guard against a newer store
and may misread one that has been migrated, so keep the backup if you might roll
back. From v1.1.0 onward a binary instead refuses to open a store whose
schema is newer than it understands, with a clear error.
2. Automatic schema migration (user_version 1 → 2). The first time a
v1.1.0 binary opens the store — for example musefs scan — it migrates in a
single transaction. The migration:
- Adds scanner-owned
tracks.fingerprintandtracks.content_hashcolumns (nullable SHA-256 hex, non-unique by design) plus afingerprintindex. They startNULLand are populated on the next scan; external writers do not set them. - Rebuilds the
tagstable so the 256 KiBvaluecap counts bytes rather than characters (the v1CHECKwas up to ~4× looser for multibyte text). Any row that was already over the byte cap is dropped in the rebuild (this only reaches genuinely pathological data — a single tag value larger than 256 KiB of bytes, which a real library never has, and such rows were already unreadable under the byte-counting read guard anyway; in practice no store is affected).
The migration applies automatically the first time a v1.1.0 binary opens the
store, but you should still run musefs scan --db <store> once after upgrading:
that is what populates the new fingerprint / content_hash columns, which the
scanner's content-identity refind logic relies on. Then remount. See
The SQLite store for the full schema contract.
3. Behavior changes to check.
scanexit code.scan/scan --revalidatenow exit2when any file fails to parse or ingest (previously always0on a non-fatal run). A clean scan still exits0; a hard error still exits1. Pipelines that key off the exit status — e.g.musefs scan … && musefs mount …— will now correctly stop on a partial-ingest failure; update any script that assumed0.--fallbackkeys are case-insensitive. A per-field--fallback AlbumArtist=…(or any non-lowercase key) is now matched against the template field instead of silently never applying. If you worked around the old bug by lowercasing keys, no change is needed; uppercase keys now take effect.dfon the mount now shows a synthetic capacity instead of zeros.- Extended attributes (
getxattr/setxattr/…) now returnENOTSUPexplicitly on the read-only mount; the caller-visible result is unchanged, but the per-probe[Not Implemented]warning is gone.
4. External writers (beets, Picard, Lidarr, python-musefs) version
independently and need no change for this upgrade: the new fingerprint /
content_hash columns are scanner-owned and nullable, so the external-writer
contract is unchanged. Update those packages on their own cadence.
Earlier releases
For v1.0.0 and earlier, see the Changelog.
Changelog
All notable changes to this project are documented here. The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
The
contrib/Python packages have their own decoupled version and changelog: see the contrib changelog.
For curated, upgrade-focused notes (highlights and per-version migration steps), see the Release notes.
Unreleased
1.1.0 - 2026-06-17
Added
- Runtime telemetry (
.musefs-metrics): an opt-in--expose-metricsflag (envMUSEFS_EXPOSE_METRICS) surfaces a synthetic.musefs-metricsfile at the mount root rendering Prometheus-format counters — getattr/read/open activity, backing read-ahead behavior, and (when built with jemalloc) allocator stats. Off by default; the file is absent unless enabled. See the README Metrics section (#394). - Scan progress indicator:
scanandscan --revalidaterender a live progress bar (indicatif) with an elapsed-time summary on an interactive terminal, falling back to periodicingested N/M (P%)log lines when output is non-interactive. A new--quiet/-qflag suppresses it (#406). --skip-on-missingtemplate flag: an opt-in--skip-on-missing(envMUSEFS_SKIP_ON_MISSING) drops a track from the mount when a top-level template field stays unresolved, instead of substituting--default-fallback. Per-field--fallbackchains and[...]optional sections are unaffected (a field resolved via its fallback counts as present). The motivating case is--template '$!{beets_path}' --skip-on-missing, which hides tracks beets left without abeets_pathrather than collapsing them into anUnknownbucket (#408).--read-ahead-prefetchflag: opt-in background prefetch threads layered on top of read amplification, default off — benchmarks found amplification alone delivers the entire read-ahead win, while the threads add ~10% overhead with no measured benefit. Enable only when profiling a backend where a single large read does not self-pipeline (#255).- riscv64 release platform: prebuilt
riscv64gc-unknown-linux-{gnu,musl}binaries andlinux/riscv64Docker images now ship with each tagged release. Container bases bumped to current stable: glibc Debian bookworm → trixie (bookworm has no riscv64 image), musl Alpine 3.20 → 3.23 (3.20 is end-of-life). statfsreply: the mount now reports a non-zero synthetic capacity with ample free space instead of fuser's all-zero default, sodfno longer shows a 0-byte filesystem and capacity-checking importers (Lidarr et al.) don't balk (#368).- Per-extension skip breakdown: at end of scan, a summary line breaks the
skippedcount down by lowercased extension (e.g.skipped 42: jpg=20, cue=10, log=8, <none>=4), logged atwarnso it shows by default, so a large skip count is diagnosable — expected sidecars versus genuinely unexpected files. Log-only; theScanStatsstruct and CLI summary are unchanged (#341). musefs vacuumcommand: compact the SQLite store, reclaiming free pages left by prunes, orphan-art GC, and the schema migration. RunsVACUUM+ a WAL checkpoint and reports the space reclaimed; run it while unmounted (#566).
Fixed
- Art/serve rowid-reuse consistency: the read fast path's WAL-snapshot +
content_versionguard, previously gated only on binary-tag layouts, now covers all DB-rowid segments (artArtImage/OggArtSlicetoo) viaRegionLayout::streams_db_rowid, and the stateless no-fh read fallback now applies the same snapshot/recheck and re-validates its freshly opened backing fd against the resolved stamp. A concurrent external retag +gc_orphan_art+ reinsert can no longer splice a wrong image or stale tag bytes mid-read (the audio-bytes invariant was never affected) (#502, #503). - Per-field
--fallbackcase-insensitivity: fallback keys are now ASCII lowercased to match template field names, so--fallback AlbumArtist=…(any uppercase) is honored instead of silently never matching (#504). - Tag value byte cap: both the schema
CHECK(rebuilt in theMIGRATION_V2upgrade) and the read-timetags.valueguard now count bytes, not UTF-8 characters, so the 256 KiB materialized-memory bound is exact rather than up to ~4x looser for multibyte text. The upgrade drops any pre-existing over-cap rows (already unreadable under the byte-counting reader guard) (#505). - Embedded NUL in ID3 metadata: synthesized ID3 frames now reject a DB-sourced tag key, tag value, art mime, or art description containing an embedded NUL instead of emitting a frame a downstream parser would misread (#506).
- Orphan-art GC NULL safety:
gc_orphan_artusesNOT EXISTSrather thanNOT IN (subquery), so a NULLart_idcould not silently turn the GC into a no-op (#507). - Mount usability:
mountnow warns when the mountpoint is non-empty (its contents are shadowed for the mount's lifetime), and a permission-denied mount (e.g. an AppArmor-restricted prefix) prints actionable guidance instead of a bare "Permission denied" (#508, #509). - Silent mp4 oversize drops: oversized embedded
covrcover art and binary freeform (----) values in.m4a/.m4bfiles are skipped in the format layer before materialization (to avoid building a large image out of a largemoov), which previously dropped them with nothing in the logs. The scan now emits awarnline for each, matching the logging the other formats already had (#343, follow-up to #284). - xattr log noise:
getxattr/listxattr/setxattr/removexattrnow replyENOTSUPexplicitly (read-only filesystem, no extended attributes) instead of falling through to fuser's default, which logged a[Not Implemented]warn on every xattr probe (ls -l, indexers, backup tools). The caller-visible result is unchanged (#364). - MP4 path-to-
ilstleniency: the walk tomoov/udta/meta/ilstnow uses the same lenient box scan as the metadata extractors, so a single malformed or truncated sibling box anywhere on the path no longer suppresses an otherwise well-formedilstand silently drops every tag and cover. The audio/structure path stays strict (#542). - QuickTime bare
metaatoms: themetaparser only consumes the 4-byte FullBox version/flags prefix when it is actually present (a zero word), so a QuickTime-style baremeta— which has no such prefix — is read instead of landing mid-header and dropping all tags and art (#543). scanexit code on ingest failure:scan/scan --revalidatenow exit2when any file fails to parse/ingest (failed > 0), instead of always exiting0. A pipeline such asmusefs scan … && musefs mount …can now detect a partial or total ingest failure; a clean scan still exits0and a hard error still exits1(#554).- Release smoke audio-bytes check:
scripts/smoke-binary.sh(the per-arch release gate) now compares the served file's encoded audio stream against the untouched backing file, asserting the cardinal byte-identical-audio invariant rather than only checking thefLaCmagic — so a target-specific positioned-read or offset regression in a cross-compiled binary is caught (#547).
1.0.0 - 2026-06-12
First stable release.
Added
- Lidarr integration: a new
contrib/lidarr/package that drives symlink-based placeholder imports and syncs Lidarr metadata into the musefs SQLite store. - FUSE mount-access controls: new
--allow-other,--owner, and--groupflags mount withallow_other+default_permissionsso accounts other than the mounting user can reach the view and the presented owner/group/mode bits are enforced;--owner/--groupimply--allow-other. A non-rootallow_othermount is pre-flight checked against/etc/fuse.confuser_allow_otherand fails early with guidance if it is missing. See the README Ownership and permissions section (#293, #294). - Hardened deployment assets: the container image runs as a dedicated
unprivileged user with a build-arg-configurable UID/GID, and the
musefs-scan.servicesystemd unit ships a strong sandbox (the FUSE-mountingmusefs.servicedeliberately cannot be sandboxed). See the systemd hardening notes (#317, #318, #319). - crates.io distribution: the
musefsbinary is published to crates.io as of this release and installable withcargo install musefs. A new thinmusefswrapper crate owns the binary (musefs-cliis now a library crate), and a tag-triggered release workflow publishes all crates in dependency order. - Fuzzing & property tests: coverage-guided
cargo-fuzztargets for every format parser (FLAC, MP3, MP4, Ogg, WAV), the byte-level primitives (Ogg page parsing, base64 windowing, VorbisComment), and the serve path — the latter drives the full synthesis pipeline over hostile DB rows and binary tags via a fuzzing-gatedDb::with_raw_conn. Plusproptestinvariants — panic-freedom, the byte-identical audio guarantee, and tag round-trip — an end-to-end read-fidelity property, and amutageninterop test asserting an independent reader sees the tags we synthesize.
Changed
mount --dbnow requires an existing store. Mounting against a missing database path is rejected before any FUSE setup instead of silently creating and migrating an empty store, so a mistyped--dbfails loudly rather than mounting an empty view.scan --dbstill creates the store if absent (#309).
Fixed
- Scanner no longer drops files and embedded art silently: embedded cover
art over
MAX_ART_BYTES(and binary tags overMAX_BINARY_TAG_BYTES) were filtered out at ingest with no log line, so a track whose art exceeded the cap appeared to simply have none — indistinguishable from a scan bug. The drop is now logged (RUST_LOG=warn). Likewise, a supported-extension file that fails to parse or errors mid-probe was countedfailedwith the underlying error discarded; the reason is now logged. Note: oversized art in.m4a/.m4bfiles is dropped earlier, inside the format layer, and is not yet logged (#284, #343). - Lidarr custom-script env var casing: Lidarr stores custom-script
environment variables in a .NET
StringDictionary, which lowercases every key, so a Linux script actually receiveslidarr_sourcepath/lidarr_eventtyperather than the PascalCase names Lidarr's docs list. The integration read the PascalCase names, so with a real Lidarr every import failed and every event parsed as unsupported. Lidarr env vars are now resolved case-insensitively. Found by the issue #141 real-instance smoke run. - VorbisComment parse OOM (DoS): a crafted comment block declaring a huge
entry count made
Vec::with_capacityattempt a multi-gigabyte allocation; the pre-allocation is now bounded by the readable byte count. Found by the newvorbiscommentfuzz target. - MP4 box-bounds integer overflow: an untrusted 64-bit extended box size made
the box-bounds check (
pos + total) overflowusize— a panic in debug and a silent wrap in release that accepted a bogus box length. The addition is now checked. Found by themp4fuzz target. - ID3v2 parsing unbounded allocation (DoS): the
id3crate eagerly allocates a frame's declared size (ID3v2.3 frame sizes are plain 32-bit, up to 4 GiB), so a crafted tag could exhaust memory at scan time — via an MP3 or a WAV embeddedid3chunk. Parsing is now gated on validated ID3v2 frame bounds and an ID3v2 tag at offset 0 (theid3reader scans forward). Found by themp3andwavfuzz targets. - Scan counters now match their documented contract:
musefs scanreports every non-audio file (any unsupported or missing extension —.jpg,.cue,.log,.nfo, cover art, etc.) asskipped, and supported-extension files that fail to parse (e.g. a corrupt.flac) asfailed. Previously malformed files were miscounted asskippedand unsupported files were not counted at all, so expectskippedto be larger than before on a real library (#301). - Symlink scans no longer double-count: with
--follow-symlinks, a file reached via both its real path and a symlink is ingested and counted once instead of inflatingscanned; multiple hardlinks to the same inode are likewise collapsed to a single track (#302). - Stable inodes on case-insensitive mounts: the inode allocator is now keyed on the case-folded path in case-insensitive mode, so an unrelated deletion that flips a merged directory's display casing no longer reassigns a survivor's inode (#305).
- Lidarr autoscan now honors the scan timeout: an import/release-triggered
autoscan applies the shared 120s scan timeout, matching the beets and Picard
integrations, so a wedged
musefs scanfails with a controlled timeout instead of blocking the custom-script process indefinitely (#312).
0.2.0 - 2026-05-27
First public release.
Added
- Formats: synthesis for M4A/M4B (MP4), Ogg (Opus, Vorbis, FLAC-in-Ogg), and WAV, alongside the existing FLAC and MP3 — metadata generated on the fly from the SQLite store and spliced in front of byte-identical backing audio.
- Arbitrary tag support: a single canonical tag vocabulary maps common fields
to each format's native slot (ID3 frame / MP4 atom / Vorbis field); any other
tag round-trips through the format's extension slot (ID3
TXXX, MP4----freeform, raw Vorbis field). User-defined key casing is preserved. - beets plugin (
contrib/beets/): syncs beets' canonical tags and cover art into the store keyed by each file's real path, with no remount and no audio rewrite. - Performance, concurrency & caching pass: worker-pool offload of blocking
reads, lock-free virtual-tree swap, per-handle I/O, a bounded LRU header-layout
cache, debounced single-flighted refresh with stable inodes, kernel/mount
tuning flags, bounded-memory MP4 resolves, and opt-in
--keep-cachewith auto-invalidation.
Notes
- Read-only mount; tag edits happen out-of-band against the SQLite store and are
picked up automatically (
PRAGMA data_versionpolling). See the README Supported formats section and the per-format docs for round-trip limitations.
0.1.0
- Initial MVP (FLAC and MP3 synthesis, virtual tree with beets-style templates,
synthesis/structure-onlymount modes, auto-refresh,scan/scan --revalidate). Never published publicly; superseded by 0.2.0.
Security
Security Policy
Supported versions
Security fixes target the latest release (see CHANGELOG.md); there are no maintained backport branches.
Reporting a vulnerability
Please report vulnerabilities privately via GitHub's security advisory form: github.com/Sohex/musefs/security/advisories/new. Do not open a public issue for an undisclosed vulnerability.
You can expect an acknowledgment within a few days. Confirmed issues are fixed as a priority, the fix is noted in the changelog, and you will be credited in the advisory unless you prefer otherwise.
What counts
musefs's primary threat surface is parsing untrusted media files: the scanner probes arbitrary bytes at scan time, and the serve path re-parses file fronts at resolve/read time. Anything a crafted file can do beyond "fail to scan with a controlled error" is in scope — memory unsafety, panics reachable from file contents, unbounded allocation, and hangs. Parser denial-of-service findings are real vulnerabilities here, not mere robustness bugs: several (a VorbisComment pre-allocation OOM, an MP4 box-bounds overflow, an ID3v2 allocation bomb) have been found by the project's fuzz targets and fixed — see CHANGELOG.md. Those fuzz and property suites run continuously (CONTRIBUTING.md); a fuzz reproducer is the ideal report attachment.
Also in scope: anything that lets a crafted database (the mount trusts its
--db only as far as the documented contract) or a hostile local writer
violate the read-only guarantee on backing files.