Introduction

A read-only FUSE filesystem that presents a re-tagged, reorganized view of your music library — without modifying or duplicating a single byte of the original audio. Fix tags, art, and folder structure in a SQLite store; the mount shows a clean library while your files stay exactly as they are.

What it's for

  • A clean view of a messy library. Your files keep their on-disk chaos; the mount presents one consistent, template-driven tree for players and media managers.
  • Tag editing without touching files. Edit the SQLite store (directly, or via the beets plugin, Picard plugin, or Lidarr integration) and the mounted view updates live — no remount, no rewrite, no re-rip anxiety.
  • Lossless-by-construction experimentation. Change your tags, try a different organization scheme, new cover art — the originals are physically read-only to the mount. Backing up a current library is as easy as copying the db file.
  • Hash-stable by construction. The mount never rewrites a byte, so each backing file's checksum is exactly what it was the day it arrived — anything verified by hash keeps verifying, and anything you're seeding keeps seeding, however aggressively you retag and reorganize the view on top.

Note: This project was built with AI. The general workflow was to use the superpowers skills to provide a framework. Claude Opus was used to write plans and specs which were then implemented by another model, primarily MiMo v2.5.

One of my goals in building this project was to "vibe code" something that was decisively not slop. I believe I've realized that objective and I hope that you take the project on its merits.

If you disagree, please let me know! I'd love to know where I came up short so I can improve things.

Status

All five formats ship with embedded cover art and binary-tag preservation. The serve path has been through a performance/concurrency hardening pass for real-world player and media-manager access against large libraries on HDD/SSD/NFS, and the parsers are continuously fuzzed. beets, Picard, and Lidarr plugins ship in contrib/. See the CHANGELOG for history.

Deeper reading: the architecture reference for how it works, the contributor guide for the development workflow.

Quick start

cargo install musefs    # compiles from source — needs a Rust toolchain,
                        # libfuse3-dev and pkg-config; prebuilt binaries
                        # and container images: see Installing

musefs scan ~/Music --db library.db        # ingest your library
mkdir -p ~/mnt/music
musefs mount ~/mnt/music --db library.db \
    --template '$albumartist/$album/$title'
# mount blocks until unmounted: fusermount3 -u ~/mnt/music (or Ctrl-C)

~/mnt/music now serves your library as Album Artist/Album/Title.flac — with each file's metadata generated fresh from the database, spliced in front of your original, untouched audio.

Installation

Three ways to get musefs: a prebuilt binary (no toolchain needed), building from source, or a container image. Whichever you pick, mounting needs a 64-bit FUSE-capable OS (Linux, FreeBSD, macOS) — see Platform support.

Important: Linux and FreeBSD are E2E tested. I don't have anything running macOS to test on, if you run this on one let me know if it works, or especially if it doesn't!

At present AMD64, AARCH64, and RISC-V 64 are supported. If you'd like 32-bit support please open an issue.

Prebuilt binaries

Each tagged release attaches static/portable Linux binaries for six targets:

TargetlibcNotes
x86_64-unknown-linux-gnuglibcPinned to glibc 2.17 — runs on essentially any current distro.
aarch64-unknown-linux-gnuglibcglibc 2.17 floor, ARM64.
x86_64-unknown-linux-muslmuslFully static — runs on Alpine / scratch containers.
aarch64-unknown-linux-muslmuslFully static, ARM64.
riscv64gc-unknown-linux-gnuglibcglibc 2.27 floor, RISC-V 64.
riscv64gc-unknown-linux-muslmuslFully static, RISC-V 64.

The *-musl build is statically linked, so it runs on any Linux host of that architecture regardless of libc — glibc distros (Debian/Ubuntu/Fedora) included, not just Alpine/musl. For mixed or containerized deployments it is the simplest choice: one binary you can drop onto a glibc host and an Alpine image alike.

Download the tarball for your target from the latest release, verify it, and extract:

sha256sum -c musefs-<version>-<target>.tar.gz.sha256
tar -xzf musefs-<version>-<target>.tar.gz   # yields ./musefs

Runtime requirements: the binaries mount via FUSE's fusermount3 helper, so the target needs the FUSE userspace tools and /dev/fuse:

  • Debian/Ubuntu: apt-get install fuse3
  • Alpine: apk add fuse3

No glibc/libfuse install is needed for the musl binaries beyond fuse3.

Note: On Ubuntu 24.04+ (libfuse ≥ 3.17) the fusermount3 AppArmor profile only permits unprivileged mounts under whitelisted prefixes ($HOME/**, /mnt, /media, /tmp, …). Mounting elsewhere fails with fusermount3: mount failed: Permission denied — see Mounting for the whitelist and the fix.

Building from source

cargo install musefs compiles the latest release; building needs a stable Rust toolchain (2024 edition) plus the FUSE headers (libfuse3-dev) and pkg-config. To install the latest development version instead:

cargo install --git https://github.com/Sohex/musefs musefs

The same fuse3 runtime requirement as the prebuilt binaries applies.

The binary uses jemalloc as its global allocator by default (it bounds resident memory for the long-lived mount daemon under heavy concurrent reads). Distribution packagers or anyone debugging memory with valgrind/heaptrack can build against the system allocator instead with cargo build -p musefs --no-default-features (or cargo install musefs --no-default-features).

Platform support

PlatformFUSEKernel passthrough (StructureOnly)Notes
LinuxYes (/dev/fuse + fusermount3, from the fuse3 package)Yes (6.9+, falls back to daemon serving otherwise)Full support.
FreeBSDYes (pure-rust /dev/fuse backend; fusefs kernel module, no libfuse)NoFull FUSE support.
macOS (FUSE-T)Best-effortNoCompiles and runs unit tests with macos-no-mount; mounted e2e is not yet validated.

On platforms without kernel passthrough, --mode structure-only still serves the original bytes, just through the daemon instead of the kernel.

Filename case-folding is platform-aware: --case-insensitive <true|false> defaults to true on macOS and false on Linux/FreeBSD. When enabled, filenames are compared case-insensitively — case-variant directories merge into one (first-seen casing wins) and case-variant files get a numeric suffix (e.g. Song (2)); case-insensitive mounts refresh via a full rebuild rather than the incremental fast path.

Running in containers

Container images

Each tagged release also publishes multi-arch images to the GitHub Container Registry:

ImagelibcPlatforms
ghcr.io/sohex/musefs:<version>, ghcr.io/sohex/musefs:latestglibcamd64, arm64, riscv64
ghcr.io/sohex/musefs:<version>-musl, ghcr.io/sohex/musefs:muslmuslamd64, arm64, riscv64

docker pull selects the CPU architecture automatically. Use the -musl / :musl tags when slotting musefs into an Alpine-based stack; the default (glibc) tags suit everything else. Floating :latest / :musl track the most recent stable release only — prereleases publish only version-pinned tags.

Running musefs on the host is the simplest, best-supported option — it is an ordinary FUSE daemon and the image exists mainly to colocate musefs with containerized media managers (e.g. Lidarr). If you do containerize, mind the gotchas below.

Required flags

musefs mounts via FUSE, so the container needs /dev/fuse and the matching capability:

docker run --rm \
  --device /dev/fuse --cap-add SYS_ADMIN --security-opt apparmor=unconfined \
  -v /path/to/library:/library:ro \
  -v /path/to/store:/store \
  ghcr.io/sohex/musefs:latest scan /library --db /store/musefs.db

Without --device /dev/fuse --cap-add SYS_ADMIN --security-opt apparmor=unconfined the mount cannot be established.

Note: The apparmor flag may or may not be necessary depending on how your system is configured.

Note that CAP_SYS_ADMIN is a broadly privileged capability — it grants far more than FUSE mounting (mounting arbitrary filesystems, and more). It is unavoidable for an in-container FUSE mount — even rootless Podman cannot drop it; without --cap-add SYS_ADMIN the mount fails with fusermount3: mount failed: Permission denied. Under rootless Podman the capability is confined to the container's user namespace rather than the host, so its blast radius is smaller, but it is still required. Running musefs on the host needs no such capability at all.

Runs as a non-root user

The images run as a dedicated unprivileged user (default uid/gid 1000), not root — musefs mounts via the setuid fusermount3 helper and needs no root of its own. Consequences for the commands above:

  • The bind-mounted store volume must be writable by that uid. Either chown 1000:1000 /path/to/store on the host, or add --user $(id -u):$(id -g) to run as your own uid. The library volume is mounted :ro, so its ownership does not matter.
  • To bake an image whose user matches your host account (so no chown or --user is needed), build from source with --build-arg MUSEFS_UID=$(id -u) --build-arg MUSEFS_GID=$(id -g).
  • The images include user_allow_other in /etc/fuse.conf, so a non-root --allow-other / --owner / --group mount (needed to share the mount across containers or users, below) passes musefs's pre-flight check. See Ownership and permissions.

The mount-visibility gotcha (read this before sharing the mount)

A FUSE mount made inside a container lives in that container's mount namespace. By default neither the host nor other containers can see it, so pointing a second container (your media manager) at musefs's output does not work out of the box. To share the mount you propagate it between containers through a host directory: musefs binds that directory with rshared and mounts itself there, and the consumer binds the same directory with rslave so the mount propagates in. The host directory must itself be a shared mount.

# A host directory both containers bind to, marked shared so mounts propagate.
mkdir -p /srv/musefs-mnt
mount --bind /srv/musefs-mnt /srv/musefs-mnt
mount --make-rshared /srv/musefs-mnt

# A named volume for the store, writable by the image's unprivileged user.
podman volume create musefs-store

# musefs container: bind rshared, mount musefs there with --allow-other.
podman run -d --name musefs \
  --device /dev/fuse --cap-add SYS_ADMIN --security-opt apparmor=unconfined \
  -v /path/to/library:/library:ro -v musefs-store:/store \
  --mount type=bind,source=/srv/musefs-mnt,destination=/mnt/musefs,bind-propagation=rshared \
  ghcr.io/sohex/musefs:latest mount /mnt/musefs --db /store/musefs.db --allow-other

# consumer container: bind the same host path rslave; the mount propagates in.
podman run -d --name player \
  --mount type=bind,source=/srv/musefs-mnt,destination=/music,bind-propagation=rslave \
  ghcr.io/sohex/yourmediamanager:latest

Use a named volume (or an already-writable host path) for the store: a bind from a root-owned host directory is read-only to the image's unprivileged user and musefs aborts before mounting. --allow-other is required because the consumer container runs as a different uid than the musefs container; without it the consumer gets Permission denied on the mount. See Ownership and permissions.

Note: Some hardened kernels block cross-uid access to an unprivileged user's FUSE mount even with --allow-other — for example when the fuse module's allow_sys_admin_access parameter is N, or unprivileged user namespaces are restricted. If the consumer still gets Permission denied, set /sys/module/fuse/parameters/allow_sys_admin_access to Y, or run musefs and the consumer under the same uid.

Both the glibc and musl images carry the fuse3 userspace tools; pick :musl if your other containers are Alpine-based, otherwise the default tags are fine.

Sharing a host mount into a container

Running musefs on the host instead of in a container is simpler and needs no CAP_SYS_ADMIN. Mark the mount point as shared and mount musefs there with --allow-other, then bind it into the consumer container with rslave so the host's musefs mount propagates in:

# On the host: mark the mount point shared, then mount musefs with --allow-other.
mkdir -p /srv/musefs-mnt
mount --bind /srv/musefs-mnt /srv/musefs-mnt
mount --make-rshared /srv/musefs-mnt
musefs mount /srv/musefs-mnt --db /store/musefs.db --allow-other &

podman run -d \
  --mount type=bind,source=/srv/musefs-mnt,destination=/music,bind-propagation=rslave \
  ghcr.io/sohex/yourmediamanager:latest
# the container reads the re-tagged view at /music, byte-for-byte live

rslave is what keeps this working across restarts: a plain bind only captures whatever is mounted when the container starts, so it shows an empty directory if musefs mounts later and a stale view after a musefs restart.

Scanning

musefs --version (or -V) prints the build version; --help on the root or any subcommand lists its flags.

Scan

musefs scan /path/to/music --db library.db            # ingest (dirs recurse)
musefs scan /path/to/music --db library.db --revalidate

scan probes each audio file (FLAC, MP3, M4A/M4B, Ogg, WAV), recording its audio byte range, tags, and embedded art in the store. It takes one or more files or directories, and --jobs N controls probe parallelism. --follow-symlinks walks symlinked files and directories (off by default, so symlinks are logged and skipped). --quiet (-q) suppresses the per-target summary for scripting; scan failures still surface on stderr (raise detail with -v/-vv, or RUST_LOG=info).

scan and scan --revalidate show a live progress indicator: on an interactive terminal, a discovery spinner followed by a determinate bar (position, percent, ETA, current file); on a non-interactive stderr (piped or logged), throttled ingested N/M (P%) lines. --quiet (-q) suppresses the progress indicator and the per-target summary. Each summary line ends with the elapsed time.

The per-target summary reads scanned N: … skipped X, failed Y. skipped counts every file that isn't a supported audio format — cover art, .cue / .log / .nfo sidecars, and anything else non-audio — so a large skipped number (hundreds or thousands on a big library) is expected, not an error. A per-extension breakdown of the skip count is logged at end of scan (e.g. skipped 42: jpg=20, cue=10, log=8, <none>=4), so you can tell expected sidecars from anything genuinely unexpected. failed is the one to watch: those are audio files musefs recognised by extension but could not parse. Format dispatch is by extension only — there is no content sniffing and no fallback to another parser, so a file whose contents don't match its extension (e.g. a FLAC named .mp3) is handed to the wrong parser, fails, and is counted here rather than retried. Renaming files across formats makes them vanish from the mount; fix the extension and rescan.

If any file fails (failed Y with Y > 0), scan exits 2 even though the batch otherwise completes and the parseable files are ingested — so a pipeline like musefs scan … && musefs mount … stops on a partial or total ingest failure rather than mounting an incomplete library. A successful scan exits 0; a hard error (a missing target, an unreadable DB) still exits 1. The exit code is the only machine-detectable signal; per-file failures otherwise surface only on stderr.

--revalidate is the maintenance pass: it skips unchanged files — preserving any tag edits you made in the store — prunes tracks whose backing file is gone, and garbage-collects orphaned art.

Content checksums and move re-identification

--checksum=none|fingerprint|full (env MUSEFS_CHECKSUM, default fingerprint) controls what content checksums scan computes and stores.

  • none — no checksums (legacy behavior).
  • fingerprint — compute a cheap fingerprint for each file, derived from the probe's parsed output (tags, audio bounds, embedded art). This is the default: it rides the existing probe at essentially no extra I/O cost and is sufficient for routine move detection.
  • full — fingerprint plus an eager full-file SHA-256. Use this when you want collision-proof retargeting or a forensic content identity for every file.

Two flags govern how a fingerprint match is confirmed before retargeting a moved file:

  • --fast (env MUSEFS_FAST) — fingerprint match is always sufficient; never reads the full file even when a stored content_hash exists.
  • --strict (env MUSEFS_STRICT) — require a full-hash match; if the matched candidate has no stored content_hash, refuse the retarget and insert a fresh row instead. The default (neither flag) auto-escalates: full-hash the new file when the candidate already has a content_hash, and trust the fingerprint alone when it does not.

--fast and --strict are mutually exclusive.

Move re-identification workflow. After moving or reorganizing your backing library, run a normal musefs scan on the new locations. For each file not already in the store, the scanner looks up rows whose fingerprint matches and whose old path is gone, and retargets the unique match in place — its id, tags, and art are preserved. Move recovery only applies to rows that were fingerprinted before the move (rows scanned under --checksum=none have no fingerprint and cannot be retargeted until a later fingerprint-tier pass). Run scan after a move and ideally before any revalidaterevalidate still prunes tracks whose backing file is gone, so it will remove un-retargeted rows if run first.

Mounting & path templates

Mount

musefs mount /path/to/mountpoint --db library.db \
    --template '$albumartist/$album/$title' \
    --default-fallback Unknown \
    --fallback albumartist='Unknown Artist' \
    --mode synthesis        # or: structure-only

mount blocks until the filesystem is unmounted (fusermount3 -u, or Ctrl-C).

mount never creates the store — unlike scan, it requires a populated DB to already exist and exits non-zero otherwise. Interactively this is invisible (the scanmount quick start always seeds it first), but it bites automation: a mount started at boot before anything has scanned hard-fails (and crash-loops under Restart=). Seed the store with an initial scan, or order the mount after it — see contrib/systemd.

Mounting at an arbitrary path may be denied by AppArmor. On distros that ship an AppArmor profile for fusermount3 (Ubuntu 24.04+ / libfuse ≥ 3.17), unprivileged FUSE mounts are only allowed when the mountpoint is under a whitelisted prefix — the shipped profile permits $HOME/**, /mnt, /media, /tmp, /cvmfs, $XDG_RUNTIME_DIR, plus flatpak dirs. Mounting elsewhere (e.g. a data volume at /data/...) fails with fusermount3: mount failed: Permission denied, and the kernel audit log shows apparmor="DENIED" operation="mount" … profile="fusermount3". The mountpoint's own ownership is irrelevant — AppArmor rejects the mount() syscall first. Fix it by mounting under a permitted prefix, or by whitelisting your prefix in /etc/apparmor.d/local/fusermount3 (the shipped profile ends with include if exists <local/fusermount3>).

Two modes:

  • synthesis (default) — files carry metadata freshly generated from the store, spliced ahead of the original audio bytes.
  • structure-only — files are served byte-for-byte as they are on disk; only the directory tree is virtual.

Edit tags or art in the database while mounted (another scan, a beets/Picard/Lidarr sync, raw SQL) and the view refreshes automatically.

Run musefs <command> --help for the full flag list.

Path templates

Paths come from a beets-style template (matched case-insensitively; any tag key in the store works):

  • $field / ${field} — substitute a tag field (e.g. $artist, $album, $title, $tracknumber, $date, $genre).
  • ${albumartist|artist}fallback chain: the first present field wins, before the --default-fallback value (default Unknown) is used.
  • A missing field resolves in order: the field's value, then a per-field fallback from --fallback FIELD=VALUE (repeatable, e.g. --fallback albumartist='Unknown Artist'), then --default-fallback. Per-field fallbacks let one field default differently from the rest.
  • --skip-on-missing — drop a track from the mount entirely when a top-level template field stays unresolved, instead of substituting --default-fallback. Per-field --fallback chains and [ … ] sections are unaffected (a field resolved via its fallback counts as present, and section fields stay optional). Handy when an external tool tags only some tracks, e.g. --template '$!{beets_path}' --skip-on-missing hides tracks beets left without a beets_path (such as deduplicated albums).
  • [ … ]conditional section: the bracketed text is emitted only when at least one field inside it is present. So $album[ - CD $disc] yields Album - CD 2, or just Album on a single-disc release. Write $[ / $] for literal brackets.
  • $!{field}path field: the value's / are kept as directory separators (each segment sanitized; empty/./.. dropped). Lets an external tool precompute a whole relative path into one tag and mount it as --template '$!{beets_path}'.

Anything else is literal. Name collisions get a deterministic (2), (3), … suffix. Every rendered component is capped at 255 bytes (NAME_MAX, truncated on a UTF-8 boundary, extension preserved), and a plain field whose value is exactly . or .. is dropped rather than creating an unusable directory. The default template is $albumartist/$album/$title.

Brackets and braces must be balanced: an unclosed [ section or an unterminated ${ / $!{ field is rejected at mount time with an error naming the problem, rather than silently folding the rest of the template into the open construct. To check a template before committing to a mount, add --dry-run: it validates the template, prints a sample of the paths the mount would expose along with the total file and directory counts, then exits without mounting.

Tuning & metrics

Tuning

The defaults are sensible for most setups, including the two measured storage wins — daemon-level backing read-ahead (--read-ahead-budget-mib, the single biggest win for NFS/remote) and keeping the kernel page cache across opens (--keep-cache, on by default, ~3× faster reopen on HDD/NFS). The kernel-level read-ahead / background knobs have little measurable effect (see the storage-tunables benchmarks for the methodology and numbers).

FlagDefaultWhat it does
--poll-interval-ms1000Debounce window for detecting external DB edits.
--read-ahead-budget-mib64Per-mount RAM budget (MiB) for backing read-ahead: the daemon coalesces a stream's small FUSE reads into one large positioned read, so the backing client can pipeline/parallelize them. The biggest lever for slow/high-latency backing — ~5–6× single-stream throughput over a 200 ms-RTT NFS mount; neutral on local disk. Shared across all active streams with LRU eviction; 0 disables it.
--read-ahead-prefetchdisabledAdvanced: add background prefetch threads on top of read amplification. Off by default — benchmarks found amplification alone delivers the entire read-ahead win, while the threads add ~10% overhead with no measured benefit. Enable only when profiling a backend where a single large read does not self-pipeline.
--keep-cache <true|false>trueKeep the kernel page cache across opens. On by default — it is the one measured storage win: repeat opens of a file are served from cache instead of re-read over slow storage (~3× faster reopen on HDD/NFS in our benches). External re-tags auto-invalidate the affected files, so cached bytes never go stale. Disable with --keep-cache false (e.g. on a memory-constrained host where the page cache is contended).
--attr-ttl-ms1000How long the kernel may trust cached entry/attr lookups. Higher cuts lookup/getattr traffic — useful for metadata-heavy clients (library scanners) over high-latency backing — but bounds how fast external edits become visible.
--max-readahead-kib512Kernel read-ahead window (clamped to the kernel maximum). Distinct from --read-ahead-budget-mib (the daemon-level read-ahead, which is the effective one): this kernel knob does not speed up musefs streaming, since reads reach the daemon in fixed FUSE-sized chunks regardless. On HDD, values well above the default can even hurt. Leave at the default unless your own profiling shows otherwise.
--max-background64Max outstanding background (read-ahead/async) requests the kernel keeps in flight. Does not bound foreground reads (those scale with client concurrency), so it has little effect on read throughput; left for completeness.

Filename case-folding (--case-insensitive) is platform behaviour rather than a performance knob — see Platform support.

Metrics

musefs mount optionally exposes runtime telemetry through a synthetic .musefs-metrics/ directory at the mount root:

musefs mount /mnt/music --db library.db --expose-metrics   # or: MUSEFS_EXPOSE_METRICS=1
cat /mnt/music/.musefs-metrics/metrics
# HELP musefs_uptime_seconds Seconds since the mount started.
# TYPE musefs_uptime_seconds gauge
musefs_uptime_seconds 60
# HELP musefs_handles_open Open file handles in the core slab.
# TYPE musefs_handles_open gauge
musefs_handles_open 3
# HELP musefs_cache_header_hits_total Raw header-cache key hits; a hit may still trigger a content-version rebuild.
# TYPE musefs_cache_header_hits_total counter
musefs_cache_header_hits_total 100

--expose-metrics (default off) is a runtime flag that gates the virtual file; it is unrelated to the compile-time metrics cargo feature, which adds syscall counters (opens, preads, etc.) to the output. The jemalloc allocator stats require a build with the jemalloc feature, which is the default.

The metrics file advertises st_size == 0 (like /proc), so use an EOF-aware reader — cat, head -c, or the Prometheus textfile collector — not a stat-and-read-by-size approach.

Maintenance

Compacting the store (musefs vacuum)

The SQLite store only grows as you use it: deleting tracks (beets/Lidarr prunes), garbage-collecting orphaned art, and the schema migration all leave free pages behind that are not automatically reclaimed. Because embedded art is stored inline (up to ~16 MiB per image), a library that has churned art can carry significant dead space.

musefs vacuum compacts the store and reports how much it reclaimed:

musefs vacuum --db library.db        # or: MUSEFS_DB=library.db musefs vacuum
vacuumed library.db: 412.7 MiB → 318.2 MiB (reclaimed 94.5 MiB)

It runs SQLite's VACUUM followed by a WAL checkpoint, rewriting the database into a compact form.

Run it while unmounted

VACUUM needs a write lock on the store and rewrites the whole file. Run it when nothing else is using the database — no mount, no scan. If the store is in use, the command fails with an actionable error rather than fighting for the lock:

error: the store is in use — unmount the filesystem or stop any scan before vacuuming

Notes

  • Full rewrite. Each run rewrites the entire database and transiently needs free disk space roughly equal to the store size (it builds a complete copy before swapping). Running it again on an already-compact store is safe and reports (already compact).
  • May upgrade the schema. Like every musefs command that opens the store for writing, vacuum migrates an older store to the current schema version before compacting.

Ownership, permissions & config

Ownership and permissions

By default the mount presents the launching process's uid/gid and read-only permission bits (555 dirs, 444 files), and is reachable only by the user who performed the mount (and root).

To present a different owner — e.g. a media-server service account — and let that account actually reach the mount, pass --owner/--group (or --allow-other). Either makes musefs mount with allow_other and default_permissions: other users can traverse the mount, and the kernel enforces the presented owner/mode bits instead of ignoring them.

FlagDefaultWhat it does
--owner <NAME|UID>process uidUser presented as the owner of every entry. Accepts a username or a numeric uid. Implies --allow-other.
--group <NAME|GID>process gidGroup presented for every entry. Accepts a group name or a numeric gid. Implies --allow-other.
--allow-otheroffMount with allow_other + default_permissions so accounts other than the mounting user can reach the mount and the owner/mode bits are enforced. Implied by --owner/--group.
--file-mode <OCTAL>444Permission bits for regular files, in octal. The mount is read-only, so write bits are advertised but writes still fail with EROFS.
--dir-mode <OCTAL>555Permission bits for directories, in octal.

The default 444/555 bits are world-readable, so any account can read once allow_other is on. To restrict the mount to the presented owner/group, drop the world bits (e.g. --file-mode 440 --dir-mode 550) — only then does --owner/--group gate access rather than merely label it.

Non-root mounts need user_allow_other. When you are not root, libfuse refuses an allow_other mount unless /etc/fuse.conf contains a line user_allow_other. musefs checks this before mounting and fails with an explanatory error if it is missing; add the line to /etc/fuse.conf, or run musefs as root. (This is libfuse/system policy, not a musefs restriction.) The published container images already include this line, so non-root allow_other mounts work out of the box there.

--allow-other grants other users — but not root. A FUSE mount made with allow_other (not allow_root) is reachable by other unprivileged users, yet root specifically cannot traverse or stat it when it is owned by another user. This surprises root-run tooling (Ansible, boot scripts):

  • mountpoint -q <mnt> / stat <mnt> run as root report it as not a mountpoint — they try to stat through the mount and get EACCES. Detect the mount from root with findmnt <mnt> or /proc/mounts instead, which read the mount table rather than the filesystem.
  • Don't have root manage the mountpoint directory while it is mounted: a root task that re-asserts the directory (e.g. Ansible file: state=directory) fails with EACCES/EEXIST on every run after the first. Create the directory before mounting, or run such tasks as the mounting user.

Configuring with environment variables

Every scalar mount and scan flag can also be set with a MUSEFS_* environment variable — uppercase the long flag and turn dashes into underscores (e.g. --poll-interval-msMUSEFS_POLL_INTERVAL_MS, the mount mountpoint → MUSEFS_MOUNTPOINT). An explicit flag always overrides its env var, which overrides the default. Boolean flags (e.g. MUSEFS_KEEP_CACHE, MUSEFS_REVALIDATE, MUSEFS_FOLLOW_SYMLINKS, MUSEFS_QUIET, MUSEFS_ALLOW_OTHER, MUSEFS_CASE_INSENSITIVE, MUSEFS_EXPOSE_METRICS, MUSEFS_FAST, MUSEFS_STRICT) accept a case-insensitive boolish value — true/false, yes/no, on/off, 1/0 — and reject anything else. The repeatable --fallback and the scan targets are command-line only. See contrib/systemd/musefs.conf.example for a commented example covering the common settings.

These variables are read the same way no matter how musefs is launched: exported into the shell before running the binary directly (MUSEFS_DB=… musefs mount), set via a systemd EnvironmentFile= or Environment= directive, or passed into a container with -e/--env-file. The configuration surface is identical across all three; the sections below just show the per-deployment wiring.

Running as a systemd user service

To run musefs on the host at login, drop-in units live in contrib/systemd/: a musefs.service mount daemon, an optional musefs-scan.timer for periodic re-scans, and a commented musefs.conf.example holding every MUSEFS_* setting. Copy the units to ~/.config/systemd/user/, copy the config to ~/.config/musefs/musefs.conf, edit MUSEFS_MOUNTPOINT and MUSEFS_DB, then systemctl --user enable --now musefs.service. See the systemd integration guide for the full walkthrough and the PATH / linger gotchas.

FAQ

Does musefs ever write to my audio files? No. The mount is read-only and the scanner only reads. The served files are assembled on the fly: generated metadata plus positioned reads of your originals. Nothing is ever copied or rewritten.

Where do my edited tags live? In the SQLite store (--db). Edit it with the beets or Picard plugins, the Lidarr integration, or with plain SQL — the schema is a documented, stable contract (see the SQLite store).

Do edits show up without remounting? Yes. The mount polls the database (debounced) and picks up external commits automatically, with stable inodes across refreshes — even files held open keep working.

Can I write through the mount? No — and it's not planned. Out-of-band editing against the store is the design: it's what guarantees your originals can never be corrupted.

Is it fast enough for a big library on a NAS? That's the design target: synthesized headers are cached, blocking reads run on a worker pool so a slow disk never stalls the filesystem, and read-ahead, cache TTLs, and poll intervals are all tunable. In structure-only mode on kernel 6.9+, reads can bypass the daemon entirely via FUSE passthrough (needs CAP_SYS_ADMIN).

A file in the mount won't open / reads error — why? The most common cause is a backing file that changed since its last scan (musefs refuses to serve a file whose size, mtime, or ctime drifted, rather than splice at stale offsets). Run musefs scan --revalidate to re-probe it.

Supported formats

musefs synthesizes fresh metadata for each supported container while serving the original audio bytes verbatim. Each format has its own page for the exact synthesis behavior and lossy edges.

FormatExtensionsWhat is synthesized
FLAC.flacRegenerates the metadata blocks; preserves STREAMINFO/SEEKTABLE bit-exact
MP3.mp3Regenerates the ID3v2.4 tag; audio frames (incl. Xing/LAME) untouched
M4A.m4a, .m4bRebuilds the moov atom, patching chunk offsets; mdat served verbatim
Ogg.ogg, .oga, .opusRegenerates header pages; audio pages verbatim, only page seq/CRC patched in place
WAV.wavRegenerates the RIFF front (LIST/INFO + embedded ID3v2); data payload verbatim

FLAC

How musefs scans and synthesizes native FLAC files (.flac). FLAC inside an Ogg container is a different beast — see Ogg. For the segment model these layouts plug into, see the segment model.

What round-trips

  • All text tags. Canonical keys (title, artist, albumartist, date, tracknumber, …) map to their conventional Vorbis field names via the shared vocabulary (musefs-format/src/tagmap.rs); any other field round-trips verbatim by its own name. Multi-value fields keep their order. User-defined keys that are not legal Vorbis field names (empty, containing =, control characters, or non-ASCII bytes — i.e. outside ASCII 0x200x7D minus =) are dropped on synthesis and logged; they cannot round-trip by name.
  • Binary metadata blocks. APPLICATION and CUESHEET blocks are captured at scan time as binary tags (an APPLICATION payload includes its 4-byte application id) and re-emitted on synthesis, streamed from the DB rather than held in memory.
  • Embedded pictures. Each PICTURE block round-trips with its MIME type, picture type, description, and dimensions; image bytes are stored content-addressed and streamed at read time.
  • Structural blocks. STREAMINFO and SEEKTABLE are preserved bit-exact. They are captured into the read-only structural_blocks store at scan time (external tools must not edit them) and re-emitted on synthesis.

Lossy edges

  • PADDING blocks are dropped — the synthesized file carries no padding.
  • Metadata blocks of unknown/reserved types are dropped at scan time.
  • A PICTURE block whose picture type falls outside the standard 020 range is clamped to 0 (Other) at scan time, matching the store's track_art.picture_type CHECK. This shared PICTURE parser also serves FLAC-in-Ogg, so the same clamp applies there.
  • The VORBIS_COMMENT vendor string is replaced with musefs's own.
  • Vorbis field names are case-insensitive by spec; musefs re-emits canonical keys under their conventional uppercase names and upper-cases unknown field names. A field stored as MixedCase comes back as MIXEDCASE — same field to a conforming reader, different bytes.

How synthesis works

flac::synthesize_layout (musefs-format/src/flac.rs) builds the layout in this order — an inline metadata region, DB-streamed payloads, then the untouched audio:

 offset 0
 ┌──────────────────────────────────────────────┐ ┐
 │ █ "fLaC" marker                      (Inline) │ │
 │ █ STREAMINFO / SEEKTABLE, bit-exact  (Inline) │ │ generated
 │ █ VORBIS_COMMENT rebuilt from DB     (Inline) │ │ metadata
 │ ▒ APPLICATION / CUESHEET bodies   (BinaryTag) │ │ region
 │ █ PICTURE framing + ▒ image bytes  (ArtImage) │ │
 ├──────────────────────────────────────────────┤ ┘
 │ ░ audio frames, verbatim       (BackingAudio) │
 └──────────────────────────────────────────────┘
 EOF     █ inline-generated   ▒ DB-streamed   ░ untouched backing
  1. Inline — the fLaC marker plus the preserved structural blocks (STREAMINFO, SEEKTABLE, sorted by block type) and a VORBIS_COMMENT block regenerated entirely from the DB tag rows.
  2. BinaryTag — one segment per stored APPLICATION/CUESHEET block, streamed from the DB at read time.
  3. ArtImage — one PICTURE block per linked art row; the block framing is inline, the image bytes stream from the blob store.
  4. BackingAudio — the original audio frames, served by positioned reads at the stored audio_offset/audio_length.

Structural blocks normally come from the structural_blocks store. A database scanned before that store existed has no rows there; synthesis then falls back to re-reading the file's front for every preserved block (carrying APPLICATION/CUESHEET inline and suppressing the streamed binary tags so nothing is emitted twice). A re-scan upgrades the track to the streamed path.

Quirks & invariants

  • The audio frames are never touched: the backing segment starts exactly at the scanned audio offset, and the byte-identical-audio property is asserted by musefs-format/tests/proptest_flac.rs and the mutagen interop suite (musefs-core/tests/interop_emit.rs).
  • Synthesis re-parses its own inline output in tests (flac_tag_roundtrip_is_stable): the regenerated front must be a valid FLAC metadata region whose computed audio boundary equals the layout's header length.
  • Block-body sizes are bounded at parse time (MAX_BLOCK_BODY); a crafted file cannot force a huge allocation.
  • The parser now rejects (at scan and synthesis) any FLAC whose metadata does not begin with exactly one 34-byte STREAMINFO block; a crafted store providing malformed structural rows fails synthesis with a controlled error rather than emitting decoder-rejected output.

MP3

How musefs scans and synthesizes MP3 files (.mp3) and their ID3v2 metadata. For the segment model these layouts plug into, see the segment model. The ID3v2 builder described here is shared with WAV's embedded id3 chunk — see WAV.

What round-trips

  • Canonical text tags (title, artist, albumartist, date, tracknumber, …) map to their standard ID3v2 text frames (TIT2, TPE1, TPE2, TDRC, TRCK, …) via the shared vocabulary (musefs-format/src/tagmap.rs). NUL-separated multi-value frames yield one tag row per value and are re-emitted NUL-separated in a single frame.
  • Vocabulary TXXX keys (ReplayGain fields, MusicBrainz album/artist ids) round-trip through TXXX frames with their fixed, exact-case descriptions (e.g. MusicBrainz Album Id).
  • Unmapped standard text frames round-trip keyed by their own frame id: a TSSE (or a legacy v2.3 TYER) comes back as the same frame inside the synthesized tag.
  • Other user-defined keys round-trip as TXXX frames keyed by their own description, original casing preserved.
  • Comments and lyrics (COMM/USLT): one tag row per frame. A frame with a placeholder language (XXX/und/empty) and no descriptor folds to the shared comment/lyrics key; one carrying a real language or descriptor is keyed id3:COMM:<lang>:<desc> / id3:USLT:<lang>:<desc> so per-language or description-keyed frames stay distinct, and both fields are restored on synthesis.
  • Ratings and play counts: a POPM frame is promoted at scan time to rating (the raw 0–255 byte) and playcount (omitted when 0) text tags, and rebuilt as a POPM frame on synthesis.
  • MusicBrainz track id: a UFID frame with the http://musicbrainz.org owner is promoted to musicbrainz_trackid and rebuilt with the same owner.
  • Opaque binary frames, byte-exact: PRIV, GEOB, SYLT, MCDI, URL (W***) frames, non-MusicBrainz UFIDs, and unknown frames are captured verbatim (frame id + raw body) and re-emitted streamed from the DB (BinaryTag segments) — never held in memory.
  • Embedded pictures (APIC): MIME type, picture type, and description round-trip; image bytes are stored content-addressed and streamed.

Lossy edges

  • The synthesized tag is always ID3v2.4, regardless of the source tag's version (v2.2/v2.3 tags are parsed but never re-emitted as such).
  • A COMM/USLT frame folded to the shared comment/lyrics key (placeholder language, no descriptor) is re-emitted with language XXX and an empty descriptor, so a source und placeholder comes back as XXX. Frames carrying a real language or descriptor are preserved (see above).
  • POPM: the owner ("email to user") field is dropped by design. Multiple POPM frames collapse to one (first rating wins, last parseable play count wins); counters above u32::MAX clamp to 4 bytes.
  • ID3v1 is not read. A file whose only tag is ID3v1 scans with no tags (populate the DB via beets/Picard instead). A trailing ID3v1 tag is also excluded from the audio region, so the synthesized file does not carry it.
  • The audio locator validates the ID3v2 major version (2–4) and rejects synchsafe size bytes with the high bit set, producing a controlled Malformed error rather than mask-decoding an invalid offset. Tags using unsynchronisation or an extended header still scan — their declared size already covers the audio boundary.
  • Scan-time tag extraction is skipped entirely — by a deliberate denial-of-service guard, see below — for tags using unsynchronisation, an extended header, non-zero frame flags (compression/encryption), malformed synchsafe size fields, or containing CHAP/CTOC chapter frames. Such files still mount and serve; they just contribute no scanned tags.
  • ID3v2.2 binary frames are not extracted (3-char ids; text and art still parse). APIC width/height are not recorded at scan time.
  • An APIC picture type outside the standard 020 range (the id3 crate's Undefined(u8) variant can exceed 20) is clamped to 0 (Other) at scan time, matching the store's track_art.picture_type CHECK.

How synthesis works

mp3::synthesize_layout (musefs-format/src/mp3.rs) emits a fresh ID3v2.4 tag followed by the untouched audio:

 offset 0
 ┌──────────────────────────────────────────────┐ ┐
 │ █ ID3v2.4 header (10 bytes)          (Inline) │ │
 │ █ text / TXXX / COMM / USLT frames   (Inline) │ │ generated
 │ █ rebuilt POPM / UFID frames         (Inline) │ │ ID3v2.4
 │ █ frame header + ▒ opaque body    (BinaryTag) │ │ tag
 │ █ APIC framing + ▒ image bytes     (ArtImage) │ │
 ├──────────────────────────────────────────────┤ ┘
 │ ░ MPEG audio incl. Xing/LAME,  (BackingAudio) │
 │ ░ verbatim                                    │
 └──────────────────────────────────────────────┘
 EOF     █ inline-generated   ▒ DB-streamed   ░ untouched backing
  1. Inline — the 10-byte tag header, all text/TXXX/COMM/USLT frames, and the rebuilt POPM/UFID frames. Frame sizes are synchsafe-bounded; oversized frames fail synthesis rather than emit a corrupt tag.
  2. Per picture: inline APIC framing + an ArtImage segment streaming the image bytes.
  3. Per opaque binary frame: an inline frame header + a BinaryTag segment streaming the body from the DB (empty payloads are skipped — they would fail layout validation).
  4. BackingAudio — the audio region located at scan time: everything after the leading ID3v2 tag and before a trailing ID3v1 tag, anchored by an MPEG frame-sync check. The Xing/LAME info frame is an MPEG frame, so it travels with the audio untouched.

Quirks & invariants

  • The OOM guard (id3v2_alloc_safe): the id3 parser crate eagerly allocates a frame's declared size (v2.3 sizes are plain 32-bit — up to 4 GiB), so musefs validates every frame bound itself before handing a buffer to the crate, and refuses tags it cannot validate. Found and locked in by the mp3 fuzz target; the conservative skips listed under "Lossy edges" are this guard.
  • Byte-identical audio and tag round-trip stability are asserted by musefs-format/tests/proptest_mp3.rs and the mutagen interop suite (musefs-core/tests/interop_emit.rs).

M4A

How musefs scans and synthesizes MP4-container audio (.m4a, .m4b). Only unfragmented files with exactly one track, and that track audio (soun), are accepted; anything else is skipped at scan time. For the segment model these layouts plug into, see the segment model.

What round-trips

  • Canonical text tags map to their standard ilst atoms (©nam, ©ART, aART, ©alb, ©day, …) via the shared vocabulary (musefs-format/src/tagmap.rs).
  • Vocabulary freeform keys (ReplayGain fields, MusicBrainz album/artist ids, ISRC, COPYRIGHT, …) round-trip through ---- freeform atoms under the com.apple.iTunes mean, matched case-insensitively.
  • Other text freeform atoms round-trip keyed by their verbatim name, original casing preserved.
  • Track and disc numbers, with totals: the binary trkn/disk atoms are decoded to tracknumber/discnumber as "N" or "N/M" (the "N of M" total, matching ID3 TRCK/TPOS) and rebuilt as binary atoms with the total filled in.
  • Integer atoms: tmpo/cpil/pgap map to the canonical bpm/ compilation/gapless keys (shared with ID3 TBPM/TCMP and Vorbis) and are rebuilt as type-21 integer atoms.
  • Multi-value atoms: every data sub-box of an atom is read (the iTunes multiple-data convention), so a multi-valued atom round-trips all its values, not just the first.
  • Opaque binary freeform atoms, byte-exact: a ---- atom whose payload is binary-typed is captured verbatim under the key ----:<mean>:<name> (so the mean survives) and re-emitted streamed from the DB (BinaryTag segment).
  • Cover art: every data child of a covr atom (the iTunes multiple-artwork convention) is ingested; synthesis emits one covr atom with one data child per stored art row, in order, image bytes streamed.

Lossy edges

  • A text freeform atom under a mean other than com.apple.iTunes is re-emitted with the com.apple.iTunes mean (the scan keys text freeform by name only). Binary freeform atoms keep their mean via the ----:<mean>:<name> key.
  • Binary ilst atoms outside the handled set (trkn/disk, the tmpo/cpil/pgap integer atoms, and ---- freeform) are dropped at scan time, since they are not re-emitted on synthesis.
  • covr ingestion accepts only JPEG (type 13) and PNG (type 14) artwork; other type codes are skipped. MP4 has no picture-type or description fields: scanned art becomes "front cover" with an empty description, and any non-PNG stored art is emitted with the JPEG type code.
  • A covr image or binary ---- value larger than its size cap is skipped at scan time — before the image is materialized out of a potentially large moov — and logged (a warn line on stderr) so the lossy drop is explained rather than silent.

How synthesis works

mp4::synthesize_layout (musefs-format/src/mp4.rs) regenerates the moov box and serves [ftyp][regenerated moov][mdat header][mdat payload]:

 offset 0
 ┌──────────────────────────────────────────────┐ ┐
 │ █ ftyp, copied verbatim              (Inline) │ │
 │ █ moov: kept structural children,    (Inline) │ │ regenerated
 │ █   stco/co64 offset values += Δ              │ │ front
 │ █ fresh udta/meta/ilst framing       (Inline) │ │
 │ █ ---- framing + ▒ freeform body  (BinaryTag) │ │
 │ █ covr framing + ▒ image bytes     (ArtImage) │ │
 │ █ mdat header                        (Inline) │ │
 ├──────────────────────────────────────────────┤ ┘
 │ ░ mdat payload, verbatim       (BackingAudio) │
 └──────────────────────────────────────────────┘
 EOF     █ inline-generated   ▒ DB-streamed   ░ untouched backing
         Δ = new mdat payload offset − old
  1. The scan keeps moov's structural children and drops its old udta. A fresh udta/meta/ilst is built from the DB: inline box framing, with each opaque ---- value and each cover image spliced in as streamed BinaryTag/ArtImage segments. Every enclosing box size accounts for the streamed lengths, so the spliced bytes land exactly where the sizes say.
  2. The mdat payload is served verbatim (BackingAudio), merely relocated: every chunk offset in stco (32-bit) or co64 (64-bit) shifts by one constant delta. Only offset values are patched, never box sizes, so the new moov size is computable before the delta — no circular dependency. A 32-bit stco offset that would overflow fails synthesis rather than corrupt.
  3. A moov that sits after mdat (common for faststart-less files) is handled by a streaming reader that skips the mdat payload — the potentially hundreds-of-MB payload is never read at resolve time.

Quirks & invariants

  • The structural metadata read at resolve time is capped (MAX_MP4_METADATA_BYTES, 256 MiB); a file declaring more is refused with a controlled error instead of ballooning memory.
  • MP4 box sizes are 32-bit: oversized synthesized metadata (e.g. enormous art) fails with TooLarge at the format boundary rather than emitting a truncated size field.
  • Byte-identical audio and structural validity are asserted by musefs-format/tests/proptest_mp4.rs, an offset-patching oracle test (mp4_oracle.rs), and the mutagen interop suite (musefs-core/tests/interop_emit.rs).

Ogg (Opus / Vorbis / FLAC-in-Ogg)

How musefs scans and synthesizes Ogg files (.ogg, .oga, .opus) carrying an Opus, Vorbis, or FLAC logical bitstream. Multiplexed and chained Ogg is detected and skipped at scan time: within the header region every page must share the first page's serial, and only the first page may carry beginning-of-stream. For the segment model these layouts plug into, see the segment model. Native FLAC files are covered by FLAC.

The Ogg invariant

Original Ogg packet payload bytes are preserved during synthesis; page sequence numbers and CRCs may be patched intentionally. Synthesis regenerates the logical bitstream's header pages (to embed fresh tags and art), which changes the header page count; the audio pages that follow are served verbatim except that each page header's sequence number is shifted by a constant delta and its CRC recomputed in place. The served audio byte length is unchanged — renumbering patches, never recopies.

Verified by musefs-format/tests/proptest_ogg.rs (crate feature fuzzing), read_at integration tests comparing source and synthesized audio payloads (musefs-core/src/reader.rs test modules), and the mutagen interop suite (musefs-core/tests/interop_emit.rs).

What round-trips

  • All text tags. VorbisComments are rebuilt from the DB through the same builder as FLAC: canonical keys map to their conventional field names via the shared vocabulary (musefs-format/src/tagmap.rs); any other field round-trips verbatim by its own name, in order, multi-values included. User-defined keys outside the Vorbis field-name grammar (empty, containing =, control characters, or non-ASCII — outside ASCII 0x200x7D minus =) are dropped on synthesis and logged.
  • Embedded pictures, with MIME type, picture type, description, and dimensions — in both art encodings (see below).
  • Codec headers. The identification packet (OpusHead, Vorbis identification, the OggFLAC STREAMINFO carrier) and any trailing header packets (e.g. the Vorbis setup packet) are preserved; only the comment metadata is regenerated.

Lossy edges

  • The VorbisComment vendor string is replaced with musefs's own.
  • Vorbis field names are case-insensitive by spec; canonical keys come back under their conventional uppercase names and unknown field names are upper-cased on synthesis.
  • Ogg carries no binary-tag slot: only text comments and pictures exist, so there is nothing else to preserve.
  • Embedded pictures are parsed through FLAC's PICTURE block reader, so a picture type outside the standard 020 range is clamped to 0 (Other) at scan time, matching the store's track_art.picture_type CHECK.
  • Embedded picture descriptions are right-padded with up to two trailing spaces. The FLAC PICTURE block is built with its description padded so the prefix length — 32 + mime.len() + description.len(), i.e. everything before the image bytes — is a multiple of 3 (picture_prefix, musefs-format/src/ogg/mod.rs), which is what makes base64(prefix ++ image) == base64(prefix) ++ base64(image) and lets the image's base64 be served as an independent, incrementally-streamable substring (the art split above). Padding the description is the safe place to do it — the MIME type must stay a valid type. So a synthesized picture's description can differ from the original by up to two trailing spaces; this applies to Opus/Vorbis and OggFLAC alike, since both build the block body the same way.

How synthesis works

ogg::synthesize_layout (musefs-format/src/ogg/mod.rs) produces:

 offset 0
 ┌──────────────────────────────────────────────┐ ┐
 │ █ identification page, preserved     (Inline) │ │ regenerated
 │ █ comment page(s) rebuilt from DB    (Inline) │ │ header
 │ ▒   art windows, base64/raw    (OggArtSlice)  │ │ pages
 │ █ trailing header pages, preserved   (Inline) │ │ (repaginated)
 ├──────────────────────────────────────────────┤ ┘
 │ ░ audio pages: payload verbatim,   (OggAudio) │
 │ ░ page seq += Δ, CRC repatched in place       │
 └──────────────────────────────────────────────┘
 EOF     █ inline-generated   ▒ DB-streamed
         ░ backing pages (headers patched in place, payload untouched)
         Δ = synthesized header page count − original
  1. Inline — the regenerated header pages: the preserved identification packet, a comment packet rebuilt from the DB, and the preserved trailing header packets, repaginated with correct CRCs.
  2. The art split. Opus and Vorbis embed art as base64 METADATA_BLOCK_PICTURE comments (the decoded bytes are a FLAC PICTURE block body): each image is an OggArtSlice run — a window of base64(image) encoded incrementally at read time from the blob store, never materialized whole. Artwork is streamed at synthesis time: page CRCs are computed from page-bounded ArtSource windows, and the full image and its base64 copy are never materialized. FLAC-in-Ogg instead carries one native FLAC PICTURE block packet per image (raw OggArtSlice runs, no base64); the last metadata packet's last-block flag and packet 0's 16-bit following-packet count are recomputed to match. Art exceeding MAX_ART_BYTES (16 MiB − 64 KiB) is rejected by the store's CHECK, with a resolve-time cap backstopping a writer that disables check enforcement.
  3. OggAudio — one compact segment covering all original audio pages, with the page-count delta to apply to every sequence number.

At read time there is no in-memory page index: the page containing a requested offset is found by a bounded backward scan (CRC-validated), then pages are walked forward with each header patched algebraically and payload bytes served by exact positioned reads. A one-page memo on the resolved file short-circuits the scan for sequential reads. A page walk that overruns the scanned audio bounds is a hard Malformed error — corrupt or misaligned data is refused, not served. Synthesized page sequence numbers wrap modulo 2³² (matching Ogg's u32 sequence field), so files whose audio pages have very high sequence numbers serve correctly rather than failing the read.

The forward page-walk reads (serve_ogg_window) flow through the shared backing read-ahead buffer (BackingReader, see backing read-ahead) just like PCM BackingAudio reads, so a sequential Ogg stream amortizes backing latency the same way. The read-ahead cache holds raw backing bytes keyed by absolute offset, so it is orthogonal to header patching: the algebraic CRC/sequence rewrite happens on the bytes after they are read, and the cache never sees a patched page. (The backward find_page_start scan and its CRC check stay on the raw fd — they are short, non-sequential probes that the forward-streaming window would not help.)

CRC patching: the linear-CRC trick

This is the neatest thing in the Ogg path. Every Ogg page carries a CRC-32 over its entire contents — header and payload, with the 4-byte CRC field treated as zero during the computation (musefs-format/src/ogg/crc.rs). Renumbering shifts every audio page's sequence number by Δ, which changes 4 header bytes (offsets 18..22). Naively, repairing the CRC means re-checksumming the whole page — including the up-to-64 KB payload that musefs has gone out of its way never to pull into memory.

It doesn't have to. The Ogg CRC uses init 0, no input/output reflection, and no final XOR, which makes it linear over GF(2): for two equal-length messages, crc32(A ⊕ B) == crc32(A) ⊕ crc32(B). Take A = the original page and B = a delta page the same length as the original but all zeros except bytes 18..22, which hold old_seq ⊕ new_seq. Then A ⊕ B is exactly the renumbered page, so:

new_crc = old_crc ⊕ crc32(DELTA)

and the payload — identical in A and A ⊕ B — cancels out entirely. The patched CRC depends only on the old CRC (already in the header) and the 4-byte sequence delta. The payload is never read.

Computing crc32(DELTA) also avoids walking the page. The 18 leading zero bytes leave the running CRC at 0 (TABLE[0] = 0, so each step is a no-op), so the computation starts directly from the 4-byte seq delta, then only has to "advance the CRC over" the trailing zeros (the rest of the header plus the whole payload length, read straight from the segment table). That advance is crc_shift_zeros — the CRC-32 of appending n zero bytes. Appending one zero byte is a fixed linear map on the 32-bit CRC state, so appending n of them is that 32×32 GF(2) matrix raised to the n-th power by repeated squaring: O(log n), independent of page size. Small, typical pages take a cheaper per-byte loop; only a huge single packet laced into max-size pages crosses the matrix threshold.

The net effect is that patch_page_header_algebraic (musefs-format/src/ogg/page.rs) repairs each served audio page's header from just its 27 + seg_count header bytes, in work bounded independent of payload size — and the audio payload stays untouched on disk, spliced in verbatim by positioned reads. That is what lets the Ogg invariant ("renumbering patches, never recopies") hold at serve time without a per-page in-memory index.

Quirks & invariants

  • Page and header sizes are bounded at parse and serve time (MAX_OGG_PAGE_BYTES, MAX_OGG_HEADER_BYTES in musefs-core/src/ogg_index.rs); a crafted file cannot force unbounded allocation. The ogg, ogg_page, b64, and vorbiscomment fuzz targets hammer these paths.
  • The incremental base64 encoder is windowed by output offset: any byte range of the encoded form can be produced from the corresponding slice of raw image bytes (musefs-format/src/ogg/b64.rs).
  • The serve path's determinism does not depend on the memo: a content change rebuilds the resolved file and starts with a fresh, empty memo.

WAV

How musefs scans and synthesizes RIFF/WAVE files (.wav). WAV has no single native tag standard, so musefs writes metadata twice: a broad-compatibility LIST/INFO chunk and a full-fidelity embedded id3 chunk. For the segment model these layouts plug into, see the segment model. The ID3v2 tag inside the id3 chunk is built by the same code as MP3's — MP3's round-trip and lossy-edge rules apply to it wholesale.

What round-trips

  • All text tags, via the embedded id3 chunk (full ID3v2.4, exactly as for MP3: canonical frames, TXXX extension slot, frame-id passthrough).
  • The INFO subset, twice. Seven canonical keys also get a native LIST/INFO subchunk for ID3-unaware readers: titleINAM, artistIART, albumIPRD, dateICRD, genreIGNR, commentICMT, tracknumberITRK.
  • Binary ID3 frames and promoted tags (POPMrating/playcount, MusicBrainz UFIDmusicbrainz_trackid, opaque PRIV/GEOB/… byte-exact) — classification identical to MP3, only the chunk extraction differs.
  • Embedded pictures: APIC frames inside the id3 chunk, MIME + picture type + description preserved, image bytes streamed.
  • Structural chunks: fmt (required) and fact (when present) are preserved from the original front.

At scan time, tags are merged per field from both surfaces with id3 taking precedence and INFO filling gaps; only chunk headers are walked — the data payload is never read.

Lossy edges

  • Non-structural chunks are dropped. The synthesized front carries only fmt , fact, the new LIST/INFO, and the new id3 chunk: cue points (cue ), broadcast-wave metadata (bext), sampler loops (smpl), and any other chunk from the original front are not reproduced.
  • The INFO chunk carries only the seven-field vocabulary above; readers that understand only INFO see just those fields. Everything still rides in the id3 chunk.
  • All of MP3's ID3 lossy edges apply to the id3 chunk: ID3v2.4-only output, placeholder-language COMM/USLT reset to XXX, POPM owner dropped, ID3v1 ignored, the OOM-guard skips (the authoritative list lives in MP3's lossy edges).
  • Tags trailing a very large data payload are not seen. When the data payload pushes any LIST/INFO or id3 chunk beyond the scan probe ceiling (64 MiB), the file is still ingested — the data chunk header gives the audio bounds without reading the payload — but those trailing tag chunks are not read at scan time. Front-positioned metadata is unaffected.

How synthesis works

wav::synthesize_layout (musefs-format/src/wav.rs) regenerates the entire RIFF front, then serves the untouched payload:

 offset 0
 ┌──────────────────────────────────────────────┐ ┐
 │ █ RIFF/WAVE framing                  (Inline) │ │
 │ █ fmt  (+ fact), preserved           (Inline) │ │ regenerated
 │ █ LIST/INFO chunk (7-field subset)   (Inline) │ │ RIFF front
 │ █ id3  chunk: ID3v2.4 text frames    (Inline) │ │ (metadata
 │ █   frame header + ▒ opaque body  (BinaryTag) │ │  written
 │ █   APIC framing + ▒ image bytes   (ArtImage) │ │  twice)
 ├──────────────────────────────────────────────┤ ┘
 │ ░ data chunk payload, verbatim (BackingAudio) │
 └──────────────────────────────────────────────┘
 EOF     █ inline-generated   ▒ DB-streamed   ░ untouched backing
  1. InlineRIFF/WAVE framing, the preserved fmt (and fact) chunks, the rebuilt LIST/INFO chunk, and the embedded id3 chunk's text frames. Every chunk length is known up front, so the RIFF size and each chunk size field are byte-exact — no placeholder sizes.
  2. Inside the id3 chunk: APIC framing inline with ArtImage segments streaming image bytes, and BinaryTag segments streaming opaque ID3 frame bodies, exactly as in MP3 synthesis.
  3. BackingAudio — the original data chunk payload, served verbatim by positioned reads.

RIFF form-size enforcement

Every RIFF/WAVE file declares a form size at bytes 4..8 (riff_size). The form covers bytes 8 through 8 + riff_size and must encompass all top-level chunks (fmt , data, LIST, id3 , …). musefs enforces this at parse time:

  • riff_wave_start parses the RIFF size and returns form_end = 8 + riff_size.
  • locate_audio and locate_audio_at_ceiling reject any file where form_end exceeds the physical file or where the data chunk payload extends past form_end.
  • Streaming or concatenated WAVs that write riff_size = 0 or 0xFFFFFFFF are rejected, but only incidentally: there is no explicit sentinel check. riff_size = 0 yields form_end = 8, which is smaller than any file carrying a data payload, and 0xFFFFFFFF yields a form_end larger than any real file — both fall foul of the bounds checks above. Detecting and honouring those sentinels explicitly is a deferred follow-up.

Quirks & invariants

  • A file must have both a fmt chunk and a data chunk to scan; the declared data size must lie within the file.
  • The ID3-in-WAV path inherits MP3's allocation-bomb guard (id3v2_alloc_safe): a crafted id3 chunk cannot OOM the scanner — this exact vector was found by the wav fuzz target.
  • Byte-identical audio and front re-parseability are asserted by musefs-format/tests/proptest_wav.rs and the mutagen interop suite (musefs-core/tests/interop_emit.rs).

Integrations

External tools write tags and art into the musefs SQLite store; a live mount reflects their edits without copying audio. Each integration has its own page.

  • beets — the musefs beets plugin
  • Picard — the MusicBrainz Picard plugin
  • Lidarr — Custom Script integration
  • systemd — running musefs as a user/system service
  • python-musefs — the shared store-contract library behind the plugins

The plugin packages have their own changelog at contrib/CHANGELOG.md.

beets-musefs

A beets plugin that syncs your beets metadata (tags + cover art) into a musefs SQLite store, so a live musefs mount shows a re-tagged view of your library without rewriting any audio.

How it fits together

  • The plugin owns the tags (and cover art, when beets has it) of each track, keyed by the file's canonical real path.
  • The structural columns (audio offsets, size, mtime) can only come from musefs probing the file, so the plugin runs musefs scan for you (via the bin config) before syncing — it never tries to compute those itself.
  • beet musefs scans the library and then syncs; the import/write hooks scan just the touched file and then sync. musefs's auto-refresh shows changes live — no remount, and no separate scan step.

Install

pip install beets-musefs

This pulls in the shared python-musefs runtime library from PyPI automatically — both packages are published, so no working-tree install is needed.

Use via pluginpath (no package install)

The plugin itself doesn't need to be installed — point beets at the plugin's beetsplug directory and it loads at runtime. It still needs the shared python-musefs runtime library importable, so install that first:

pip install python-musefs

beets adds pluginpath entries directly to the beetsplug package path, so it must be the beetsplug dir itself (not its parent). In your beets config.yaml:

pluginpath: /path/to/musefs/contrib/beets/beetsplug
plugins: musefs
musefs:
  db: ~/musefs.db          # path to the musefs SQLite store (required)
  bin: musefs              # musefs executable for auto-scan; use a full path if
                           # not on $PATH, e.g. /path/to/musefs/target/release/musefs
  # autoscan: yes          # default; runs `musefs scan` for you. Set `no` to
  #                        # manage scanning yourself (hooks then best-effort).
  # fields:                # optional: map extra beets fields to musefs keys
  #   comments: comment

Development install (from a checkout)

To hack on the plugin or run the test suite against your working tree, install both packages editable from the repo so imports resolve to the local source:

pip install -e contrib/python-musefs       # shared library
pip install -e "contrib/beets[test]"       # plugin + test deps

Workflow (test drive)

# Sync beets metadata into the store. Auto-scans the library first (creating the
# DB if needed) — no separate `musefs scan` step.
beet musefs                      # everything
beet musefs albumartist:"Boards of Canada"   # a subset (scans just those files)
beet musefs -n                   # dry run: report counts, write nothing
beet musefs --revalidate         # also prune rows whose backing file is gone

# Mount the re-tagged view.
musefs mount ~/mnt --db ~/musefs.db \
    --template '$albumartist/$album/$tracknumber - $title'

# ...or mirror your beets library layout exactly, via the computed beets_path tag.
musefs mount ~/mnt --db ~/musefs.db --template '$!{beets_path}'

Imports and tag write-backs auto-sync via event hooks: beet import and beet modify -w … record the touched items and reconcile them once the command finishes — when each file's path is final (beets has no move event, and a write fires before its move). The reconcile scans the new path and writes its tags, but it never prunes — pruning is a deliberate act (see below). A move therefore leaves the old path's row behind until you run beet musefs --revalidate. A metadata-only beet modify (no -w) doesn't fire a hook — re-run beet musefs. With autoscan: no, run musefs scan yourself first; the hooks then skip gracefully if the DB is missing.

Never writing to your backing audio files

If your backing files must stay byte-for-byte untouched — you're seeding them as a torrent, the library is immutable, or you simply want beets to drive the musefs view without ever rewriting a tag — configure beets to never write to disk:

import:
    copy: no
    move: no
    write: no

write: no is enough on its own: every stock beets plugin gates its file writes on import.write. The musefs plugin reads canonical metadata from the beets database, not from the files, and musefs scan ingests/synthesizes embedded art itself — so write: no loses nothing in the musefs view.

A few plugins ignore that gate or are redundant in this mode:

  • scrub — deletes all tags from files directly via mutagen, ignoring import.write; its auto-import hook would wipe tags from your backing files. Don't enable it.
  • embedart — embeds cover art into the audio files. Redundant: musefs already presents embedded art in the virtual files (scan ingestion plus the plugin's overlay of the album's artpath).
  • zero — only acts during a file write, so it is inert with write: no (nothing to do, but nothing to worry about either).

Notes

  • Field coverage: every tag beets writes to a file (its _media_tag_fields) is synced — ReplayGain, MusicBrainz IDs, comment, lyrics, grouping, isrc, multi-valued artists, and any custom field — under canonical musefs keys. Read-only file facts (bitrate, length, …) are never written as tags.
  • Merge, not replace: beets' values win for the fields it manages; any other tag already embedded in the file is preserved in the view.
  • Deletions stick: the plugin records the keys it manages per track in a musefs_managed beets flexattr (stored in the beets DB only — never in your audio files or the musefs store). Remove a tag in beets and it is removed from the view and stays gone across re-scans.
  • --restore-backing (or restore_backing: yes): when you remove a tag in beets, let the file's original embedded value reappear instead of disappearing.
  • Caveat: sticky deletion relies on autoscan: yes (the default), which re-derives the file's embedded tags before each sync. With autoscan: no, a deletion only takes effect after your next manual musefs scan.
  • Cover art: taken from the album's artpath (beets' external cover file). beets art wins when present; otherwise any art musefs scan ingested from embedded pictures is preserved.
  • Computed path (beets_path): each sync also writes a beets_path text tag holding the track's beets library-relative path (from your paths: config, via item.destination), with the file extension removed — musefs re-appends it. Mount with --template '$!{beets_path}' (the $!{} path field keeps / as directory separators) to mirror your beets layout, including layouts musefs's own template engine can't express. Set write_path: no in the musefs: config to skip it. Do not add an extension in a template that consumes beets_path. See the computed-tag workflow in the architecture overview.
  • Pruning is a deliberate act. The plugin never prunes on its own. Pruning track rows whose backing file is gone from disk (renames/moves/deletes) is owned entirely by musefs scan --revalidate, reachable from beets as beet musefs --revalidate (which forwards the flag to the auto-scan). Plain beet musefs and the passive end-of-command reconcile (beet import / beet modify -w) only sync, so a transient backing-storage loss — an unmounted network share, an offline drive, a momentary realpath divergence — can never mass-delete plugin metadata. Run beet musefs --revalidate (or musefs scan --revalidate) while the library is available to clear stale rows left by a move or an on-disk delete.
  • Removals are not auto-pruned. beet remove / beet remove -d does not prune the store; run beet musefs --revalidate afterwards to drop the rows whose backing file is now gone. A bare beet remove (which keeps the file on disk) leaves a servable row in place even then — musefs can still serve those bytes.
  • Orphaned art: replacing art can orphan old blobs; musefs scan --revalidate garbage-collects them.
  • Schema version: the plugin refuses to run if the DB's user_version differs from the version it targets — rebuild after upgrading musefs.

Tests

The tests live under tests/ and use a local virtualenv with beets + pytest.

cd contrib/beets
uv venv                                   # create .venv (once)
source .venv/bin/activate
uv pip install -e ../python-musefs        # shared library (editable, from the working tree)
uv pip install -r requirements.txt        # beets + pytest

python -m pytest                          # unit + integration (no Rust binary)
python -m pytest -m musefs_bin            # path-matching gate vs the real `musefs` binary
python -m pytest -m e2e                   # full beets -> mount -> playback end-to-end

The musefs_bin gate shells out to the real musefs binary, so build it first from the repo root (cargo build) and run it against a fresh build. The e2e tier additionally needs ffmpeg and /dev/fuse + fusermount3: it generates audio, imports it with beets, retags, syncs, mounts via FUSE, and verifies the mount's tags and byte-identical audio (including a move-reconcile case). Both tiers are deselected from the default run and skip cleanly if their tools are absent.

musefs-picard

A MusicBrainz Picard plugin that syncs your Picard metadata (tags + front cover) into a musefs SQLite store, so a live musefs mount shows a re-tagged view of your library without rewriting any audio.

How it fits together

Picard has no way to redirect its Save to a database, so this plugin adds a context-menu action instead: match/edit as usual, then right-click your selection → "Sync to musefs" instead of pressing Save. The plugin:

  1. runs musefs scan on each selected file to create/refresh its track row and structural columns (the offsets only musefs can compute), then
  2. writes Picard's tags and front cover into the store, keyed by the file's canonical real path.

musefs's auto-refresh surfaces the change at the mount with no remount. The audio file is never saved by Picard.

Install (local / development)

Picard loads "folder plugins" from its plugins directory. Copy (or symlink) the musefs/ folder there:

  • Linux: ~/.config/MusicBrainz/Picard/plugins/
  • macOS: ~/Library/Preferences/MusicBrainz/Picard/plugins/
  • Windows: %APPDATA%\MusicBrainz\Picard\plugins\
cp -r contrib/picard/musefs ~/.config/MusicBrainz/Picard/plugins/

The musefs/_common/ subfolder is the vendored python-musefs library, copied in so the plugin folder is self-contained (Picard does not install plugin dependencies). It is committed; you don't need to do anything to use it. If you change the shared library, re-run python contrib/python-musefs/vendor_to_picard.py and commit the refreshed copy — CI's drift guard enforces it.

Then enable musefs sync in Options → Plugins, and configure it in Options → musefs sync:

  • musefs DB path — path to the musefs SQLite store (required).
  • musefs binary — the musefs executable (PATH name or full path), used to auto-create rows. Default musefs.
  • Run musefs scan before syncing — autoscan toggle (default on). With it off, run musefs scan yourself first or the sync errors on a missing DB.
  • Extra field map — optional key=value list mapping additional or custom Picard tag names to musefs store keys (applied verbatim, last-wins, on top of the automatic full-tag-set sync), e.g. mymood=mood.

MUSEFS_DB and MUSEFS_BIN environment variables override the DB/binary settings (handy for testing).

Workflow

  1. musefs mount ~/mnt --db ~/musefs.db --template '$albumartist/$album/$tracknumber - $title'
  2. In Picard, match/cluster an album as usual.
  3. Right-click the album/files → Sync to musefs.
  4. Browse ~/mnt — the files show Picard's tags and cover, audio byte-identical.

Notes

  • Front cover only: the first front-cover image Picard holds is synced. Picard art wins when present; otherwise any art musefs scan ingested from the file's embedded picture is preserved. Re-syncing a file with no Picard art lets the embedded picture re-seed when autoscan is on (musefs scan re-reads the file); with autoscan off, existing art is left untouched.
  • Tags are fully replaced with Picard's view on every sync.
  • Field coverage: every populated Picard tag is synced under its canonical musefs (on-disk) key — all MusicBrainz IDs, sort and performer/credit fields, movement, totals, and any custom field; multi-values expand and per-role performers fold to Name (Role). Picard's hidden ~ internals (length, rating, …) are never written.
  • Orphaned art: replacing art can orphan old blobs; musefs scan --revalidate garbage-collects them.
  • Schema version: the plugin refuses to run if the DB's user_version differs from the version it targets — rebuild the store after upgrading musefs.

Tests

cd contrib/picard
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

python -m pytest                 # unit + integration (no Picard, no Rust binary)
python -m pytest -m musefs_bin   # path-matching gate vs the real `musefs` binary

The musefs_bin gate shells out to the real musefs binary, so build it first from the repo root (cargo build). It is deselected from the default run and skips cleanly if the binary is absent.

Real-Picard (pytest-qt) tests

The adapter (musefs/__init__.py) is exercised against a real Picard + PyQt5 install, headless. Picard isn't a clean pip wheel, so use the distro package and bind a uv venv to the system Python it targets:

sudo apt-get install -y picard                              # Picard at /usr/lib/picard + system PyQt5
uv venv --system-site-packages --python "$(which python3)"  # match apt Picard's C-ext interpreter
uv pip install -e 'contrib/picard[test]'                    # test extra includes pytest-qt
PYTHONPATH=/usr/lib/picard QT_QPA_PLATFORM=offscreen \
  .venv/bin/python -m pytest contrib/picard/tests -v

These tests importorskip("picard"), so on a machine without Picard they skip cleanly and only the Qt-free _core tests run.

Manual smoke test (full GUI round-trip)

  1. cargo build and create a store: musefs scan /path/to/album --db /tmp/m.db.
  2. Copy the plugin into Picard's plugins dir; enable it; set DB path /tmp/m.db.
  3. Load the album in Picard, change a tag (e.g. title), add a front cover.
  4. Right-click → Sync to musefs; confirm the status bar / log reports synced=N.
  5. musefs mount /tmp/mnt --db /tmp/m.db and verify the mounted file carries the new tag and cover, with byte-identical audio.

lidarr-musefs

A Lidarr integration that syncs Lidarr's metadata into a musefs SQLite store, so a live musefs mount shows a re-tagged view of your library without Lidarr ever copying, moving, or rewriting backing audio bytes.

Lidarr stays the downloader, matcher, and metadata source; its destination tree becomes a placeholder of symlinks that exists only so Lidarr can track files. Point Navidrome, Plex, Jellyfin, or other consumers at the musefs mount instead.

How it fits together

The package installs two console scripts that plug into Lidarr's hooks:

  • musefs-lidarr-import (Import Using Script) — replaces Lidarr's own copy/move when it imports a download: it creates the destination entry as a symlink (or hardlink) to the downloaded file and fails closed — it never falls back to copying bytes.
  • musefs-lidarr-sync (Custom Script notification) — fires after an import or rename: it queries Lidarr's API for the affected tracks' metadata (title, artist/albumartist, album, track/disc numbers, release date, MusicBrainz ids, genres) plus each album's cover art, runs musefs scan on the files to create/refresh their track rows (the structural columns only musefs can compute), and writes the tags and art into the store. Transient API failures (network errors, timeouts, 5xx) are retried with backoff so a blip or a Lidarr restart mid-import doesn't silently drop the sync.

musefs's auto-refresh surfaces each sync at the mount with no remount. Both scripts build on the shared python-musefs store-contract library.

Install

Install the package — with its python-musefs dependency — into the environment Lidarr uses to run custom scripts, so both scripts are on Lidarr's PATH:

pip install lidarr-musefs

This pulls in the shared python-musefs dependency from PyPI automatically. To install from a checkout instead (e.g. for development), install both editable so imports resolve to the local source:

pip install -e contrib/python-musefs
pip install -e contrib/lidarr

You also need the musefs binary reachable by the sync script (see MUSEFS_BIN below) and a musefs store/mount of your own — see the main README.

Required Lidarr settings

  • Settings -> Media Management -> Import Using Script: enabled.
  • Import Script Path: musefs-lidarr-import.
  • Metadata Provider -> Write Audio Tags: Never.
  • File Date: None.
  • Linux permission management: disabled.

Do not rely on Lidarr's built-in "Use Hardlinks instead of Copy" for this workflow. Lidarr uses a hardlink-or-copy transfer mode internally, so a hardlink failure can copy bytes. musefs-lidarr-import creates the destination entry itself and fails closed.

musefs-lidarr-sync --doctor verifies these settings over the API (see Doctor).

Lidarr Custom Script

Configure a Custom Script notification (Settings -> Connect):

  • On Release Import: enabled.
  • On Rename: enabled.
  • On Album Delete: enabled.
  • On Artist Delete: enabled.
  • Path: musefs-lidarr-sync.

Test events exit successfully without touching files or the database. TrackRetag events are skipped with a warning because they fire after Lidarr writes tags.

Environment

Both scripts are configured through environment variables, set in the environment Lidarr launches scripts with.

Import script:

MUSEFS_LIDARR_LINK_MODE=symlink   # default; use hardlink only if symlinks are unsuitable

Sync script:

MUSEFS_DB=/path/to/musefs.db      # the musefs SQLite store (required)
MUSEFS_BIN=musefs                 # musefs executable; full path if not on PATH
MUSEFS_LIDARR_URL=http://localhost:8686
MUSEFS_LIDARR_API_KEY=your-api-key
MUSEFS_LIDARR_AUTOSCAN=1          # default; runs `musefs scan` before each sync

API keys are redacted from logs and errors.

Manual backfill

To sync every track file Lidarr already knows about (e.g. on first setup):

musefs-lidarr-sync --all

Manual backfill requires MUSEFS_LIDARR_URL and MUSEFS_LIDARR_API_KEY. It runs the doctor preflight first (skip with --skip-lidarr-preflight), then queries all Lidarr artists and syncs their known track files into the musefs DB.

Migrating an existing Lidarr library

The forward path above (new import → import script symlink → sync) works cleanly on a fresh import. Re-homing a pre-existing Lidarr library onto the musefs symlink tree runs into several Lidarr behaviors; this is the working order (observed on Lidarr v1, lsio image). None of it is a musefs bug — these are Lidarr quirks an integrator only hits here.

  1. Reassign the artists to the new (musefs) root folder.
  2. Clear the stale trackfile records before re-importing. If the artists' existing trackfiles still reference the old root, re-import fails with NotParentException (/old/root/... is not a child of /new/root) — Lidarr's RemoveExistingTrackFiles chokes computing the relative path. Delete the stale trackfile records first.
    • The empty-root deletion guard: Lidarr blocks trackfile deletion while the new root folder is empty ("Artist's root folder is empty", a mass-deletion safety guard) — a chicken-and-egg with the symlinks not existing yet. Drop a placeholder file in the root until the first symlinks land, then remove it.
    • Batch the bulk delete: DELETE /api/v1/trackfile/bulk returns 500 on large batches (~200 ids); send ~25 ids per call.
  3. Re-import. The import script creates the destination symlinks.
  4. Backfill the store: musefs-lidarr-sync --all.

Point musefs scan at the backing directory, not the symlink tree. The default (--follow-symlinks off) is exactly right here: the store should key off the real files, while Lidarr's symlink tree is just its own tracking view.

Doctor

To verify your Lidarr settings are musefs-safe:

musefs-lidarr-sync --doctor

The doctor checks Lidarr's API for:

  • writeAudioTags = no
  • fileDate = none
  • setPermissionsLinux = false

If MUSEFS_LIDARR_URL and MUSEFS_LIDARR_API_KEY are not configured, doctor and sync fail because the integration cannot verify safe settings or build complete per-track metadata.

--doctor is a runtime / post-deploy check, not an offline one: it makes a live Lidarr API call, so it needs MUSEFS_LIDARR_URL + MUSEFS_LIDARR_API_KEY and a reachable Lidarr instance. Run it after deployment, not at container build time — offline it fails with connection-refused even when the toolchain itself is wired up correctly. There is no offline "are the binary and plugins installed/wired" check; to confirm installation at build time, test that the musefs-lidarr-import / musefs-lidarr-sync scripts and the musefs binary are importable/on PATH.

Smoke test

  1. Build and install musefs.
  2. Install python-musefs and lidarr-musefs into the environment Lidarr uses for custom scripts.
  3. Configure Import Using Script and Custom Script as described above.
  4. Import a small album.
  5. Confirm Lidarr's destination entry is a symlink by default.
  6. Run musefs mount /tmp/mnt --db "$MUSEFS_DB".
  7. Confirm the mount shows Lidarr metadata.
  8. Confirm the source file's bytes and mtime did not change.

Notes

  • Tags are fully replaced with Lidarr's view on every sync (scanner-written binary tags always survive — see the external-writer contract).
  • Cover art: each album's Lidarr cover is fetched and written as the front cover, replacing the track's art rows on every sync (an over-cap or unreachable cover is skipped, leaving any scanner-ingested art in place).
  • Schema version: the sync refuses to run if the DB's user_version differs from the version it targets — rebuild the store after upgrading musefs.
  • Deletions prune by MusicBrainz id, scoped to rows this plugin owns. On an Album/Artist delete, the sync removes the matching store rows (musicbrainz_albumid / musicbrainz_artistid) so the mount stops presenting them. The backing audio is never touched — pruning only drops the store rows, not the files Lidarr keeps in the backing directory. A delete event for a release with no MusicBrainz id cannot be mapped and is logged and skipped.
  • Ownership marker. Every track the sync writes is stamped with a musefs_lidarr_managed=1 tag, and a delete only removes rows carrying that marker. Without it, a musicbrainz_albumid the scanner seeded from a file's own native tags is indistinguishable from one Lidarr wrote, so an unrelated Lidarr delete could drop an unmanaged track's metadata. The marker is a normal text tag, so it does appear in served files (e.g. as a MUSEFS_LIDARR_MANAGED Vorbis comment / a TXXX frame / an iTunes freeform atom). A track imported under an older plugin version (before the marker existed) is treated as unmanaged and is left in place on delete — re-sync it to stamp the marker.
  • CI coverage: a fast smoke (real Lidarr exec path + mocked API) gates PRs, and a full real-instance download-client import e2e gates the Python releases — see the Python plugins guide.

Tests

cd contrib/lidarr
python -m venv .venv && source .venv/bin/activate
pip install -e ../python-musefs    # shared library (editable, from the working tree)
pip install -e ".[test]"

python -m pytest                   # unit + integration (no Rust binary)
python -m pytest -m musefs_bin     # path-matching gate vs the real `musefs` binary

The musefs_bin gate shells out to the real musefs binary, so build it first from the repo root (cargo build). It is deselected from the default run and skips cleanly if the binary is absent.

Running musefs as a systemd user service

These units run musefs on the host (the recommended deployment) under your own user account — no root, no CAP_SYS_ADMIN.

Files

  • musefs.service — the mount daemon (musefs mount); blocks until stopped.
  • musefs-scan.service + musefs-scan.timer — optional periodic musefs scan --revalidate.
  • musefs.conf.example — every MUSEFS_* setting, commented with defaults.

Install

mkdir -p ~/.config/systemd/user ~/.config/musefs
cp musefs.service musefs-scan.service musefs-scan.timer ~/.config/systemd/user/
cp musefs.conf.example ~/.config/musefs/musefs.conf
$EDITOR ~/.config/musefs/musefs.conf   # set MUSEFS_MOUNTPOINT and MUSEFS_DB
systemctl --user daemon-reload
systemctl --user enable --now musefs.service

Enable the periodic re-scan too (edit the library path in musefs-scan.service first):

systemctl --user enable --now musefs-scan.timer

Hardening

These units run under the --user manager, which constrains what systemd sandboxing is possible. The two units differ sharply:

  • musefs-scan.service is fully sandboxed. The scanner creates no FUSE mount, so it takes a strong sandbox (ProtectSystem=true, SystemCallFilter=, plus namespace and seccomp restrictions). ProtectSystem=true (not strict) keeps system directories read-only while leaving your library and MUSEFS_DB writable, so a custom DB location needs no ReadWritePaths= edit. A few directives that require capability-bounding-set drops (CapabilityBoundingSet=, PrivateDevices, ProtectKernelModules, ProtectKernelLogs, ProtectClock) are omitted: the unprivileged user manager cannot apply them, and the process is already capability-less, so nothing is lost. Inspect with systemd-analyze --user security musefs-scan.service.

  • musefs.service is intentionally not sandboxed, and cannot be. musefs mounts via the setuid fusermount3 helper. NoNewPrivileges=true — and nearly every other systemd hardening directive, since installing a seccomp filter for an unprivileged process forces the kernel no_new_privs flag — disables the setuid escalation, and the mount then fails with fusermount3: mount failed: Operation not permitted. The unit comment explains this in full.

Notes

  • The store must exist before the mount starts. musefs mount never creates the DB — it requires a populated store and exits non-zero otherwise, so a mount unit that starts before anything has scanned hard-fails and (with Restart=on-failure) crash-loops. Seed the store with an initial musefs scan before enable --now musefs.service. If you generate the store from another unit, order this one after it with a drop-in (systemctl --user edit musefs):

    [Unit]
    After=musefs-initial-scan.service
    Requires=musefs-initial-scan.service
    

    (The musefs-scan.timer is a periodic re-scan, not the initial seed.)

  • Binary location. The --user manager does not inherit your shell's PATH. The units set PATH for a cargo install binary in ~/.cargo/bin; if musefs is elsewhere, edit the Environment=PATH= line (or make ExecStart an absolute path).

  • %h vs ~. Unit files expand %h to your home directory; the musefs.conf EnvironmentFile does not expand %h or ~ — use absolute paths there, and never paste ~/... into a unit directive (it is taken literally).

  • Settings. musefs.conf.example is a commented example of the common MUSEFS_* mount/scan variables (every scalar mount/scan flag has a MUSEFS_* form — uppercase the long flag, dashes to underscores). Explicit flags override env vars; --fallback and scan targets are command-line only (set them in ExecStart).

  • Inline overrides. Prefer systemctl --user edit musefs to add Environment= lines in a drop-in; it survives reinstalls.

  • Headless servers. A --user timer only fires while your user manager runs. For a daily scan when you are not logged in: loginctl enable-linger <user>.

  • Logs. journalctl --user -u musefs -f.

python-musefs

The shared store-contract library behind the beets, Picard, and Lidarr musefs plugins. It is the single source of truth for how a plugin writes the musefs SQLite store: the schema-version check, the tags / art / track_art writes, sha256 art content-addressing, the realpath_key path normalization, the musefs scan shell-out (run_scan), and the per-file sync write-loop (Record / sync_files).

Field mapping stays in each plugin — beets expands multi-valued genres/composers into one tag each, Picard takes the first value — so this library deliberately does not own it.

Writing a plugin

A plugin turns host metadata (a beets item, a Picard track, a Lidarr release) into musefs store writes. This library owns every store-touching step except the field mapping: you supply the per-file tag and art values, and it handles the schema check, the scan shell-out, content-addressing, and the write loop.

The write flow

The canonical order is connect → check_schema_version → run_scan → build Records → sync_files → commit → prune_missing. The caller owns the transaction — nothing here commits for you.

from musefs_common import (
    SCAN_TIMEOUT_SECONDS,
    ArtImage,
    Record,
    check_schema_version,
    connect,
    prune_missing,
    realpath_key,
    run_scan,
    sync_files,
)


def sync(db_path, files, *, musefs_bin="musefs"):
    # `run_scan` creates the DB if absent and fills the structural columns a
    # plugin cannot compute (format, audio offset/length, backing size/mtime).
    # On a brand-new store it must precede `connect`, which has nothing to open
    # until the scan has created the file.
    run_scan(musefs_bin, db_path, files, timeout=SCAN_TIMEOUT_SECONDS)

    conn = connect(db_path)
    try:
        check_schema_version(conn)  # raises SchemaMismatch on a version skew

        records = [
            Record(
                key=realpath_key(path),  # MUST equal the scanned row's backing_path
                pairs=[("artist", artist), ("title", title)],
                art=[ArtImage(data=cover, mime="image/jpeg")] if cover else None,
            )
            for path, artist, title, cover in host_metadata(files)
        ]

        stats = sync_files(conn, records)  # full-replace of plugin text tags
        conn.commit()  # the caller commits

        prune_missing(conn)  # drop rows whose backing file vanished
        conn.commit()
        return stats
    finally:
        conn.close()

For a dry run, pass dry_run=True to sync_files and conn.rollback() instead of committing — SyncStats still reports what would change.

run_scan raises ScanError (kind{"not_found", "timeout", "failed"}) and check_schema_version raises SchemaMismatch; a host adapter formats its own user-facing message from the exception attributes (see the beets plugin's _scan_user_error).

The Record shape

One Record per file is your primary output. Its fields:

fieldtypemeaning
keystrThe file's identity in the store. Must be realpath_key(path) — the canonicalized absolute path the scanner stored as backing_path. A key that matches no scanned row is silently counted in SyncStats.skipped, not written.
pairslist[tuple[str, str]]Ordered (tag_key, value) text tags. Duplicate keys are allowed and get contiguous ordinals (multi-valued tags).
artlist[ArtImage] | NoneEmbedded pictures, already resolved to bytes. None/[] leaves existing art untouched.
delete_keyslist[str] | NoneMerge mode only: keys to clear without rewriting (see below). Ignored in replace mode.

ArtImage(data, mime, picture_type=3, description="") is one picture: data is raw bytes, picture_type is the ID3/FLAC type (3 = front cover). Images larger than MAX_ART_BYTES are dropped and counted in SyncStats.skipped_art.

If every record lands in skipped, the keys and the scan target disagree — both must canonicalize the same way, so scan the real files (not a symlink farm) and build keys with realpath_key.

Merge vs. replace, and sticky deletes

sync_files(..., merge=False) (the default) replaces every plugin-owned text tag on each track: it clears all value_blob IS NULL rows and rewrites them from record.pairs. Scanner-written binary tags always survive.

sync_files(..., merge=True) merges: only the keys named in record.pairs and record.delete_keys are touched; other scan-seeded text tags stay. Use merge when your plugin owns a subset of the tags and must not clobber the rest. The store does not remember which keys you manage — you track your managed-key set out of band (the contract is explicit that the store is not the place for plugin state).

Merge-mode key matching is case-insensitive (lower(key) = lower(?)): Vorbis keys render case-insensitively, so a scan that seeds a tag in the file's native case (e.g. LABEL) is correctly replaced when your plugin canonicalizes to lowercase (label), rather than leaving the original row behind as a duplicate.

When the user removes a tag in the host, merge mode needs to delete the now-orphaned store row. The beets plugin solves this with an accumulating managed-key set (the musefs_managed pattern), worth copying:

  • Persist, per file, the set of keys you have ever written (beets uses a flexattr; any per-file host metadata works).
  • On each sync, delete_keys = previous_managed − keys_written_now, and the new persisted set is previous_managed ∪ keys_written_now.
  • A key you stop writing becomes a tombstone: it keeps getting deleted on every sync until you write it again. Persist the managed set only after the store commit succeeds, so a failed sync doesn't lose the record of what you owe.

See contrib/beets/beetsplug/_core.py (build_records / persist_managed) for the reference implementation.

Store invariants you must respect

The full external-writer contract is in the external-writer contract. The rules that bite plugin authors:

  • Write only tags, art, and track_art. The scanner owns the structural columns of tracks and all of structural_blocks; never compute them — run musefs scan (i.e. run_scan). CHECK constraints reject malformed structural shapes at commit, so you cannot persist them anyway.
  • Binary tags survive a sync. merge_tags / replace_tags scope their deletes to text rows (value_blob IS NULL), so the write loop never wipes scanner-written binary tags. You may write binary tags yourself too — a binary row carries its payload in value_blob and must leave value empty (the only CHECK on the row).
  • Content-address art through upsert_art (sha256 de-dup) rather than inserting art rows by hand; sync_files does this for you.
  • Art rows are immutable. A trigger rejects in-place updates of an art row's content columns (data, sha256, mime, byte_len, width, height). To change a track's art, insert a new content-addressed row via upsert_art and relink it via replace_track_art.
  • Path layout is just a tag. To drive a reorganized mount, write your computed relative path into a custom tag (e.g. beets_path) and mount with --template '$!{beets_path}'. musefs sanitizes each path segment, so a writer cannot inject traversal.

API reference

Everything in __all__, imported from the top-level musefs_common package.

Connection & schema

  • connect(db_path)sqlite3.Connection — open with a 5s busy timeout and foreign_keys = ON.
  • check_schema_version(conn) — raise SchemaMismatch unless the store's user_version equals EXPECTED_USER_VERSION.

Scanning

  • run_scan(binary, db_path, target, *, timeout=None) — shell out to musefs scan; target is one path or an iterable, all scanned under one process. Creates the DB if absent. Raises ScanError.

Building records

  • Record(key, pairs=[], art=None, delete_keys=None) — one file's sync inputs (see The Record shape).
  • ArtImage(data, mime, picture_type=3, description="") — one embedded picture.
  • realpath_key(path) — canonical path string matching the scanner's backing_path; accepts str/bytes, returns str.

Writing

  • sync_files(conn, records, *, dry_run=False, stats=None, merge=False)SyncStats — the write loop; caller owns the transaction. Pass stats to accumulate into a caller-seeded instance.
  • sync_one(conn, record, stats, *, dry_run=False, merge=False) — sync a single record into a caller-supplied SyncStats.
  • SyncStatssynced / skipped / art_linked / skipped_art / skipped_invalid counters, plus .summary(). A record whose tags or art violate a store CHECK constraint is rolled back and skipped (not raised), bumping skipped_invalid and appending (record.key, message) to the invalid list — one malformed record never aborts the batch.

Lower-level store helpers (called for you by sync_files; use directly only for a custom write loop)

  • track_id_for_path(conn, key) → track id or None.
  • merge_tags(conn, track_id, managed_pairs, delete_keys) — per-key replace of plugin-managed text tags, leaving unmanaged text rows intact.
  • replace_tags(conn, track_id, pairs) — replace all plugin-owned text tags.
  • upsert_art(conn, data, mime) → art id — content-address data by sha256, inserting only if new.
  • replace_track_art(conn, track_id, arts) — replace a track's track_art rows; arts is [(art_id, picture_type, description), …].
  • sniff_mime(data, path) — image mime from magic bytes, falling back to file extension.
  • prune_missing(conn, track_ids=None) → count — delete tracks whose backing file no longer exists (every track, or just track_ids).
  • delete_tracks(conn, track_ids) → count — unconditionally delete the given track rows (intent-based, unlike prune_missing's on-disk existence check); their tags and track_art rows cascade away.

Reading

  • track_ids_for_paths(conn, keys){key: id} — bulk backing_path → track id; keys with no matching row are omitted. Chunked under SQLite's parameter cap, so arbitrarily large lookups are safe (the bulk track_id_for_path).
  • track_ids_by_tag(conn, key, value)[id, …] — track ids whose plugin-owned text tag (key, value) matches (scanner-written binary tags never match); maps a source's "I deleted this album/artist" signal back to the rows it tagged.
  • tags_for_track(conn, track_id)[TagRow, …] ordered by key then ordinal, covering both plugin-owned text tags and scanner-written binary tags.
  • TagRow(key, value, value_blob) — one read-back tag row. Text tags have value_blob is None; binary tags have value == "" and value_blob bytes.

Constants

  • EXPECTED_USER_VERSION — schema user_version this library targets.
  • MAX_ART_BYTES — per-image art cap; larger images are skipped.
  • SCAN_TIMEOUT_SECONDS — default wall-clock cap for one run_scan.

Exceptions

  • SchemaMismatch(found) — schema-version skew; .found is the DB's version.
  • ScanError(kind, *, binary, target, …) — a musefs scan failure; .kind{"not_found", "timeout", "failed"}, with context attributes for messaging.

Consumers

  • beets depends on this package via pip (contrib/beets/pyproject.toml).

  • Picard cannot pip-install plugin dependencies, so the package is vendored into contrib/picard/musefs/_common/ by vendor_to_picard.py. After any change here, re-run:

    python contrib/python-musefs/vendor_to_picard.py
    

    The Picard test tests/test_vendor_sync.py fails if the committed copy drifts.

  • Lidarr depends on this package via pip (contrib/lidarr/pyproject.toml).

Schema coupling

musefs_common/schema.py (SCHEMA_SQL, USER_VERSION) is generated from the Rust migrations in musefs-db/src/schema.rs — do not edit it by hand. EXPECTED_USER_VERSION (in constants.py) derives from it. When the Rust schema bumps, regenerate and re-vendor:

MUSEFS_REGEN_SCHEMA_PY=1 cargo test -p musefs-db schema_py
python contrib/python-musefs/vendor_to_picard.py

A musefs-db unit test fails if the generated file drifts. This is all independent of the package's own __version__ (its release SemVer).

Tests

cd contrib/python-musefs
python -m venv .venv && source .venv/bin/activate
pip install -e ".[test]"
python -m pytest -v
ruff check . && ruff format --check .

Architecture overview

This is the technical reference for musefs internals: how a virtual file is assembled, how the workspace is layered, what the SQLite store guarantees, and how external edits become visible without a remount. For usage, see the User Guide; for the development workflow, see Contributing; for per-format behavior, see the format docs.

Design overview

musefs is a read-only passthrough FUSE filesystem with one cardinal invariant: original audio bytes are never copied or modified. A served file is not a transcoded or rewritten copy — it is assembled on the fly by splicing a freshly generated metadata region in front of positioned reads of the untouched backing file. The SQLite store is the source of truth for tags, art, and each file's audio byte range; the backing directory is the source of truth for the audio itself.

Crate layout

A strict layered Cargo workspace; dependencies point one way only:

musefs-db   ─┐                 SQLite store: schema/migrations, tracks/tags/art access
musefs-format┘← (db)           format byte-surgery: metadata synthesis + RegionLayout
        ↑
musefs-core ← (db, format)     orchestration: virtual tree, resolution, scanning, refresh
        ↑
musefs-fuse ← (core)           thin FUSE adapter (fuser)
        ↑
musefs-cli  ← (core, fuse, db) clap commands library (scan/mount logic)
musefs      ← (cli)            thin binary entrypoint; published as `musefs`

musefs-core is the integration layer — cross-cutting logic belongs there. musefs-fuse, musefs-cli, and the musefs binary crate are deliberately thin; the FUSE adapter's job is translating kernel requests into core calls (and dispatching blocking reads onto a worker pool with per-thread reusable buffers, so a slow backing read never stalls the FUSE dispatch thread).

The workspace also carries musefs-latencyfs, a dev/bench-only crate (publish = false): a latency-injecting passthrough FUSE filesystem used by the benchmarks harness to simulate slow backing stores. It is not part of the shipping dependency graph (core uses it only as a dev-dependency).

The serving model

The segment model

A synthesized virtual file is described by a RegionLayout (musefs-format/src/layout.rs): an ordered list of Segments whose lengths sum to the served file size. Six variants:

  • Inline(Vec<u8>) — generated framing/text bytes (an ID3v2 tag, FLAC metadata blocks, a RIFF front), fully materialized at resolve time.
  • ArtImage { art_id, len } — embedded cover art; only the length lives in the layout. Image bytes stream from the DB blob in chunks at read time and are never buffered whole. This invariant also holds for Ogg synthesis, where page CRCs are computed from page-bounded ArtSource windows (previously the documented exception).
  • BackingAudio { offset, len } — a run of the original file's audio frames, served by positioned reads (read_exact_at) against the backing file.
  • OggAudio { offset, len, seq_delta } — original Ogg audio pages served with each page's sequence number shifted by seq_delta and its CRC recomputed in place (a resized header changes the page count). The byte length is unchanged — renumbering patches, never recopies.
  • OggArtSlice { art_id, offset, len, base64, art_total } — a window of an embedded picture served lazily from the blob store; when base64, the window is base64-encoded incrementally at read time.
  • BinaryTag { payload_id, len } — an opaque binary tag payload (e.g. an ID3 PRIV frame body or a FLAC APPLICATION block body) streamed from the DB at read time.

read_at (musefs-core/src/reader.rs) serves a byte range by walking the segments and splicing: inline bytes are copied, art and binary-tag payloads are read from the DB in chunks, backing audio comes from positioned reads of the original file, and Ogg pages are renumbered and CRC-patched in flight. This is how the cardinal invariant holds end to end. Layouts that stream any payload from the DB by rowid — binary tags and art (ArtImage / OggArtSlice) — are flagged (RegionLayout::streams_db_rowid) so the reader wraps those reads in a single WAL snapshot with a content_version recheck. A concurrent retag (delete + reinsert reusing a freed rowid) cannot interleave bytes from two generations of a tag or splice the wrong image. Both the per-handle fast path and the stateless no-fh fallback apply the guard, and the fallback re-validates its freshly opened backing fd against the resolved stamp.

Backing read-ahead

Every backing read — BackingAudio splices and the serve_ogg_window page walk alike — flows through a single BackingReader::read_exact_at (musefs-core/src/readahead.rs). It caches raw backing-file bytes keyed by absolute backing offset in a per-handle adaptive window: a sequential miss reads one large pread (geometric growth up to a per-stream cap) instead of the ≤256 KiB FUSE chunk, so a high-latency backing client (NFS, remote) can pipeline the RPCs behind one syscall; a seek resets the window to the floor. All handles draw from one process-wide RAM budget (--read-ahead-budget-mib, default 64) with deadlock-free try_lock LRU eviction. Keying on the absolute backing offset (not the synthesized output) makes the cache retag-immune, and serving still flows through the per-read validate_opened_backing re-stat, so the cardinal audio-bytes invariant and freshness semantics are untouched. An optional Phase-2 background-prefetch layer (--read-ahead-prefetch) exists but is off by default — read amplification carries the whole win (see the backing read-ahead benchmarks).

How each format builds its layout differs enough to warrant its own document: FLAC, MP3, M4A, Ogg, WAV.

Mount modes

musefs_core::Mode selects one of two behaviors at mount time:

  • Synthesis (default) — the metadata region is generated from the DB and spliced ahead of the backing audio, as above. Resolve-time validation guards the stored audio bounds: if audio_offset + audio_length runs past the backing file's current length, the row no longer matches the file and the resolve fails with a controlled BackingChanged error.
  • StructureOnly — pure passthrough: the layout is a single whole-file BackingAudio segment, so the original bytes are served verbatim under the templated tree. Stored audio bounds are irrelevant (the whole file is served) and are not validated in this mode.

In StructureOnly mode, on kernels with FUSE passthrough (6.9+) and a daemon holding CAP_SYS_ADMIN (kernel-gated: run as root or setcap cap_sys_admin=ep the binary), each open registers the backing fd with the kernel and reads bypass the daemon entirely. The capability check is performed at mount time and its absence pre-announced; if registration fails at runtime anyway, passthrough is disabled for the rest of the session (later opens skip the doomed ioctl) and reads fall back to the daemon silently. Freshness for a passthrough handle is open-time-only — it is a plain POSIX fd onto the backing file. In Synthesis mode no single fd represents the spliced bytes, so passthrough never applies.

Synthetic telemetry namespace

When --expose-metrics is on, the root directory gains a synthetic .musefs-metrics/ entry backed by reserved inodes at u64::MAX - 1 (dir) and u64::MAX - 2 (file) — the same "top of the u64 space" trick the Spotlight marker uses, since InodeAllocator starts at 2 and only increments. The directory and file are disjoint from the macOS Spotlight marker at u64::MAX.

The metrics file is /proc-style: it advertises st_size == 0 and is served via FOPEN_DIRECT_IO, so readers must read to EOF rather than trusting the stated size. Content is rendered at open time from a snapshot of CoreTelemetry (header/size caches, read-ahead budget/charge, virtual-tree footprint, refresh health), FuseTelemetry (uptime, read/dir-handle gates, worker pool, passthrough state), and optional jemalloc/syscall counters (including read-ahead hit/miss) — see musefs-core/src/telemetry.rs for the full metric list. This namespace deliberately bypasses the virtual tree (VirtualTree) and the RegionLayout / segment model: it is injected into root-directory readdir and resolved by direct inode checks, so the cardinal audio path is untouched.

The store & external-writer contract

The SQLite store

musefs-db/src/schema.rs defines the schema as an ordered list of migrations (MIGRATIONS: the MIGRATION_V1 baseline plus MIGRATION_V2, which adds the scanner-owned fingerprint/content_hash columns); user_version records the schema version (2). The store is the interface external tools write to — the beets and Picard plugins under contrib/ write tags and art here out-of-band.

  • The baseline schema (MIGRATION_V1): the core tables — tracks (one row per backing file: path, format, audio byte range, size/nanosecond-mtime/ctime stamps, content_version), tags (multi-value key/value rows ordered by ordinal, with an optional value_blob for binary tags), art (content-addressed, deduplicated image blobs), track_art (per-track art links with picture type and ordering), and structural_blocks (read-only, derived-from-file FLAC STREAMINFO/SEEKTABLE metadata, not part of the editable contract). Deleting a track cascades to its tags and track_art rows. Triggers bump the owning track's content_version/updated_at on any tags/track_art edit; CHECK constraints enforce the contract invariants below at commit time. A bounded, self-pruning track_changes ring (capacity 8192, CHANGELOG_CAP) fed by triggers on tracks gives O(changed) refresh — every metadata edit funnels through an UPDATE on the tracks row, relying on SQLite's nested trigger activation (on by default). Freshness-superset triggers make content_version cover every DB-knowable input to synthesized bytes: art_reject_content_update (art is content-addressed and immutable), art_ad (a deleted art row bumps referencing tracks so an orphan rebuilds to a clean serve-time error), tracks_geometry_au (scanner-owned geometry changes), and structural_blocks_ai/_ad.

The external-writer contract

Ownership. External tools get full read/write on tags, art, and track_art. The scanner owns the structural columns of tracks (id, backing_path, format, audio_offset, audio_length, backing_size, backing_mtime_ns, backing_ctime_ns, content_version, updated_at) and all of structural_blocks: those are derived from probing the file, and external tools must run musefs scan rather than compute them.

tracks.fingerprint and tracks.content_hash are also scanner-owned, read-only-derived columns — like structural_blocks, they are never part of the editable tag contract and external tools never write them. fingerprint is a SHA-256 over the probe's parsed output (deterministic per file, excludes filesystem stamps such as mtime/ctime), computed in the parallel probe worker at zero extra I/O. content_hash is a full-file SHA-256, stored as 64-char hex; it is computed only at the full checksum tier (--checksum=full), which requires an eager whole-file read. Neither column is UNIQUE by design — duplicate-content tracks legitimately share both values. On a normal scan, when a probed file's path is not yet in the store and its fingerprint matches exactly one orphaned row (a row whose backing_path no longer exists on disk), the scanner retargets that row to the new path in place, preserving its id, tags, and art rather than orphaning them. This is how musefs recovers from a backing-library move or reorganization: run musefs scan after moving files, and existing store rows follow their backing files to the new locations.

What the store enforces. SQLite CHECK constraints reject the malformed shapes at commit, so an external writer cannot persist them:

  • an unknown format string, or a negative length/offset/size/version;
  • an audio_offset + audio_length running past the stored backing_size;
  • a binary tag row whose value is non-empty;
  • an art.byte_len that disagrees with its blob, or a sha256 of the wrong length;
  • a picture_type outside 0..=20;
  • a tags.key over 256 chars or tags.value over 256 KiB;
  • tags.key must be non-empty and contain no ASCII control characters (a DB CHECK enforces this, rejecting violating writes — with one blind spot: an embedded NUL terminates SQLite's length()/GLOB, so a key like a\0b slips the CHECK. The scanner's own floor drops it before insert, and the Vorbis path rejects it on synthesis). Additionally, only keys within the Vorbis field-name grammar (ASCII 0x200x7D, excluding =) survive FLAC/Ogg synthesis — others are dropped and logged. MP3/M4A custom keys may use the wider set (e.g. =, :, spaces, non-ASCII).
  • a value_blob over MAX_BINARY_TAG_BYTES;
  • an art.mime over 255 chars or byte_len over MAX_ART_BYTES;
  • a track_art.description over 1 KiB;
  • a structural_blocks row with an unknown kind, negative ordinal, or body over the FLAC 24-bit block limit.

Schema identity. On open, musefs also validates schema identity: a sqlite_master comparison against a freshly-migrated reference plus PRAGMA foreign_key_check, rejecting anything that is not the canonical latest schema with a message telling the user to run musefs scan. A store whose user_version is newer than this binary's latest migration (a future or third-party tool bumped the schema) is refused up front with a distinct "store is newer than this binary" error rather than silently treated as already-migrated — an older binary must not risk misreading a newer contract.

Art is immutable once written. art rows are content-addressed by sha256; a trigger rejects any in-place UPDATE of an art row's content columns (data, sha256, mime, byte_len, width, height) with RAISE(ABORT) — a multi-row UPDATE art touching any content column aborts the whole statement. To change a track's art, insert a new content-addressed row and relink it via track_art (which bumps content_version); do not mutate an existing row. Deleting an art row still referenced by track_art (possible only with foreign_keys OFF) bumps every referencing track so the mount serves a clean EIO on the now-orphaned reference instead of stale bytes.

What musefs defends at serve time. CHECKs cannot catch a scanner-owned field mutated to a well-formed value that no longer matches the real file on disk: backing_size or backing_mtime_ns/backing_ctime_ns that drift from the actual file's stat, or audio bounds that fit the stored backing_size but overrun the file once it has shrunk. musefs re-stats the backing file on every resolve and treats such rows as untrusted input, degrading to a controlled BackingChanged/layout error, never undefined behavior. The store's CHECK rejects art over MAX_ART_BYTES (16 MiB − 64 KiB) at write time; resolve also re-checks it (ArtTooLarge, all formats) to backstop a writer that disables check enforcement, and the scanner's ingest-time drop is tracked in #284. Referential gaps are treated the same way: a track_art row whose art_id has no matching art row (an orphan an external writer can produce with FK enforcement disabled) fails the serve with EIO rather than silently dropping the art.

Merge vs. replace. An external writer may merge rather than fully replace text tags — overwriting only the keys it manages and leaving the rest of the scan-seeded set in place — provided it tracks its own managed-key set out of band (the beets plugin uses a beets flexattr; the store is not the place for plugin state). musefs renders tags outside its native VOCAB (musefs-format/src/tagmap.rs) by passthrough (Vorbis uppercased, mp3 TXXX, mp4 freeform), so such tags appear but are not guaranteed byte-identical to a given tagger's own per-format encoding. A merge matches the keys it manages case-insensitively, so a writer's canonical (lowercase) key replaces a scan-seeded row stored under the backing file's native case (e.g. Vorbis LABEL) instead of coexisting with it — Vorbis keys render case-insensitively, so two such rows would otherwise duplicate.

Path layout offload. External tools can also offload path layout entirely: a plugin evaluates its own (arbitrarily complex) path logic, writes the resulting relative path into a custom text tag — e.g. INSERT INTO tags (track_id, key, value, ordinal) VALUES (?, 'beets_path', 'Pink Floyd/Animals/01 Pigs', 0) — and the user mounts with --template '$!{beets_path}'. Because the field map is just the (lowercased) tag keys, any number of such tags (beets_path, lidarr_path, …) can back different concurrent mounts. The path field keeps embedded / as directory separators but sanitizes each segment and drops empty/./.. segments, so a misbehaving writer cannot inject traversal or empty components into the tree.

The shared Python library. contrib/python-musefs/ encodes this contract for plugin authors, including a generated copy of the schema (musefs_common/schema.py, regenerated from schema.rs by a drift-guarded test — see CONTRIBUTING). Its tag/art replace operations each wrap their DELETE+INSERT in a SQLite savepoint, so they are individually atomic and the "caller owns the transaction" guarantee holds even on an autocommit connection. The Lidarr integration uses the same shared library from a Custom Script workflow. Its Lidarr destination tree is only a tracking aid, made of symlinks by default; musefs remains the consumer-facing filesystem.

CI proves this contract end to end in the contract job (see CONTRIBUTING): a Python writer's tags/art, layered on a scanned track, are synthesized by the Rust serve path and read back by an independent reader.

External writers prune in one of two ways depending on how they own files. For in-place writers (e.g. the beets plugin), existence-based pruning — dropping the row of a removed backing file — is a deliberate act owned by musefs scan --revalidate; the plugin never prunes on its own (it exposes the revalidate scan via beet musefs --revalidate). The prune_missing helper in musefs_common implements the same by-existence delete for writers that prefer to own pruning themselves. Link-tree writers (e.g. the Lidarr integration) never delete the backing files they point at, so they prune by identity instead: a source-reported album/artist deletion removes the rows carrying the matching MusicBrainz id.

Connections are mode-typed (Db<ReadWrite> / Db<ReadOnly>), opened in WAL mode with a busy timeout. The serve path uses a DbPool whose per-thread variant hands each reader thread its own connection — WAL reads never contend.

Freshness, tree & scanning

Freshness: two version counters

Two distinct counters drive correctness; they answer different questions.

content_version (per-track column) answers "did this track's served bytes change?". The DB triggers increment it on any input the database can see that changes synthesized bytes: tag and track_art edits, art-row deletes that orphan a reference, scanner-owned geometry changes (format, audio bounds, backing size/nanosecond-mtime), and FLAC structural-block changes. It is therefore a superset key — the one input it cannot cover is an on-disk backing change with no DB write, which resolve (and, since #279, a size-cache getattr hit) catches by re-statting the backing file and degrading to BackingChanged. The scanner stamps the backing file's (size, mtime_ns, ctime_ns) tuple from the probed file descriptor using a pre/post fstat sandwich: if the file's metadata changes between the two stats, the entry is dropped. ctime defeats an mtime-forging writer (e.g. touch -m). The HeaderCache (reader.rs) — a byte-budgeted concurrent cache (64 MiB default) of resolved layouts — keys each entry on it: a hit with a stale content_version rebuilds the layout. Independently of the cache, every resolve re-stats the backing file and errors with BackingChanged if its size, mtime, or ctime drifted from the scanned values, so a silently replaced backing file is never spliced at stale offsets. The per-handle read path re-stats the held descriptor on every read too, so this guarantee holds on the hot path and not only through resolve().

data_version (PRAGMA data_version, whole-DB) answers "did anyone commit anything?". Musefs::poll_refresh compares it to the last seen value; on a change it consults the track_changes ring and applies an incremental, O(changed) rebuild: only the affected tracks' tree entries are re-rendered, exactly the removed tracks' cache entries are dropped, and the inodes whose content_version rose are reported to the FUSE layer. If the mount slept past the ring's capacity (or the ring was truncated), it falls back to a full tree rebuild — correct by construction, and a bulk change wants one anyway. The new version stamp is committed only after a successful rebuild; failures arm a retry backoff.

The FUSE layer fires poll_refresh on metadata ops (lookup, readdir, …) off the dispatch thread, so external edits appear without remounting. Polling is debounced (--poll-interval-ms) and rebuilds are single-flighted: a metadata-op storm costs at most one rebuild per interval. When mounted with --keep-cache, the changed-inode notifications drive kernel page-cache invalidation (inval_inode), so a re-tagged file never serves stale cached bytes.

Virtual tree

VirtualTree::build (musefs-core/src/tree.rs) materializes an inode → node mapping from rendered paths. Paths come from beets-style templates (template.rs): $field / ${field} substitutions (with ${a|b} fallback chains) over the track's tag fields, each resolving through per-field fallbacks and then a global default_fallback; [...] conditional sections suppress their literals when every field they reference is empty. With skip_on_missing set (CLI --skip-on-missing), an unresolved top-level field instead drops the track from the mount: render_one returns None, so the track enters neither the snapshot nor the tree, and the incremental refresh path reclassifies a track that loses (or regains) such a field as a removal (or addition). Plain values are sanitized to a single path component ('/' and control characters become '_', components equal to . or .. are dropped, and any component is truncated to 255 bytes on a UTF-8 boundary so it stays within NAME_MAX), while a $!{field} path field keeps '/' as directory separators (sanitizing each segment and dropping empty/./.. segments) so a precomputed multi-level path expands into real directories. Path collisions are resolved deterministically by appending (k) before the extension (disambiguate). mapping.rs bridges DB tag rows to the format layer's inputs and to template fields — ordering and multi-value semantics live there.

Inodes are stable across rebuilds: a persistent path→inode allocator (InodeAllocator) reuses an unchanged rendered path's inode and never recycles a retired one, so a descriptor held open across a refresh keeps resolving to the same node and a stale FUSE handle can never alias a different file. On case-insensitive mounts the key is case-folded, so a survivor keeps its inode even when an unrelated deletion flips a merged directory's display casing (#305). A path that vanished degrades to ENOENT, bounded by the entry/attr TTL. (Retired paths are pruned once they outnumber live ones, bounding the allocator at twice the live tree; a path that returns after a prune gets a fresh inode.)

Scanning

scan_directory (musefs-core/src/scan.rs) ingests a backing directory: collect supported audio files, probe each (format detection → audio offset/length, tags, pictures, structural blocks) on a parallel probe pipeline feeding a single DB writer, committing in batches. Probing reads are bounded — the scanner never slurps whole files — and ingestion caps per-item sizes (MAX_ART_BYTES, MAX_BINARY_TAG_BYTES) so a crafted file cannot balloon the store. An over-cap picture or binary tag is dropped and logged (RUST_LOG=warn) rather than vanishing silently, so a track that appears to have lost its cover art has an explanation in the logs; a supported-extension file that fails to parse, or errors mid-probe, is likewise logged with the reason and counted failed.

Symlinks are not followed by default: a symlinked file or directory is logged (RUST_LOG=info/warn) and skipped, which keeps the walk immune to directory-symlink cycles. Passing --follow-symlinks resolves them — symlinked audio files and directories are scanned — guarded by a visited (dev, ino) set so symlink cycles terminate, and by a second file-level (dev, ino) set so a file reached via both a real path and a symlink is ingested once rather than upserting its canonical track row twice. Because that set keys on (dev, ino), multiple hardlinks to the same inode are likewise collapsed to a single track under --follow-symlinks. Broken symlinks are logged and skipped without aborting the scan. The root argument is always followed regardless of the flag; only links encountered during recursion are gated.

revalidate is the maintenance pass (scan --revalidate): re-probe only files whose (size, mtime_ns, ctime_ns) freshness stamp changed — a ctime-only move (e.g. a forged-mtime in-place rewrite) is still re-probed (skipping unchanged files preserves external tag edits in the DB), delete tracks under the scanned root whose backing file is gone, and garbage-collect now-unreferenced art. Pruning is scoped to the scanned root, so revalidating one library root never removes tracks belonging to another. Because a track is keyed by its canonical backing path, a file scanned via --follow-symlinks whose real target lives outside the scanned root falls outside the prune scope: if that target later disappears, its stale row is not pruned by revalidating this root.

The contrib ecosystem

External writers live under contrib/: python-musefs is the shared store-contract library (schema-version check, tag/art writes, sha256 art content-addressing, the musefs scan shell-out); the beets plugin, the Picard plugin, and the Lidarr integration (a Custom Script workflow) build host-specific tag mapping on top of it. Each one's README covers its own setup and behavior; CONTRIBUTING covers their test suites and the generated-schema/vendoring mechanics.

Getting set up

The working manual for building, testing, and landing a change. For what the pieces are, read the architecture overview first; for per-format behavior, the format docs.

Map of this document:

Getting set up

Prerequisites:

  • Rust — stable (edition 2024) with rustfmt and clippy.
  • FUSE (to mount, or to run the FUSE end-to-end tests) — Linux with /dev/fuse and libfuse (libfuse3-dev / libfuse3 plus pkg-config), or FreeBSD with /dev/fuse and the fusefs kernel module (no libfuse — see FreeBSD e2e for the in-tree VM harness).
  • Python 3 with ruff and pytest — only for the Python plugin suites.
  • shellcheck and yamllint — optional; the pre-commit hook's shell and YAML lint legs each skip with a notice if not installed.

Enable the repo's pre-commit hook once per clone:

git config core.hooksPath .githooks

The hook (.githooks/pre-commit) runs, in order: cargo fmt --all --check, cargo clippy --all-targets -- -D warnings, the full workspace test suite (cargo test --workspace), a conditional cargo-mutants anchor-drift guard (only when .cargo/mutants.toml, scripts/check_mutant_anchors.py, or a musefs-core/musefs-format source file is staged), shellcheck over every tracked shell script, yamllint (relaxed .yamllint) over every tracked YAML file, and ruff check + ruff format --check over contrib/beets/, contrib/picard/, contrib/lidarr/, contrib/python-musefs/, scripts/, and tests/interop/. A few consequences worth internalizing:

  • A commit with red tests is always rejected — there is no "commit-now-fix-later" workflow here.
  • Python-only changes hit the hook too: the ruff gate lints exactly the union of paths the CI jobs lint, so a commit can't pass the hook yet fail CI lint.
  • The cargo gate (fmt/clippy/test) is skipped when every staged path is under docs/ or is a Markdown file, so a docs-only commit stays fast.
  • The shellcheck/yamllint legs fire only when a shell or YAML file is staged, and skip with a notice when the tool is absent; when they do run they lint all tracked files of that type, so a sibling file can't drift unnoticed.
  • The mutant-anchor guard fires only when the mutants config, its check script, or a musefs-core/musefs-format source file is staged, and skips with a notice when cargo-mutants is absent (CI re-checks it regardless). It re-validates that the .cargo/mutants.toml exclude_re anchors still point at their intended file:line:col after a line-shifting edit.

Build & test

cargo build                              # build the workspace
cargo test                               # all crates (excludes FUSE e2e)
cargo test -p musefs-core                # one crate
cargo test -p musefs-core read_at        # tests matching a substring
cargo clippy --all-targets               # lint (policy: see below)
cargo fmt                                # format

The musefs binary enables the default-on jemalloc feature (jemalloc global allocator + background purge thread). Build the system-allocator variant with cargo build -p musefs --no-default-features — used for the RSS comparison (scripts/rss-churn-bench.sh) and by packagers that forbid vendored C libs.

The FUSE end-to-end tests perform real mounts and are #[ignore]d:

cargo test -p musefs-fuse -- --ignored   # needs /dev/fuse + libfuse

The kernel-passthrough e2e additionally needs CAP_SYS_ADMIN. Don't run cargo under sudo — build first, then run the prebuilt test binary with sudo (find it in target/debug/deps/):

cargo test -p musefs-fuse --no-run
sudo target/debug/deps/<e2e_test_binary> --ignored <passthrough_test_name>
  • Read-consistency harness (musefs-fuse/tests/read_consistency.rs): a seeded, reproducible randomized pread/mmap sweep compares live-mount reads against an in-memory oracle (the seed is printed on failure to reproduce). The hermetic FLAC tests — whole-file mmap fidelity and the read-only write-refusal matrix — always run; the multi-format breadth sweep generates fixtures with ffmpeg and skips any format whose codec is unavailable.

FreeBSD e2e

The FUSE e2e suite also runs on FreeBSD, via the scripts in scripts/freebsd-vm/. They are the single source of truth — CI and local runs invoke the same scripts, so they can't drift:

  • run-local.sh — host-side orchestrator: creates and boots a FreeBSD VM under qemu/KVM and runs the suite in it. All artifacts go under the gitignored .scratch/freebsd/.
  • provision.sh — in-guest: installs git, ffmpeg, and the current stable Rust toolchain via rustup (FreeBSD's packaged rust lags and is too old for some deps), and loads the fusefs kernel module. Run by run-local.sh and CI.
  • run-e2e.sh — in-guest: cargo test --workspace then the --ignored FUSE e2e suite (guards that ffmpeg is present so the decode/encode tests don't silently skip).
  • serial-run.py — drives the VM over its serial console (the console driver used by run-local.sh).

CI. The freebsd job in .github/workflows/ci.yml runs these in a vmactions/freebsd-vm VM. It is expensive (a full in-VM build), so it does not run on every PR — only when the FUSE/mount surface or its harness changed (musefs/, musefs-fuse/, scripts/freebsd-vm/, Cargo.lock, ci.yml) or on a release tag (v*).

Local run (one command):

sh scripts/freebsd-vm/run-local.sh

Host prerequisites (Debian/Ubuntu packages in parens): qemu-system-x86_64 + qemu-img (qemu-system-x86, qemu-utils), xorriso (xorriso), curl + xz (curl, xz-utils), python3. /dev/kvm for acceleration (it runs without it, just far slower); ~6 GB free under .scratch/.

What it does, end to end:

  1. Downloads the official FreeBSD-<rel>-amd64-BASIC-CLOUDINIT-ufs.qcow2 image into .scratch/freebsd/ (cached; downloaded once). That image directs its console to the serial line, which is what lets the harness drive it.
  2. Creates a fresh overlay disk from the cached base each run (cheap reset).
  3. Boots the VM headless and logs in as root over the serial console (the image has an empty root password — no SSH, no keys, no cloud-init).
  4. Serves this repo over a throwaway HTTP server on qemu's user-net gateway (10.0.2.2); the guest fetches and unpacks it.
  5. Runs provision.sh + run-e2e.sh over the console and propagates the exit code, then powers the VM off.

Tunable via env: FREEBSD_REL (default 14.3-RELEASE), VM_MEM, VM_SMP, VM_DISK, HTTP_PORT, RUN_TIMEOUT.

To drive your own VM instead, boot any FreeBSD image and, from the repo root inside it as root, run sh scripts/freebsd-vm/provision.sh then sh scripts/freebsd-vm/run-e2e.sh.

Notes:

  • FreeBSD uses fuser's pure-rust /dev/fuse backend — no libfuse package; only the fusefs kernel module and base-system mount_fusefs(8) are needed.
  • Kernel FUSE passthrough (StructureOnly) is Linux-only; on FreeBSD it falls back to daemon serving.

macOS support is best-effort: CI builds there with fuser's macos-no-mount feature, and the platform-specific logic is unit-tested. Mounted e2e on macOS/FUSE-T is not yet validated.

Test tiers

Test tiers beyond cargo test

Property tests

proptest invariants — panic-freedom, the byte-identical-audio guarantee, tag round-trip stability — live in musefs-format/tests/proptest_*.rs and musefs-core/tests/proptest_read_fidelity.rs. The format-layer suites are gated on the fuzzing feature, which musefs-format's self-dev-dependency enables for all of its own test builds — so a plain cargo test -p musefs-format runs them.

Coverage-guided fuzzing

The fuzz/ crate is excluded from the workspace: workspace-wide build, test, and clippy do not compile it, so a format-layer signature change can break fuzz targets without anything failing locally — CI's fuzz smoke job (cargo +nightly fuzz build) is what catches it. Check locally before pushing a format-layer API change:

cargo install cargo-fuzz                          # one-time; needs nightly
cargo +nightly fuzz build                         # what the CI smoke job runs
cargo +nightly fuzz run <target>                  # flac|mp3|mp4|ogg|wav|ogg_page|b64|vorbiscomment|serve
cargo +nightly fuzz coverage <target>             # confirm coverage reaches the parser
cargo run --manifest-path fuzz/Cargo.toml --bin generate_seeds   # (re)build seeds

Fuzz crash regressions

When you fix a fuzz-found crash:

  1. Drop the reproducer bytes into fuzz/regressions/<target>/ (one file per reproducer). The per-PR fuzz smoke job's replay step runs every committed reproducer with cargo +nightly fuzz run <target> <files> -- -runs=0 — a deterministic single pass that fails the build if any known input panics again. This is separate from fuzz/corpus/, which cargo fuzz cmin minimizes (and would prune reproducers from).
  2. Where the crash exposed a real logic/behavior defect, also add a focused behavioral test for that logic in the owning crate's suite (the pre-commit hook gates it). The byte replay proves the exact input no longer panics; the behavioral test documents and locks in the fix. They are not interchangeable.

Coverage notes: the per-format targets also drive the bounded/ceiling probers (*_bounded, locate_audio_at_ceiling, read_structure_from) and assert a differential oracle against the full-buffer parse. The serve target fuzzes the read-time serve path (read_at_with_file over adversarial layouts, including serve_ogg_window/OggArtSlice) and is scheduled-only (built per-PR, not smoke-run) because it builds a DB + temp backing file per input. The serve target also exercises hostile DB rows (negative/oversized geometry, invalid formats, orphaned/oversized art, stale binary-tag handles, content-version mismatch) via the musefs-db fuzzing-gated with_raw_conn, plus binary-tag streaming and distinct Opus/Vorbis/OggFLAC fixtures.

Independent-reader interop (mutagen)

Asserts that an independent ecosystem reader sees the tags musefs synthesizes, across all five formats:

pip install -r tests/interop/requirements.txt
MUSEFS_INTEROP_DIR=/tmp/i cargo test -p musefs-core --test interop_emit -- --ignored emit_interop_fixtures
MUSEFS_INTEROP_DIR=/tmp/i python -m pytest tests/interop

External-writer contract round trip

CI's contract job mandatorily proves the Python -> Rust DB contract: it builds the binary, runs each binary-only plugin's musefs_bin tier with MUSEFS_REQUIRE_BIN=1 (a missing binary fails instead of skipping), and runs the round-trip harness. The harness is the single source of truth, run locally with:

pip install -r tests/contract/requirements.txt pytest && pip install -e contrib/python-musefs
bash scripts/contract-roundtrip.sh

It scans real ffmpeg-generated audio (so musefs scan owns the track geometry), writes tags/art through musefs_common.store, synthesizes the served bytes via cargo test --test contract_emit, and asserts with mutagen that the Python tags and art survived. Picard's musefs_bin tier runs in the picard job (it needs the system-Picard environment).

Failure-path fault injection

The reader and DB error paths are exercised under simulated runtime faults. musefs_core::metrics::set_backing_fault(BackingFault::{Eio,ShortRead}) (behind the metrics feature) installs a process-global fault at the positioned backing-read site, cleared by the returned RAII guard. Because it is global, the tests run in their own metrics-gated binaries.

cargo test -p musefs-core --features metrics --test reader_faults
cargo test -p musefs-core --test backing_changed_fault   # real file mutation
cargo test -p musefs-core --test db_corruption_fault      # byte-corrupt DB
cargo test -p musefs-fuse --features metrics -- --ignored # EIO through the mount (needs /dev/fuse)

BackingChanged (re-validated in HeaderCache::resolve) and DB corruption are driven by real conditions, not the seam. ENOSPC/read-only faults are write-path concerns and are out of scope for the read-time suite.

Mutation testing

scripts/mutants.sh wraps cargo-mutants for the logic-bearing crates; .cargo/mutants.toml permanently excludes the thin glue crates (musefs-fuse, musefs-cli, musefs) and feature-gated instrumentation. musefs-latencyfs carries real logic and has its own leg (it needs /dev/fuse to kill its mutants).

The CI parity check for a branch is the in-diff gate — mutate only the lines your branch changed:

git diff "$(git merge-base main HEAD)...HEAD" -- '*.rs' > mutants.diff
grep -q '^@@ ' mutants.diff   # IMPORTANT: an empty diff mutates nothing and exits 0 — a silent false pass
cargo mutants --in-diff mutants.diff -j2 --exclude 'musefs-latencyfs/**' --output /tmp/mutants-out/in-diff

Sharp edges:

  • Check the exit status directly. Don't pipe the run through tail/grep — that masks the exit code.

  • Scratch space and memory. cargo-mutants copies the source tree into a scratch dir under TMPDIR/MUTANTS_TMP (which must be outside the repo). For a small in-diff mutant set, the default tmpfs /tmp is fine — and faster. For large sets (a full-crate campaign), some mutants are allocation bombs (e.g. a constant-return on a parser position helper spins a collect-loop) that can OOM the host before the test timeout fires: put TMPDIR on real disk and run inside a memory-capped cgroup, e.g.

    mkdir -p ~/.cache/musefs-mutants-tmp
    TMPDIR="$HOME/.cache/musefs-mutants-tmp" systemd-run --user --scope --collect \
        -p MemoryMax=10G -p MemorySwapMax=0 \
        cargo mutants --in-diff mutants.diff -j2 --exclude 'musefs-latencyfs/**' --output /tmp/mutants-out/in-diff
    

    scripts/mutants.sh also supports sharding (MUTANTS_SHARD=i/n, used by CI to split the long musefs-format leg), though a sharded local workflow hasn't been built out.

  • Known-unkillable mutant classes get a documented exclude_re in .cargo/mutants.toml, not test contortions. Note that cargo-mutants mutates const initializer expressions too — a constant is not a hiding place for arithmetic the gate flags.

  • exclude_re entries are guarded against drift. A few exclusions must pin a specific file:line:col: (the operator+function alone isn't unique in the function); those coordinates rot silently when cargo fmt shifts the code, and a stale anchor can re-point onto a killable mutant — a silent false pass. scripts/check_mutant_anchors.py prevents that: it lists the full unfiltered mutant set (cargo mutants --no-config --list --json) and re-validates every exclude_re entry. It runs in the per-PR in-diff job (.github/workflows/mutants.yml) and its unit tests run in CI's python-musefs job. Run it locally with:

    cargo mutants --no-config --list --json > /tmp/mutants-list.json
    python3 scripts/check_mutant_anchors.py --mutants-json /tmp/mutants-list.json
    

    Each entry carries a machine-checked # guard: comment on the line directly above it:

    • file:line:col anchors# guard: op="<" fn="probe_file" rows=3. The guard asserts the matched mutants all share that operator and function, occupy one site, and number exactly rows (use fn="" for a const-level site with no enclosing function). A narrowing entry (one that embeds a replacement to leave same-site siblings killable) sets rows to that subset's size.
    • description anchors# guard: count=N (default 1) asserts the entry matches mutants spanning exactly N distinct sites; this is what catches a newly-added killable sibling silently joining the match set. A bare single-site description entry needs no tag.

    When the guard fails: a found none message means a line:col anchor drifted — re-anchor it to the current coordinates from the listing and re-confirm the mutant there is still genuinely equivalent (a reformat can change surrounding logic, not just line numbers). A count/rows mismatch means a sibling appeared or disappeared — investigate before bumping the number. Pure cargo fmt/line-shift drift can often be repaired automatically with python3 scripts/check_mutant_anchors.py --fix, which re-points an anchor to its current coordinates by operator+function. It only does so when the mapping is unambiguous — every same-operator site in the function is anchored, so the positional match is exact. An anchor that pins one of several same-operator sites (the usual reason it is a file:line:col anchor rather than a description) cannot be derived from the tag alone, so --fix leaves it for manual re-anchoring and reports can't auto-derive the coordinate; it also declines when a site was added or removed. Always eyeball the resulting diff before committing. Every new file:line:col exclusion needs a # guard: tag (the guard rejects an untagged one), and exclude_re patterns must stay within the Rust-regex/Python-re shared subset the guard allows (\. \d + | ^ ( ) *, no inline (?...) groups).

Performance regression gating

cargo test -p musefs-core --features metrics includes tests/perf_counters.rs: golden assertions on deterministic work counters (preads, pread_bytes, scan_bytes_read, art/binary-tag chunks) for the read/serve and ingest paths, plus a tree.rs unit test pinning the refresh rebuild count as size-invariant. These are a hard gate — a legitimate change to read/ingest/refresh work must update the golden numbers in the same PR. They run on every non-doc PR via CI's check job. Constant-factor (wall-clock) changes are surfaced separately by the warn-only perf-ab job (below).

The A/B benchmark runs only when musefs-core/src/** or musefs-format/src/** change. The perf-bench matrix job benches the base and PR commits in parallel on separate runners (one ref each), then the perf-ab job downloads both exported baselines and posts a critcmp delta as a sticky PR comment. It is warn-only and not a required check — GH runner noise (now including cross-runner variance) makes wall-clock unfit for hard gating. Reproduce locally on one machine with scripts/perf-ab.sh <base-sha> out.md.

Concurrency + sanitizers

Concurrent-reader coverage exists at two levels:

cargo test -p musefs-core --test concurrent_reads          # core: HeaderCache + WAL reads (default suite)
cargo test -p musefs-fuse --test concurrent_reads -- --ignored  # mount: DbPool::PerThread (needs /dev/fuse)

CI runs the core test under AddressSanitizer as a required gate (asan job) and both tests under ThreadSanitizer as a non-required best-effort signal (tsan job, continue-on-error). TSan cannot instrument the system C libraries (libfuse, libsqlite3), so it is a signal, not a gate. ASan is ABI-compatible with an uninstrumented std, but TSan is not — so the TSan command needs -Zbuild-std (and the rust-src component) to rebuild std with the sanitizer. Reproduce locally with:

rustup toolchain install nightly
rustup component add rust-src --toolchain nightly   # for TSan's -Zbuild-std
RUSTFLAGS="-Zsanitizer=address" ASAN_OPTIONS="detect_leaks=0" \
  cargo +nightly test -p musefs-core --test concurrent_reads --target x86_64-unknown-linux-gnu
RUSTFLAGS="-Zsanitizer=thread" TSAN_OPTIONS="halt_on_error=0" \
  cargo +nightly test -p musefs-core -Zbuild-std --test concurrent_reads --target x86_64-unknown-linux-gnu

Coverage

cargo install cargo-llvm-cov
cargo llvm-cov --workspace --exclude musefs-fuse --exclude musefs-latencyfs --open
cargo llvm-cov --workspace --exclude musefs-fuse --exclude musefs-latencyfs --lcov --output-path lcov.info

musefs-fuse and musefs-latencyfs are excluded because these FUSE crates' tests need a real mount; their behavior is covered by the separate e2e CI job rather than llvm-cov. The CI e2e job also runs the binary-level cargo test -p musefs -- --ignored and cargo test -p musefs-latencyfs -- --ignored suites so they cannot silently rot (they require /dev/fuse + fusermount3). CI (coverage.yml) runs this on every push/PR and uploads to Codecov (CODECOV_TOKEN repo secret).

Conventions & adding a format

Code conventions

  • Errors. Each crate has its own error.rs with a thiserror enum; musefs-core wraps lower layers in CoreError; the CLI is the only anyhow consumer. Internal error paths never discard diagnostics: no Result<_, ()>, no .map_err(|_| …) that drops a source — each variant carries its source (#[from]) or a static reason naming the broken invariant.
  • Integer conversions. The four clippy cast lints are deny-via-CI. Widenings use From; u64 -> usize only via the sanctioned usize_from helpers (musefs_db::convert, re-exported by core; musefs-format and musefs-latencyfs carry crate-local siblings — the workspace is declared 64-bit-only); genuine narrowings use try_from (? for input-dependent values, .expect for structurally bounded ones, .unwrap in tests); deliberate bit-truncation keeps as under a reasoned #[expect]. Non-negative DB row fields are unsigned; rusqlite's checked conversions (feature fallible_uint) validate at the row boundary.
  • Lint policy. clippy::pedantic minus a few intentional/noisy groups, defined in the root Cargo.toml under [workspace.lints]. The hook and CI deny all warnings.
  • Unsafe code. unsafe_code = "deny" is set for the workspace members in the root Cargo.toml ([workspace.lints.rust]); the standalone fuzz/ crate is outside the workspace and is not covered. A genuinely-necessary unsafe is opted in per-site with #[expect(unsafe_code, reason = "...")] — never a bare unsafe block and never by relaxing the workspace lint, so every unsafe is greppable and review-visible. Prefer a safe crate (e.g. rustix for syscalls) over hand-rolled FFI.
  • Layering. Keep musefs-fuse, musefs-cli, and the musefs binary thin; cross-cutting logic belongs in musefs-core (see the crate layout).
  • Hidden API consumers. benches/ directories and each crate's tests/ are compiled only by --all-targets: after an API change, compile-check with cargo clippy --all-targets, not cargo build.

Adding a format

  1. Implement probe + synthesize_layout in musefs-format (mirror an existing module — flac.rs, mp3.rs, mp4.rs, ogg/, wav.rs), returning a RegionLayout.
  2. Add the variant to musefs-db's Format enum, then wire it into the match track.format arms in reader::HeaderCache::resolve (musefs-core/src/reader.rs) and into scan.rs (extension list, probe dispatch).
  3. Extend the test surface: a fuzz_check::fixtures::<fmt>() minimal file, a fuzz/fuzz_targets/<fmt>.rs target with a seed in generate_seeds, a musefs-format/tests/proptest_<fmt>.rs, and a manifest row in musefs-core/tests/interop_emit.rs.
  4. Write docs/<FMT>.md (follow the shape of the existing five).

Python plugins

Python plugins (contrib)

The four packages share one drift-guarded contract; see the contrib ecosystem for the layout and the integration pages for plugin-specific setup.

# python-musefs: self-contained
cd contrib/python-musefs && python -m pytest && ruff check . && ruff format --check .

# beets: install the local python-musefs first so the suite tests the working
# tree, not the PyPI release (see the beets integration page for the venv flow)
cd contrib/beets && pip install -e ../python-musefs && pip install -e ".[test]" && python -m pytest tests

# picard: no install needed (vendored + pythonpath=".")
cd contrib/picard && python -m pytest tests

# lidarr: install the local python-musefs first so the suite tests the working
# tree, not the PyPI release (see the lidarr integration page for the env flow)
cd contrib/lidarr && pip install -e ../python-musefs && pip install -e ".[test]" && python -m pytest tests

Gotchas that have bitten before:

  • On PEP 668 "externally managed" systems, bare pip install fails — use a venv for the beets suite.
  • The real-Picard tests importorskip Picard and Qt: without an importable Picard (e.g. the system package on PYTHONPATH), they silently skip. When touching the Picard plugin, make sure they actually ran.
  • The Lidarr integration is gated by two automated tiers, both deterministic and network-free (Lidarr's metadata server is mocked too):
    • PR check — .github/workflows/lidarr-smoke.yml (scripts/lidarr-smoke.sh): a fast smoke that proves the Custom Script exec path on a real Lidarr (its Test event) and runs the content leg (musefs-lidarr-sync tag-writes, musefs-lidarr-import symlink, served-mount tags, unchanged bytes) against a local mock Lidarr API. Runs on PRs touching the Lidarr surface.
    • Release gate — .github/workflows/lidarr-e2e.yml (scripts/lidarr-e2e/run-e2e.sh): the full real-instance e2e. A real Lidarr, driven by local metadata/indexer/qBittorrent mocks, performs a genuine download-client import of a real CC0 album as a NewDownload, firing OnReleaseImport, which execs the real musefs scripts; the served mount is then asserted to carry Lidarr-supplied metadata the backing file lacked, bytes unchanged. This gates the Python py-v* publish and closes what used to be the manual download-client gap. The vendored CC0 fixture is scripts/lidarr-e2e/fixtures/.
  • musefs_common/schema.py is generated from musefs-db/src/schema.rs. After a schema change: MUSEFS_REGEN_SCHEMA_PY=1 cargo test -p musefs-db schema_py, then re-vendor Picard's copy with python contrib/python-musefs/vendor_to_picard.py. Drift is enforced by a musefs-db unit test and the Picard vendor-sync test.
  • MAX_ART_BYTES in contrib/python-musefs/src/musefs_common/constants.py is hand-mirrored from musefs-core/src/scan.rs — update both sides together.

Releasing

Releasing the Python packages

The contrib/ Python packages (python-musefs, beets-musefs, lidarr-musefs, and the unpublished musefs-picard) share a single version, decoupled from the Rust crates and released on a py-v* tag. musefs-picard tracks the version but is not uploaded to PyPI (Picard has its own plugin registry; the shared library is vendored into it).

One-time setup (before the first release). Trusted Publishing fails until the publisher exists on PyPI. For each of python-musefs, beets-musefs, and lidarr-musefs:

  1. Create/reserve the project on PyPI.
  2. Add a GitHub Actions trusted publisher pointing at: owner/repo Sohex/musefs, workflow release-python.yml, environment pypi.

Also create a GitHub environment named pypi in the repo settings (it gates the publish job).

Cutting a release:

  1. Choose the new version X.Y.Z and run python scripts/bump_python_version.py X.Y.Z. This rewrites every contrib/*/pyproject.toml version, the __version__ strings, the python-musefs>= dependency floors, and re-vendors python-musefs into the Picard plugin.
  2. Review git diff — it should touch only the version/floor lines and the Picard vendored _common/ copy.
  3. Promote the ## [Unreleased] section of contrib/CHANGELOG.md to ## [X.Y.Z] - <date>.
  4. Commit, then tag and push:
    git commit -am "release: python packages X.Y.Z"
    git tag py-vX.Y.Z
    git push origin HEAD --tags
    
  5. release-python.yml runs the version gate, the four Python test suites, then publishes python-musefs, beets-musefs, and lidarr-musefs to PyPI (in that order).

Releasing the Rust crates and binaries

The Rust workspace publishes to crates.io and ships prebuilt cross-compiled binaries on a v* tag, decoupled from the Python py-v* flow. release.yml runs one ordered graph — gate → build → smoke → publish → release-assets — and is the source of truth; this checklist is the human side.

Pre-flight.

  1. Working tree clean, on the commit you intend to release.

  2. Confirm main is green (CI + coverage). The tag push triggers a fresh ci.yml and coverage.yml run, and the release gate job waits for ci-ok and coverage-ok to be green on the tagged commit before anything builds or publishes — a red tree blocks the release automatically.

  3. CARGO_REGISTRY_TOKEN is present in repo secrets.

  4. Smoke-build every cross target so jemalloc-sys is known to compile under zig before tagging (the release matrix builds with the jemalloc feature on):

    for t in x86_64-unknown-linux-gnu.2.17 aarch64-unknown-linux-gnu.2.17 \
             x86_64-unknown-linux-musl aarch64-unknown-linux-musl; do
      cargo zigbuild --release -p musefs --target "$t"
    done
    

    If a target cannot build jemalloc-sys, add --no-default-features to that matrix entry's cargo zigbuild in release.yml, rather than blocking the release. The Docker images COPY the binary this step produces (they don't run cargo), so the matching container inherits the opt-out automatically.

Version bump (do this in one commit before tagging).

  1. Pick the new version X.Y.Z.
  2. Bump the workspace version in Cargo.toml.
  3. Bump every internal musefs-* path-dependency constraint that pins the old version (e.g. musefs-db = { version = "X.Y.Z", path = "..." }) — a stale internal floor fails the publish.
  4. Promote the ## [Unreleased] section of CHANGELOG.md to ## [X.Y.Z] - <date>.
  5. Dry-run package each crate: cargo package -p <crate> --locked for each of musefs-db musefs-format musefs-core musefs-fuse musefs-cli musefs. This catches packaging errors but not the cross-crate index-propagation problem (it resolves siblings via path deps); that is handled in-workflow (next section).
  6. Commit, e.g. git commit -am "release: vX.Y.Z".

Tag and push.

git tag vX.Y.Z
git push origin HEAD --tags

The tag push starts both CI and release.yml. The gate job blocks publishing until ci-ok + coverage-ok are green on the tagged tree (45-minute timeout, covering the full matrix including the FreeBSD VM e2e).

What release.yml does.

  1. gate — verifies the tag matches the workspace version and waits for the required CI checks to pass on the tagged commit (fails closed on a failed check or timeout).
  2. build — cross-compiles the four target binaries.
  3. smoke — runs the binary smoke on each target (host + Alpine).
  4. publish — publishes crates in dependency order. For each crate it skips the publish if name@version already resolves from the crates.io index, then waits for that version to appear before publishing the next dependent crate (index-propagation; #163). The skip makes a whole-workflow re-run after a partial failure safe.
  5. release-assets — creates/updates the GitHub Release and uploads the binary tarballs + checksums (only after crates publishing succeeds).

Retry / rollback.

  • crates.io is yank-only — a published version cannot be un-published.
  • A partial failure (e.g. crate 3 of 6 published, then a transient error) is recovered by re-running the workflow: the publish loop skips the crates already in the index and resumes, then runs release-assets. No manual cleanup of the published crates is needed.
  • GitHub asset upload is idempotent (gh release upload --clobber), so re-runs re-upload safely.

Post-release verification.

  1. cargo install musefs (or cargo install musefs --version X.Y.Z) from a clean machine/container.
  2. Download a release tarball and verify its checksum: sha256sum -c musefs-X.Y.Z-<triple>.tar.gz.sha256.
  3. Confirm all four target tarballs + .sha256 files are attached to the GitHub Release.

Lidarr gate at a v1.0.0 milestone. The Lidarr real-instance e2e (lidarr-e2e.yml) gates the Python py-v* release, not this Rust flow. When a v1.0.0 milestone bundles both, ensure the Python release (and therefore its Lidarr e2e gate) is also run.

PRs & commits

  • Conventional-style subjects (fix(format): …, docs: …, ci: …), scoped and imperative.
  • main is protected by required status checks: the ci-ok and coverage-ok aggregator jobs must pass. CI also runs the fuzz smoke build, the in-diff mutation gate, and a security audit on PRs. Docs-only changes skip the expensive jobs at the job level — the aggregators still report.
  • Benchmark results, when a change warrants them, are recorded in Benchmarks.

Before you push

The pre-commit hook already gates fmt, clippy, the workspace tests, and the Python/shell/YAML lints on every commit. What it does not run — check the ones your change triggers:

  • Logic changes → the in-diff mutation gate. It is CI parity, not optional polish.
  • Format-layer API changescargo +nightly fuzz build; the fuzz/ crate is outside the workspace, so nothing else compiles it (coverage-guided fuzzing).
  • musefs-db schema changes → regenerate and re-vendor the Python schema mirror (Python plugins).
  • Picard plugin changes → make sure the real-Picard tests actually ran rather than silently skipped (gotchas).
  • FUSE/mount-surface changes → run the --ignored e2e suite locally (Build & test); the FreeBSD CI leg only runs on PRs that touch that surface.

Benchmarks

Every optimization pass re-measured apples-to-apples on one box as a PR-isolated before/after pair, plus a cumulative 16caba4main summary. This file is performance only — correctness gates (byte-identical proptests, FUSE e2e, in-diff mutation) live in CI and the contributor guide, not here.

Read it in three layers:

  1. Results at a glance — the cumulative per-subsystem delta and a one-line headline per pass.
  2. Methodology — machine, before/after definition, the overlay rule, run conventions, storage placement. Written once; every detail section assumes it.
  3. Per-pass detail — one section per pass: what changed, the before/after table, the reproduce command, and the "why" where it matters.

Results at a glance

Cumulative — 16caba4 → current main (e02223e)

Composed from the per-pass isolated deltas below, anchored to current-main absolutes. Non-isolating: a same-harness run at both ends is infeasible (API drift means neither the 16caba4-era harness nor the main harness compiles at the other commit), so these compose the chain of passes that touched each subsystem rather than a single end-to-end measurement. See Cumulative detail for the absolutes and the per-pass composition.

SubsystemHeadline metric16caba4-eracurrent mainΔDominant pass
Ingestfsync count (durable)4030eliminatedSP1
cold scan, ci flac32 206 ms47 ms~685×SP1
Refreshrefresh-1 @ 20 000 tracks173 ms1 ms~173×#69
Servesequential_read/flac929 µs569 µs−38.8%SP3 + PR3
cold_first_read/ogg14.96 ms1.51 ms−89.9%SP4
concurrent m16+walker8.20 ms4.15 ms−49.4%SP3 + PR3

Per-pass headlines

Each headline is the pass's single largest statistically-significant delta on its deployment-representative tier.

PassCommitHeadline (this box)
SP1 — ingestion scalabilityccbbfaadurable cold scan ~1150–3600× faster; fsync storm 403→0
SP2 — incremental tree refreshed5f3805 000-track refresh-1 1.4× (32→23 ms)
SP3 — read/serve residualse8d56bdsequential_read −8 to −13% (flac/mp3/m4a/m4a-last)
SP4 — storage-aware Ogg servinga62453bogg cold-read −88%, seek −94%
#69 — refresh O(changed)e7ae912refresh-1 @ 20 000 ~170× (173→1 ms)
#114 — root fan-out lookup0881b31root fan-out @ 20 000 ~5× (5→1 ms)
PR2 — scan pair (#67/#68)2d4faf3−128 B/file scan I/O (flac/ogg/wav); wall within noise
PR3 — serve-path copies (#70)32be8f0sequential_read −7 to −11% (m4a-last/ogg/wav); concurrent −19%
#136 — HeaderCache quick_cache2e6674ewithin noise (marginal m4a/ogg sequential)
#112 — StructureOnly passthroughfaec017passthrough dd 3.36× (2.5→8.4 GB/s)

One direction inverted vs the historical file: SP1 §4 (compute-isolated, on RAM) is now faster after the change, not slower. The old file recorded SP1 as ~1.9× slower on RAM-backed tempfs (the "honest cost" of the pipeline); on this 8-core box the parallel pipeline wins even on RAM (~1.4×), at higher peak RSS. See SP1 §4.


CI regression gating

BENCHMARKS.md records hand-run absolute numbers; CI guards against regressions in three lanes:

  1. Counter gate (every non-doc PR, hard). perf_counters.rs + tree.rs golden work-counter assertions under --features metrics. Catches algorithmic regressions (extra copy, whole-file slurp, O(N) tree rebuild).
  2. A/B wall-clock (warn-only, core src PRs). The perf-bench matrix job benches the base and PR commits in parallel on separate runners; the perf-ab job then diffs the two exported baselines and posts a critcmp delta as a PR comment. Never blocks.
  3. Release record. The benchmarks job runs the full bench suite at the ci tier on a tag and uploads the numbers as an artifact for curation here.

The fsync-storm (403→0) signal needs a real FUSE mount and lives only in the release lane / the #[ignore] bench_scan_under_latency, not the per-PR gate.

The release artifact is named benchmark-snapshot-<tag>; download it from the tag's workflow run. The job runs on a GitHub-hosted ubuntu-latest runner, not the dedicated box the rest of this file uses, so its wall-times are runner-relative and are not folded into the per-pass tables — only the portable signals (bytes_read, pread/fsync counts, refresh flatness) are cross-comparable. Each release's snapshot is recorded verbatim under Release CI snapshots.

Release CI snapshots

Per-tag records from the benchmarks release job (CI regression gating §3), run on a GitHub-hosted ubuntu-latest runner at the ci tier. This is not the dedicated box the per-pass tables use, so the wall-times here are runner-relative and are a point-in-time record per release — not comparable to those sections. The portable signals (scan_bytes_read, pread/fsync counts, refresh flatness) are comparable, and are the no-regression check.

v1.1.0 — f865afc, single run

No regression vs the curated tables: scan_bytes_read is unchanged from PR2 (flac/ogg/wav = 845 000 / 847 400 / 828 000 B — the −128 B/file ID3v1 gating still holds; mp3 847 200 B unchanged; m4a uses the seek-reader, 0 B), and single-track refresh stays flat with library size.

read_throughput (Criterion, median estimate, µs):

benchflacmp3m4am4a-lastoggwav
sequential_read416418417419524†422
cold_first_read798778805804911792
seek_read368353372370582365

concurrent_read_walk/m8_plus_walker: 931 µs.

sequential_read/ogg collected only 10k iterations with 19% outliers (Criterion low-sample warning) — treat as noisy.

bench_ingest — ci tier (200 tracks × 4 KiB), runner tmpfs:

formatscan (ms)revalidate (ms)scan_bytes_read (B)RSS (KiB)
flac311845 0007100
mp3821847 2007164
m4a86107180
m4a-last87107184
ogg831847 4007184
wav831828 0007184

bench_refresh — ci tier, single-track re-tag:

library sizerefresh-1 (ms)root-fanout-1 (ms)
10000
100011
5000123
2000079

refresh-1 vs refresh-N (200-track ci, same instance): refresh-1 0 ms, refresh-N (100 touched) 6 ms. The 5000 > 20000 inversion is single-run noise on the shared runner; the dedicated-box #69 sweep is the clean flat signal.

Methodology

Machine

CPU8 cores
RAM32 GB (31 GiB)
Durable storage (/data)btrfs, 2-device span (sda3+sdb3), rotational; Data: single, Metadata: RAID1; zstd:1. No SSD on this box.
RAM storage (/dev/shm)tmpfs
Toolchainrustc 1.96.0 · release builds
KernelLinux 7.0 (FUSE passthrough requires ≥6.9 + CAP_SYS_ADMIN)

Before / after definition

History is squash-merged (linear), so each pass is one commit:

  • after = the pass's own squash-merge commit.
  • before = its parent, <after>^PR-isolated, not current main. This preserves attribution (each delta is exactly what that PR changed) and avoids harness drift from later passes.

The overlay rule

Two passes (SP2, SP4) report a bench that did not yet exist at their before commit. For those, the after-commit's harness file is checked out onto the before checkout (git checkout <after> -- <bench_file>) so the old code is measured with the new harness. Overlay use is called out in each affected section.

Run conventions

  • bench_ingest / bench_refresh (ignored tests, cargo test --release … -- --ignored): 3 runs, median reported (spread noted where it matters). bench_ingest needs --features metrics.
  • read_throughput (Criterion bench): Criterion's own sampling; before side saved with --save-baseline, after side compared with --baseline. Reported Δ is Criterion's change estimate.
  • Wall times on /data are box-relative (rotational disk); where a portable signal exists (fsync count, bytes_read, pread count) it is the primary number.

Storage placement

  • Durable rows run on /data (rotational btrfs). bench_ingest honors MUSEFS_BENCH_DIR.
  • RAM rows run on /dev/shm (tmpfs). bench_ingest honors MUSEFS_BENCH_DIR=/dev/shm/…; bench_refresh and read_throughput ignore it and follow TMPDIR=/dev/shm.

Per-pass detail

SP1 — Ingestion scalability

ccbbfaa^ccbbfaa. bench_ingest, --features metrics. No overlay.

What changed: whole-file fs::read slurp + per-file commits at synchronous=FULL → bounded probing reads + parallel-probe/single-writer pipeline + per-batch transactions at synchronous=NORMAL (WAL retained).

1. Durable small files — the fsync/batching win

ci tier (200 tracks × 4 KiB, no embedded art), corpus + DB on /data. Not compute-bound — the before path is dominated by per-file fsync latency.

formatbefore scan (ms)after scan (ms)speedup
flac32 206211534×
mp316 124141152×
m4a30 089191584×
m4a-last39 592113599×
ogg16 153141154×
wav15 574121298×

2. Durable large files — bounded reads + batching

bandwidth tier (1000 tracks × 30 MiB FLAC + art ≈ 30 GiB), on /data, 1 run.

metricbefore (slurp)after (bounded)Δ
scan wall (ms)378 04115 22824.8× faster
revalidate (ms)2431417.4×
peak RSS (KiB)98 636132 4360.74× (more)

The after path reads only a ~1 MiB metadata window per file instead of slurping each 30 MiB file in full.

3. fsync count — the mechanism

ci tier (200 FLAC) scanned through the passthrough latency-FS (ssd profile), which counts fsyncs at the FUSE layer. Wall is box-relative (rotational /data); the fsync count is the portable signal.

configfsyncsscan wall (ms, box-relative)
before (synchronous=FULL, per-file commits)40379
after (synchronous=NORMAL, batched commits)021

The 403→0 collapse is the root cause of §1's durable speedups.

4. Compute-isolated (RAM) — the trade, now a win on this box

large-compute tier (100k tracks × ~38 KiB FLAC) on /dev/shm (RAM), where fsync is free — so the §1/§3 batching win is neutralized and only raw compute remains. bytes_read ≈ 3.92 GiB both sides (the 38 KiB files are below the 1 MiB window, so bounded reads don't help).

configbefore scan (ms)after scan (ms)revalidate before→after (ms)peak RSS before→after (KiB)
default jobs31 24122 2952239 → 127827 904 → 96 084
--jobs 131 11123 5652255 → 128328 024 → 92 200

Finding — direction inverted vs the historical file. The old file (6-core EPYC) recorded SP1 as ~1.9× slower on RAM — the deliberate "honest cost" of the pipeline where there is no fsync win to amortize. On this 8-core box the parallel pipeline is ~1.4× faster even on RAM (the extra cores outweigh the per-file coordination), at the cost of ~3.4× peak RSS (96 MB vs 28 MB). The trade has shifted from "small RAM loss" to "RAM win for more memory" on wider hardware.

# durable §1/§2: MUSEFS_BENCH_DIR on /data ; RAM §4: MUSEFS_BENCH_DIR on /dev/shm
MUSEFS_BENCH_TIER=ci MUSEFS_BENCH_DIR=/data/bench \
  cargo test --release -p musefs-core --features metrics --test bench_ingest \
  -- --ignored --nocapture bench_cold_scan_and_revalidate
# §3 fsync count:
MUSEFS_BENCH_LATENCY_PROFILE=ssd MUSEFS_BENCH_TIER=ci MUSEFS_BENCH_FORMAT_MIX=flac \
  cargo test --release -p musefs-core --features metrics --test bench_ingest \
  bench_scan_under_latency -- --ignored --nocapture

SP2 — Incremental tree refresh

ed5f380^ed5f380. bench_refresh, RAM (TMPDIR=/dev/shm). Overlay: the bench_refresh_one_across_library_sizes sweep didn't exist at ed5f380^, so the after-commit harness is overlaid on the before checkout.

What changed: replace the O(N) VirtualTree::build_with full reconstruction with apply_changes (in-place im-backed tree mutation) — only nodes whose id appears in the changed/added/removed sets are touched.

ci tier, FLAC, single-track re-tag, 3 runs (median):

library sizebefore (ms)after (ms)speedup
10000n/a (sub-granularity)
1000560.83× (noise tier)
500032231.39×

Why (Stage A → Stage B): at Stage A the rebuild already rendered incrementally (only the changed track re-rendered, O(changed)), but the subsequent VirtualTree::build_with reconstructed the whole tree from scratch (O(N)) — the remaining linear cost. Stage B's apply_changes removes that full reconstruction; the residual slope (still ~23 ms at 5000) is the lighter O(N) render-key scan + HashMap rebuild that feeds apply_changes, not a full tree rebuild. The speedup grows with library size because diff cost is proportional to changes, not total entries. (Corpus is single-album, so build_with time is slightly optimistic vs a real multi-album library.)

cargo test -p musefs-core --release --test bench_refresh \
  bench_refresh_one_across_library_sizes -- --ignored --nocapture

SP3 — Read/serve residuals

e8d56bd^e8d56bd. Criterion read_throughput, RAM. No overlay.

What changed: (1) read_segments writes each BackingAudio run directly into the output buffer's reserved tail (no throwaway vec![0u8; n] + copy); (2) handles: Mutex<HashMap> → lock-free sharded_slab::Slab; (3) size_cache: Mutex<HashMap>dashmap::DashMap.

sequential_read — per-format (4 MiB files, 128 KiB reads)

formatbefore (µs)after (µs)time Δthrpt Δ
flac929.1839.6−7.9%+8.6%
mp3940.2824.8−13.1%+15.1%
m4a939.8824.2−10.8%+12.2%
m4a-last938.0842.6−10.3%+11.4%
ogg966.81049.4+6.3%−5.9%
wav935.4912.3−2.5%+2.5%

The metadata-light formats improve 8–13% from dropping the per-splice alloc+copy. ogg +6.3% is a low-iteration sampling anomaly (Criterion warned "Unable to complete 100 samples in 5.0s" — only 5050 iterations vs 10k for other formats).

concurrent_read_walk/m16_plus_walker

16 reader threads + one metadata walker sharing one Arc<Musefs> (includes thread spawn/join):

before (ms)after (ms)Δ
m16_plus_walker8.209.48+15.7%

This high-variance burst metric regressed on this run — attributable to thread spawn/join overhead in the contention path rather than the read path itself; it is not a sequential-read regression. (The old file recorded this bench as parity/improved; it swings run-to-run.)

cargo bench -p musefs-core --bench read_throughput -- sequential_read concurrent_read_walk

SP4 — Storage-aware Ogg serving

a62453b^a62453b. Criterion read_throughput + latency-injected read. Overlay: cold_first_read/seek_read were added by SP4, so the after-commit bench is overlaid on the before checkout.

What changed: replace the eager whole-region Ogg page index with a stateless per-request backwards-scan: find_page_start locates the containing page from a ~65 KB window (CRC-validated entry guard), serve_ogg_window patches each page header algebraically (crc_shift_zeros, no payload I/O), and a one-entry last_page memo short-circuits the scan + CRC guard when the next request lands inside the already-located page.

sequential_read — warm repeat-read (no page-index amortization to win)

formatbefore (µs)after (µs)Δ
flac856.2880.5+2.8%
mp3847.7894.5+5.5%
m4a862.5816.9−5.3%
m4a-last872.7831.6−4.7%
ogg1037.91048.2+1.0%
wav892.6840.8−5.8%

cold_first_read / seek_read — the Ogg win

benchformatbeforeafterΔ
cold_first_readogg14.956 ms1.799 ms−88.0%
seek_readogg13.541 ms827 µs−93.9%

Non-ogg cold/seek stay within ±7% (no page index involved). The wins come from never building the whole-file index up front — the old code reads the entire prefix to serve even one chunk near EOF; SP4 scans ~65 KB backward, then the memo carries the validated page forward. sequential_read/ogg is flat (+1.0%) because it reads the full file linearly regardless — the win is cold-start and seek.

Latency-injected reads (bench_read_under_latency, nfs-hdd) — AFTER only

This bench was introduced by SP4; no before baseline exists.

labelformattierstoragewall (ms)openspreads
read_whole_coldoggcinfs-hdd2810
read_seek_coldoggcinfs-hdd2810

preads=0: the backwards-scan reads are served from the layout's inline/generated segments without reaching the backing file. Near-equal whole/seek wall time indicates per-file open+resolve latency dominates under nfs-hdd; the local cold/seek benches above are the clean signal.

Why crc_shift_zeros is a hybrid

patch_page_header_algebraic advances the CRC past a page's payload via crc_shift_zeros. The per-step loop is O(n) and dominated linear sequential_read on max-size 65 KB pages; a GF(2) matrix-power method is O(log n) but carries a fixed ~32-matmul cost, so it is slower for the small pages real Opus/Vorbis streams carry. The evolution across implementations (ogg benches):

ogg benchlinear crc+matrix+matrix +memo-amortized guard (shipped)
sequential_read17.6 ms6.40 ms0.93 ms
cold_first_read~17 ms7.42 ms1.61 ms
seek_read821 µs829 µs

Shipped as a hybrid: per-step loop below n=16384, matrix at/above; a differential test covers both paths + the boundary.

cargo bench -p musefs-core --bench read_throughput -- cold_first_read seek_read sequential_read
MUSEFS_BENCH_LATENCY_PROFILE=nfs-hdd cargo test --release -p musefs-core \
  --features metrics --test bench_ingest bench_read_under_latency -- --ignored --nocapture

#69 — Refresh O(changed)

e7ae912^e7ae912. bench_refresh, RAM. No overlay.

What changed: changelog-driven change detection (changelog_since + render_keys_for on just the changed ids) replaces the O(N) render-key scan, and collision-gated apply_changes dirtying stops the old parent chain from being rebuilt unconditionally. Refresh-1 cost becomes O(changed).

Single-track refresh vs library size (3 runs, median)

A single-track re-tag moves the track out of its shared album dir — the structural worst case for a flat corpus (one artist / one album, N siblings).

library sizebefore — full rebuild (ms)after — O(changed) (ms)factor
10000
100060∞ (sub-ms)
5000330∞ (sub-ms)
200001731~170×

The after sweep is flat: refresh-1 @ 20 000 is within 1 ms of @ 100, against a linear ~170 ms slope before.

One-vs-many (same Musefs instance, 200-track ci tier)

labelwall (ms)
refresh-10
refresh-N (100 touched)4

refresh-N scales with the touched set, not the library.

# before (apply the 4-point sweep edit first):
sed -i 's/\[100usize, 1000, 5000\]/[100usize, 1000, 5000, 20000]/' musefs-core/tests/bench_refresh.rs
cargo test -p musefs-core --release --test bench_refresh \
  bench_refresh_one_across_library_sizes -- --ignored --nocapture
cargo test -p musefs-core --release --test bench_refresh \
  bench_refresh_one_vs_many -- --ignored --nocapture

#114 — Rendered child lookup (root fan-out)

0881b31^0881b31. bench_refresh, RAM. Overlay: the bench_refresh_root_fanout_one_across_library_sizes bench was added by #114, so its harness is overlaid on the before checkout.

What changed: a rendered-name child index turns the root sibling scan in deepest_existing_ancestor into an indexed miss. The corpus uses N top-level artist directories; the timed update retags one track to fallback Unknown/…, exercising an absent rendered-name lookup at root.

library size (top-level artists)before (ms)after (ms)
10000
100000
500020
2000051

~5× at the 20 000-artist fan-out; ≤5 k is already ≤2 ms on both sides.

cargo test -p musefs-core --release --test bench_refresh \
  bench_refresh_root_fanout_one_across_library_sizes -- --ignored --nocapture

PR2 — Scan pair (#67/#68)

2d4faf3^2d4faf3. bench_ingest, --features metrics, RAM, 3 runs. No overlay.

What changed: (#67) gate the 128-byte ID3v1 tail read to .mp3 files — only MP3 consumes the frame; (#68) ingest_bulk drains the owned Unit batch by value, moving picture payloads into the DB structs instead of cloning.

Wall time — ci tier (200 tracks × 4 KiB, no art), median of 3

formatbefore (ms)after (ms)
flac2930
mp32123
m4a2726
m4a-last3226
ogg2224
wav2124

Wall time is within run-to-run noise — at ci tier (4 KiB files, no embedded art) there is no picture payload to move, so #68's win doesn't show here. It appears on art-bearing corpora (the bandwidth tier / real libraries) where the clone was O(art-size) per file.

Scan I/O — the #67 signal (scan_bytes_read)

formatbefore (B)after (B)Δ totalΔ per file
flac870 600845 000−25 600−128 B
mp3847 200847 20000 (tail still read)
m4a000n/a (seek-reader path)
m4a-last000n/a
ogg873 000847 400−25 600−128 B
wav853 600828 000−25 600−128 B

Non-MP3 formats drop exactly the 128-byte ID3v1 tail per file (−25 600 B over the 200-track corpus). MP3 is unchanged; M4A uses the seek-reader, not the front-anchored probe path.

MUSEFS_BENCH_TIER=ci MUSEFS_BENCH_DIR=/dev/shm/bench \
  cargo test -p musefs-core --release --features metrics --test bench_ingest \
  -- --ignored --nocapture bench_cold_scan_and_revalidate

PR3 — Serve-path copies (#70)

32be8f0^32be8f0. Criterion read_throughput, RAM. No overlay.

What changed: four stacked serve-path copy eliminations — DB chunk readers fill the caller's &mut [u8]; read_segments writes ArtImage/BinaryTag/raw OggArtSlice arms into the output buffer's resized tail; Musefs::read_into serves into a caller buffer; and the FUSE layer reuses a per-worker thread-local scratch buffer. None touches synthesis or layout (served audio stays byte-identical).

sequential_read

formatbefore (µs)after (µs)Δverdict
flac939.8924.8−2.1%noise
mp3917.2884.1−3.1%noise
m4a904.1877.6−3.7%noise
m4a-last909.8860.3−7.4%improved
ogg1080.4963.4−9.1%improved
wav925.6815.7−11.1%improved

cold_first_read / seek_read / concurrent

benchbeforeafterΔverdict
cold_first_read/flac1.652 ms1.557 ms−5.8%improved
cold_first_read/mp31.590 ms1.678 ms+5.5%regressed (within 10%)
cold_first_read/ogg1.781 ms1.694 ms−4.9%improved
seek_read (all)within ±2.7%held
concurrent_read_walk/m169.490 ms7.642 ms−19.5%improved

No format breaches the >10% rise gate. The concurrent burst metric improves 19% here (it is high-variance and swings run-to-run; see SP3).

cargo bench -p musefs-core --bench read_throughput -- \
  sequential_read concurrent_read_walk cold_first_read seek_read

#136 — HeaderCache → quick_cache

2e6674e^2e6674e. Criterion read_throughput, RAM. No overlay.

What changed: an S3-FIFO byte-weighted quick_cache replaces the hand-rolled 16-shard Mutex LRU — the serve path's last shared std lock is gone.

At a glance: within noise. No workload regresses outside noise; the only movers are marginal sequential_read improvements on the metadata-light formats.

benchbeforeafterΔverdict
sequential_read/m4a851.1 µs794.7 µs−6.6%improved
sequential_read/m4a-last855.2 µs798.3 µs−6.7%improved
sequential_read/ogg1.043 ms962.9 µs−7.7%improved
sequential_read/flac,mp3,wavwithin noiseheld
cold_first_read (all)within noise / −3.6% m4aheld
seek_read (all)within noiseheld
concurrent_read_walk/m165.557 ms5.451 ms−1.9%held
cargo bench -p musefs-core --bench read_throughput

#112 — StructureOnly kernel passthrough

0881b31faec017. Bespoke dd harness (committed: benches/passthrough_dd.sh), sudo (passthrough needs CAP_SYS_ADMIN).

What changed: the backing fd is registered at open (FUSE passthrough, kernel ≥6.9); the kernel serves StructureOnly reads directly from the backing inode, bypassing the daemon round-trip.

512 MiB WAV backing on /dev/shm (RAM-cached, isolates FUSE-path overhead), dd bs=1M sequential read, fresh mount per binary, 3 runs each:

run 1run 2run 3median
before (daemon reads)2.5 GB/s2.5 GB/s2.7 GB/s2.5 GB/s
after (passthrough)8.4 GB/s8.3 GB/s8.9 GB/s8.4 GB/s

3.36× on this RAM-cached sequential workload: the before path round-trips every ~128 KiB chunk through the daemon (wakeup + positioned read + copy back via /dev/fuse); the after path reads straight from the backing inode's page cache.

sudo benches/passthrough_dd.sh target/release/musefs /dev/shm/pt 512

Cumulative detail

16caba4 → current main (e02223e). Derived, non-isolating — composed from the per-pass isolated deltas above, anchored to current-main absolutes. A same-harness end-to-end run is infeasible: MountConfig.case_insensitive and scan_directory_with/ScanOptions/revalidate_with don't exist at 16caba4 (so main's harnesses can't compile there), and the 16caba4-era harness omits the now-required case_insensitive field (so it can't compile on main either). The deltas below name the contributing passes and the dominant one; unrelated speedups are not multiplied into a single headline.

Current-main absolutes (1 run, native harness)

Ingest — ci tier, /data, bench_ingest:

formatscan (ms)revalidate (ms)RSS (KiB)
flac4726900
mp32526944
m4a5526956
m4a-last3936980
ogg2026980
wav2536984

Refresh — RAM, bench_refresh_one_across_library_sizes: refresh-1 @ 100 / 1000 / 5000 = 0 ms; @ 20 000 = 1 ms.

Serve — RAM, read_throughput (Criterion median): sequential_read flac 569 µs · mp3 563 µs · m4a 566 µs · m4a-last 568 µs · ogg 737 µs · wav 598 µs; cold_first_read ogg 1.507 ms; seek_read ogg 806 µs; concurrent m16+walker 4.15 ms.

Composed per-subsystem deltas

Ingest = SP1 ∘ PR2. Dominated by SP1's durable-fsync elimination; PR2 is the −128 B/file + move-not-clone refinement.

metricpre-SP1current mainΔ
fsync count (latencyfs)4030eliminated
scan_wall (ci flac)32 206 ms47 ms~685×
scan_wall (bandwidth flac)378 041 ms~15 228 ms†~24.8×
scan_bytes_read (ci flac)870 600 B845 000 B−128 B/file

† Bandwidth tier not re-measured at main; figure is SP1's after number.

Refresh = SP2 ∘ #69 ∘ #114. The O(N)→flat journey; dominant pass is #69 (changelog-driven O(changed) rebuild), with #114 shaving the 20 k root fan-out on top.

metricpre-SP2current mainΔ
refresh-1 @ 10005 ms0 ms∞ (sub-ms)
refresh-1 @ 500032 ms0 ms∞ (sub-ms)
refresh-1 @ 20000173 ms1 ms~173×

Serve = SP3 ∘ SP4 ∘ PR3 ∘ #136. SP3 + PR3 drive the cross-format sequential/cold/seek wins (alloc elimination + copy reduction); SP4 owns the ogg cold/seek collapse.

metricpre-SP3current mainΔ
sequential_read/flac929 µs569 µs−38.8%
sequential_read/mp3940 µs563 µs−40.1%
sequential_read/m4a940 µs566 µs−39.8%
sequential_read/ogg967 µs737 µs−23.8%
sequential_read/wav935 µs598 µs−36.1%
cold_first_read/ogg14.96 ms1.51 ms−89.9%
seek_read/ogg13.54 ms806 µs−94.0%
concurrent m16+walker8.20 ms4.15 ms−49.4%

Criterion's own change: lines compare against the previous on-machine baseline (itself already optimized); the absolutes above are the reliable end-to-end signal.


Storage tunables

A proposed --storage-profile {ssd,hdd,nfs} preset would have bumped --max-readahead-kib and --max-background (and enabled --keep-cache) per medium, on the premise that "larger read-ahead hides HDD/NFS latency." Measured against real storage, that premise does not hold — only --keep-cache shows a benefit — so the preset was dropped and these flags keep their defaults. This section records the evidence.

Methodology

Unlike the optimization passes above (tmpfs, in-process Criterion), these run through a real kernel mount with a real reader, because the tunables are kernel↔FUSE negotiation parameters invisible to an in-process driver:

  • Backing: real RAID-1 HDD (/home, /dev/md127) and a btrfs HDD span (/data, /dev/sda3); for NFS, a loopback NFSv4.2 export (exportfs + mount -t nfs localhost:…) whose backing is tmpfs (isolates the RPC tax) or HDD (RPC + seeks).
  • Latency: tc qdisc add dev lo root netem delay <X>ms adds X per packet → ≈2X RTT per NFS RPC. Tested at 8 ms, 50 ms, and 200 ms RTT (the last ≈ a trans-Pacific server).
  • Cold reads: sync; echo 3 > /proc/sys/vm/drop_caches before each measured read — without it the page cache serves repeats and hides all backing latency.
  • Mode: synthesis, not structure-only. Structure-only triggers kernel FUSE passthrough when the process is privileged (these run as root), which serves the backing fd directly and bypasses the daemon read path — and with it every tunable that acts on that path. Synthesis splices BackingAudio reads through the daemon, the real serving path.
  • Why not the injected MUSEFS_FAULT_*_US model: it cannot show a read-ahead effect. FUSE delivers reads to the daemon in fixed ≤256 KiB chunks (max_pages, already pinned at the kernel's 1 MiB ceiling by fuser's 16 MiB default max_write), so the per-pread count — and thus any per-pread injected latency total — is independent of max_readahead.

Reproduce: benches/storage_tunables_bench.sh (needs /dev/fuse, root, and for the NFS rows nfs-kernel-server + tc). HDD numbers are noisy (±10–15%); the trends, not the digits, are the signal.

--max-readahead-kib — no benefit anywhere; hurts on HDD

Cold single-stream sequential throughput (MB/s), synthesis:

readahead KiBHDD /home (RAID1)HDD /data (btrfs)NFS 8 msNFS-on-HDD 50 msNFS-on-HDD 200 ms
512 (default)24812730.84.71.3
20481917230.64.91.3
40961538430.54.91.3
32 (probe)23775

(File sizes differ per column — 512 MiB local, 96 MiB at 50 ms, 48 MiB at 200 ms — so compare within a column, not across. The 200 ms column ≈ a trans-Pacific server: flat to the last digit.)

The window size barely moves throughput, and on HDD values ≥2048 KiB are among the slowest (peak is ~128–512 KiB). The reason is visible on NFS: 512 MiB ÷ 256 KiB × 8 ms ≈ 16 s ≈ the observed 31 MB/s — a single stream is served serially, one ≤256 KiB read at a time, each paying the full RTT, with no prefetch overlap that a larger window could exploit.

--max-background — no effect on read throughput

Wall time (s) for N concurrent cold streams over distinct tracks:

max_backgroundHDD /home (16 streams)NFS 8 ms (16)NFS-on-HDD 50 ms (80)NFS-on-HDD 200 ms (24)
64 (default)4.555.16177.8238.5
1285.055.18175.7237.4

64 ≈ 128 even with 80 > 64 streams. Expected: musefs's FuseConfig notes max_background caps background work and that "foreground reads are bounded only by client concurrency, not by this." The concurrent reads here are foreground. (Concurrency does hide latency — 16 NFS streams reach ~10× single-stream aggregate — but that is client parallelism, which max_background does not gate.)

--keep-cache — the one real win (~3×)

Cold read then immediate reopen (no cache drop between); reopen_s is the signal:

keep_cacheHDD reopen (s)NFS 8 ms reopen (s)NFS-on-HDD 50 ms reopen (s)
false0.2240.2070.039
true0.0620.0600.014

With --keep-cache the kernel retains the page cache across opens, so a re-opened file is served from RAM instead of re-fetched over slow storage — ~3× faster reopen, consistent across HDD and NFS. This is the only tunable worth changing for slow backing (relevant for players/scanners that re-open files), and it needs no preset. It is on by default as of #432 (inode invalidation on retag keeps it consistent); disable with --keep-cache false on memory-constrained hosts.

Conclusions

  • Drop the --storage-profile preset. Of the four knobs it would have set, three (max_readahead, max_background, and by extension a per-medium combination of them) show no benefit; max_readahead ≥2048 KiB actively hurts on HDD. The only justified change — enable --keep-cache on HDD/NFS — does not need an abstraction.
  • Single-stream latency hiding — addressed in #255 (next section). The serialized read path measured above (512 MiB ÷ 256 KiB × RTT) is exactly what backing read-ahead now fixes.

Backing read-ahead (#255)

Each --max-readahead-kib row above exposed the real bottleneck: a single stream is served one ≤256 KiB FUSE chunk at a time, each paying the full backing RTT, so a 200 ms-RTT NFS mount tops out at ~1.3 MB/s regardless of the kernel read-ahead window. The fix is read amplification in the daemonBackingReader coalesces a stream's small reads into one large positioned pread (geometric window growth, global RAM budget with LRU eviction), so the backing client can pipeline/parallelize the RPCs behind one syscall. A background-prefetch-threads layer ("Phase 2") was also built but is off by default (see below).

Methodology

Two harnesses. Real kernel mount (benches/storage_tunables_bench.sh): a real reader (dd) over a real FUSE mount, cold (drop_caches) each sample, median of 3. Local backing on a btrfs HDD; NFS via a loopback NFSv4.2 export plus tc netem for RTT. The corpus is real FLAC (MUSEFS_BENCH_CORPUS_SRC) — a /dev/zero corpus on a compressing fs (btrfs compress=zstd) collapses to a cached extent and never touches the platter, which silently inverts the HDD numbers; real already-compressed audio is incompressible. In-process (musefs-core/tests/bench_ingest.rs::bench_read_under_latency): the core read path over musefs-latencyfs (per-op injected latency), isolating the daemon from the kernel FUSE layer. off = --read-ahead-budget-mib 0; phase1 = the default (amplification only); phase1+2 = --read-ahead-prefetch.

Single-stream cold throughput (MB/s)

backingoffphase 1 (default)phase 1+2passthrough
local HDD (btrfs, real FLAC)~60~62~60~58
NFS, tmpfs-backed, 200 ms RTT1.27.46.89.8

On NFS read-ahead is a ~6× single-stream win (1.2 → 7.4 MB/s, 75 % of the kernel-passthrough ceiling). On a real local HDD all four configs sit within run-to-run noise (~±15 %) — read-ahead is neutral, not a regression. (An earlier /dev/zero corpus showed a spurious −35 %; it was the zstd-compression artifact above, not read-ahead.)

Concurrent streams (8 × distinct tracks, aggregate MB/s, NFS 200 ms RTT)

offphase 1 (default)phase 1+2passthrough
1.613.612.116.3

In-process, per-op latency (16 MiB Ogg whole read; wall ms / backing preads)

profileoffphase 1 (default)
ssd (80 µs/op)45 ms / 774 preads26 ms / 32 preads
nfs-ssd (600 µs/op)138 ms / 774112 ms / 32

Amplification collapses 774 backing round-trips to 32; the win scales with per-op latency and is already material at SSD speeds (1.7×).

Phase 2 is off by default

Background prefetch threads (Phase 2) never beat amplification alone and cost a consistent ~10 %: single-stream NFS 6.8 vs 7.4, concurrent NFS 12.1 vs 13.6, neutral on HDD. A single large pread already lets the NFS client pipeline its RPCs, so the threads add coordination overhead without overlap to exploit. Phase 2 is therefore opt-in (--read-ahead-prefetch), retained for hypothetical backends where one large read does not self-pipeline.

Defaults: read-ahead on at --read-ahead-budget-mib 64, Phase-1 amplification only. Set 0 to disable on local-disk-only setups (no benefit there, though no harm either).

Internal window cap on HDD (#433)

The amplification window doubles per sequential read up to WINDOW_ABS_CAP (8 MiB, musefs-core/src/readahead.rs). The #256 sweep above measured the kernel max_readahead knob — where ≥2048 KiB hurts on HDD — but never this daemon-internal cap, so #433 asked whether 8 MiB is too large for spinning media.

Methodology. WINDOW_ABS_CAP is a compile-time const, so the sweep builds one release binary per value (benches/storage_tunables_bench.sh window-cap, which patches the const in place and restores it after each build). Cold (drop_caches) single-stream reads of the same ~270 MiB real FLAC, synthesis mount, default flags (amplification on, prefetch off), real backing on a btrfs HDD (/data, 4389-track corpus). Reproduce:

WINDOW_CAP_MIB="1 2 4 8 16" MUSEFS_BENCH_CORPUS_SRC=<music-tree> MUSEFS_BENCH_CORPUS_MAX_MIB=300 \
  benches/storage_tunables_bench.sh window-cap <hdd-backing-dir>

Result: no measurable cap effect — the medium's noise dominates. Median MB/s (and the within-cap min–max over 7 cold samples) overlap across every cap, and the apparent ordering is an artifact of measurement order, not the cap: throughput drifts down through each run, so whatever runs first looks fastest. Sweeping the caps in the reverse order reverses the "trend".

cap (MiB)ascending sweep, median (min–max)descending sweep, median
1104 (65–132)46
285 (54–142)58
483 (51–104)61
861 (54–72)61
1668 (54–94)86

The current default (8 MiB) lands at ~61 MB/s in both orderings; the first-measured cap is fastest in both (104 for cap 1 ascending, 86 for cap 16 descending). The within-cap spread (≈50–140 MB/s) dwarfs every between-cap median gap. This corroborates the #256 finding that backing read-ahead is neutral on local HDD.

Decision: keep 8 MiB, no runtime knob. There is no HDD gain to capture, and the cap exists for the proven case — the ~6× single-stream amplification win on high-RTT NFS/remote, where coalescing into one large pread lets the client pipeline RPCs. Lowering the cap to chase an unmeasurable HDD effect would regress that win.


Global allocator — steady-state RSS (#360)

Long-lived high-churn FUSE load fragments glibc malloc, growing daemon RSS over days without a true leak. The musefs binary now defaults to the jemalloc global allocator with a background purge thread. Measured with scripts/rss-churn-bench.sh (Linux; median VmRSS over the flattened tail — steady state, not peak).

Parameters: WORKERS=8 (nproc), FILES=500, CYCLES=200, WARMUP=20, no REFRESH_CMD. DB = a freshly-scanned 4427-track store on tmpfs (/tmp); backing audio on /data (HDD). Concurrent cat-to-/dev/null churn drives the open/read/release handle-table and read-synthesis allocation path.

AllocatorSteady-state RSS
system malloc~74.7 MiB (76496 kiB)
jemalloc~28.7 MiB (29368 kiB)

Decision: SHIP jemalloc. Steady-state RSS is ~62% lower (jemalloc ≤ system malloc, the §4 ship rule). Under identical churn glibc retained ~46 MiB of dirty pages that jemalloc's decay + background purge return to the OS — the #360 fragmentation failure mode, reproduced and fixed. The gap is far outside run-to-run noise, so no within-noise tie-break was needed.


Scan fingerprint overhead (#464)

Bench: cargo bench -p musefs-core --bench fingerprint_overhead Corpus: 200 minimal FLAC files (~200 B metadata + 4 KiB audio) in tempfile::tempdir() (TMPDIR = tmpfs/RAM). Single-threaded scan (jobs: 1). Criterion, 20 samples.

TierMedian (ms)µs/file
None47.0235
Fingerprint107.6538

Delta: +60.6 ms / 200 files = +303 µs/file overhead (+129%).

Interpretation: The 129% overhead on this synthetic RAM-backed bench exceeds the plan's ≤15% threshold. The overhead is dominated by the extra UPDATE tracks SET fingerprint = … SQLite execution per file inside the batch transaction — not by SHA-256 hashing cost (SHA-256 of a few hundred bytes is sub-microsecond). On a real HDD-backed library the probe I/O (tens-of-ms per file) is the bottleneck, making both the hash and the DB write negligible. See plan Task E2 step 3 decision note: the decision to keep SHA-256 and add the length CHECK was escalated to the controller because the raw percentage exceeded the stated threshold, even though the absolute overhead (303 µs/file) is operationally negligible at disk I/O rates.


Scan fingerprint overhead — SSD latency profile (#464)

Bench: bench_scan_under_latency in musefs-core/tests/bench_ingest.rs (MUSEFS_BENCH_LATENCY_PROFILE=ssd). Corpus: 200 minimal FLAC files (~200 B metadata + 4 KiB audio) on a musefs-latencyfs SSD-latency FUSE mount. Default thread count (jobs: 0). 3 runs each, median reported.

TierMedian (ms)µs/file
None2221110
Fingerprint2411205

Delta: +19 ms / 200 files = +95 µs/file overhead (+8.6%).

Interpretation: Under an SSD latency profile the I/O dominates and the fingerprint overhead drops to +8.6% (+95 µs/file), well within the plan's ≤15% threshold. The RAM bench's +129% (+303 µs/file) was an artefact of RAM eliminating the I/O that would normally dwarf the extra SHA-256 hash and DB write. At real SSD rates the fingerprint cost is operationally negligible.

Release notes

Curated, upgrade-focused notes for each release. For the exhaustive, per-change list see the Changelog; for the external-writer contrib/ packages (which version independently) see the contrib changelog.

v1.1.0

A feature-and-hardening release on top of the v1.0.0 stable line. No CLI flags or store columns were removed, but the on-disk schema steps to version 2 and a few defaults change observable behavior — read Upgrading from v1.0.0 before you update an existing store.

Highlights

  • Runtime telemetry. An opt-in --expose-metrics (env MUSEFS_EXPOSE_METRICS) surfaces a synthetic .musefs-metrics/ directory at the mount root whose metrics file renders Prometheus-format counters for getattr/read/open activity, backing read-ahead behavior, and (with the jemalloc build) allocator stats. Off by default. See Tuning & metrics.
  • Scan progress indicator. scan and scan --revalidate render a live progress bar on an interactive terminal and fall back to periodic ingested N/M (P%) lines when output is redirected. A new --quiet/-q suppresses it.
  • --skip-on-missing template flag. Opt-in (env MUSEFS_SKIP_ON_MISSING): drops a track from the mount when a top-level template field stays unresolved, instead of substituting --default-fallback. The motivating case is --template '$!{beets_path}' --skip-on-missing, hiding tracks beets left without a beets_path rather than collapsing them into an Unknown bucket.
  • --read-ahead-prefetch flag. Opt-in background prefetch threads layered on read amplification, default off — benchmarks found amplification alone delivers the read-ahead win, so enable this only when profiling a backend where a single large read does not self-pipeline.
  • riscv64 release platform. Prebuilt riscv64gc-unknown-linux-{gnu,musl} binaries and linux/riscv64 Docker images now ship with each tagged release. Container bases moved to current stable (Debian trixie, Alpine 3.23).
  • statfs reply. The mount now reports a synthetic non-zero capacity with ample free space, so df no longer shows a 0-byte filesystem and capacity-checking importers (Lidarr et al.) no longer balk.
  • Per-extension skip breakdown. End-of-scan summary breaks the skipped count down by lowercased extension (e.g. skipped 42: jpg=20, cue=10, log=8) so a large skip count is diagnosable. Log-only; the counters are unchanged.
  • musefs vacuum. A maintenance command that compacts the SQLite store — reclaiming the free pages that prunes, orphan-art GC, and the migration leave behind — and reports the space reclaimed. Run it while unmounted. See Maintenance.

Plus a substantial round of correctness and robustness fixes across the read fast path (rowid-reuse consistency for art segments), the MP4/QuickTime metadata walk, ID3 synthesis, and the prune/delete paths — see the Changelog for the full list.

Upgrading from v1.0.0

1. Back up your store. The schema migration below is one-way. While no scan or external writer is touching the database, copy musefs.db (and its -wal / -shm sidecars if present). A v1.0.0 binary has no guard against a newer store and may misread one that has been migrated, so keep the backup if you might roll back. From v1.1.0 onward a binary instead refuses to open a store whose schema is newer than it understands, with a clear error.

2. Automatic schema migration (user_version 1 → 2). The first time a v1.1.0 binary opens the store — for example musefs scan — it migrates in a single transaction. The migration:

  • Adds scanner-owned tracks.fingerprint and tracks.content_hash columns (nullable SHA-256 hex, non-unique by design) plus a fingerprint index. They start NULL and are populated on the next scan; external writers do not set them.
  • Rebuilds the tags table so the 256 KiB value cap counts bytes rather than characters (the v1 CHECK was up to ~4× looser for multibyte text). Any row that was already over the byte cap is dropped in the rebuild (this only reaches genuinely pathological data — a single tag value larger than 256 KiB of bytes, which a real library never has, and such rows were already unreadable under the byte-counting read guard anyway; in practice no store is affected).

The migration applies automatically the first time a v1.1.0 binary opens the store, but you should still run musefs scan --db <store> once after upgrading: that is what populates the new fingerprint / content_hash columns, which the scanner's content-identity refind logic relies on. Then remount. See The SQLite store for the full schema contract.

3. Behavior changes to check.

  • scan exit code. scan/scan --revalidate now exit 2 when any file fails to parse or ingest (previously always 0 on a non-fatal run). A clean scan still exits 0; a hard error still exits 1. Pipelines that key off the exit status — e.g. musefs scan … && musefs mount … — will now correctly stop on a partial-ingest failure; update any script that assumed 0.
  • --fallback keys are case-insensitive. A per-field --fallback AlbumArtist=… (or any non-lowercase key) is now matched against the template field instead of silently never applying. If you worked around the old bug by lowercasing keys, no change is needed; uppercase keys now take effect.
  • df on the mount now shows a synthetic capacity instead of zeros.
  • Extended attributes (getxattr/setxattr/…) now return ENOTSUP explicitly on the read-only mount; the caller-visible result is unchanged, but the per-probe [Not Implemented] warning is gone.

4. External writers (beets, Picard, Lidarr, python-musefs) version independently and need no change for this upgrade: the new fingerprint / content_hash columns are scanner-owned and nullable, so the external-writer contract is unchanged. Update those packages on their own cadence.

Earlier releases

For v1.0.0 and earlier, see the Changelog.

Changelog

All notable changes to this project are documented here. The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

The contrib/ Python packages have their own decoupled version and changelog: see the contrib changelog.

For curated, upgrade-focused notes (highlights and per-version migration steps), see the Release notes.

Unreleased

1.1.0 - 2026-06-17

Added

  • Runtime telemetry (.musefs-metrics): an opt-in --expose-metrics flag (env MUSEFS_EXPOSE_METRICS) surfaces a synthetic .musefs-metrics file at the mount root rendering Prometheus-format counters — getattr/read/open activity, backing read-ahead behavior, and (when built with jemalloc) allocator stats. Off by default; the file is absent unless enabled. See the README Metrics section (#394).
  • Scan progress indicator: scan and scan --revalidate render a live progress bar (indicatif) with an elapsed-time summary on an interactive terminal, falling back to periodic ingested N/M (P%) log lines when output is non-interactive. A new --quiet/-q flag suppresses it (#406).
  • --skip-on-missing template flag: an opt-in --skip-on-missing (env MUSEFS_SKIP_ON_MISSING) drops a track from the mount when a top-level template field stays unresolved, instead of substituting --default-fallback. Per-field --fallback chains and [...] optional sections are unaffected (a field resolved via its fallback counts as present). The motivating case is --template '$!{beets_path}' --skip-on-missing, which hides tracks beets left without a beets_path rather than collapsing them into an Unknown bucket (#408).
  • --read-ahead-prefetch flag: opt-in background prefetch threads layered on top of read amplification, default off — benchmarks found amplification alone delivers the entire read-ahead win, while the threads add ~10% overhead with no measured benefit. Enable only when profiling a backend where a single large read does not self-pipeline (#255).
  • riscv64 release platform: prebuilt riscv64gc-unknown-linux-{gnu,musl} binaries and linux/riscv64 Docker images now ship with each tagged release. Container bases bumped to current stable: glibc Debian bookworm → trixie (bookworm has no riscv64 image), musl Alpine 3.20 → 3.23 (3.20 is end-of-life).
  • statfs reply: the mount now reports a non-zero synthetic capacity with ample free space instead of fuser's all-zero default, so df no longer shows a 0-byte filesystem and capacity-checking importers (Lidarr et al.) don't balk (#368).
  • Per-extension skip breakdown: at end of scan, a summary line breaks the skipped count down by lowercased extension (e.g. skipped 42: jpg=20, cue=10, log=8, <none>=4), logged at warn so it shows by default, so a large skip count is diagnosable — expected sidecars versus genuinely unexpected files. Log-only; the ScanStats struct and CLI summary are unchanged (#341).
  • musefs vacuum command: compact the SQLite store, reclaiming free pages left by prunes, orphan-art GC, and the schema migration. Runs VACUUM + a WAL checkpoint and reports the space reclaimed; run it while unmounted (#566).

Fixed

  • Art/serve rowid-reuse consistency: the read fast path's WAL-snapshot + content_version guard, previously gated only on binary-tag layouts, now covers all DB-rowid segments (art ArtImage/OggArtSlice too) via RegionLayout::streams_db_rowid, and the stateless no-fh read fallback now applies the same snapshot/recheck and re-validates its freshly opened backing fd against the resolved stamp. A concurrent external retag + gc_orphan_art + reinsert can no longer splice a wrong image or stale tag bytes mid-read (the audio-bytes invariant was never affected) (#502, #503).
  • Per-field --fallback case-insensitivity: fallback keys are now ASCII lowercased to match template field names, so --fallback AlbumArtist=… (any uppercase) is honored instead of silently never matching (#504).
  • Tag value byte cap: both the schema CHECK (rebuilt in the MIGRATION_V2 upgrade) and the read-time tags.value guard now count bytes, not UTF-8 characters, so the 256 KiB materialized-memory bound is exact rather than up to ~4x looser for multibyte text. The upgrade drops any pre-existing over-cap rows (already unreadable under the byte-counting reader guard) (#505).
  • Embedded NUL in ID3 metadata: synthesized ID3 frames now reject a DB-sourced tag key, tag value, art mime, or art description containing an embedded NUL instead of emitting a frame a downstream parser would misread (#506).
  • Orphan-art GC NULL safety: gc_orphan_art uses NOT EXISTS rather than NOT IN (subquery), so a NULL art_id could not silently turn the GC into a no-op (#507).
  • Mount usability: mount now warns when the mountpoint is non-empty (its contents are shadowed for the mount's lifetime), and a permission-denied mount (e.g. an AppArmor-restricted prefix) prints actionable guidance instead of a bare "Permission denied" (#508, #509).
  • Silent mp4 oversize drops: oversized embedded covr cover art and binary freeform (----) values in .m4a/.m4b files are skipped in the format layer before materialization (to avoid building a large image out of a large moov), which previously dropped them with nothing in the logs. The scan now emits a warn line for each, matching the logging the other formats already had (#343, follow-up to #284).
  • xattr log noise: getxattr/listxattr/setxattr/removexattr now reply ENOTSUP explicitly (read-only filesystem, no extended attributes) instead of falling through to fuser's default, which logged a [Not Implemented] warn on every xattr probe (ls -l, indexers, backup tools). The caller-visible result is unchanged (#364).
  • MP4 path-to-ilst leniency: the walk to moov/udta/meta/ilst now uses the same lenient box scan as the metadata extractors, so a single malformed or truncated sibling box anywhere on the path no longer suppresses an otherwise well-formed ilst and silently drops every tag and cover. The audio/structure path stays strict (#542).
  • QuickTime bare meta atoms: the meta parser only consumes the 4-byte FullBox version/flags prefix when it is actually present (a zero word), so a QuickTime-style bare meta — which has no such prefix — is read instead of landing mid-header and dropping all tags and art (#543).
  • scan exit code on ingest failure: scan/scan --revalidate now exit 2 when any file fails to parse/ingest (failed > 0), instead of always exiting 0. A pipeline such as musefs scan … && musefs mount … can now detect a partial or total ingest failure; a clean scan still exits 0 and a hard error still exits 1 (#554).
  • Release smoke audio-bytes check: scripts/smoke-binary.sh (the per-arch release gate) now compares the served file's encoded audio stream against the untouched backing file, asserting the cardinal byte-identical-audio invariant rather than only checking the fLaC magic — so a target-specific positioned-read or offset regression in a cross-compiled binary is caught (#547).

1.0.0 - 2026-06-12

First stable release.

Added

  • Lidarr integration: a new contrib/lidarr/ package that drives symlink-based placeholder imports and syncs Lidarr metadata into the musefs SQLite store.
  • FUSE mount-access controls: new --allow-other, --owner, and --group flags mount with allow_other + default_permissions so accounts other than the mounting user can reach the view and the presented owner/group/mode bits are enforced; --owner/--group imply --allow-other. A non-root allow_other mount is pre-flight checked against /etc/fuse.conf user_allow_other and fails early with guidance if it is missing. See the README Ownership and permissions section (#293, #294).
  • Hardened deployment assets: the container image runs as a dedicated unprivileged user with a build-arg-configurable UID/GID, and the musefs-scan.service systemd unit ships a strong sandbox (the FUSE-mounting musefs.service deliberately cannot be sandboxed). See the systemd hardening notes (#317, #318, #319).
  • crates.io distribution: the musefs binary is published to crates.io as of this release and installable with cargo install musefs. A new thin musefs wrapper crate owns the binary (musefs-cli is now a library crate), and a tag-triggered release workflow publishes all crates in dependency order.
  • Fuzzing & property tests: coverage-guided cargo-fuzz targets for every format parser (FLAC, MP3, MP4, Ogg, WAV), the byte-level primitives (Ogg page parsing, base64 windowing, VorbisComment), and the serve path — the latter drives the full synthesis pipeline over hostile DB rows and binary tags via a fuzzing-gated Db::with_raw_conn. Plus proptest invariants — panic-freedom, the byte-identical audio guarantee, and tag round-trip — an end-to-end read-fidelity property, and a mutagen interop test asserting an independent reader sees the tags we synthesize.

Changed

  • mount --db now requires an existing store. Mounting against a missing database path is rejected before any FUSE setup instead of silently creating and migrating an empty store, so a mistyped --db fails loudly rather than mounting an empty view. scan --db still creates the store if absent (#309).

Fixed

  • Scanner no longer drops files and embedded art silently: embedded cover art over MAX_ART_BYTES (and binary tags over MAX_BINARY_TAG_BYTES) were filtered out at ingest with no log line, so a track whose art exceeded the cap appeared to simply have none — indistinguishable from a scan bug. The drop is now logged (RUST_LOG=warn). Likewise, a supported-extension file that fails to parse or errors mid-probe was counted failed with the underlying error discarded; the reason is now logged. Note: oversized art in .m4a/.m4b files is dropped earlier, inside the format layer, and is not yet logged (#284, #343).
  • Lidarr custom-script env var casing: Lidarr stores custom-script environment variables in a .NET StringDictionary, which lowercases every key, so a Linux script actually receives lidarr_sourcepath / lidarr_eventtype rather than the PascalCase names Lidarr's docs list. The integration read the PascalCase names, so with a real Lidarr every import failed and every event parsed as unsupported. Lidarr env vars are now resolved case-insensitively. Found by the issue #141 real-instance smoke run.
  • VorbisComment parse OOM (DoS): a crafted comment block declaring a huge entry count made Vec::with_capacity attempt a multi-gigabyte allocation; the pre-allocation is now bounded by the readable byte count. Found by the new vorbiscomment fuzz target.
  • MP4 box-bounds integer overflow: an untrusted 64-bit extended box size made the box-bounds check (pos + total) overflow usize — a panic in debug and a silent wrap in release that accepted a bogus box length. The addition is now checked. Found by the mp4 fuzz target.
  • ID3v2 parsing unbounded allocation (DoS): the id3 crate eagerly allocates a frame's declared size (ID3v2.3 frame sizes are plain 32-bit, up to 4 GiB), so a crafted tag could exhaust memory at scan time — via an MP3 or a WAV embedded id3 chunk. Parsing is now gated on validated ID3v2 frame bounds and an ID3v2 tag at offset 0 (the id3 reader scans forward). Found by the mp3 and wav fuzz targets.
  • Scan counters now match their documented contract: musefs scan reports every non-audio file (any unsupported or missing extension — .jpg, .cue, .log, .nfo, cover art, etc.) as skipped, and supported-extension files that fail to parse (e.g. a corrupt .flac) as failed. Previously malformed files were miscounted as skipped and unsupported files were not counted at all, so expect skipped to be larger than before on a real library (#301).
  • Symlink scans no longer double-count: with --follow-symlinks, a file reached via both its real path and a symlink is ingested and counted once instead of inflating scanned; multiple hardlinks to the same inode are likewise collapsed to a single track (#302).
  • Stable inodes on case-insensitive mounts: the inode allocator is now keyed on the case-folded path in case-insensitive mode, so an unrelated deletion that flips a merged directory's display casing no longer reassigns a survivor's inode (#305).
  • Lidarr autoscan now honors the scan timeout: an import/release-triggered autoscan applies the shared 120s scan timeout, matching the beets and Picard integrations, so a wedged musefs scan fails with a controlled timeout instead of blocking the custom-script process indefinitely (#312).

0.2.0 - 2026-05-27

First public release.

Added

  • Formats: synthesis for M4A/M4B (MP4), Ogg (Opus, Vorbis, FLAC-in-Ogg), and WAV, alongside the existing FLAC and MP3 — metadata generated on the fly from the SQLite store and spliced in front of byte-identical backing audio.
  • Arbitrary tag support: a single canonical tag vocabulary maps common fields to each format's native slot (ID3 frame / MP4 atom / Vorbis field); any other tag round-trips through the format's extension slot (ID3 TXXX, MP4 ---- freeform, raw Vorbis field). User-defined key casing is preserved.
  • beets plugin (contrib/beets/): syncs beets' canonical tags and cover art into the store keyed by each file's real path, with no remount and no audio rewrite.
  • Performance, concurrency & caching pass: worker-pool offload of blocking reads, lock-free virtual-tree swap, per-handle I/O, a bounded LRU header-layout cache, debounced single-flighted refresh with stable inodes, kernel/mount tuning flags, bounded-memory MP4 resolves, and opt-in --keep-cache with auto-invalidation.

Notes

  • Read-only mount; tag edits happen out-of-band against the SQLite store and are picked up automatically (PRAGMA data_version polling). See the README Supported formats section and the per-format docs for round-trip limitations.

0.1.0

  • Initial MVP (FLAC and MP3 synthesis, virtual tree with beets-style templates, synthesis / structure-only mount modes, auto-refresh, scan / scan --revalidate). Never published publicly; superseded by 0.2.0.

Security

Security Policy

Supported versions

Security fixes target the latest release (see CHANGELOG.md); there are no maintained backport branches.

Reporting a vulnerability

Please report vulnerabilities privately via GitHub's security advisory form: github.com/Sohex/musefs/security/advisories/new. Do not open a public issue for an undisclosed vulnerability.

You can expect an acknowledgment within a few days. Confirmed issues are fixed as a priority, the fix is noted in the changelog, and you will be credited in the advisory unless you prefer otherwise.

What counts

musefs's primary threat surface is parsing untrusted media files: the scanner probes arbitrary bytes at scan time, and the serve path re-parses file fronts at resolve/read time. Anything a crafted file can do beyond "fail to scan with a controlled error" is in scope — memory unsafety, panics reachable from file contents, unbounded allocation, and hangs. Parser denial-of-service findings are real vulnerabilities here, not mere robustness bugs: several (a VorbisComment pre-allocation OOM, an MP4 box-bounds overflow, an ID3v2 allocation bomb) have been found by the project's fuzz targets and fixed — see CHANGELOG.md. Those fuzz and property suites run continuously (CONTRIBUTING.md); a fuzz reproducer is the ideal report attachment.

Also in scope: anything that lets a crafted database (the mount trusts its --db only as far as the documented contract) or a hostile local writer violate the read-only guarantee on backing files.