WAV
How musefs scans and synthesizes RIFF/WAVE files (.wav). WAV has no single
native tag standard, so musefs writes metadata twice: a broad-compatibility
LIST/INFO chunk and a full-fidelity embedded id3 chunk. For the
segment model these layouts plug into, see
the segment model. The ID3v2 tag inside
the id3 chunk is built by the same code as MP3's — MP3's
round-trip and lossy-edge rules apply to it wholesale.
What round-trips
- All text tags, via the embedded
id3chunk (full ID3v2.4, exactly as for MP3: canonical frames,TXXXextension slot, frame-id passthrough). - The INFO subset, twice. Seven canonical keys also get a native
LIST/INFOsubchunk for ID3-unaware readers:title→INAM,artist→IART,album→IPRD,date→ICRD,genre→IGNR,comment→ICMT,tracknumber→ITRK. - Binary ID3 frames and promoted tags (
POPM→rating/playcount, MusicBrainzUFID→musicbrainz_trackid, opaquePRIV/GEOB/… byte-exact) — classification identical to MP3, only the chunk extraction differs. - Embedded pictures:
APICframes inside theid3chunk, MIME + picture type + description preserved, image bytes streamed. - Structural chunks:
fmt(required) andfact(when present) are preserved from the original front.
At scan time, tags are merged per field from both surfaces with id3 taking
precedence and INFO filling gaps; only chunk headers are walked — the
data payload is never read.
Lossy edges
- Non-structural chunks are dropped. The synthesized front carries only
fmt,fact, the newLIST/INFO, and the newid3chunk: cue points (cue), broadcast-wave metadata (bext), sampler loops (smpl), and any other chunk from the original front are not reproduced. - The INFO chunk carries only the seven-field vocabulary above; readers that
understand only INFO see just those fields. Everything still rides in
the
id3chunk. - All of MP3's ID3 lossy edges apply to the
id3chunk: ID3v2.4-only output, placeholder-languageCOMM/USLTreset toXXX,POPMowner dropped, ID3v1 ignored, the OOM-guard skips (the authoritative list lives in MP3's lossy edges). - Tags trailing a very large
datapayload are not seen. When thedatapayload pushes anyLIST/INFOorid3chunk beyond the scan probe ceiling (64 MiB), the file is still ingested — thedatachunk header gives the audio bounds without reading the payload — but those trailing tag chunks are not read at scan time. Front-positioned metadata is unaffected.
How synthesis works
wav::synthesize_layout (musefs-format/src/wav.rs) regenerates the entire
RIFF front, then serves the untouched payload:
offset 0
┌──────────────────────────────────────────────┐ ┐
│ █ RIFF/WAVE framing (Inline) │ │
│ █ fmt (+ fact), preserved (Inline) │ │ regenerated
│ █ LIST/INFO chunk (7-field subset) (Inline) │ │ RIFF front
│ █ id3 chunk: ID3v2.4 text frames (Inline) │ │ (metadata
│ █ frame header + ▒ opaque body (BinaryTag) │ │ written
│ █ APIC framing + ▒ image bytes (ArtImage) │ │ twice)
├──────────────────────────────────────────────┤ ┘
│ ░ data chunk payload, verbatim (BackingAudio) │
└──────────────────────────────────────────────┘
EOF █ inline-generated ▒ DB-streamed ░ untouched backing
Inline—RIFF/WAVEframing, the preservedfmt(andfact) chunks, the rebuiltLIST/INFOchunk, and the embeddedid3chunk's text frames. Every chunk length is known up front, so theRIFFsize and each chunk size field are byte-exact — no placeholder sizes.- Inside the
id3chunk:APICframing inline withArtImagesegments streaming image bytes, andBinaryTagsegments streaming opaque ID3 frame bodies, exactly as in MP3 synthesis. BackingAudio— the originaldatachunk payload, served verbatim by positioned reads.
RIFF form-size enforcement
Every RIFF/WAVE file declares a form size at bytes 4..8 (riff_size).
The form covers bytes 8 through 8 + riff_size and must encompass all
top-level chunks (fmt , data, LIST, id3 , …). musefs enforces
this at parse time:
riff_wave_startparses the RIFF size and returnsform_end = 8 + riff_size.locate_audioandlocate_audio_at_ceilingreject any file whereform_endexceeds the physical file or where thedatachunk payload extends pastform_end.- Streaming or concatenated WAVs that write
riff_size = 0or0xFFFFFFFFare rejected, but only incidentally: there is no explicit sentinel check.riff_size = 0yieldsform_end = 8, which is smaller than any file carrying adatapayload, and0xFFFFFFFFyields aform_endlarger than any real file — both fall foul of the bounds checks above. Detecting and honouring those sentinels explicitly is a deferred follow-up.
Quirks & invariants
- A file must have both a
fmtchunk and adatachunk to scan; the declareddatasize must lie within the file. - The ID3-in-WAV path inherits MP3's allocation-bomb guard
(
id3v2_alloc_safe): a craftedid3chunk cannot OOM the scanner — this exact vector was found by thewavfuzz target. - Byte-identical audio and front re-parseability are asserted by
musefs-format/tests/proptest_wav.rsand the mutagen interop suite (musefs-core/tests/interop_emit.rs).