The audio file nobody can open
Sound sends a polyphonic BWF with 24 channels. Your DAW imports two channels of silence.
The production sound mixer delivers a hard drive at wrap. On it: polyphonic Broadcast Wave files, 24 channels each, meticulously recorded across boom mics, lavalieres, plant mics, and ambience rigs. Every channel is labelled. Every take has timecode. The metadata is clean.
The assistant editor drags a file into their DAW. Two channels of silence.
Not because the file is broken. Not because the recording failed. Because the DAW saw 24 channels, assumed stereo, took the first two, and those happened to be empty ISOs waiting for a different mixer’s configuration. The other 22 channels of perfectly good production audio are right there in the file. The tool just didn’t look for them.
What “polyphonic” means in practice
A polyphonic BWF is a single WAV file containing multiple channels of audio. On a feature film, a Sound Devices 970 or Cantar X3 might record 24 or 32 tracks simultaneously: booms, body mics, plant mics, a stereo mix, and sometimes ambisonic captures from a spatial microphone.
This is different from multi-mono, where each channel gets its own file. Both approaches exist in production. Both are valid. The industry never standardised on one, so every downstream tool has to handle both. Most tools handle multi-mono fine because a mono WAV is hard to misinterpret. The problems start with polyphonic files.
In a polyphonic BWF, the fmt chunk tells you how many channels exist and the sample rate. That’s it. It doesn’t tell you what each channel is. Is channel 1 a boom? A lav? The left side of a stereo mix? A height channel from an Ambisonics array? The fmt chunk doesn’t know and doesn’t care. It just knows there are 24 of them.
Where the channel map lives
The actual channel identification lives in two places, both optional.
The iXML chunk carries a TRACK_LIST element where each track gets a name, an interleave index, and sometimes a function tag. This is the production sound metadata. When a mixer labels their channels on the recorder (“Boom”, “Lav Alice”, “Lav Bob”, “Plant SFX”), those names go into iXML. A well-configured recorder writes detailed track lists. A poorly configured one writes “Track 1” through “Track 24” and calls it a day.
The channel mask in the fmt chunk extension (for WAVE_FORMAT_EXTENSIBLE files) maps channels to speaker positions using Microsoft’s channel assignment bitmask. Left, right, centre, LFE, and so on, up to 18 predefined positions. This was designed for surround-sound playback, not production audio. There’s no bitmask value for “boom mic” or “plant mic behind the door.” Production channels don’t map to speaker positions because they’re not meant for speakers. They’re meant for a mixing stage.
This is the fundamental mismatch. The channel identification system baked into the WAV format assumes you’re describing a speaker layout. Production audio doesn’t have a speaker layout. It has a collection of microphones that were pointed at different things. The metadata formats that actually describe production channels (iXML, the ADM/BW64 standard for object-based audio) are bolt-on additions that many tools have never heard of.
What goes wrong
A DAW that doesn’t read iXML opens a 24-channel BWF and has to decide what it’s looking at. Common failure modes:
Stereo assumption. The tool assumes anything with more than one channel is stereo, takes channels 1 and 2, and ignores the rest. This is the most common failure and the most destructive, because it silently discards 22 channels of audio without warning. If channels 1 and 2 happen to be mix tracks, you get something that sounds plausible but isn’t the original recordings. If they happen to be empty ISOs, you get silence and spend twenty minutes wondering if the sound department recorded anything at all.
Channel mask misinterpretation. The tool reads the WAVE_FORMAT_EXTENSIBLE channel mask and tries to map 24 channels to a surround layout. Channels get assigned to LFE, rear surrounds, height speakers. The boom mic is now routed to the subwoofer output. The lav is playing from the right rear. Nothing is where it should be, and if you’re monitoring in stereo, half the channels vanish into outputs that don’t exist.
Partial import. The tool imports all 24 channels but puts them on a single track as an interleaved surround file. You can see the waveforms, but you can’t solo channel 7 or mute channel 12. The audio is all there and completely inaccessible for editorial work.
Ambisonics misidentification. Some files carry ambisonics flags in iXML, indicating the first four channels are a B-format spatial capture (W, X, Y, Z). A tool that sees the ambisonics flag might try to decode those channels to speaker feeds, applying rotation matrices and shelf filters to what is actually a boom microphone. The audio plays back but sounds like it’s been run through a broken reverb.
The silence problem
The worst outcome is silence, because silence looks like a failure to record rather than a failure to interpret. A picture editor who gets two channels of silence from a 24-channel poly file will assume the sound department didn’t deliver. They’ll call production. Production will check the drive, open the file in a tool that reads all channels, confirm the audio is there, and send the same file back. The editor will import it again with the same result.
This loop can repeat for days before someone identifies the actual problem: the DAW is not reading the file incorrectly in a technical sense. It’s reading the channels it knows how to find. It just doesn’t know there are others.
Why tools don’t read the metadata
The obvious question: if iXML tells you what each channel is, why don’t all tools read it?
Because iXML is an optional extension with no enforcement mechanism. The spec was designed by production sound manufacturers (Aaton, Sound Devices, Zaxcom) for their own equipment. It’s well documented, widely used in professional production sound, and almost completely unknown outside that world. DAW developers who have never worked in film production don’t know iXML exists. Video editing tools that handle BWF for the timecode in the bext chunk have no reason to parse a second chunk of XML they’ve never encountered.
There’s also the problem of trust. Even when a tool does read iXML, the metadata might be wrong. Channel names from a recycled template. A track count in iXML that disagrees with the fmt chunk. Ambisonics flags on a file that contains straight mono ISOs because the mixer toggled the wrong setting. The metadata can lie, and a tool that trusts it without verification can produce results that are wrong in more creative ways than a tool that ignores it entirely.
What correct handling looks like
A tool that handles polyphonic BWF correctly does several things.
It reads the fmt chunk for channel count and sample format. It reads the bext chunk for timecode. Then it reads the iXML chunk for track names, functions, and layout information. If iXML is present and the track count matches fmt, it presents each channel with its label. If iXML is absent or contradictory, it presents all channels with generic labels and flags the discrepancy. It never silently drops channels. It never force-fits production audio into a surround speaker layout.
The key principle is: when the metadata is ambiguous, show everything and let the operator decide. Don’t guess. Don’t assume stereo. Don’t assume surround. Don’t assume anything about what a channel contains based on its position in the interleave.
This is not a hard engineering problem. Parsing iXML is straightforward XML reading. Comparing the iXML track count against the fmt channel count is a single integer comparison. Presenting 24 channels on 24 tracks instead of 2 is a layout decision, not a technical limitation. The tools that get this wrong aren’t facing difficult constraints. They’re making assumptions they never needed to make.
The pattern
This is a recurring theme in production audio: the information exists, in a well-documented format, in the file you already have. The file isn’t broken. The spec isn’t ambiguous. The recorder wrote everything correctly. The problem is that the tool on the receiving end has a narrower model of the world than the file demands.
Twenty-four channels of production audio is not exotic. It’s Tuesday on a feature film set. The formats that describe it have existed for over a decade. The failure isn’t in the recording or the format or the delivery. It’s in the assumption, baked into too many tools, that audio comes in stereo pairs and anything else is someone else’s problem.
It shouldn’t take a phone call to production to find out that the audio was there all along.