FLAC Vs. WAV Vs. AIFF
FLAC, WAV, and AIFF are all lossless audio formats that promise high-fidelity audio but result in larger file sizes. Unlike three decades ago, this trade-off is ideal for many people and is unavoidable for certain tasks. So which format should you use?
FLAC is compressed, while WAV and AIFF are not, but there is no difference in quality. WAV and AIFF are similar container formats for various kinds of audio data but typically contain raw audio samples. FLAC’s encoding is more advanced, and the format is designed specifically for audio.
FLAC is a more complex format that might suit your specific needs better. However, WAV and AIFF still hold their own within particular areas in the industry, and at other times you would need to use ALAC instead. Unpacking the historical context and technical differences will give you the context needed to always use the right tool for the job.
The Essential Distinction Between FLAC, WAV, And AIFF
For most practical purposes, the difference will be that WAV and AIFF take up more space than FLAC. FLAC is also not as widely supported, but support for each format varies considerably. However, you’ll want to take much more into account when choosing a lossless audio format.
Free Lossless Audio Codec (FLAC) is a codec. A codec encodes and decodes sound information in a specific way. Native FLAC is the default file format used. In other words, the FLAC-encoded data is contained by native FLAC. However, you can find FLAC data in more general-purpose container formats, such as Matroska or Core Audio Format.
Waveform Audio File Format (WAVE or WAV) and Audio Interchange File Format (AIFF) are container formats for other audio data. WAV and AIFF are not codecs. For example, they can contain various forms of uncompressed PCM.
However, WAV and AIFF most often contain LPCM – for now, think of LPCM as a list of raw audio samples that represent the sound wave.
From now on, we’ll continue to use FLAC to refer to both the codec and file format but will make the distinction between WAV, AIFF, and (L)PCM.
FLAC, WAV, And AIFF’s Place In The Industry
FLAC, WAV, and AIFF came about at different times, meeting the industry’s distinct needs and capabilities at their inception. So understanding the format’s history gives necessary insight into why you might want to choose one over the other.
The History of FLAC, WAV, and AIFF
Our story begins in 1985 when Electronic Arts – a video game publisher and developer – designed the Interchange File Format (IFF) in conjunction with Commodore. IFF would help identify and manage digital files and assets, particularly on the Amiga platform, where the format was heavily used.
IFF is designed around splitting data into chunks with 4-character tags to identify the kind of data. As an aside, chunks could have similar sub-chunks with their own labels. The tagging system was very useful in determining the type of data in a file and how it should be used and interpreted.
In 1988, Apple developed AIFF (the Macintosh II was released in 1989). It was based on IFF but for uncompressed, raw audio (with AIFF-C/AIFC being similar but supporting rudimentary compression of the data, as well as more PCM encoding formats).
In 1991, Microsoft and IBM developed their own iteration of the IFF format: RIFF, the Resource Interchange File Format. RIFF was used as a basis for WAV (and other media formats like AVI). WAV became frequently known as WAV due to its .wav file extension. WAV supports more than just uncompressed PCM, such as audio encoded as MP3.
Unlike the many variations on the basic IFF format, FLAC was novel. In 2000, Josh Coalson started to develop what would become FLAC, which was released in 2001. The Xiph.Org brought the standard under its wing and has overseen its development across all major platforms since.
Compatibility Of FLAC Vs. WAV Vs. AIFF
The compatibility of FLAC, WAV, and AIFF is directly a consequence of their inception. The older WAV and AIFF formats are more prevalent than FLAC, although all have support across all modern platforms. AIFF is better supported on Apple devices and, likewise, for WAV on Windows.
Of note is that Apple developed its own compressed, lossless format called Apple Lossless, or ALAC. FLAC is not supported by iTunes or the Music app – Apple opted to only allow the use of ALAC.
While FLAC support has since been introduced to iOS and third-party software that supports it is readily available (like the VLC Media Player), ALAC is still more convenient for users of the Apple ecosystem.
The technical differences between FLAC and ALAC are irrelevant for most users. However, ALAC is about four times more computationally expensive to decode and may use more power during playback as a result. Usually, your phone’s screen is a disproportionate power hog, though, so don’t worry about it too much.
Understanding PCM, The Core Of WAV And AIFF
Pulse-Code Modulation, abbreviated as PCM, is the primary method used to represent sampled audio digitally. PCM consists of amplitude samples taken of a signal at regular intervals. In this case, we’re interested in the amplitude of a sound wave over time.
WAV and AIFF typically contain PCM data. However, not all PCM data is the same:
- Sample rate: the number of samples taken per second
- Sample depth: the number of bits of data per sample
- Quantization levels: the distribution of representable amplitudes – see below
Let’s unpack quantization. Digital samples are discrete. In contrast, analog signals are continuous, so we round off (quantize) each measurement of the signal amplitude. We can be a bit creative with how we round off, though. For instance, we could extend the dynamic range at the cost of accuracy.
In order to reconstruct the original sound, we interpolate (move between) the sample data. While we won’t get the original data back, if we use a high enough sample rate and depth, the difference should be negligible. This is partly why not all WAV or FLAC files are the same quality, even if both formats are “lossless” – the sample format matters.
Crash Course On Sound And Hearing
Sound is a wave, and waves are disturbances in a medium (like in the air or through the wall between you and your neighbor). These disturbances repeat at a given frequency, which we hear as pitch. The size of the disturbance is the amplitude, which we hear as volume.
Human hearing is confined to about 20-20,000 Hz, where Hertz is the number of vibrations per second. To represent 20 kHz sound without a form of distortion called “aliasing,” we want at least double the sample rate. This is called the Nyquist frequency. First, filter out frequencies above 20 kHz, then sample at around 40-50 kHz. This way, we’ll have good coverage of the range of human hearing.
Regarding volume, 8 bits per sample only gets us 256 different quantization levels, which is very low. Telephones try to work with this using a non-linear distribution (meaning 2 is not double the level of 1), but it still sounds poor. 16 bits instead gives us a whopping 65,536 quantization levels, which is good enough with a linear distribution.
We also have two ears. With this shocking revelation in mind, it’ll seem reasonable that CD-quality audio is just 2 tracks of linear PCM (LPCM) with a sample rate of 44.1 kHz and a sample depth of 16 bits. This can be called stereo 16b 44.1kHz LPCM.
The WAV, The AIFF, And The Raw File Formats
We can create a simple CD-quality file format with a rough idea of all of this in mind. We’ll take two audio tracks – “left” and “right” – with the correct sample rate and depth, and write the first sample of left to the file, then the first of right, the second of the left track, the second of right, and so on until we’re done. We’ve more-or-less created the raw audio format.
However, we don’t usually use raw audio as media players won’t know how to read it. Instead, we must tell the decoder that we’re talking about interlaced stereo 16b 44.1kHz LPCM. So let’s instead add that information about the audio data to our file: this is called metadata. We’ll begin the file with the metadata so the media player knows what’s going on.
It seems reasonable, then, to break up our file into multiple chunks and tag the first chunk as audio metadata and the second chunk as audio data. Now, this is starting to sound a lot like the IFF-derived formats. And while oversimplified, this is the essence of what WAV and AIFF really are.
The Advantages And Disadvantages of WAV And AIFF
WAV and AIFF are typically adapted IFF containers around uncompressed PCM data. This means that it’s simple. Many audio engineers prefer to work with these formats as there is no need to decode or re-encode the data whenever it is viewed or modified. It just needs to be read off the disk. However, the lack of compression means they are often rather large.
For example, for CD quality LPCM of a 5-minute track: 2 tracks x 16 bits per sample x 44100 samples per second x 5 minutes. This results in about 53 MiB of data, which is a lot. So much so that we invented formats like MP3 to crush this down to about ten times less, at the expense of quality. However, we need not bite that particular bullet.
Everything You Need To Know About FLAC
If you want near-perfect fidelity audio while saving space, FLAC (and its Apple counterpart, ALAC) may be the best fit for your goals. While WAV and AIFF are straightforward and copiously immense, FLAC sacrifices simplicity in the name of saving your bits.
What Makes FLAC Special?
FLAC is the de facto standard for lossless, compressed audio. Encoding the same recording in a FLAC file will yield a file size of around 50% to 60% of the size of the equivalent WAV or AIFF file. A reduction by half may seem rather unimpressive, so a point of comparison should help.
Take a WAV file of some music on your PC, and zip it into an archive. ZIP (or GZIP, if Linux is your bread and butter) uses the DEFLATE algorithm, a more generic lossless compression method.
DEFLATE is quite effective at compressing various file types, from text to programs to documents. However, audio samples are much harder to compress this way. Zipping a WAV file will typically yield a compression ratio of about 80-95% of the original file’s size, which leaves a lot to be desired.
FLAC instead uses a very special-purpose compression algorithm that leverages the nature of audio data to reduce file sizes by up to a factor of two while still decoding to the exact same output as a WAV or AIFF file.
The FLAC Encoding
While not as trivial as WAV or AIFF, a sufficient understanding of FLAC is attainable by us mere mortals, for the audio engineers and computer scientists have provided good documentation. So let’s break it down:
- Break up the input samples into blocks, potentially of variable sizes.
- Each block may contain multiple channels. Break the block into subblocks, one for each channel, but where subblocks are similar, encode them relative to each other.
- For each subblock, find a simple mathematical curve that fits the data well enough.
- Take the difference between each sample and the quantization of the curve. These are the residuals, and hopefully, they’ll be small.
- Use Rice coding to represent the small residuals with a small number of bits each.
Because sound waves’ amplitudes follow a complicated curve, we can often find curves to efficiently fit small sections of it. This allows us to encode our samples as tiny corrections given by the difference between the sample and the curve. After that, efficiently storing a list of small numbers is relatively easy.
After that, it’s a matter of arranging all the subblocks into the correct blocks and figuring out what goes where. This information is all present in the metadata of a native FLAC file.
How Does Rice Encoding Work?
Rice encoding cleverly uses unary to express small numbers (especially zero) most efficiently. In unary encoding, we represent a number n as n ones followed by a zero:
n | Binary Representation |
0 | 0 |
1 | 10 |
2 | 110 |
3 | 1110 |
And so on. For example, 0001000101110010 represents 0, 0, 0, 1, 0, 0, 1, 3, 0, 1. That’s 10 numbers in 16 bits. On the other hand, if we represented each number using 16 bits (as is the case with CD quality PCM), this would require 160 bits! This is why finding a good curve, and thus reducing our residuals, is so helpful.
Rice coding builds on top of unary, allowing slightly larger values to be represented more efficiently using a hybrid of binary and unary, but the details are less insightful.
Choosing A Suitable Curve
FLAC employs four classes of predictors. The encoder ought to select the most efficient predictor per subblock, depending on the kind of audio data it needs to encode.
- Verbatim: No prediction is used, and the samples are listed as-is. This is useful when the audio data is random (no patterns, hard to compress).
- Constant: In cases where the sound is nearly constant (also known as “digital silence”) Run-Length encoding is used.
- Fixed linear predictor: A set of curves that are computationally efficient to use and fixed (as defined by the specification) so the file doesn’t need to store much data.
- 32nd-order linear: A flexible curve allowing encoders to model the data more precisely. This can be much more efficient but is more difficult to compute.
This flexibility allows FLAC to compress almost any kind of data relatively efficiently. In particular, note how using the verbatim predictor allows FLAC to act like WAV or AIFF – this prevents exceptional cases where FLAC would become significantly less efficient.
Converting Between FLAC, WAV, And AIFF
Converting between file formats (transcoding) is a familiar process for anyone who has run into compatibility issues with audio files. Unfortunately, associated with this is the conception that you will lose quality during conversions, but this is not always true.
Lossless formats, by definition, do not sacrifice any information during encoding, and the original audio can therefore be reconstructed perfectly. Therefore, we can encode and decode audio data repeatedly, using any lossless formats, with no loss in quality – theoretically.
As long as you avoid lossy formats such as MP3, AAC, and Ogg Vorbis, you may prefer to store files as FLAC but convert to any other format as you see fit, and as long as you keep the original FLAC file around, you will always have a perfect copy to listen to or convert from once again.
Conclusion
While their fidelity is equally perfect, FLAC differs greatly from WAV and AIFF. Whether you’re outside of the Apple ecosystem or within, FLAC and ALAC, respectively, will be the most storage-friendly choices. Nonetheless, WAV and AIFF will still provide a simple, mature, and compatible option for those who may need it.
Links
- https://xiph.org/flac/format.html
- https://en.wikipedia.org/wiki/Mac_(computer)
- https://en.wikipedia.org/wiki/Apple_Lossless_Audio_Codec
- https://vox.rocks/resources/flac-vs-alac
- https://en.wikipedia.org/wiki/Pulse-code_modulation
- https://en.wikipedia.org/wiki/Audio_Interchange_File_Format
- https://en.wikipedia.org/wiki/Interchange_File_Format
- https://en.wikipedia.org/wiki/WAV
- https://en.wikipedia.org/wiki/FLAC
- https://en.wikipedia.org/wiki/Golomb_coding