Digital audio quality
February 7th, 2006 | Tags: music | 9 Comments
Pliable has a mostly-reasonable rant about digital audio and audio quality. I’d like to raise a couple of objections, though. These thoughts were to be posted in a comment at the overgrown path, but have become overgrown in their own right.
There are two factors in the potential fidelity of PCM digital audio: sample rate and bit depth. This is only true for PCM audio (i.e., excluding SACD) and only true for uncompressed formats; more on MP3 and friends in a moment. The highest frequency that a digital signal can reproduce is given by the Nyquist-Shannon sampling theorem as Sample rate/2. (I have written about the sampling theorem and its blatant violation by popular culture here; this is such a pet peeve of mine that friends mention it as a joke in my presence.)
The bit depth, on the other hand, is what enables dynamic range. More bits means more discrete volume levels; CDs use 16 bits, meaning that there are 65,536 possible discrete dynamic levels for any sample. (If it sounds to you like reducing a continuous analog signal to one of a certain number of fixed amplitudes can introduce distortion, you’re right: read about dither.)
When you multiply the bit depth by the sampling rate, you get the bit rate: the number of bits per second used to represent an analog signal. There is a reasonable relationship between bit rate and fidelity, for obvious reasons, and Pliable is right to lament the decrease in fidelity from the perfect analog signal to a CD to a low-bit-rate MP3. However, there are a few problems with Pliable’s argument:
- Vinyl is not as great as we remember it. Sure, records sound “warm,” but that’s just pleasing harmonic distortion. A good record can represent frequencies up to 18khz, which would only require a sampling rate of 36khz to reproduce in the digital domain. Furthermore, vinyl does not have a uniform frequency response (hence the RIAA EQ curve). Finally, the dynamic range of vinyl suffers at the quiet end because of the medium’s low signal-to-noise ratio and at the loud end because very loud sounds may throw the needle.
- We aren’t recording for dogs. Independently of whether or not we can actually hear such high harmonics as to make high sampling rates necessary, high-end microphones like the Neumann U87 can only capture up to 20khz. Of course, with bits as cheap as they are, there is no reason not to increase the bit rate as high as the source equipment allows, but it seems to me that bits are better spent on depth (dynamic range) than on sampling rate (frequency response), at least after a certain point.
The biggest problem with Pliable’s argument, however, is that it relies on the claim that a 128kbps MP3 file is only about 1/10 the bit rate of a 1144kbps CD. This is true, but meaningless because the two formats represent audio in different ways. On a CD, an analog signal is represented as a stream of sampled amplitudes, as in the figures below:
A continuous signal
The continuous signal, sampled 32 times with 19 possible amplitude levels
The continuous signal, sampled 32 times with 7 possible amplitude levels
An MP3, on the other hand, represents audio data in the frequency domain, or as a time-varying spectrum. In the case of the continuous signal above, it could be represented perfectly in the frequency domain by a file that said “play a single period of a sine wave with frequency X, amplitude Y, and phase Z.” This file, it should be clear, would be substantially smaller than any digitally-sampled rendition of the sine, as well as more accurate.
You can in fact represent any time-domain signal (that is, one that represents amplitudes varying over time, like the sine above) as a set of frequency-domain signal (that is, representing a sound as a series of frequency spectra). Obviously, the sine above is a special case (any recording that Pliable or I would be listening to would require a much more complex description). Furthermore, recording increasingly precise representations of continuous signals takes a great deal of space no matter whether you are operating in the time domain or the frequency domain. However, it is possible to “cheat” in the frequency domain and save space in ways that aren’t possible in the time domain.
- One can restrict the precision of the frequency analysis. This adversely affects sound quality, but it may be possible to reduce the size of a precise spectral representation of a sound a great deal before the effect is perceptible.
- Alternatively (this is the approach that MP3 uses) one can throw away “psychoacoustically insignificant” frequencies, saving space in the representation. This “lossy encoding” doesn’t always work all that well, since an algorithm (the psychoacoustic model) has to determine which frequencies aren’t important (and, for spectrally complex or “in-tune” music, the algorithm is often wrong). A 96kbps MP3 of a pop song might be adequate, but the Tallis Scholars at 96kbps is probably unlistenable.
Basically, my claim is this: digital audio necessarily discards parts of the original analog signal. However, it is not clear that there exists any reasonable way to reproduce an analog signal with complete fidelity (or what that would mean in the case of reproducing frequencies well above the range of human perception.) Compact discs and MP3s discard different things to save space, but direct comparisons between PCM bit rates (as in CD, DAT, etc.) and compressed bit rates (as in MP3, digital radio, AAC, etc.) are meaningless.
Finally, as long as the bulk of money is in the Antares-Autotune-riddled, equal-tempered, dynamic-range-free pop music world, there is no particular financial incentive for record companies to improve upon the lowly CD.
I’m currently listening to Dies Sind Die Heiligen Zehen Gebot BWV 678 from the album “Clavier-Ubung III” by Masaaki Suzuki
February 7th, 2006 at 03:52:38 PM (#)
Will, a beautifully thought through and presented response. Although we may differ on nuances we share a passion and concern for the future quality of recorded music, and that’s what counts.
I am delighted at the thoughtful and constructive responses that this article has generated. I guess my principal objective was to start some thought and debate, and that’s clearly happening.
Anyway, someone who is listening to Clavier-Ubung III is on my side. I trust you caught ‘Mortal defeat for the mob in Paris’ over at http://theovergrownpath.blogspot.com/2005/10/mortal-defeat-for-mob-in-paris.html ?
I also have a passion for Brahm’s last work, the Eleven Chorale Preludes for organ which have links to the Clavier-Ubung III. Francois Menissier’s recording of the Brahms on Edition Hortus has given me much pleasure. In the end 64Kbps or 2.8 Mbps is not the point, it is the music that matters.
February 7th, 2006 at 05:03:16 PM (#)
To be completely fair to Pliable, he did say that “sample rate does not have a linear relationship to sound quality”…
And like Pliable, I too appreciated your clear presentation of the ‘facts’. However, I can’t quite agree that any comparison between PCM and MP3 is completely meaningless. My points are
1. Notwithstanding the re-distribution of data, MP3 discards MOST of the available information.
2. Who can say with confidence, what is, and what is not audible, what is worthy and what unworthy to be retained?
3. If we cease to strive to improve audio – and to make improvements available to the public – what system will remain to improve the quality of sound that’s available for the distribution of recorded music? Thjis is of particular relevance and importance for classical music for reasons that I begin to mention in my blog.
February 7th, 2006 at 05:35:02 PM (#)
Thanks, Pliable. I do get all of your posts via my syndication reader.
Guthry, I don’t think we disagree substantially. Perhaps “meaningless” struck you as a bit strong, but it really is comparing apples and oranges to compare PCM data rates to compressed data rates. At any rate, I am not particularly interested in listening to real music in compressed form (although if I am at my office and using headphones, a high-quality AAC file is not substantially worse than a CD.)
With regard to your point #1, I think you’re ignoring the differences in representation; a spectral representation requires widely different amounts of data to render different sorts of source material, while a time-domain representation (like PCM) requires r samples per second regardless of the harmonic content of sounds. (As an admittedly pathological example, consider ten seconds of silence, which would require 10r samples in PCM but only a few [or even one!] low-resolution spectra in a spectral representation.)
With regard to your point #2, I readily concede that the psychoacoustic model is the absolute weak point of the whole process. Early models, such as the Fraunhofer one, were barely suitable for contemporary commercial music. It remains an active area of research, and some lossy compression schemes can do quite well at very low bit rates (e.g. 1/3 of CD bit rates.) Frequency-domain representations can be more efficient. (Indeed, since you can represent 24-bit audio in some lossy schemes, they might be even better than a CD!)
Finally, I completely agree with your point #3. However, I fear that “ceasing to strive to improve audio” has in large part already happened.
February 22nd, 2006 at 03:43:05 PM (#)
Will
presuming you’re still following this tangent….
Do you have any knowledge of iTunes? I’d really like to know the original quality of their lossless files. Are they 16bit PCM compressed in lossless format, or [more likely] are they already 320kb compressed files?
any ideas
GT
February 22nd, 2006 at 04:14:11 PM (#)
GT,
In the iTunes application, you can choose to import in a lossless format, called (appropriately) “Apple Lossless” (ALAC). This process converts from whatever source material you have (e.g. a 16/44.1 CD, a 128kbps mp3, or anything — I have converted 24 bit/88.2khz files!) into a compressed file that reproduces the original exactly. (I don’t know much about the mathematics of lossless audio compression, but I believe it relies on the autocorrelation of a signal — that is, how sample n is related to sample n – k.)
The ALAC files produce sound identical to PCM at about 60% of the bit rate and can be played on an iPod through those (quite hi-fi, indeed) earbuds.
The music available through the iTunes music store, though, is compressed using the lossy AAC codec. (To my ear, AAC is superior to MP3 at any given bit rate; an AAC file may even be better than an MP3 with a higher bit rate.) Most of the files are 128kbps, but some isolated files are encoded at a higher rate.
The fidelity of the source material for these compressed files depends on the source of the music; individual record labels encode and submit their tracks to Apple. I’d imagine that most record labels make their AAC files from CDs or from digital masters, but some places that cater to independent musicians (e.g. the Tunecore service) will deal in compressed “masters.”
best,
wb
February 23rd, 2006 at 01:22:22 PM (#)
Many thanks for your help Will – it’s sort of what I suspected. It seems rather disingenuous to me for iTunes to emphasise the lossless packaging – which I’m quite sure is terrific technology – without explaining that its success depends entirely on the quality of the original. I very much doubt whether they are offering losslessly compressed 16bit 44.1Khz files: it would take too long and eat up too much HD space for most people to bother. Even classical stuff like LSO live recordings I think are no better than 320kb.
thanks again
GT
July 14th, 2007 at 06:05:49 AM (#)
In the late 70s the dynamic range of some records and tape was multiplied by two using a technology called DBX. There were special audiophile DBX encoded records that required a DBX decoder. The recording was compressed by 50% and the playback expanded by an equal amount. It is startling to hear the stylus hit the first grove and hear silence until the music starts. It is also startling to hear tape witthout hiss. This is 30 year old technology and holds up quite well today. One could argue about audible pumping during compression and expansion but I don’t notice it. It is dangerous to generalize about some things as there can be notable exceptions. Dynamic range is not realized unless you have a quiet listening environment. Frequency response is age dependent. Lack of distortion is noticed by all ages. Our ears are most senditive to the voice frequencies by design. Therefore a sound system that is near perfect in the voice range can sound better that one that is not.
March 19th, 2009 at 02:32:44 PM (#)
One of the great unsung heroes in compression that pertains to classical recording was Sony’s last implimentation of ATRACS 3Plus. While it was not as clean as PCM/44.1, it runs rings around mp3 at any bit rate.
Having a HI-MD recorder operating in stereo recording mode for 7 hours with this algorithm in the chain is much better than leaving the old tape recorder running at 1 7/8 with a 10 1/2 inch reel.
JG
Toronto
March 19th, 2009 at 02:46:27 PM (#)
John, thanks so much for your comment! (I have a number of — ahem — field recordings that I’m quite happy with from classic MDs, and I do understand that the recent ATRAC improvements were quite good indeed.)