In the days ahead, I am going to start doing some Audio DiffMaker tests where appropriate; another freely available tool for the audiophile tester to find out what works, what doesn't, and to identify the difference. If you have not already guessed, some of my motivation in doing these tests is not only to feed my own curiosity, but also to encourage others to understand the tests and technology - hopefully in time elevate the knowledge base rather than unquestioned acceptance of many senseless audiophile myths out there.
If you peruse the DiffMaker site, it's quite obvious what this program does. It basically takes two recordings of the audio (presumably under 2 conditions or with different hardware), inverts one of them, and applies it to the other to see if the signals "null" each other out. The "magic" of course is in the algorithm used to align the samples in terms of time (including sample rate drift), and signal amplitude. If the recordings are identical, there should be a complete null where the result is silence. The program will create the "null" WAV file to review (very useful) and spit out a number representing the amount of "audio energy" left in the resulting null'ed audio file - expressed as dB's. The program calls this the "Correlated Null Depth". The higher this value, the more correlated the 2 samples are (ie. the "closer" they sound).
The beauty of this method is that one is free to use any audio input signal - freed from the need to remain bound to synthetic test tones which thus far I have been using. The main limitation so far with this software I have seen appears to be memory limits I've run into with long audio segments, it also takes a fair bit of computation to get the results. With my 6GB Windows 8 x64 laptop and DiffMaker 3.22 (September 2008), once I go beyond ~35 seconds 24/96 audio, the program runs into an error condition - presumably memory issues. Fair enough, I think 35 seconds is adequate to allow a decent comparison.
After a bit of consideration, I decided to create a "composite" audio test signal that I hope represents a reasonable survey of real music that is also challenging enough for a high-end audio system to reproduce. For fun, I've called this audio track the "DiffMaker Audio Composite" (DMAC) Test which I think would be a reasonable test to apply to future evaluations I post on the blog. The DMAC consists of the following 4 tracks - all downsampled to 24/44kHz. Why you may ask? Simply because most digital music exists as 44kHz so it's important that this sampling rate be done right, and it is believed by many that 24-bit depth is the major factor lending improvement to hi-res audio quality. The tracks:
Rebecca Pidgeon - "Spanish Harlem" 3:02-3:11 (The Raven, 1994) - 9 seconds taken from the 2009 Bob Katz 15th Anniversary Edition at 24/88. Well known to most audiophiles as a vocal test track... Shakers in the background and such... Good evaluation of the mids.
The Prodigy - "Smack My Bitch Up" 2:13-2:22 (Fat Of The Land, 1997) - 9 seconds of loud and clipped techno/electronica. I applied -2dB to the track to allow extra headroom for the ADC without clipping. Low dynamic range, but intense bass. An example of "modern" mastering efforts. Taken from the CD 16/44.
Rachel Podger & Brecon Baroque - "Concerto In G Minor, BWV 1056: Presto" 00:02-0:10 (J.S. Bach: Violin Concertos, 2010, Channel Classics SACD to 24/88) - 8 seconds of lovely string classical work - good mid-range to highs, nice "microdynamics".
Pink Floyd - "Time" 00:06-00:10 (Dark Side Of The Moon, 1973) - 4 seconds of bells & chimes taken from the start of this track. Quite a lot of high-frequency content, detail in the sound, and channel separation. I used the 2011 24/96 Immersion Box Set remaster.
Interspersed between each track are dual bursts of 0.1s 1kHz tone at -4dBFS interspersed with 0.1s silence. This serves as a "beacon" for DiffMaker's alignment algorithm. The trickiest part of this test is temporal alignment and doing this has significantly improved the consistency of the results for me.
Vital stats for the 35 second test track:
DR9 (thanks in a large part to the loud compressed Prodigy track). Peak volume: -1.37 / -1.46 dB. Average RMS Power: -27.1 / -26.66 dB.
As with any proposed test, first thing to do is some form of validation.
I. ReliabilitySetup: MacBook Pro Decibel --> shielded USB --> TEAC UD-501 (SHARP filter) --> shielded RCA --> E-MU 0404USB --> shielded USB --> Win8 laptop
Although the DMAC track is 16/44, it was measured back at 24/96 where the E-MU 0404USB functioned optimally. I also turned ON compensation for sample rate drift. The rest of the settings are as per default.
Here are 15 runs with the DMAC track played back through my TEAC UD-501 looking at the reported "correlated null depth" as an objective measure by the program. I also had a look at the null waveforms to ensure there were no obvious technical issues. The runs were spaced out over 24-hours to capture changes in conditions that may be present over the course of the day, temperature variation, electrical condition, and how long the DAC and ADC had been turned on in order to get a sense of the error range. Interestingly, from what I can tell, the result seemed to vary with ambient temperature. Trials 4-8 were done in mid-day with temperatures going up to ~30 degrees Celsius where I did the tests. Of course, maybe other factors like electrical noise and powerline quality may have a hand in the variation during that time of the day. In general, since I do most of my testing in the evenings, those lower results serve as a reasonable lower extreme for this test. (BTW: I turned the WiFi off on the computers if anyone thinks that makes a difference.)
As you see, there is a range of results (mean = 80.74/79.66, standard dev = 3.88 / 3.89). Remember that because we are measuring the analogue output from the DAC, there will be some noise in the signal - this is an inevitable property of analogue signals especially since I'm re-digitizing it back with the ADC to measure.
II. ValidityGiven the error range above, is it good enough to detect very small changes?
Let's try to measure the following conditions:
1. Adobe Audition 3 Graphic EQ boost of +0.3dB at 16kHz with another EQ boost of +0.3dB at 5kHz. The 16kHz change should be inaudible, and the 5kHz adjustment likewise should be inaudible except maybe to the best young golden ears. I was unable to ABX this EQ change using the Sennheiser HD800 + TEAC UD-501.
2. TEAC UD-501 digital filter set to SLOW. This involves a high frequency roll-off starting north of 15kHz. May be detectable to those with excellent high-frequency hearing but I think for the vast majority of us, this difference is unlikely to pass an ABX test.
3. TEAC UD-501 digital filter set to OFF. This is of course the "NOS" mode for the TEAC. I can quite readily hear the difference in an A-B test. Should not be a problem for the DMAC protocol.
Reminder of the TEAC filter frequency response curves:
Result of test conditions 1-3:
4. Changes due to MP3 encoding. We know lossy encoding changes the bit-perfect nature of the signal. We know ~320kbps is audibly very subtle (as per the test that kicked off this blog). We know that lower bit rates will result in more sonic degradation. Can the DMAC test differentiate MP3 from the lossless and further discriminate different bit rates using LAME 3.99.5 (3 runs each condition)?
Nice, it looks like indeed we can! Good correlation between decrease in "correlated null depth" (increasing variance) and lower bitrate for MP3 encoding. The machine isn't fooled by MP3 algorithms :-).
Of course there are other things I can do to demonstrate the validity of this test to show variance... I've done a few other things like varying degrees of EQ changes to demonstrate the correlation which I won't bore you with here.
Summary:As you can see, it looks like the DMAC Test is quite reliable and can be shown to discriminate differences in audio even down to levels that are very unlikely to be heard by human listeners with the E-MU 0404USB as a measurement device.
A word about tests like this and audibility. Remember that humans listen with a powerful psychoacoustic "filter". The ear has significant physiological limitations. For example, we are sensitive especially to the 1-5kHz audio spectrum and quickly lose sensitivity to frequencies higher up - have a look at the Fletcher-Munson curves. Secondly, psychoacoustic effects like simultaneous and temporal masking renders certain details inaudible. This is part of the "magic" of lossy encoding algorithms - allowing software to throw out quite a lot of data/details yet maintaining excellent audio quality. (Interestingly, the DiffMaker program does have an "ARM-468 weighted energy" setting which may be closer to human perception but I have thus far not tried it yet.)
The results of tests like this one I believe can be used for correlation of the sonic output to demonstrate variance between signals (which is of course the intent of the software developers). However, because the machine does not have the psychoacoustic mechanism of humans, the results can never directly correlate with what is being heard subjectively. A good example is the similar score between the digital filter OFF (NOS) condition and MP3 192kbps. They both score around 50dB in "correlated null depth", but I would argue the MP3 encoding changes the sound significantly less than removing the digital filter (ie. the effect from a NOS DAC). In an AB test, I can detect a "dulling" of the high frequencies on tracks like the Prodigy sample with the digital filter turned off whereas the MP3 sounds less 'colored'.
One more thing about using the "Correlated Null Depth" value. What I'm showing here is all based on the measurements off my equipment using the E-MU 0404USB, TEAC UD-501 DAC, and procedure/settings I'm using. This means it's only useful for my test purposes and cannot be generalized otherwise. The measured value itself of course will fluctuate and time-to-time, I'm going to need to readjust the reference score based on hardware changes.
I look forward to incorporating this test with the others in the days ahead...
Addendum: Curious to see the difference between Reference null and what happens without a digital filter (ie. "NOS mode" on the TEAC)?
The following is what a high quality null WAV output looks like (~85dB) - "Spectral Frequency View" where the X-axis is time and Y-axis is frequency with the color representing amplitude at that specific frequency (blue/dark = low amount, red/bright = high):
Here's the TEAC UD-501 in "NOS mode" with digital filter turned off: