Friday, 11 April 2014

ANALYSIS: A Comparison of DSD Encoders & Decoders (KORG AudioGate, JRiver MC, Weiss Saracon)



Hello guys & gals, I've seen the question asked of comparing various DSD conversion programs on message boards over the years but have never seen someone try to "compare and contrast" with objective analysis. Let's at least give it a try here. I don't promise unequivocal answers, but hopefully a decent stab at it :-).

Remember that any conversion between DSD to PCM is a "lossy" process. Therefore, it is of course preferable to keep PCM sourced recordings in PCM and DSD likewise if possible. There will be some compromise in the accuracy each time conversion happens. Even though the bitrates for DSD64 and 24/96 PCM may be similar, the modulation technique used to represent the resultant sound wave is different (as per the above image). The question of course is how much difference and if it's quantifiable.

This question of conversion is important because as I have discussed before, many if not most DSD releases have gone through some kind of conversion for the flexibility and ease of editing in the PCM domain. The most blatant examples of conversion are the ones sourced from 44/48kHz material, but I'm sure many others are from 96/192kHz origin but they would not be easy to differentiate from a DSD original.

I. Procedure:


I do not have access to DSD recording gear but I can convert recorded PCM to DSD and back again to see what the conversion process does. For example, the PCM test signal from RightMark Audio Analyzer can be sent through the conversion process and we can see what happens to it to get an idea of the amount of degradation. For these tests, I chose to use the 24/96 test signal which I feel is a very reasonable hi-resolution specification exceeding DSD64 in a number of resolution domains. I know that 88kHz may be better as an integer multiple of the 2.8MHz sample rate but I figure these days in a high resolution studio, 24/96 is probably the standard and is common as high-resolution HDTracks and Blu-Ray audio releases.

Here then is the general procedure:
- Take the 24/96 RightMark test signal.
- Convert the PCM to DSD using the various encoders.
- Reconvert the DSD file back into 24/96 PCM using each program.
- Analyse effects of the 2-way conversion and differences between the programs.

I decided to use 3 commonly available conversion programs for this test - something free, something a consumer can afford, and finally the professional "standard" made available to me thanks to a friend who runs a studio. This will result in a total of 9 final 24/96 WAV files to "measure" with the RightMark software (3 PCM-to-DSD encoding x 3 DSD-to-PCM conversion). The 3 software programs used for conversion are:

1. KORG AudioGate 2.3.3. This software is available free. All it takes to run conversions is access to your Twitter account so the software tweets each time a conversion takes place. Small price to pay for the ability to do the conversion I suppose. I used the default DSD encoding (DSDIFF / Stereo Interleaved / 2.8MHz / 1-bit) and decoding to PCM (WAV / Stereo Interleaved / 96kHz / 24-bit) parameters. I noticed that AudioGate will apply a +6dB gain with the DSD to PCM conversion (-6dB DSD is equivalent to 0dBFS PCM, not uncommonly this standard is not followed and +6dB gain can result in clipping).

2. JRiver Media Center 19.0.117. I've used this program before to test the PCM-to-DSD conversion playback last year. You can also save the resultant PCM --> DSD and the converse DSD --> PCM conversion files as well. The DSD --> PCM conversion happens in 24/352.8 so I used the best resampler I have - iZotope RX 3 - to convert back to 24/96 for final analysis using a steep filter at 48kHz. There is also no +6dB gain applied so the default volume of the PCM output file is softer than with AudioGate and Saracon at default settings.

3. Weiss Saracon 01.61-27. The standard DSD <--> PCM conversion package used by a number of places like Channel Classics, many HDTracks releases, Pentatone... Again, I just used the default settings for conversion to DSD (dff, CRFB 8th Order, 0 gain, 2.8224MHz, Auto channel mode, Smart Interleave, Enable Stabilizer). Likewise the conversion back to PCM was with default settings (WAV, 24-bit fixed point, TPDF dither, 96.0kHz, +6 dB gain, Smart Interleave).

II. Result:

As usual, I'm going to present the data as summary charts to start. There are 3 DSD encoders and the same 3 can be used to decode DSD, so let's just present them organized by the encoder used. When I say something like "AudioGate then JRiver", I'm referring to the use of AudioGate as the DSD encoder, then using JRiver to do the conversion back to high-resolution PCM to be analyzed by RightMark (remember, for JRiver's case, I also used iZotope RX 3 to resample from 24/352 --> 24/96).

AudioGate as DSD encoder.
JRiver as DSD encoder.
Weiss Saracon as DSD encoder.
As you can see, the first column in each table is the 24/96 RightMark PCM test signal with no conversion done. These would be the ideal numbers if one could measure a perfect DAC/ADC setup or in this case the results of perfect conversion.

The rest of the columns reflect what happens to the 24/96 PCM test signal as it goes through the DSD conversion and decoding steps. Remember that RightMark is analyzing the audible 20Hz to 20kHz spectrum only. As you know, DSD64 conversion adds quite a lot of ultrasonic noise if left unfiltered and this would result in some poor noise levels and lower dynamic range if frequencies >20kHz were analysed.

Indeed, various amounts of distortion and imperfections can be seen. On the whole, it's far from bad though. At worst, the cumulative noise level is still down below -120dB and dynamic range >120dB with each of these encoder/decoder pairs.

Comparatively, you can see the free KORG AudioGate encoder table above seemed to have the worst results in terms of noise level irrespective of what other software was used to convert back to PCM. This is followed by JRiver and then Saracon puts out some very fine numbers.

There's a similar tendency when comparing the DSD-to-PCM decoder used. In general, the JRiver and Saracon DSD-to-PCM conversions (columns 3 & 4) resulted in better measurements of noise level, and dynamic range than AudioGate (column 2).

Let's now have a look at some individual graphs to see what's going on - here's using AudioGate to encode PCM-to-DSD:
Frequency Response
Notice the different software used to convert DSD back to PCM all have different low pass filters. As expected, PCM (white) is flat all the way to 48kHz. AudioGate (green) uses a very weak filter and is only attenuated by <1dB at 48kHz, followed by Saracon. JRiver at the default "Safe" setting has a steep 24kHz 48dB/octave slope applied as noted here (you can change this if you want up to 30kHz cutoff, 50kHz cutoff, or filter turned OFF).

Noise Level
There's deviance from the PCM noise floor using AudioGate DSD conversion as you can see. AudioGate is more noisy at converting PCM to DSD than the other programs (as will be evidenced later). The noise floor also isn't as smooth as the others (interesting notch at 10kHz and 20kHz).

In comparison, let's have a look at the JRiver PCM-to-DSD encoding:
Frequency Response
Noise Level
The frequency response curves are similar to the AudioGate DSD encoding representing the respective low-pass filter settings of the DSD-to-PCM converters. The main difference is with the noise level. As you can see, JRiver as DSD encoder is able to maintain a very clean noise floor essentially equivalent to 24-bit PCM until about 13kHz before rising - and this low noise floor is maintained by JRiver and Saracon when reconverting back to PCM. The AudioGate DSD-to-PCM conversion in comparison has a higher noise floor throughout the audible spectrum - perhaps a higher level of dithering is being applied?

Finally, let's look at Saracon used as the PCM-to-DSD encoder:
Frequency Response
Noise Level
A clean PCM-like noise floor all the way to 20kHz is achievable after going through Saracon DSD encoding but this quickly increases thereafter. Again, AudioGate conversion to PCM results in a higher noise floor which I speculate is due to stronger dithering.

III. Conclusion:

Since DSD <--> PCM isn't a straightforward process (like say resampling in PCM), as expected, at a "microscopic level", conversion software does make a difference in resolution.

What is much harder to quantify is audibility. Those frequency response, noise floor, distortion, crosstalk results are all below what I believe are human thresholds of audibility and overall there is minimal change to the 24/96 PCM original signal within the audible frequency range. Remember, the results I show here are with both conversion to DSD and back again to PCM, not just a single conversion step. Yet, I have seen commenters on-line insisting that the conversion results in audible deterioration in sound (even with just a single step like DSD --> PCM).

Looking at these 3 software programs, we can say with some certainty from an objective perspective that Saracon PCM-to-DSD transcoding maintains the lowest noise floor from 20Hz to 20kHz. JRiver is also very good in this respect, while AudioGate's results are less accurate but obviously still very good and of questionable audible significance given that the difference is still below the measured noise floor of all except maybe the very best DACs.

Of course, one has to pay big bucks for Saracon compared to the free AudioGate software!

As for DSD-to-PCM conversion, the main difference appears to be where each program has decided to put the low-pass filter to remove DSD's ultrasonic noise. Of the 3, JRiver has the most conservative low-pass filter at 24kHz (with small notable effect beginning around 20kHz) by default. Saracon allows a bit more to pass through up to around 30kHz, and AudioGate allows essentially everything to pass through up to 48kHz with 24/96 sampling. The only other difference seems to be a stronger dithering algorithm (I'm guessing here) with AudioGate such that the noise floor is marginally higher than the others. Again, we're looking at differences way way down in the noise floor so it really should not be an issue. I think the real question is where you think the low-pass filter should be set for DSD64 material (ie. at what point is recorded ultrasonic signal drowned out by noise and not worth keeping?)

From what I see here, I'm quite happy that Saracon is used in most commercial releases I've come across for DSD-to-PCM conversion. Within the 20Hz to 20kHz audible spectrum, it does appear to be the best even though I highly doubt one could go wrong with any of these. Just remember that the steep low-pass filter in Saracon means there's nothing above ~40kHz and therefore no point buying a Saracon DSD converted file above 96kHz (88kHz is all that's needed).

Over the years, I've listen to original DSD and compared to PCM conversions at 24/88 using Saracon and AudioGate output level matched as best I could (using the TEAC UD-501, never tried formal ABX or blinding). IMO, it's tough to assess since you can't instantaneously switch from DSD to PCM. The PCM converted files sound good to me and I would not hesitate to archive the DSD64 library as 24/88. Whatever difference has always been subtle at best (despite claims from the DSD faithful that somehow DSD sounds much better). I suppose it's possible that different DAC devices could also sound different depending on PCM or DSD input.

Has anyone out there done an ABX or other controlled listening test with DSD-to-PCM conversion? Would love to hear of your experience and preference... 

---------------

Rant of the week...
In the high-fidelity audio world we've often discussed the ills of severe dynamic range compression (DRC). I'm just going to go on my soapbox for a couple minutes and complain also about the ills of DRC for soundtracks these days... Notice how LOUD TV shows have become lately? A couple years ago, I tried watching NBC's Hannibal. Not only was the pacing terrible, meant for folks with ADHD, but the audio was so annoyingly grating that I could not tolerate more than 3 episodes. (I don't know if the series improved after those 3 episodes...)

More recently, I've become annoyed by the recent Cosmos: A Spacetime Odyssey hosted by Neil DeGrasse Tyson playing on Fox and National Geographic Channel. I mean... COME ON PEOPLE! This is a science program. This is a documentary (with some science fiction entertainment thrown in). WHY DOES IT HAVE TO BE SO LOUD? It's like there's no subtlety left... No opportunity to whisper... No opportunity to wonder... No opportunity to enjoy the eye-candy of some excellent CGI graphics without the blaring of some "majestic" soundtrack through many parts of the show. Aren't the ideas being presented supposed to be what it's all about? But yet at times, the narration gets muddled by the background audio.

While I can still enjoy Cosmos 2014 with my kids for the topical presentation, I'm left wondering how much better it could have been to allow the dialogue to take center stage and the background soundtrack to accentuate the emotional impact instead of being ridiculously front-and-center as if I'm supposed to watch this program on a tiny smartphone screen on the subway (maybe that's the target audience!). As usual, it's hard to know who to blame - is it the sound engineers working on this series behind the mixing console or the folks manning the TV station transmitting the signal running it through their compressor? Unfortunate.

I'll end with a quote from Carl Sagan. Certainly worth contemplating when reading comments posted on the Internet in general... (Not just as audiophiles.)

"We live in a society exquisitely dependent on science and technology, in which hardly anyone knows anything about science and technology." Carl Sagan (1989) [good article BTW]

I wonder what Mr. Sagan would think about the current state of affairs regarding the level of understanding of science in our society today. I suspect if he were still alive (he died in 1996), he'd be impressed by the access to information and interconnectedness we have these days through the Internet. That's not necessarily saying a lot though about the level of understanding.


Still a great read after all these years... Originally published 1980.

5 comments:

  1. What happens if you convert back and forth say 10 times ?

    ReplyDelete
    Replies
    1. Good question... I'll check if I have time :-).

      Probably gradual worsening but might not be too bad.

      I certainly hope no studio would do such a thing!

      Delete
  2. Hi Archimago,
    Thanks for this excellent comparison. I use jRiver and it seems it's conversion is about the same as the others (within hearing distance).
    I use jRiver to convert DSD to PCM on the fly. Many people claim that on the fly conversion is not as good as offline conversion (like on the fly flac to wav conversion being worse than streaming wav directly). I am not convinced of this at all. Could you possibly compare on the fly and offline conversion with jRiver and see if there is a difference in the output PCM file or not?
    Thanks a lot and regards

    ReplyDelete
    Replies
    1. Hi Rudolf,
      Haven't tried looking at the on-the-fly vs. pre-converted results DSD --> PCM results... IF there is anything to find, one would imagine the difference would be dependent on whether the CPU is able to keep up with the processing demands. From what I have seen, with a reasonable CPU, this should not be a problem.

      Although not exactly what you're asking, you can see the effect of realtime conversion with my test results of PCM --> DSD realtime upsampling here:
      http://archimago.blogspot.ca/2013/09/measurements-pcm-to-dsd-upsampling.html

      Those tests were done with JRiver converting PCM to DSD64/128 in realtime. This would tax the CPU processing even more than DSD --> PCM yet I'm seeing quite reasonable numbers off the DAC compared to native PCM playback. The test above was done with an inexpensive AMD A10-5800K APU. These days, even my $65 Pentium G3220 processor has no problem upsampling PCM 24/192 --> DSD128. Furthermore, there's nothing outrageously wrong with the jitter spectrum compared to native PCM playback either.

      Bottom line. Although not specifically tested, I would be very surprised if there were audible differences unless your CPU is unable to keep up and you get buffer under-run issues which would be very noticeable.

      Delete
  3. For 32bit fp 384KHz (using the new Rightmark 6.4) there's much less dithering going on, so conversion is pretty clean. Also, as there are no modulator overload, 10th order CRFB can and should be used.
    This is streamlined 384/32fp->DSD->384/32fp conversion using saracon.

    DSD looks like a mighty good archival format from these graphs:

    http://xe5.net/si/DSD_Comparison.htm

    ReplyDelete