`sr-convert` sample rate conversion utility

Version 0.8 — February 21, 2003

Quick Start Guide

What It Is
Licensing
System Requirements
Installation
Use
Trading Quality for Space and Time

Supported Sampling Rates

Common Rates
Bandwidth-sharing Rates
Television Production Rates
Other Rates

Building the Program

Building under Linux
Building on CPUs without SSE
Building on Other Configurations

Future Plans
Thank You

Quick Start Guide

What It Is. sr-convert is a command-line utility which converts .WAV files from one sampling rate to another. It is designed to have high fidelity while also offering reasonable performance.

sr-convert supports both upsampling (converting to a higher sampling rate) and downsampling (converting to a lower sampling rate). When upsampling, sr-convert carefully preserves all the frequencies present in the original file while eliminating noise in higher frequencies. Low-frequency files upsampled with sr-convert usually sound better than the originals, because sr-convert eliminates all high-frequency noise, and most PC hardware is not designed to do that during playback. (In effect, the file sounds better because you’re using sr-convert’s digital filter instead of using the filters built into your sound card and/or device drivers.)

When downsampling, sr-convert preserves all the frequences that can be present in the output file while eliminating frequencies that cannot be represented correctly. Therefore, the kind of noise known as aliasing (produced when a high-frequency sound is sampled at a rate too low to represent it) is virtually eliminated.

Upsampling and downsampling are both done using linear convolution and an all-pass filter. This method is vastly superior to duplicating and dropping samples, and is also superior to linear or polynomial interpolation. One side effect of this approach is that a fraction of a second of silence is added to each end of the file.

Licensing. This program is licensed under the terms of the GNU General Public License.

System Requirements. The precompiled MinGW binary should run on any 32-bit Windows machine. This includes Windows 95, Windows 98, Windows ME, Windows NT 3.51, Windows NT 4, Windows 2000, and Windows XP. As of this writing, it has been tested only under Windows 2000. However, the programmer notes that sr-convert does not make any use of the Windows API, beyond what use the C++ standard library makes of it, and therefore it is reasonable to assume that sr-convert will work on other Windows platforms without difficulty.

sr-convert will take advantage of the Streaming SIMD Extensions if your processor has them, but they are not required.

sr-convert uses the CPU the whole time it is running. If this adversely affects system performance then you might wish to use “start /low” to run it in its own window at a low priority.

Memory requirements vary according to the sampling rates being converted, but shouldn’t ever exceed 32 megabytes. This program does not load entire WAV files into memory.

Installation. The sr-convert.exe executable stands alone; to install it, simply move it to somewhere on your PATH, or else alter your PATH to include the directory where it is. You can also execute it from the command line if it is in the current directory or if you specify the full path to the .EXE’s location.

sr-convert does not maintain a configuration file and does not use the Windows registry. Deleting it is as simple as deleting the .EXE.

Use. To use the program, write a command-line like this:

sr-convert infile inrate outfile outrate

infile should be the name of the input file, including the extension, such as file1.wav. In this version, the input file must be PCM with 8-bit samples or 16-bit samples. These are the most commonly occurring sample formats for WAV files. The WAV file format supports a wide variety of additional sample formats which this program does not yet support (such as 12-bit samples, 24-bit samples, floating-point samples, mu-law, A-law, etc.) and some which it never will support (for example, it is possible to put a WAV file header on an MP3, but this program would have to include an MP3 decoder in order to convert such an input file).

inrate should be the sampling rate of the input file, or a single hyphen (“-”) if you want sr-convert to get the rate from the file itself. The single hyphen is most often used. Note that regardless of whether the rate comes from the file or from the command line, it must be one of the supported sampling rates. Any rate you specify will override the rate specified in the file.

outfile should be the name of the output file, including the extension if you want one. If the output file already exists, it will be overwritten. The output file should not be the same as the input file, or the program will overwrite its own input and crash. For maximum quality, sr-convert always produces a 16-bit PCM output file.

outrate is the sampling rate you would like the output file to have. It can be any of the supported sampling rates.

sr-convert prints a lot of “debugging information” as it prepares to run; this is normal and can be ignored. Then it prints dots and occasionally exclamation marks as it runs. An exclamation mark indicates that clipping occurred when encoding the output into PCM; this is most common when clipping occurs in the input file. If the output sampling rate is very low, then sr-convert will print dots very slowly.

Every three million bytes, sr-convert updates the .WAV file header. Some programs such as Sound Recorder can load the partial .WAV file based on this header, even as the sample-rate converter continues to run. Other programs such as WinAMP won’t play the file until it is finished.

Trading Quality for Space and Time. sr-convert uses a digital filter in order to convert from one sampling rate to another. The impulse response of this filter is the way it responds over time to a single 1-value sample surrounded by 0-value samples (an impulse). As the impulse response gets longer, the quality improves, but sr-convert has to do more arithmetic and the impulse response takes up more space in memory.

The impulse response is a mathematical function, and it doesn’t have a sampling rate until it is sampled prior to the conversion. Its size is more easily measured by counting the points where it crosses zero; these points are called nulls. The length of the impulse response defaults to 500 nulls, and this produces exceptional quality. A different value, which must be a positive integer, can be given to sr-convert as follows:

sr-convert infile inrate outfile outrate nulls

Smaller values of nulls produce more speed at the expense of quality, and larger values produce more quality at the expense of speed. However, there is a limit to how fast sr-convert can go, because it doesn’t spend all its time traversing the impulse response. On the other hand, very long impulse responses might stop offering additional quality after a point, instead leading only to an increased accumulation of floating-point round-off error. (This shouldn’t happen for a while, though.)

The number of nulls also determines the amount of silence added to the ends of the output file.

Supported Sampling Rates

sr-convert supports a wide variety of sampling rates (68 at last counting) right out of the box, and more can be added if you have a Scheme interpreter and you’re willing to recompile the source code. You can convert from any sampling rate directly to any other. (sr-convert will automatically create multi-stage conversions if necessary. It is better to let sr-convert do it itself, since it uses floating-point samples between stages.)

You can get a list of all supported sampling rates by typing sr-convert with no arguments.

Sampling rates are divided into several groups.

Common Rates. These are rates that appear in typical applications.

44100. This is the rate of Compact Disc digital audio.
22050 and 11025. These rates, which are one-half and one-quarter of the CD rates, were introduced by Microsoft as part of the Multimedia PC requirements. The sound effects in Windows 3.1 were at 11025, as well as those in Doom.
48000. This rate is used by professional recording studio equipment, on DVDs, and in some audio codecs.
24000 and 12000. These correspond to 48000 in the same way that 22050 and 11025 correspond to 44100.
32000. This rate is the lowest rate supported by MP3 high-quality encoding. Way back when the author had a 60 MHz Pentium and a motherboard which supported only one of the two DMA channels the sound card wanted, he noted that a 32 kHz MP3 would play more smoothly and consume less CPU than a 44.1kHz MP3.
16000. This rate, along with 22050 and 24000, is supported by lower-quality MP2 encoding.
8000 and 7350. 8000 is the sampling rate of telephone signals, and 7350 corresponds to 44100 the same way that 8000 corresponds to 48000.
6000 and 5512+1/2. These are half of the 12000 and 11025 rates; they occur rarely, because of their poor sound quality. sr-convert uses the integer 5512 to represent 5512+1/2.
64000, 128000, 88200, 176400, 96000, 192000. These are two and four times the regular sampling rates of 32000, 44100, and 48000, and are supported by some high-quality codecs and by DVD audio.

Bandwidth-sharing Rates. Sometimes it might be useful to have two signals occupy the same space as a single larger signal. For example, if you have bandwidth enough for a 48000-hz signal, then you might wish to split it into an 8000-hz signal and a 40000-hz signal. In order to do this, 40000 has to be supported as a rate.

So far there are few applications of bandwidth-sharing, but they were added because they were easy to support. The mathematics behind sample-rate conversion favors ratios, and the ratios between these rates are simple.

These rates can also be used to reduce file sizes without reducing quality very much. For example, using 36750 instead of 44100 is unlikely to create an audibly noticeable deterioration in quality, but it can save several megabytes of disk space.

14700, 29400, 33075, and 36750. These come up along with 7350, 11025, and 22050 when splitting the bandwidth of a 44100 signal. 14700 is a third, 29400 is two-thirds, 33075 is three quarters, and 36750 is five-sixths.
51450, 55125, 58800, 66150, 73500, 77175. These come up when splitting an 88200 signal. They correspond to 1+1/6, 1+1/4, 1+1/3, 1+1/2, 1+2/3, 1+3/4 of 44100.
102900, 110250, 117600, 132300, 147000, 154350, 161700. These may come up when splitting a 176400 signal. They correspond to 2+1/3, 2+1/2, 2+2/3, 3, 3+1/3, 3+1/2, and 3+2/3 of 44100.
36000, 40000. These come up along with 8000, 12000, 16000, 24000, and 32000 when splitting 48000.
56000, 60000, 72000, 80000, 84000, 88000. These come up along with 64000 when splitting 96000.
112000, 120000, 144000, 160000, 168000, 176000. These come up along with 128000 when splitting 192000.

Television Production Rates. It is often convenient to pretend that NTSC television (used in the U.S.) has a 60 Hz vertical refresh rate and a 30 Hz frame rate, but FCC regulations specify that the actual rates are 59.94 and 29.97 Hz (for color television). These are 999/1000 of the more convenient rates.

As a result, sometimes video has to be speeded up or slowed down slightly to convert between 30 Hz and 29.97 Hz, and this also affects the audio soundtrack. A 44100 Hz soundtrack on a 29.97 Hz movie might be increased to 44144+16/111 to go with 30 Hz. A 44100 soundtrack on a 30 Hz movie might be decreased to 44055+9/10 to go with 29.97 Hz.

The difference is so slight that the soundtrack doesn’t sound like it has been speeded up or slowed down, but synchronization can be totally destroyed by this effect; by the end of a two-hour movie the difference works out to 7.2 seconds.

Sometimes it becomes necessary to get rid of the oddball sampling rate of a speeded-up or slowed-down file while preserving the synchronization effects that were introduced by the speedup or slowdown. Sometimes, also, it is necessary to convert a file to the oddball rate without introducing synchronization artifacts. That is what sr-convert can do. (By specifying a different sampling rate for the input file, you can also get it to introduce these synchronization effects).

(You can also squeeze an additional 4.8 seconds of audio onto an 80-minute CD this way, but it doesn’t really seem worth an hour of processing to get those 4.8 seconds. On the other hand, if your source material needs to be converted anyway, and if you need the 4.8 seconds, it might be worth it.)

(This does nothing about the amount of blank space added to the beginning and end of the output file by the convolution process, either. However, it is possible to edit that out, and a future version may be able to prevent it from being generated.)

It’s important to note that WAV files can represent only integer sampling rates. Therefore, sr-convert uses the nearest integer (preferring the even integer in the event of a tie) to represent a rate — and when reading these rates, it interprets them as the fractional rate. That’s why 5512+1/2 is written as 5512 on the command-line and in the WAV file, and that’s also why 44055+9/10 is written as 44056. In the reverse direction, sr-convert always interprets 44056 as 44055+9/10.

31968, 63936, 127872, 44055+9/10, 88111+4/5, 176223+3/5, 47952, 95904, 191808. These are 999/1000 of the common sampling rates 32000, 64000, 128000, 44100, 88200, 176400, 48000, 96000, and 192000.
32032+32/999, 64064+64/999, 128128+128/999, 44144+16/111, 88288+32/111, 176576+64/111, 48048+16/333, 96096+32/333, 192192+64/333. These are 1000/999 of the same common sampling rates.

Other Rates. In order to make sr-convert support other rates, you must alter make-tables.scm (particularly near the line that begins “(define rates”) and rebuild the program. However, only rational sampling rates can be supported, and the ratios between sampling rates must be simple enough that excessive memory is not consumed by the impulse response.

sr-convert samples the impulse response at the least common multiple of the two sampling rates being used. Thus, if the sampling rates are 44100 and 48000, then the impulse response is sampled at 7.056 MHz. The nulls are spaced depending on the lower of the two sampling rates, so in this case they are 160 samples apart, and an impulse response spanning 500 nulls is 80,001 samples long. A similar method is used for any pair of input sampling rates. Thus, if you pick two large primes for sampling rates, the impulse response is likely to be so horrendously long that it can’t be addressed by a 32-bit machine, much less stored in memory.

make-tables.scm has code to detect such abnormally long sampling rates, and try to break them up. This is why for some pairs of sampling rates, a two-stage or three-stage conversion is used. These conversions require intermediate sampling rates which are also supported. If you find yourself encountering abnormally long impulse responses, then try to factor both of the offending sampling rates and produce an “intermediate” rate or two. Then let make-tables.scm construct a two-stage or three-stage converter.

For example, if you are converting from 65536 to 59049, the intermediate sampling rate is over 3.8 gigahertz. However, you can take advantage of the fact that 65536 = pow(2,16) and 59049 = pow(3,10), and you can create intermediate sampling rates that consist of multiples of 2 and 3, such as 62208, which is pow(2,8)*pow(3,5). However, if you have to convert from 99929 to 123419, then you are out of luck, because both numbers are prime.

When modifying make-tables.scm, bear in mind that it has a strict indentation style of two spaces for every open parenthesis (and no tabs). This style is easily checked by automated tools and helps debug missing or extra parentheses, which is a common problem in Scheme programs.

Building the Program

(Note added Jun 13 2003: sr-convert comes with a binary. If that binary works, you don’t need to build it at all, unless you want to change things. Also, the output of the Scheme program is included, so if you don’t modify the Scheme part of the source code, you can build without a Scheme interpreter.)

In order to build sr-convert from source, you need the MinGW compiler for Win32 and a Scheme interpreter such as Gambit Scheme. You’re going to have to modify the Makefile to point to the correct location of the Scheme interpreter. (If it’s on your PATH, as mine should have been, then you can specify just gsi without a full path.) The Scheme program uses only R4RS Scheme, so it should be easy to find Scheme implementations capable of running it. If you need to use another Scheme implementation besides Gambit, the Scheme code should run with little or no modification, but it has not yet been tested.

The Scheme interpreter is used to construct a skips.out file from the make-tables.scm file. If you don’t have a Scheme interpreter, you can’t build a new skips.out file, but you can build the program using the included skips.out file. If make-tables.scm is never modified, then the Makefile will never try to build skips.out and the Scheme interpreter specified in the Makefile will never be used.

Building under Linux. Building under Linux is substantially the same as building under MinGW. There is a version of Gambit Scheme for Linux which works the same way as the Windows version, and g++ under Linux works the same way as it does in MinGW. The only difference is that in Linux, if you don’t modify the Makefile, you will have to type the .exe extension in order to run the produced binary. You can either create a soft link, or rename the program, or modify the Makefile.

Building on CPUs without SSE. If your processor does not have SSE instructions, but is otherwise x86 compatible, then the question comes down to whether your binutils, specifically as, supports SSE instructions. If they do, then the program will require no modification. Even though the assembler will produce SSE instructions that your particular CPU doesn’t have, the program, when it runs, will detect that SSE is not available, and will use the floating-point unit.

Building on Other Configurations. If you have a non-x86 processor, then you can try removing the SSE code entirely, but this requires a knowledge of C++. C++ versions of the SSE functions already exist, and the program uses them on x86 platforms where SSE is not available. Deleting the SSE routines and all references to them will produce an ordinary C++ program which can be compiled with your local compiler.

Future Plans

This program accomplishes the purpose that was originally intended for it, however, it can be extended in a number of ways.

More types of WAV files can be supported. There is a new specification for WAV files that support multichannel audio and higher resolution samples; implementing this would be good. (Note written Jun 13 2003: the link has been updated.) (September 25 2005: link updated again.)
There should be an option to get rid of the “moment of silence” added to the ends of the file by the impulse response. This “moment of silence” is mathematically necessary, and it is conceivable that for a given file it would not be entirely silent; it prevents any “pop” from being introduced in the output file, and if there was one in the input file, the added samples allow the “pop” to be filtered along with the rest of the file. However, for some files it is necessary that the audio be synchronized with video or other audio, and the introduction of leading samples throws everything off.
Better command-line argument parsing will be helpful. (Note written Jun 13 2003: I have written better command-line argument parsing in another program, and need to port it over to this program.)
An overview of the existing source code should be written, which will help new programmers find their way around.
More platforms should be supported, and in such a way that the program doesn’t have to be modified. It is already a nuisance to modify the Makefile so that the .exe extension isn’t put on the executable in Linux.
The program currently prints a lot of “noise” as it runs, things such as the values of pointers, and lots of dots. It might be better if it simply gave a percent-done indication, since the size of the output can be computed exactly. Code to print debugging information can be retained, but it should be printed only in a debugging mode.
The existing C++ code could stand to be refactored for clarity in a few places, particularly in the main function itself.
It is theoretically possible to generalize the handling of multi-stage conversions to support any number of stages. This might make the code cleaner even if the actual number of stages never increases.
There needs to be a way for the Scheme program to create a table entry that indicates that a certain pair of sampling rates is not possible to convert. Right now, if make-tables.scm finds that conversion is possible with an absurdly large impulse response, it goes ahead and puts that in the table, which causes the C++ program to attempt to allocate huge amounts of memory.

Currently, this version of the program does produce audible aliasing when it is used with very low sampling rates. The cause of this is the nature of the digital filter. An ideal filter, which passes all frequencies from 0 to X Hz and blocks frequencies above X Hz, is mathematically possible, but requires an infinitely long impulse response. This program takes a mathematically perfect but infinite function and multiplies it by a window that makes it finite in time and tapers the ends; this, however, slightly weakens the filter, so that it takes one or two Hz to cut off as the frequency increases toward X and past it, instead of cutting off instantly when the frequency reaches X. When X is half the sampling rate and a frequency above X leaks through, this leads to aliasing. But when the sampling rate is very high, there is very little energy in that frequency, and the frequency is very high anyway, so no aliasing is heard. With lower sampling rates, though, this is not true.

The solution to this is to lower the cutoff frequency slightly, say by about 0.5 percent, for low sampling rates. If you pass a very low frequency sweep through that filter, then the frequency will get gradually cut off before aliasing occurs. Implementation of this will have to wait for a future version, though.

Thank You

The author of this program would like to thank you for testing it. If you would like to suggest any features or report any bugs, please send e-mail to edkiser@users.sourceforge.net. The author would also like to know which of the above enhancements and bugfixes you consider most important.