sr-convert
sample rate conversion utilityVersion 0.8 — February 21, 2003
What It Is. sr-convert
is a command-line utility which converts
.WAV
files from one sampling rate to another. It is designed to have high fidelity while also offering
reasonable performance.
sr-convert
supports both upsampling (converting to a higher sampling rate) and downsampling
(converting to a lower sampling rate). When upsampling, sr-convert
carefully preserves all the
frequencies present in the original file while eliminating noise in higher frequencies. Low-frequency files upsampled
with sr-convert
usually sound better than the originals, because sr-convert
eliminates
all high-frequency noise, and most PC hardware is not designed to do that during playback. (In effect, the file
sounds better because you’re using sr-convert
’s digital filter instead of using the filters
built into your sound card and/or device drivers.)
When downsampling, sr-convert
preserves all the frequences that can be present in the output file
while eliminating frequencies that cannot be represented correctly. Therefore, the kind of noise known as
aliasing (produced when a high-frequency sound is sampled at a rate too low to represent it) is virtually
eliminated.
Upsampling and downsampling are both done using linear convolution and an all-pass filter. This method is vastly superior to duplicating and dropping samples, and is also superior to linear or polynomial interpolation. One side effect of this approach is that a fraction of a second of silence is added to each end of the file.
Licensing. This program is licensed under the terms of the GNU General Public License.
System Requirements. The precompiled MinGW binary should run on any 32-bit
Windows machine. This includes Windows 95, Windows 98, Windows ME, Windows NT 3.51, Windows NT 4, Windows 2000, and
Windows XP. As of this writing, it has been tested only under Windows 2000. However, the programmer notes that
sr-convert
does not make any use of the Windows API, beyond what use the C++ standard library makes of
it, and therefore it is reasonable to assume that sr-convert
will work on other Windows platforms without
difficulty.
sr-convert
will take advantage of the Streaming SIMD Extensions if your processor has them, but they
are not required.
sr-convert
uses the CPU the whole time it is running. If this adversely affects system performance
then you might wish to use “start /low
” to run it in its own window at a low priority.
Memory requirements vary according to the sampling rates being converted, but shouldn’t ever exceed 32 megabytes. This program does not load entire WAV files into memory.
Installation. The sr-convert.exe
executable stands alone; to
install it, simply move it to somewhere on your PATH, or else alter your PATH to include the directory where it
is. You can also execute it from the command line if it is in the current directory or if you specify the full path to
the .EXE
’s location.
sr-convert
does not maintain a configuration file and does not use the Windows registry. Deleting it
is as simple as deleting the .EXE
.
Use. To use the program, write a command-line like this:
sr-convert
infileinrate
outfile
outrate
infile should be the name of the input file, including the extension, such as
file1.wav
. In this version, the input file must be PCM with 8-bit samples or 16-bit samples. These
are the most commonly occurring sample formats for WAV files. The WAV file format supports a wide variety of
additional sample formats which this program does not yet support (such as 12-bit samples, 24-bit samples,
floating-point samples, mu-law, A-law, etc.) and some which it never will support (for example, it is possible to put
a WAV file header on an MP3, but this program would have to include an MP3 decoder in order to convert such an input
file).
inrate should be the sampling rate of the input file, or a single hyphen
(“-
”) if you want sr-convert
to get the rate from the file itself. The single
hyphen is most often used. Note that regardless of whether the rate comes from the file or from the command line, it
must be one of the supported sampling rates. Any rate you specify will override the rate
specified in the file.
outfile should be the name of the output file, including the extension if you
want one. If the output file already exists, it will be overwritten. The output file should not be the same as the
input file, or the program will overwrite its own input and crash. For maximum quality, sr-convert
always
produces a 16-bit PCM output file.
outrate is the sampling rate you would like the output file to have. It can be any of the supported sampling rates.
sr-convert
prints a lot of “debugging information” as it prepares to run; this is normal
and can be ignored. Then it prints dots and occasionally exclamation marks as it runs. An exclamation mark indicates
that clipping occurred when encoding the output into PCM; this is most common when clipping occurs in the input
file. If the output sampling rate is very low, then sr-convert
will print dots very slowly.
Every three million bytes, sr-convert
updates the .WAV
file header. Some programs such as
Sound Recorder can load the partial .WAV
file based on this header, even as the sample-rate converter
continues to run. Other programs such as WinAMP won’t play the file until it is finished.
Trading Quality for Space and Time. sr-convert
uses a digital
filter in order to convert from one sampling rate to another. The impulse response of this filter is the way it
responds over time to a single 1-value sample surrounded by 0-value samples (an impulse). As the impulse
response gets longer, the quality improves, but sr-convert
has to do more arithmetic and the impulse
response takes up more space in memory.
The impulse response is a mathematical function, and it doesn’t have a sampling rate until it is sampled
prior to the conversion. Its size is more easily measured by counting the points where it crosses zero; these points
are called nulls. The length of the impulse response defaults to 500 nulls, and this produces exceptional
quality. A different value, which must be a positive integer, can be given to sr-convert
as follows:
sr-convert
infileinrate
outfile
outrate
nulls
Smaller values of nulls produce more speed at the expense of quality, and
larger values produce more quality at the expense of speed. However, there is a limit to how fast
sr-convert
can go, because it doesn’t spend all its time traversing the impulse response. On
the other hand, very long impulse responses might stop offering additional quality after a point, instead
leading only to an increased accumulation of floating-point round-off error. (This shouldn’t happen for a while,
though.)
The number of nulls also determines the amount of silence added to the ends of the output file.
sr-convert
supports a wide variety of sampling rates (68 at last counting) right out of the box, and
more can be added if you have a Scheme interpreter and you’re willing to recompile the source code. You can
convert from any sampling rate directly to any other. (sr-convert
will automatically create multi-stage
conversions if necessary. It is better to let sr-convert
do it itself, since it uses floating-point
samples between stages.)
You can get a list of all supported sampling rates by typing sr-convert
with no arguments.
Sampling rates are divided into several groups.
Common Rates. These are rates that appear in typical applications.
44100. This is the rate of Compact Disc digital audio.
22050 and 11025. These rates, which are one-half and one-quarter of the CD rates, were introduced by Microsoft as part of the Multimedia PC requirements. The sound effects in Windows 3.1 were at 11025, as well as those in Doom.
48000. This rate is used by professional recording studio equipment, on DVDs, and in some audio codecs.
24000 and 12000. These correspond to 48000 in the same way that 22050 and 11025 correspond to 44100.
32000. This rate is the lowest rate supported by MP3 high-quality encoding. Way back when the author had a 60 MHz Pentium and a motherboard which supported only one of the two DMA channels the sound card wanted, he noted that a 32 kHz MP3 would play more smoothly and consume less CPU than a 44.1kHz MP3.
16000. This rate, along with 22050 and 24000, is supported by lower-quality MP2 encoding.
8000 and 7350. 8000 is the sampling rate of telephone signals, and 7350 corresponds to 44100 the same way that 8000 corresponds to 48000.
6000 and 5512+1/2. These are half of the 12000 and 11025 rates; they occur rarely, because of their poor
sound quality. sr-convert
uses the integer 5512 to represent 5512+1/2.
64000, 128000, 88200, 176400, 96000, 192000. These are two and four times the regular sampling rates of 32000, 44100, and 48000, and are supported by some high-quality codecs and by DVD audio.
Bandwidth-sharing Rates. Sometimes it might be useful to have two signals occupy the same space as a single larger signal. For example, if you have bandwidth enough for a 48000-hz signal, then you might wish to split it into an 8000-hz signal and a 40000-hz signal. In order to do this, 40000 has to be supported as a rate.
So far there are few applications of bandwidth-sharing, but they were added because they were easy to support. The mathematics behind sample-rate conversion favors ratios, and the ratios between these rates are simple.
These rates can also be used to reduce file sizes without reducing quality very much. For example, using 36750 instead of 44100 is unlikely to create an audibly noticeable deterioration in quality, but it can save several megabytes of disk space.
14700, 29400, 33075, and 36750. These come up along with 7350, 11025, and 22050 when splitting the bandwidth of a 44100 signal. 14700 is a third, 29400 is two-thirds, 33075 is three quarters, and 36750 is five-sixths.
51450, 55125, 58800, 66150, 73500, 77175. These come up when splitting an 88200 signal. They correspond to 1+1/6, 1+1/4, 1+1/3, 1+1/2, 1+2/3, 1+3/4 of 44100.
102900, 110250, 117600, 132300, 147000, 154350, 161700. These may come up when splitting a 176400 signal. They correspond to 2+1/3, 2+1/2, 2+2/3, 3, 3+1/3, 3+1/2, and 3+2/3 of 44100.
36000, 40000. These come up along with 8000, 12000, 16000, 24000, and 32000 when splitting 48000.
56000, 60000, 72000, 80000, 84000, 88000. These come up along with 64000 when splitting 96000.
112000, 120000, 144000, 160000, 168000, 176000. These come up along with 128000 when splitting 192000.
Television Production Rates. It is often convenient to pretend that NTSC television (used in the U.S.) has a 60 Hz vertical refresh rate and a 30 Hz frame rate, but FCC regulations specify that the actual rates are 59.94 and 29.97 Hz (for color television). These are 999/1000 of the more convenient rates.
As a result, sometimes video has to be speeded up or slowed down slightly to convert between 30 Hz and 29.97 Hz, and this also affects the audio soundtrack. A 44100 Hz soundtrack on a 29.97 Hz movie might be increased to 44144+16/111 to go with 30 Hz. A 44100 soundtrack on a 30 Hz movie might be decreased to 44055+9/10 to go with 29.97 Hz.
The difference is so slight that the soundtrack doesn’t sound like it has been speeded up or slowed down, but synchronization can be totally destroyed by this effect; by the end of a two-hour movie the difference works out to 7.2 seconds.
Sometimes it becomes necessary to get rid of the oddball sampling rate of a speeded-up or slowed-down file while
preserving the synchronization effects that were introduced by the speedup or slowdown. Sometimes, also, it is
necessary to convert a file to the oddball rate without introducing synchronization artifacts. That is what
sr-convert
can do. (By specifying a different sampling rate for the input file, you can also get it to
introduce these synchronization effects).
(You can also squeeze an additional 4.8 seconds of audio onto an 80-minute CD this way, but it doesn’t really seem worth an hour of processing to get those 4.8 seconds. On the other hand, if your source material needs to be converted anyway, and if you need the 4.8 seconds, it might be worth it.)
(This does nothing about the amount of blank space added to the beginning and end of the output file by the convolution process, either. However, it is possible to edit that out, and a future version may be able to prevent it from being generated.)
It’s important to note that WAV files can represent only integer sampling rates. Therefore,
sr-convert
uses the nearest integer (preferring the even integer in the event of a tie) to represent a
rate — and when reading these rates, it interprets them as the fractional rate. That’s why 5512+1/2 is
written as 5512 on the command-line and in the WAV file, and that’s also why 44055+9/10 is written as
44056. In the reverse direction, sr-convert
always interprets 44056 as 44055+9/10.
31968, 63936, 127872, 44055+9/10, 88111+4/5, 176223+3/5, 47952, 95904, 191808. These are 999/1000 of the common sampling rates 32000, 64000, 128000, 44100, 88200, 176400, 48000, 96000, and 192000.
32032+32/999, 64064+64/999, 128128+128/999, 44144+16/111, 88288+32/111, 176576+64/111, 48048+16/333, 96096+32/333, 192192+64/333. These are 1000/999 of the same common sampling rates.
Other Rates. In order to make sr-convert
support other rates, you
must alter make-tables.scm
(particularly near the line that begins “(define
rates
”) and rebuild the program. However, only rational sampling rates can be supported, and the ratios
between sampling rates must be simple enough that excessive memory is not consumed by the impulse response.
sr-convert
samples the impulse response at the least common multiple of the two sampling rates being
used. Thus, if the sampling rates are 44100 and 48000, then the impulse response is sampled at 7.056 MHz. The nulls
are spaced depending on the lower of the two sampling rates, so in this case they are 160 samples apart, and an
impulse response spanning 500 nulls is 80,001 samples long. A similar method is used for any pair of input sampling
rates. Thus, if you pick two large primes for sampling rates, the impulse response is likely to be so horrendously
long that it can’t be addressed by a 32-bit machine, much less stored in memory.
make-tables.scm
has code to detect such abnormally long sampling rates, and try to break them up. This
is why for some pairs of sampling rates, a two-stage or three-stage conversion is used. These conversions require
intermediate sampling rates which are also supported. If you find yourself encountering abnormally long impulse
responses, then try to factor both of the offending sampling rates and produce an “intermediate” rate or
two. Then let make-tables.scm
construct a two-stage or three-stage converter.
For example, if you are converting from 65536 to 59049, the intermediate sampling rate is over 3.8 gigahertz. However, you can take advantage of the fact that 65536 = pow(2,16) and 59049 = pow(3,10), and you can create intermediate sampling rates that consist of multiples of 2 and 3, such as 62208, which is pow(2,8)*pow(3,5). However, if you have to convert from 99929 to 123419, then you are out of luck, because both numbers are prime.
When modifying make-tables.scm
, bear in mind that it has a strict indentation style of two spaces for
every open parenthesis (and no tabs). This style is easily checked by automated tools and helps debug missing or extra
parentheses, which is a common problem in Scheme programs.
(Note added Jun 13 2003: sr-convert
comes with a binary. If that binary works, you don’t need to
build it at all, unless you want to change things. Also, the output of the Scheme program is included, so if you
don’t modify the Scheme part of the source code, you can build without a Scheme interpreter.)
In order to build sr-convert
from source, you need the MinGW
compiler for Win32 and a Scheme interpreter such as Gambit
Scheme. You’re going to have to modify the Makefile
to point to the correct location of the
Scheme interpreter. (If it’s on your PATH
, as mine should have been, then you can specify just
gsi
without a full path.) The Scheme program uses only R4RS Scheme, so it should be easy to find Scheme
implementations capable of running it. If you need to use another Scheme implementation besides Gambit, the Scheme
code should run with little or no modification, but it has not yet been tested.
The Scheme interpreter is used to construct a skips.out
file from the make-tables.scm
file. If you don’t have a Scheme interpreter, you can’t build a new skips.out
file, but you
can build the program using the included skips.out
file. If make-tables.scm
is never
modified, then the Makefile will never try to build skips.out
and the Scheme interpreter specified in the
Makefile will never be used.
Building under Linux. Building under Linux is substantially the same as
building under MinGW. There is a version of Gambit Scheme for Linux which works the same way as the Windows version,
and g++
under Linux works the same way as it does in MinGW. The only difference is that in Linux, if you
don’t modify the Makefile, you will have to type the .exe
extension in order to run the produced
binary. You can either create a soft link, or rename the program, or modify the Makefile.
Building on CPUs without SSE. If your processor does not have SSE instructions, but is otherwise x86 compatible, then the question comes down to whether your binutils, specifically as, supports SSE instructions. If they do, then the program will require no modification. Even though the assembler will produce SSE instructions that your particular CPU doesn’t have, the program, when it runs, will detect that SSE is not available, and will use the floating-point unit.
Building on Other Configurations. If you have a non-x86 processor, then you can try removing the SSE code entirely, but this requires a knowledge of C++. C++ versions of the SSE functions already exist, and the program uses them on x86 platforms where SSE is not available. Deleting the SSE routines and all references to them will produce an ordinary C++ program which can be compiled with your local compiler.
This program accomplishes the purpose that was originally intended for it, however, it can be extended in a number of ways.
.exe
extension isn’t put on the executable in
Linux.main
function itself.make-tables.scm
finds that conversion is
possible with an absurdly large impulse response, it goes ahead and puts that in the table, which causes the C++
program to attempt to allocate huge amounts of memory.Currently, this version of the program does produce audible aliasing when it is used with very low sampling rates. The cause of this is the nature of the digital filter. An ideal filter, which passes all frequencies from 0 to X Hz and blocks frequencies above X Hz, is mathematically possible, but requires an infinitely long impulse response. This program takes a mathematically perfect but infinite function and multiplies it by a window that makes it finite in time and tapers the ends; this, however, slightly weakens the filter, so that it takes one or two Hz to cut off as the frequency increases toward X and past it, instead of cutting off instantly when the frequency reaches X. When X is half the sampling rate and a frequency above X leaks through, this leads to aliasing. But when the sampling rate is very high, there is very little energy in that frequency, and the frequency is very high anyway, so no aliasing is heard. With lower sampling rates, though, this is not true.
The solution to this is to lower the cutoff frequency slightly, say by about 0.5 percent, for low sampling rates. If you pass a very low frequency sweep through that filter, then the frequency will get gradually cut off before aliasing occurs. Implementation of this will have to wait for a future version, though.
The author of this program would like to thank you for testing it. If you would like to suggest any features or report any bugs, please send e-mail to edkiser@users.sourceforge.net. The author would also like to know which of the above enhancements and bugfixes you consider most important.