NAME
soxexam - SoX Examples (CHEAT SHEET)
CONVERSIONS
Introduction In general, SoX will
attempt to take an input sound file format and convert it into a
new file format using a similar data type and sample rate. For
instance, "sox monkey.au monkey.wav" would try and convert the mono
8000Hz u-law sample .au file that comes with SoX to a 8000Hz u-law
.wav file. If an output format doesn't support the same data type
as the input file then SoX will generally select a default data
type to save it in. You can override the default data type
selection by using command line options. This is also useful for
producing an output file with higher or lower precision data and/or
sample rate. Most file formats that contain headers can
automatically be read in. When working with header-less file
formats then a user must manually tell SoX the data type and sample
rate using command line options. When working with header-less
files (raw files), you may take advantage of the pseudo-file types
of .ub, .uw, .sb, .sw, .ul, and .sl. By using these extensions on
your filenames you will not have to specify the corresponding
options on the command line. Precision The following data
types and formats can be represented by their total uncompressed
bit precision. When converting from one data type to another care
must be taken to insure it has an equal or greater precision. If
not then the audio quality will be degraded. This is not always a
bad thing when your working with things such as voice audio and are
concerned about disk space or bandwidth of the audio data.
Data Format Precision
___________
_________
unsigned byte
8-bit
signed byte
8-bit
u-law
14-bit
A-law
13-bit
unsigned word
16-bit
signed word
16-bit
ADPCM
16-bit
GSM 16-bit
unsigned long
32-bit
signed long
32-bit
___________
_________
Examples Use the '-V' option on all your command lines. It
makes SoX print out its idea of what is going on. '-V' is your
friend. To convert from unsigned bytes at 8000 Hz to signed words
at 8000 Hz:
sox -r 8000 -c 1 filename.ub newfile.sw
To convert from Apple's AIFF format to Microsoft's WAV format:
sox filename.aiff filename.wav To convert
from mono raw 8000 Hz 8-bit unsigned PCM data to a WAV file:
sox -r 8000 -u -b -c 1 filename.raw filename.wav
SoX may even be used to convert sample rates. Downconverting will
reduce the bandwidth of a sample, but will reduce storage space on
your disk. All such conversions are lossy and will introduce some
noise. You should really pass your sample through a low pass filter
prior to downconverting as this will prevent alias signals (which
would sound like additional noise). For example to convert from a
sample recorded at 11025 Hz to a u-law file at 8000 Hz sample
rate:
sox infile.wav -t au -r 8000 -U -b -c 1 outputfile.au
To add a low-pass filter (note use of stdout for output of the
first stage and stdin for input on the second stage):
sox infile.wav -t raw -s -w -c 1 - lowpass 3700 |
sox -t raw -r 11025 -s -w -c 1 - -t au -r 8000 -U -b -c 1 ofile.au
If you hear some clicks and pops when converting to u-law or A-law,
reduce the output level slightly, for example this will decrease it
by 20%:
sox infile.wav -t au -r 8000 -U -b -c 1 -v .8 outputfile.au
SoX is great to use along with other command line
programs by passing data between the programs using pipelines. The
most common example is to use mpg123 to convert mp3 files in to wav
files. The following command line will do this:
mpg123 -b 10000 -s filename.mp3 | sox -t raw -r 44100 -s -w -c 2 - filename.wav
When working with totally unknown audio data then the "auto" file
format may be of use. It attempts to guess what the file type is
and then you may save it into a known audio format.
sox -V -t auto filename.snd filename.wav
It is important to understand how the internals of SoX work
with compressed audio including u-law, A-law, ADPCM, or GSM.
SoX takes ALL input data types and converts them to
uncompressed 32-bit signed data. It will then convert this internal
version into the requested output format. This means additional
noise can be introduced from decompressing data and then
recompressing. If applying multiple effects to audio data, it is
best to save the intermediate data as PCM data. After the final
effect is performed, then you can specify it as a compressed output
format. This will keep noise introduction to a minimum. The
following example applies various effects to an 8000 Hz ADPCM input
file and then end up with the final file as 44100 Hz ADPCM.
sox firstfile.wav -r 44100 -s -w secondfile.wav
sox secondfile.wav thirdfile.wav swap
sox thirdfile.wav -a -b finalfile.wav mask
Under a DOS shell, you can convert several audio files to an new
output format using something similar to the following command
line:
FOR %X IN (*.RAW) DO sox -r 11025 -w -s -t raw $X $X.wav
EFFECTS
Special thanks goes to Juergen Mueller (jmeuller@uia.au.ac.be) for this
write up on effects. Introduction: The core problem is that
you need some experience in using effects in order to say "that any
old sound file sounds with effects absolutely hip". There isn't any
rule-based system which tell you the correct setting of all the
parameters for every effect. But after some time you will become an
expert in using effects. Here are some examples which can be used
with any music sample. (For a sample where only a single instrument
is playing, extreme parameter setting may make well-known
"typically" or "classical" sounds. Likewise, for drums, vocals or
guitars.) Single effects will be explained and some given parameter
settings that can be used to understand the theory by listening to
the sound file with the added effect. Using multiple effects in
parallel or in series can result either in a very nice sound or
(mostly) in a dramatic overloading in variations of sounds such
that your ear may follow the sound but you will feel unsatisfied.
Hence, for the first time using effects try to compose them as
minimally as possible. We don't regard the composition of effects
in the examples because too many combinations are possible and you
really need a very fast machine and a lot of memory to play them in
real-time. However, real-time playing of sounds will greatly speed
up learning and/or tuning the parameter settings for your sounds in
order to get that "perfect" effect. Basically, we will use the
"play" front-end of SoX since it is easier to listen sounds coming
out of the speaker or earphone instead of looking at cryptic data
in sound files. For easy listening of file.xxx ("xxx" is any sound
format): playfile.xxxeffect-nameeffect-parameters Or
more SoX-like (for "dsp" output on a UNIX/Linux computer):
soxfile.xxx-tossdsp-w-s/dev/dspeffect-name
effect-parameters or (for "au" output):
soxfile.xxx-tsunau-w-s/dev/audioeffect-name
effect-parameters And for date freaks:
soxfile.xxxfile.yyyeffect-nameeffect-parameters
Additional options can be used. However, in this case, for
real-time playing you'll need a very fast machine. Notes: I played
all examples in real-time on a Pentium 100 with 32 MB and Linux
2.0.30 using a self-recorded sample ( 3:15 min long in "wav" format
with 44.1 kHz sample rate and stereo 16 bit ). The sample should
not contain any of the effects. However, if you take any recording
of a sound track from radio or tape or CD, and it sounds like a
live concert or ten people are playing the same rhythm with their
drums or funky-grooves, then take any other sample. (Typically,
less then four different instruments and no synthesizer in the
sample is suitable. Likewise, the combination vocal, drums, bass
and guitar.) Effects: Echo An echo effect can be naturally
found in the mountains, standing somewhere on a mountain and
shouting a single word will result in one or more repetitions of
the word (if not, turn a bit around and try again, or climb to the
next mountain). However, the time difference between shouting and
repeating is the delay (time), its loudness is the decay. Multiple
echos can have different delays and decays. It is very popular to
use echos to play an instrument with itself together, like some
guitar players (Brain May from Queen) or vocalists are doing. For
music samples of more than one instrument, echo can be used to add
a second sample shortly after the original one. This will sound as
if you are doubling the number of instruments playing in the same
sample: playfile.xxxecho0.80.8860.00.4
If the delay is very short, then it sound like a (metallic) robot
playing music:
playfile.xxxecho0.80.886.00.4 Longer
delay will sound like an open air concert in the mountains:
playfile.xxxecho0.80.91000.00.3 One
mountain more, and:
playfile.xxxecho0.80.91000.00.31800.0
0.25 Echos Like the echo effect, echos stand for "ECHO
in Sequel", that is the first echos takes the input, the second the
input and the first echos, the third the input and the first and
the second echos, ... and so on. Care should be taken using many
echos (see introduction); a single echos has the same effect as a
single echo. The sample will be bounced twice in symmetric echos:
playfile.xxxechos0.80.7700.00.25700.0
0.3 The sample will be bounced twice in asymmetric echos:
playfile.xxxechos0.80.7700.00.25900.0
0.3 The sample will sound as if played in a garage:
playfile.xxxechos0.80.740.00.2563.00.3
Chorus The chorus effect has its name because it will often
be used to make a single vocal sound like a chorus. But it can be
applied to other instrument samples too. It works like the echo
effect with a short delay, but the delay isn't constant. The delay
is varied using a sinusoidal or triangular modulation. The
modulation depth defines the range the modulated delay is played
before or after the delay. Hence the delayed sound will sound
slower or faster, that is the delayed sound tuned around the
original one, like in a chorus where some vocals are a bit out of
tune. The typical delay is around 40ms to 60ms, the speed of the
modulation is best near 0.25Hz and the modulation depth around 2ms.
A single delay will make the sample more overloaded:
playfile.xxxchorus0.70.955.00.40.252.0-t
Two delays of the original samples sound like this:
playfile.xxxchorus0.60.950.00.40.252.0-t
60.00.320.41.3-s A big chorus of the sample is
(three additional samples):
playfile.xxxchorus0.50.950.00.40.252.0-t
60.00.320.42.3-t40.00.30.3-s
Flanger The flanger effect is like the chorus effect, but
the delay varies between 0ms and maximal 5ms. It sound like wind
blowing, sometimes faster or slower including changes of the speed.
The flanger effect is widely used in funk and soul music, where the
guitar sound varies frequently slow or a bit faster. The typical
delay is around 3ms to 5ms, the speed of the modulation is best
near 0.5Hz. Now, let's groove the sample:
playfile.xxxflanger0.60.873.00.90.5-s
listen carefully between the difference of sinusoidal and
triangular modulation:
playfile.xxxflanger0.60.873.00.90.5-t
If the decay is a bit lower, than the effect sounds more popular:
playfile.xxxflanger0.80.883.00.40.5-t
The drunken loudspeaker system:
playfile.xxxflanger0.90.94.00.231.3-s
Reverb The reverb effect is often used in audience hall
which are to small or contain too many many visitors which disturb
(dampen) the reflection of sound at the walls. Reverb will make the
sound be perceived as if it were in a large hall. You can try the
reverb effect in your bathroom or garage or sport halls by shouting
loud some words. You'll hear the words reflected from the walls.
The biggest problem in using the reverb effect is the correct
setting of the (wall) delays such that the sound is realistic and
doesn't sound like music playing in a tin can or has overloaded
feedback which destroys any illusion of playing in a big hall. To
help you obtain realistic reverb effects, you should decide first
how long the reverb should take place until it is not loud enough
to be registered by your ears. This is be done by varying the
reverb time "t". To simulate small halls, use 200ms. To simulate
large halls, use 1000ms. Clearly, the walls of such a hall aren't
far away, so you should define its setting be given every wall its
delay time. However, if the wall is to far away for the reverb
time, you won't hear the reverb, so the nearest wall will be best
at "t/4" delay and the farthest at "t/2". You can try other
distances as well, but it won't sound very realistic. The walls
shouldn't stand to close to each other and not in a multiple
integer distance to each other ( so avoid wall like: 200.0 and
202.0, or something like 100.0 and 200.0 ). Since audience halls do
have a lot of walls, we will start designing one beginning with one
wall: playfile.xxxreverb1.0600.0180.0 One wall
more:
playfile.xxxreverb1.0600.0180.0200.0
Next two walls:
playfile.xxxreverb1.0600.0180.0200.0220.0
240.0 Now, why not a futuristic hall with six walls:
playfile.xxxreverb1.0600.0180.0200.0220.0
240.0280.0300.0 If you run out of machine power or
memory, then stop as many applications as possible (every interrupt
will consume a lot of CPU time which for bigger halls is absolutely
necessary). Phaser The phaser effect is like the flanger
effect, but it uses a reverb instead of an echo and does phase
shifting. You'll hear the difference in the examples comparing both
effects (simply change the effect name). The delay modulation can
be sinusoidal or triangular, preferable is the later for multiple
instruments. For single instrument sounds, the sinusoidal phaser
effect will give a sharper phasing effect. The decay shouldn't be
to close to 1.0 which will cause dramatic feedback. A good range is
about 0.5 to 0.1 for the decay. We will take a parameter setting as
for the flanger before (gain-out is lower since feedback can raise
the output dramatically):
playfile.xxxphaser0.80.743.00.40.5-t
The drunken loudspeaker system (now less alcohol):
playfile.xxxphaser0.90.854.00.231.3-s
A popular sound of the sample is as follows:
playfile.xxxphaser0.890.851.00.242.0-t
The sample sounds if ten springs are in your ears:
playfile.xxxphaser0.60.663.00.62.0-t
Compander The compander effect allows the dynamic range of a
signal to be compressed or expanded. For most situations, the
attack time (response to the music getting louder) should be
shorter than the decay time because our ears are more sensitive to
suddenly loud music than to suddenly soft music. For example,
suppose you are listening to Strauss' "Also Sprach Zarathustra" in
a noisy environment such as a car. If you turn up the volume enough
to hear the soft passages over the road noise, the loud sections
will be too loud. You could try this:
playfile.xxxcompand0.3,1-90,-90,-70,-70,-60,-20,0,0-5
00.2 The transfer function ("-90,...") says that very
soft sounds between -90 and -70 decibels (-90 is about the limit of
16-bit encoding) will remain unchanged. That keeps the compander
from boosting the volume on "silent" passages such as between
movements. However, sounds in the range -60 decibels to 0 decibels
(maximum volume) will be boosted so that the 60-dB dynamic range of
the original music will be compressed 3-to-1 into a 20-dB range,
which is wide enough to enjoy the music but narrow enough to get
around the road noise. The -5 dB output gain is needed to avoid
clipping (the number is inexact, and was derived by
experimentation). The 0 for the initial volume will work fine for a
clip that starts with a bit of silence, and the delay of 0.2 has
the effect of causing the compander to react a bit more quickly to
sudden volume changes. Changing the Rate of Playback You can
use stretch to change the rate of playback of an audio sample while
preserving the pitch. For example to play at 1/2 the speed:
playfile.wavstretch2 To play a file at twice the
speed: playfile.wavstretch.5 Other related options
are "speed" to change the speed of play (and changing the pitch
accordingly), and pitch, to alter the pitch of a sample. For
example to speed a sample so it plays in 1/2 the time (for those
Mickey Mouse voices): playfile.wavspeed2 To raise the
pitch of a sample 1 while note (100 cents):
playfile.wavpitch100
Reducing noise in a recording First find a period of
silence in your recording, such as the beginning or end of a piece.
If the first 1.5 seconds of the recording are silent, do
soxfile.wav-tnul/dev/nulltrim01.5noiseprof/tmp/profile
Next, use the noisered effect to actually reduce the noise:
playfile.wavnoisered/tmp/profile
Other effects (copy, rate, avg, stat, vibro, lowp, highp,
band, reverb) The other effects are simple to use. However, an
"easy to use manual" should be given here. More effects (to do
!) There are a lot of effects around like noise gates,
compressors, waw-waw, stereo effects and so on. They should be
implemented, making SoX more useful in sound mixing techniques
coming together with a great variety of different sound effects.
Combining effects by using them in parallel or serially on
different channels needs some easy mechanism which is stable for
use in real-time. Really missing are the the changing of the
parameters and starting/stopping of effects while playing samples
in real-time! Good luck and have fun with all the effects!
Juergen
Mueller (
jmueller@uia.ua.ac.be)
SEE ALSO
sox(1), play(1),
rec(1)
AUTHOR
Juergen
Mueller (
jmueller@uia.ua.ac.be)
Updates by Anonymous.