General options: [ -h ] [ -p ] [ -q ] [ -S ] [ -V ] Format options: [ -t filetype ] [ -r rate ] [ -s/-u/-U/-A/-a/-i/-g/-f ] [ -b/-w/-l/-d ] [ -v volume ] [ -c channels ] [ -x ] [ -e ] Effects: avg [ -l | -r | -f | -b | -1 | -2 | -3 | -4 | n,n,...,n ] band [ -n ] center [ width ] bandpass frequency bandwidth bandreject frequency bandwidth chorus gain-in gain out delay decay speed depth -s | -t [ delay decay speed depth -s | -t ] compand attack1,decay1[, attack2,decay2...] in-dB1,out-dB1[,in-dB2,out-dB2...] [ gain [ initial-volume [ delay ] ] ] copy dcshift shift [ limitergain ] deemph earwax echo gain-in gain-out delay decay [ delay decay ... ] echos gain-in gain-out delay decay [ delay decay ... ] fade [ type ] fade-in-length [ stop-time [ fade-out-length ] ] filter [ low ]-[ high ] [ window-len [ beta ]] flanger gain-in gain-out delay decay speed < -s | -t > highp frequency highpass frequency lowp frequency lowpass frequency mask mcompand "attack1,decay1[, attack2,decay2...] in-dB1,out-dB1[,in-dB2,out-dB2...] [ gain [ initial-volume [ delay ] ] ]" xover_freq noiseprof [profile-file] noisered profile-file [threshold] pan direction phaser gain-in gain-out delay decay speed < -s | -t > pick [ -1 | -2 | -3 | -4 | -l | -r | -f | -b ] pitch shift [ width interpole fade ] polyphase [ -w < nut / ham > ] [ -width < long / short / # > ] [ -cutoff # ] rate repeat count resample [ -qs | -q | -ql ] [ rolloff [ beta ] ] reverb gain-out reverb-time delay [ delay ... ] reverse silence above_periods [ duration threshold[ d | % ] [ below_periods duration threshold[ d | % ]] speed [ -c ] factor stat [ -s n ] [ -rms ] [ -v ] [ -d ] stretch [ factor [ window fade shift fading ] swap [ 1 2 | 1 2 3 4 ] synth [ length ] type mix [ freq [ -freq2 ] [ off ] [ ph ] [ p1 ] [ p2 ] [ p3 ] trim start [ length ] vibro speed [ depth ] vol gain [ type [ limitergain ] ]
Filenames:
SoX can be used as a part of pipe operations by using the special filenames of "-". If specified as an input name, it will read data from stdin. If specified as an output name, it will send data to stdout.
General options:
is mangling your sound samples.
Format options:
Format options effect the input or output file that they immediately precede.
Self describing input files can obtain all the format information directly from the header and so don't generally need format options. Headerless input files lack this information and so format options must be used to inform SoX of the file's data type, sample rate, and number of channels.
By default, SoX attempts to write audio data using the same data type, sample rate, and channel count as the input data. If the user wants the output file to be of a different format then format options can be used to specify the differences.
If an output file format doesn't support the same data type, sample rate, or channel count as the input file format, then SoX will auto select the closest values it does support so that the user does not have to specify these format change options manually.
The avg effect can also be invoked with up to 16 double-precision numbers, seperated by commas, which specify the proportion (0.0 = 0% and 1.0 = 100%) of each input channel that is to be mixed into each output channel. In two-channel mode, 4 numbers are given: l->l, l->r, r->l, and r->r, respectively. In four-channel mode, the first 4 numbers give the proportions for the left-front output channel, as follows: lf->lf, rf->lf, lb->lf, and rb->rf. The next 4 give the right-front output in the same order, then left-back and right-back.
It is also possible to use the 16 numbers to expand or reduce the channel count; just specify 0 for unused channels.
Finally, certain reduced combination of numbers can be specified for certain input/output channel combinations.
In Ch Out Ch Num Mappings _____ ______ ___ _____________________________ 2 1 2 l->l, r->l 2 2 1 adjust balance 4 1 4 lf->l, rf->l, lb->l, rb-l 4 2 2 lf->l&rf->r, lb->l&rb->r 4 4 1 adjust balance 4 4 2 front balance, back balance
The third (optional) parameter is a post-processing gain in dB which is applied after the compression has taken place; the fourth (optional) parameter is an initial volume to be assumed for each channel when the effect starts. This permits the user to supply a nominal level initially, so that, for example, a very large gain is not applied to initial signal levels before the companding action has begun to operate: it is quite probable that in such an event, the output would be severely clipped while the compander gain properly adjusts itself.
The fifth (optional) parameter is a delay in seconds. The input signal is analyzed immediately to control the compander, but it is delayed before being fed to the volume adjuster. Specifying a delay approximately equal to the attack/decay times allows the compander to effectively operate in a "predictive" rather than a reactive mode.
For fade-ins, this starts from the first sample and ramps the volume of the audio from 0 to full volume over fade-in-length seconds. Specify 0 seconds if no fade-in is wanted.
For fade-outs, the audio data will be truncated at the stop-time and the volume will be ramped from full volume down to 0 starting at fade-out-length seconds before the stop-time. If fade-out-length is not specified, it defaults to the same value as fade-in-length. No fade-out is performed if the stop-time is not specified. All times can be specified in either periods of time or sample counts. To specify time periods use the format hh:mm:ss.frac format. To specify using sample counts, specify the number of samples and append the letter 's' to the sample count (for example 8000s). An optional type can be specified to change the type of envelope. Choices are q for quarter of a sinewave, h for half a sinewave, t for linear slope, l for logarithmic, and p for inverted parabola. The default is a linear slope.
A lowpass filter is obtained by leaving low unspecified, or 0. A highpass filter is obtained by leaving high unspecified, or 0, or greater than or equal to the Nyquist frequency.
The window-len, if unspecified, defaults to 128. Longer windows give a sharper cutoff, smaller windows a more gradual cutoff.
The beta, if unspecified, defaults to 16. This selects a Kaiser window. You can select a Nuttall window by specifying anything <= 2.0 here. For more discussion of beta, look under the resample effect.
Multi-band compander is similar to the single band compander but the audio file is first divided up into bands and then the compander is ran on each band. See the compand effect for definition of its options. Compand options are specified between double quotes and the crossover frequency for that band is specefied seperately with xover_fre. This can be repeated multiple times to create multiple bands.
To actually remove the noise, run SoX again with the noisered filter. The filter needs one argument, profile-file, which contains the noise profile from noiseprof. thershold specifies how much noise should be removed, and may be between 0 and 1 with a default of 0.5. Higher values will remove more noise but present a greater possibility of distorting the desired audio signal. Experiment with different threshold values to find the optimal one for your sample.
-w < nut / ham > : select either a Nuttal (~90 dB stopband) or Hamming (~43 dB stopband) window. Default is nut.
-width long / short / # : specify the (approximate) width of the filter. long is 1024 samples; short is 128 samples. Alternatively, an exact number can be used. Default is long. The short option is not recommended, as it produces poor quality results.
-cutoff # : specify the filter cutoff frequency in terms of fraction of frequency bandwidth, also know as the Nyquist frequency. Please see the resample effect for further information on Nyquist frequency. If upsampling, then this is the fraction of the original signal that should go through. If downsampling, this is the fraction of the signal left after downsampling. Default is 0.95. Remember that this is a float.
Lerp-ing is acceptable for cheap 8-bit sound hardware, but for CD-quality sound you should instead use either resample or polyphase. If you are wondering which rate changing effects to use, you will want to read a detailed analysis of all of them at http://leute.server.de/wilde/resample.html
By default, linear interpolation is used, with a window width about 45 samples at the lower of the two rate. This gives an accuracy of about 16 bits, but insufficient stopband rejection in the case that you want to have rolloff greater than about 0.80 of the Nyquist frequency.
The -q* options will change the default values for rolloff and beta as well as use quadratic interpolation of filter coefficients, resulting in about 24 bits precision. The -qs, -q, or -ql options specify increased accuracy at the cost of lower execution speed. It is optional to specify rolloff and beta parameters when using the -q* options.
Following is a table of the reasonable defaults which are built-in to SoX:
Option Window rolloff beta interpolation ------ ------ ------- ---- ------------- (none) 45 0.80 16 linear -qs 45 0.80 16 quadratic -q 75 0.875 16 quadratic -ql 149 0.94 16 quadratic ------ ------ ------- ---- -------------
-qs, -q, or -ql use window lengths of 45, 75, or 149 samples, respectively, at the lower sample-rate of the two files. This means progressively sharper stop-band rejection, at proportionally slower execution times.
rolloff refers to the cut-off frequency of the low pass filter and is given in terms of the Nyquist frequency for the lower sample rate. rolloff therefore should be something between 0.0 and 1.0, in practice 0.8-0.95. The defaults are indicated above.
The Nyquist frequency is equal to (sample rate / 2). Logically, this is because the A/D converter needs at least 2 samples to detect 1 cycle at the Nyquist frequency. Frequencies higher then the Nyquist will actually appear as lower frequencies to the A/D converter and is called aliasing. Normally, A/D converts run the signal through a highpass filter first to avoid these problems.
Similar problems will happen in software when reducing the sample rate of an audio file (frequencies above the new Nyquist frequency can be aliased to lower frequencies). Therefore, a good resample effect will remove all frequency information above the new Nyquist frequency.
The rolloff refers to how close to the Nyquist frequency this cutoff is, with closer being better. When increasing the sample rate of an audio file you would not expect to have any frequencies exist that are past the original Nyquist frequency. Because of resampling properties, it is common to have aliasing data created that is above the old Nyquist frequency. In that case the rolloff refers to how close to the original Nyquist frequency to use a highpass filter to remove this false data, with closer also being better.
The beta parameter determines the type of filter window used. Any value greater than 2.0 is the beta for a Kaiser window. Beta <= 2.0 selects a Nuttall window. If unspecified, the default is a Kaiser window with beta 16.
In the case of Kaiser window (beta > 2.0), lower betas produce a somewhat faster transition from passband to stopband, at the cost of noticeable artifacts. A beta of 16 is the default, beta less than 10 is not recommended. If you want a sharper cutoff, don't use low beta's, use a longer sample window. A Nuttall window is selected by specifying any 'beta' <= 2, and the Nuttall window has somewhat steeper cutoff than the default Kaiser window. You will probably not need to use the beta parameter at all, unless you are just curious about comparing the effects of Nuttall vs. Kaiser windows.
This is the default effect if the two files have different sampling rates. Default parameters are, as indicated above, Kaiser window of length 45, rolloff 0.80, beta 16, linear interpolation.
NOTE: -qs is only slightly slower, but more accurate for 16-bit or higher precision.
NOTE: In many cases of up-sampling, no interpolation is needed, as exact filter coefficients can be computed in a reasonable amount of space. To be precise, this is done when
input_rate < output_rate && output_rate/gcd(input_rate,output_rate) <= 511
The above_periods value is used to indicate if sound should be trimmed at the beginning of the audio file. A value of zero indicates no silence should be trimmed from the beginning. When specifing an non-zero above_periods, it trims audio up until it finds non-silence. Normally, when trimming silence from beginning of audio the above_periods will be 1 but it can be increased to higher values to trim all data up to a specific count of non-silence periods. For example, if you had an audio file with two songs that each contained 2 seconds of silence before the song, you could specify an above_period of 2 to strip out both silence periods and the first song.
When above_periods is non-zero, you must also specify a duration and threshold. Duration indications the amount of time that non-silence must be detected before it stops trimming data. By increasing the duration, burst of noise can be treated as silence and trimmed off.
Threshold is used to indicate what sample value you should treat as silence. For digital audio, a value of 0 may be fine but for audio recorded from analog, you may wish to increase ths value to account for background noise.
When optionally trimming silence from the end of a sound file, you specify a below_periods count. In this case, below_period means to remove all audio data after silence is detected. Normally, this will be a value 1 of but it can be increased to skip over periods of silence that are wanted. For example, if you have a song with 2 seconds of silence in the middle and 2 second at the end, you could set below_period to a value of 2 to skip over the silence in the middle of the audio file.
For below_periods, duration specifies a period of silence that must exist before data is not copied any more. By specifying a higher duration, silence that is wanted can be left in the audio. For example, if you have a song with an expected 1 second of silence in the middle and 2 seconds of silence at the end, a duration of 2 seconds could be used to skip over the middle silence.
Unfortunetly, you must know the length of the silence at the end of your audio file to trim off silence reliably. A work around is to use the silence effect in combination with the reverse effect. By first reversing the audio, you can use the above_periods to reliably trim all audio from what looks like the front of the file. Then reverse the file again to get back to normal.
To remove silence from the middle of a file, specify a below_periods that is negative. This value is then treated as a positive value and is also used to indicate the effect should restart processing as specified by the above_periods, making it suitable for removing periods of silence in the middle of the sound file.
The period counts are in units of samples. Duration counts may be in the format of hh:mm:ss.frac, or the exact count of samples. Threshold numbers may be suffixed iwth d, or % to indicate the value is in decibels or a percentage of maximum value of the sample value (0% specifies pure digital silence).
The "Volume Adjustment:" field in the statistics gives you the argument to the -v number which will make the sample as loud as possible without clipping.
The option -v will print out the "Volume Adjustment:" field's value only and return. This could be of use in scripts to auto convert the volume.
The -s n option is used to scale the input data by a given factor. The default value of n is the max value of a signed long variable (0x7fffffff). Internal effects always work with signed long PCM data and so the value should relate to this fact.
The -rms option will convert all output average values to root mean square format.
There is also an optional parameter -d that will print out a hex dump of the sound file from the internal buffer that is in 32-bit signed PCM data. This is mainly only of use in tracking down endian problems that creep in to SoX on cross-platform versions.