22/11/04, pm-n.

pVoc.

Steffen Brandorff
( sbrand@imv.au.dk )
Department of Information- and Media Science

Peter Møller-Nielsen
( pmn@daimi.au.dk )
Department of Computer Science

pVoc is a set of MATLAB -functions for manipulation of digitized sounds in the frequency domain. The functions form a testbed for new applications and algorithms. Most of the functions are designed to be building blocks for a variety of applications and experiments. However, a few applications are included to exemplify the use of the building blocks.
The section Getting started explains the most common use of the functions by means of a sample session. Sound examples are included. Some of the algorithms - especially those for conversion between time- and frequency domain - are described in [1]. The use of MATLAB puts a limit on the speed of the applications and the duration of the manipulated sounds. It takes typically around 18 sec. to produce 10 sec. of sound. An application must be reprogrammed in C if speed or capacity is essential.

[1]: Steffen Brandorff, Peter Møller-Nielsen,
Sound Manipulation in the Frequency Domain.
( Rev. 4/12/02 )


Getting started:

Down-load the MATLAB-functions and sounds from THIS page, and select pVoc-tar as the working directory.

You start by typing:

matlab
pVoc;

Time-stretching with a constant stretch factor.

The file named mozart.wav contains the first 12.4 sec. of a Mozart string quartet in wave-format. You can hear it HERE. You can stretch that sound by a factor 1.3 by typing:

stretchFile( 'mozart.wav', 'mozartOut1.wav', 1.3 );

The stretched sound is now available in the file named mozartOut1.wav. You can hear mozartOut1.wav HERE. If you only want to stretch from 1.5 sec. to 3 sec. of the sound in mozart.wav, then you should type:

stretchFile( 'mozart.wav', 'mozartOut2.wav', 1.3, 3.0, 7.3 );
You can hear mozartOut2.wav HERE.

Stretching or compressing with a very large or very small stretch factor can produce surprising results. Here is an example:
This is a few bars played by a bodhran. This is the result of stretching by a factor 0.7 ( i.e. a compression ). The tempo is higher but the character of the rythm ( four beats in a bar ) is maintained. This is the result of stretching by a factor 0.3. The tempo is still higher, but now there are three beats in a bar.

Time stretching with a time dependent stretch factor.

This variant of time-stretching is useful if you want to leave the attack of a sound untouched and only scale the steady-state. You define the variation of the stretch factor by means of a table which maps moments in the out-sound to moments in the in-sound. If you type:

time_map = [ 0, 3; 3, 6; 4, 6.6; 5, 6.9; 6, 7; 8, 7 ];
sweepFile( 'mozart.wav', 'mozartOut3.wav', time_map );

then mozartOut3.wav contains a sound lasting 8 sec. produced by a sweep over the interval from 3 to 7 sec. in the mozart.wav. You can hear mozartOut3.wav HERE. In the beginning the scale factor is 1, but after 3 sec. it starts to grow and reaches infinity after 6 sec.
sweepFile produces a graphical representation of the map. This is shown below:

The sweep can be backward as well as forward or a mixture. If you type:

  time_map=[0, 3; 3, 6; 4, 6.6; 5, 6.9; 6, 7; 8, 7; ...
            9, 6.75; 10, 6.25; 12, 4.35; 13, 4.15; 14, 4.1; 16, 4.1];
sweepFile( 'mozart.wav', 'mozartOut4.wav', time_map );

Your map now looks like:

You can hear mozartOut4.wav HERE. The first 8 sec. of mozartOut4.wav are the same as before, but then the sweep starts to move backward. After 12 sec. it slows down again and eventually it "freezes" after 14 sec. In total mozartOut4.wav lasts for 16 sec.

Morphing.

The file named violinStart.wav contains the first 0.2 sec. of a sound produced by a violin. It sounds like THIS. The file named trumpetEnd.wav contains the last 1.28 sec. of a sound produced by a trumpet. It sounds like THIS.

  morphFiles( 'violinStart.wav', 'trumpetEnd.wav', 'vioPet30.wav', 3.0 );

produces a sound consisting of the two sounds separated by 3 sec. in which the first sound evolves into the second sound. This is called morphing. You can hear vioPet30.wav HERE

  morphFiles( 'violinStart.wav', 'trumpetEnd.wav', 'vioPet05.wav', 0.5 );

  morphFiles( 'violinStart.wav', 'trumpetEnd.wav', 'vioPet10.wav', 1.0 );

and

  morphFiles( 'violinStart.wav', 'trumpetEnd.wav', 'vioPet50.wav', 5.0 );

produce sounds with a separation of 0.5 sec., 1 sec. and 5 sec. You can hear vioPet05.wav HERE, You can hear vioPet10.wav HERE and You can hear vioPet50.wav HERE. Afterwards, you can change the smoothness of the generated central piece by using time stretching with a varying stretch factor. In this way you can change the relative duration af the various subsections of the central piece. The techniques used in morphFiles(...) are described in [1]. They are rather primitive in the sense, that only the last burst of violinStart.wav and the first of trumpetEnd.wav are used to generate the central piece. This implies that vibrato, tremolo and similar time-varying qualities of violinStart.wav and trumpetEnd.wav are not recognized, and a kind of lifelessness is often inflicted on the central piece.

How to modify the processing.

The processing in stretchFile, sweepFile and morphFiles is controlled by a lot of parameters. They control the size of a burst, the size of a frame ( the spectral representation of a burst ), the shape of the window ( Hanning, Hamming etc. ), the size of the overlap between consecutive bursts etc. They also control strategies for the synthesis, i.e. the conversion from the frequency domain to the time domain. The various parameters and strategies are explained in [1].
Initially the parameters are assigned some reasonable default values. However, the value of a parameter can be changed by executing the function setGlobal('param', val). 'param' identifies the parameter to be changed and val is the new value, e.g.

  setGlobal('burstSize', 0.03)

changes the size of a burst to 30 msec and the previous value is printed. Below is a list of some parameters that can be changed in this way:

  setGlobal('burstSize', 0.04 )
    The length of a sound burst ( often called a 'window' )
    is set to 0.04 sec.

  setGlobal('windowType', 'blackman' )
    The function used to shape the burst is set to be the
    blackman-function.

  setGlobal('Soverlap', 16 )
    The overlap ( i.e. the number of bursts that contribute to each
    sample ) during synthesis is set to 16.
A complete list of parameters can be printed by typing

  setGlobal

The current setting of parameters can be checked by typing:

  printEnvironment

The default values of the parameters are chosen to benefit sounds with a harmonic character. This may not be the best setting for a percussive sound. Here is an example:
This is a few bars played by a bodhran. If it is stretched by a factor 1.2 using default parameter values it sounds like this. A kind of pre-echo is clearly heard on the first beat. Using a shorter window ( 23 msec. instead of 46 msec. ) and subdividing the spectrum into 5 sections instead of 50 when transformed back into the time-domaine, it sounds like this. If you only change the number of subsections ( from 50 to 5 ) you get this. Changing only the size of the window ( from 46 msec. to 23 msec. ) you get this.