Chair of Multimedia Telecommunications and Microelectronics - Audio Research Group

SinMod - hybrid sinusoidal audio modeling toolbox for Matlab

About Sinusoidal Modeling

Sinusoidal modeling (SM) and hybrid sinusoidal + transient + noise modeling (STN) are well established signal processing frameworks applicable to speech and audio analysis, time and pitch scaling, enhancement, restoration, source separation, automatic recognition, watermarking, compression, and synthesis [1-11]. Within a STN model, the signal is modeled as a sum of transients (a sequence of damped sinusoids or short-time impulses), quasi-sinusoids with continuously varying magnitudes and frequencies (called the deterministic component), and a stochastic component (noise), whose short-time power spectra envelope changes over time. In order to achieve such representation, the input signal is decomposed into impulse-like wideband components, narrow-band components, and the residual. For sinusoidal analysis, the signal is split to overlapping segments called frames, its magnitude spectrum is computed in each frame, quasi-sinusoidal partials are detected as distinctive peaks, and corresponding parameters (frequency, amplitude and phase) are estimated. Subsequently, these peaks are linked between neighboring frames, by a tracking algorithm that takes into account various continuity criteria and forms sinusoidal trajectories. A reconstruction of signal from the trajectory data is possible by generating quasi-sinusoidal components in a sample by sample manner, with continuously variable parameters that are interpolated between the estimated and linked parameters. This reconstructed signal may by subtracted from the original signal in order to obtain a residual. The residual contains mostly non-sinusoidal part of the signal and is usually modeled as a non-stationary noise with varying intensity and spectral envelope. The spectral envelope may be modeled as an all-pole filter response, e.g. by the use of the LPC technique.

Developers:

Maciej Bartkowiak, mbartkow_at_multimedia.edu.pl
Tomasz Żernicki, tomasz.zernicki_at_zylia.pl
Łukasz Januszkiewicz, ljanuszkiewicz_at_multimedia.edu.pl

References:

  1. T.F. Quatieri, R. J. McAulay, "Speech analysis/synthesis based on a sinusoidal representation", IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 34, no. 4, 1986.
  2. J.O.Smith, X.Serra, "PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation", Int. Computer Music Conference, 1987
  3. G. Peeters, X. Rodet, "SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum", Proc. International Computer Music Conference (ICMC), Beijing.
  4. K. Fitz, L. Haken, "Sinusoidal modeling and manipulation using Lemur", Computer Music Journal, 20:4, Winter 1996
  5. M. Macon, L. Jensen, J. Oliverio, M. Clements, E. George, "A Singing Voice Synthesis System Based on Sinusoidal Modeling", Proc. ICASSP'97, vol. 1, 1997
  6. T. Verma, T.H.Y. Meng, "Time Scale Modification Using a Sines + Transients + Noise Model", Proc. DAFx'98, Barcelona, 1998
  7. H. Purnhagen, B. Edler, Ch. Ferekidis, "Object-Based Analysis/Synthesis Audio Coder for Very Low Bit Rates", Paper 4747, Proc. 104th AES Convention, 1998
  8. T. Tolonen, "Methods for Separation of Harmonic Sound Sources Using Sinusoidal Modeling", Paper 4958, Proc. 106th AES Convention, May 1999
  9. L. Girin, S. Marchand, "Watermarking of speech signals using the sinusoidal model and frequency modulation of the partials", Proc. ICASSP'04, vol.1, 2004
  10. J. Jensen, J.H.L. Hansen, "Speech enhancement using a constrained iterative sinusoidal model", IEEE Trans. SAP, vol.9, no.7, 2001
  11. T.Heittola, A.Klapuri, T.Virtanen, "Musical Instrument Recognition in Polyphonic Audio Using Source-Filter Model for Sound Separation", Proc. ISMIR 2009

About the toolbox

The hybrid sinusoidal modeling Matlab toolbox offered on this webpage is an open-source project intended strictly for research and educational purposes. The package is currently under development. The current version is a second release, however occasional bugs may still occur. We are open for any comments, bug reports and suggestions of further development.

The SinMod toolbox allows to perform various audio signal modeling tasks:

The most important features of this implementation are:

The toolbox features also an advanced multi-criterion and heavily parameterized offline tracking utility, that combines a broad set of rules for track continuation and birth/death, in the sake of obtaining possibly best sinusoidal tracking results. Accurate and reliable tracking is essential for re-synthesis with a high-quality (or even transparent quality), but is also very important in various signal analysis applications, wherein no re-synthesis is performed.

Haffner: spectrogram of the original signal (detail) Haffner: spectrogram of the resynthesized signal (detail)
Salvation: spectrogram of the original signal Salvation: spectrogram of the resynthesized signal
after 150% time stretching of all model data
Harpsichord: spectrogram of the original signal (fragment) Harpsichord: spectrogram of the resynthesized signal after 200% time stretching of all model data
Badu: spectrogram of the original signal Badu: spectrogram of the resynthesized signal
after 10% pitch shifting of all model data

Download

Download the toolbox package (124kB).
Download a set of important utilities (72kB).

The SinMod Toolbox requires MATLAB 7.6 (R2008a or later) and the Signal Processing toolbox.
In addition, to access all functions of the Toolbox, you will need the following external public domain toolboxes and utilities:

Should you encounter any problems with getting hold of these, do not hesitate to contact us.

How to use for plain analysis-transformation-synthesis

A complete sequence of audio signal analysis and synthesis is controlled by a script function

[synsig,model] = model_ansyn(signal,config,pitch,speed)

The required arguments are: a vector of samples (signal), and a structure of configuration parameters (config). The remaining two optional arguments define simple transformations applied during synthesis: the pitch scaling ratio and time stretching ratio. The output data are resynthesized signal (synsig), and a structure (model) with all parameters and data allowing to re-synthesize the signal.

The configuration structure should have a strictly defined list of fields. To prepare such a structure, call

config = model_conf(freq_res,time_res)

This will automatically set up the config structure for you, with parameter values defined inside model_conf.m. You may edit model_conf.m and modify these parameter values. This function has also two optional parameters, that will modify the default settings for analysis frame length and offset (hop). After calling model_conf, you may also modify any parameter value from command line.

The call of model_ansyn will perform a sequence of consecutive analysis-synthesis steps:

The optional output of model_ansyn is a structure of model parameters, featuring the following fields:

trans: [struct] which contains sets of parameters for each individual transient:
t_peak, t_start - specify the location of transient in samples
env_b, env_chi - envelope shape parameters
freqs, ampls, phase - parameters of the underlying modulating sinusoids

data: [struct] which contains a set of full matrices with raw data from signal analysis:
f, a, ph, fm, am, cnf, g, f0

trj: [struct] which contains a set of sparse matrices with parameters arranged in sinusoidal trajectories:
f, a, ph, fm, am, cnf, z, g

noise: [struct] which contains spectral model parameters of the noise residual
lpc_coeffs: [double matrix] LPC/WLPC parameters for each frame
lpc_gain: [double vector] which contains noise energy in each frame

In the above list, matrices with sinusoidal data represent respectively:

f - frequencies (cps, 0..0.5, linear scale, normalized w.r.t. sampling frequency)
a - amplitudes (0...1, in linear scale)
ph - phase (the principal angle)
fm - frequency chirp rate (fraction of cps per sample)
am - amplitude slew rate (dB per sample)
cnf - likelihood measure of detection of sinusoid
g - group membership information (used in harmonic analysis of groups of partials)
z - binary flags of zombie nodes in trajectories

To display the data, you may just use the plot command, for example:

plot(model.data.f','.');      or
model.trj.f(model.trj.f==0)=NaN; plot(model.trj.f','.-');



For more advanced study of sinusoidal data, trajectory data, zombie nodes, etc, it is always advantageous to show them in the context of the signal spectrogram. For this purpose, you may use

sinmod_compare(signal,frame_length,frame_hop,data1,style1,data2,style2,...)

Sinmod_compare draws your data similarly to the plot command, on the background of the signal spectrogram. It is important to give a proper length and hop parameter, as used in the sinusoidal analysis, since these parameter are crucial for the alignment of data points to the centers of frames. The advantage of sinmod_compare over plot is also that it takes care about the zero-valued entries in trajectories which are not shown. Apart from typical drawing style descriptions (like 'k.', 'w.-', '*', etc), there is an additional line style defined by a 'o-x' string, that makes each trajectory appear as a line starting with o (birth) and ending with a x (death).













PLEASE NOTE: THIS PAGE IS UNDER CONSTRUCTION