Sinusoidal modeling (SM) and hybrid sinusoidal + transient + noise modeling (STN) are well established signal processing frameworks applicable to speech and audio analysis, time and pitch scaling, enhancement, restoration, source separation, automatic recognition, watermarking, compression, and synthesis [1-11]. Within a STN model, the signal is modeled as a sum of transients (a sequence of damped sinusoids or short-time impulses), quasi-sinusoids with continuously varying magnitudes and frequencies (called the deterministic component), and a stochastic component (noise), whose short-time power spectra envelope changes over time. In order to achieve such representation, the input signal is decomposed into impulse-like wideband components, narrow-band components, and the residual. For sinusoidal analysis, the signal is split to overlapping segments called frames, its magnitude spectrum is computed in each frame, quasi-sinusoidal partials are detected as distinctive peaks, and corresponding parameters (frequency, amplitude and phase) are estimated. Subsequently, these peaks are linked between neighboring frames, by a tracking algorithm that takes into account various continuity criteria and forms sinusoidal trajectories. A reconstruction of signal from the trajectory data is possible by generating quasi-sinusoidal components in a sample by sample manner, with continuously variable parameters that are interpolated between the estimated and linked parameters. This reconstructed signal may by subtracted from the original signal in order to obtain a residual. The residual contains mostly non-sinusoidal part of the signal and is usually modeled as a non-stationary noise with varying intensity and spectral envelope. The spectral envelope may be modeled as an all-pole filter response, e.g. by the use of the LPC technique.

Developers:

Maciej Bartkowiak, **mbartkow_at_multimedia.edu.pl**

Tomasz Żernicki, **tomasz.zernicki_at_zylia.pl**

Łukasz Januszkiewicz, **ljanuszkiewicz_at_multimedia.edu.pl**

References:

- T.F. Quatieri, R. J. McAulay, "Speech analysis/synthesis based on a sinusoidal representation", IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 34, no. 4, 1986.
- J.O.Smith, X.Serra, "PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation", Int. Computer Music Conference, 1987
- G. Peeters, X. Rodet, "SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum", Proc. International Computer Music Conference (ICMC), Beijing.
- K. Fitz, L. Haken, "Sinusoidal modeling and manipulation using Lemur", Computer Music Journal, 20:4, Winter 1996
- M. Macon, L. Jensen, J. Oliverio, M. Clements, E. George, "A Singing Voice Synthesis System Based on Sinusoidal Modeling", Proc. ICASSP'97, vol. 1, 1997
- T. Verma, T.H.Y. Meng, "Time Scale Modification Using a Sines + Transients + Noise Model", Proc. DAFx'98, Barcelona, 1998
- H. Purnhagen, B. Edler, Ch. Ferekidis, "Object-Based Analysis/Synthesis Audio Coder for Very Low Bit Rates", Paper 4747, Proc. 104th AES Convention, 1998
- T. Tolonen, "Methods for Separation of Harmonic Sound Sources Using Sinusoidal Modeling", Paper 4958, Proc. 106th AES Convention, May 1999
- L. Girin, S. Marchand, "Watermarking of speech signals using the sinusoidal model and frequency modulation of the partials", Proc. ICASSP'04, vol.1, 2004
- J. Jensen, J.H.L. Hansen, "Speech enhancement using a constrained iterative sinusoidal model", IEEE Trans. SAP, vol.9, no.7, 2001
- T.Heittola, A.Klapuri, T.Virtanen, "Musical Instrument Recognition in Polyphonic Audio Using Source-Filter Model for Sound Separation", Proc. ISMIR 2009

The **hybrid sinusoidal modeling Matlab toolbox** offered on this webpage is an open-source project
intended strictly for research and educational purposes. The package is currently
under development. The current version is a second release, however occasional bugs
may still occur. We are open for any comments, bug reports and suggestions of further development.

The SinMod toolbox allows to perform various audio signal modeling tasks:

- General sinusoidal (unconstrained) and harmonic sinusoidal analysis/synthesis
- Hybrid (sinusoidal + transients + noise) analysis/synthesis with optional pitch and speed manipulations
- Estimation and tracking of single and multiple fundamental frequencies

The most important features of this implementation are:

**High quality**- the reconstructed audio quality is near transparent even for complicated wideband music, which makes it stand up well against other implementations intended mostly for speech and simple sounds**Modularity**- it consists of a number of independent modules performing separate tasks of signal modeling: sinusoidal analysis, tracking, transient detection, spectral envelope estimation, etc.**Run-time optimization**- it is optimized for fast calculations and__features a system of data caching__which allows to avoid repeated computations on identical data**Structured data**- the input/output data and parameters are organized in simple structures which can be easily studied or visualized, however the SDIFF file format is not supported (yet)

The toolbox features also an advanced multi-criterion and heavily parameterized offline tracking utility, that combines a broad set of rules for track continuation and birth/death, in the sake of obtaining possibly best sinusoidal tracking results. Accurate and reliable tracking is essential for re-synthesis with a high-quality (or even transparent quality), but is also very important in various signal analysis applications, wherein no re-synthesis is performed.

Haffner: spectrogram of the original signal (detail) | Haffner: spectrogram of the resynthesized signal (detail) |

Salvation: spectrogram of the original signal | Salvation: spectrogram of the resynthesized signal after 150% time stretching of all model data |

Harpsichord: spectrogram of the original signal (fragment) | Harpsichord: spectrogram of the resynthesized signal after 200% time stretching of all model data |

Badu: spectrogram of the original signal | Badu: spectrogram of the resynthesized signal after 10% pitch shifting of all model data |

Download the toolbox package (124kB).

Download a set of important utilities (72kB).

The SinMod Toolbox requires MATLAB 7.6 (R2008a or later) and the Signal Processing toolbox.

In addition, to access all functions of the Toolbox, you will need the following external public domain toolboxes and utilities:

- WarpTB - Matlab Toolbox for Warped DSP by Aki Härmä and Matti Karjalainen
- YIN fundamental frequency estimator by Alain de Cheveigné
- linframe, winit and few other vectorized data manipulation utilities by Marios Athineos

Should you encounter any problems with getting hold of these, do not hesitate to contact us.

A complete sequence of audio signal analysis and synthesis is controlled by a script function

`[synsig,model] = model_ansyn(signal,config,pitch,speed)`

The required arguments are: a vector of samples (signal), and a structure of configuration parameters (config). The remaining
two optional arguments define simple transformations applied during synthesis: the pitch scaling ratio and time stretching ratio.
The output data are resynthesized signal (synsig), and a structure (model) with all parameters and data allowing to re-synthesize
the signal.

The configuration structure should have a strictly defined list of fields. To prepare such a structure, call

`config = model_conf(freq_res,time_res)`

This will automatically set up the config structure for you, with parameter values defined **inside model_conf.m**.
You may edit model_conf.m and modify these parameter values. This function has also two optional parameters,
that will modify the default settings for analysis frame length and offset (hop). After calling model_conf,
you may also modify any parameter value from command line.

The call of **model_ansyn** will perform a sequence of consecutive analysis-synthesis steps:

- Transient detection (if transient modeling enabled)
- Transient analysis (estimation of transient parameters)
- Transient synthesis
- Subtraction of the synthesized transients from the original signal
- Fundamental frequency estimation (if the analysis type is 'harmonic', or tracking is based on harmonic groups)
- Sinusoidal analysis or Harmonic analysis (depending on the analysis type option)
- Tracking of fundamental frequencies (if harmonic groups enabled)
- Tracking of sinusoidal trajectories or harmonic groups (depending on the analysis type option)
- Simplification of the model by merging of nearby trajectories (optional)
- Re-estimation of sinusoidal parameters on the basis of time-varying model data (optional)
- Synthesis of sinusoids from unmodified data (optional)
- Calculating of noise residual by spectral masking or plain subtraction (optional, depending on settings)
- Spectral modeling of the residual by applying the LPC or WLPC technique (optional)
- Modification (pitch and speed scaling) of sinusoidal and noise data (optional)
- Synthesis of modified sinusoids (if required by the above)
- Re-synthesis of noise (optional)
- Mixing of synthesized sinusoids, transients, and synthesized noise

The optional output of model_ansyn is a structure of model parameters, featuring the following fields:

`trans: [struct]`

which contains sets of parameters for each individual transient:

`t_peak, t_start`

- specify the location of transient in samples

`env_b, env_chi`

- envelope shape parameters

`freqs, ampls, phase`

- parameters of the underlying modulating sinusoids

`data: [struct]`

which contains a set of full matrices with raw data from signal analysis:

`f, a, ph, fm, am, cnf, g, f0`

`trj: [struct]`

which contains a set of sparse matrices with parameters arranged in sinusoidal trajectories:

`f, a, ph, fm, am, cnf, z, g`

`noise: [struct]`

which contains spectral model parameters of the noise residual

`lpc_coeffs: [double matrix]`

LPC/WLPC parameters for each frame

`lpc_gain: [double vector] which contains noise energy in each frame`

In the above list, matrices with sinusoidal data represent respectively:

f - frequencies (cps, 0..0.5, linear scale, normalized w.r.t. sampling frequency)

a - amplitudes (0...1, in linear scale)

ph - phase (the principal angle)

fm - frequency chirp rate (fraction of cps per sample)

am - amplitude slew rate (dB per sample)

cnf - likelihood measure of detection of sinusoid

g - group membership information (used in harmonic analysis of groups of partials)

z - binary flags of zombie nodes in trajectories

To display the data, you may just use the `plot`

command, for example:

`plot(model.data.f','.');`

or

`model.trj.f(model.trj.f==0)=NaN; plot(model.trj.f','.-'); `

For more advanced study of sinusoidal data, trajectory data, zombie nodes, etc, it is always advantageous
to show them in the context of the signal spectrogram. For this purpose, you may use

`sinmod_compare(signal,frame_length,frame_hop,data1,style1,data2,style2,...)`

Sinmod_compare draws your data similarly to the plot command, on the background of the signal spectrogram.
It is important to give a proper length and hop parameter, as used in the sinusoidal analysis, since these
parameter are crucial for the alignment of data points to the centers of frames. The advantage of sinmod_compare
over plot is also that it takes care about the zero-valued entries in trajectories which are not shown.
Apart from typical drawing style descriptions (like 'k.', 'w.-', '*', etc), there is an additional line style
defined by a 'o-x' string, that makes each trajectory appear as a line starting with o (birth) and ending with
a x (death).