This is a support page to deliver additional information and demonstrations related to the paper:

**ANALYSIS/SYNTHESIS OF HARMONIC SOUNDS BASED ON AM/FM SCALING OF A PROTOTYPE SIGNAL**

submitted to IEEE Transactions on Audio, Speech and Language Processing.

** Abstract** The paper describes an enhancement of the harmonic sinusoidal modeling technique
that extends its capabilities of efficiently representing sounds which are not purely tonal.
An improvement is achieved in reconstruction accuracy thanks to introduction of additional
modulating components to the partial instantaneous frequencies and instantaneous amplitudes.
For this purpose, a classical heterodyne analysis technique is combined with principal component
analysis of partials. The sound is represented by a discrete harmonic envelope and a narrowband
prototype signal that carries the modulations. We show by experiments that the model is capable of
reproducing transients and a significant part of mechanical noise in musical sounds and yields
good subjective quality at SNR of 15dB to 26dB.

In the paper we propose a model for harmonic audio signals that may be used in object-based **low bit rate
coding**. The high level, low-resolution spectral characteristics of the signal are represented by a
two-dimensional harmonic envelope (HE, shown below, middle), a complex-valued structure similar
(but not exactly the same) to the data delivered by a harmonic phase vocoder by Beauchamp and harmonic
sinusoidal model by Serra. A high temporal resolution is offered thanks to the second part of the model,
the prototype signal (below, right).

** Advantages:** The fundamental advantage of the above representation is that these two parts are much
less complex signals than the original signal and they can be very efficiently encoded. In the paper we describe
the analysis and synthesis process in detail. On this web page we demonstrate that our model is capable of
representing accurately the significant acoustic features of the sound: its pitch, texture, timbre, and tonal/noisy
character. In a traditional SM this would require a high data rate (a small time shift between consecutive
frames, or an excessively high number of partials), which is prohibitive in compression applications.
In our model the data rate is low (partial control parameters are subsampled typically 1:500 or 1:1000), however
the inclusion of the prototype signal (which is mostly narrow-band) allows to efficiently represent the dominant
low-level fluctuations of instantaneous parameters (IF and IA) which are responsible for the acoustic features
mentioned above.

** How it works:** The core of the system is a

Each of the output complex-valued signals from the analyzer is subsequently decimated (1:R). This low-resolution information is stored in the

For each of the baseband signals, its high frequency residual (the remainder after subtracting an upsampled previously
decimated signal) is obtained (above, right). These residuals for a single sound are subject to Principal Component
Analysis (PCA). The aim of this analysis is to identify and extract the common residual AM modulation that enhances
the representation stored in the HE.
PCA represents the collection of its input residual signals a_{k}[n] as a linear combination of other signals
which we may consider as a local orthogonal base. Only one of these signals (corresponding to the principal eigenvector
of the covariance matrix) is preserved. We denote it by a_{0}[n]. In order to reverse the linear transformation
in the decoder we also need a short vector of complex-valued scaling constants, __G___{1}.

The final **prototype signal** is a product of AM and FM modulation. The FM term of this signal is defined by IF_{0}[n],
and the AM term is defined by the real-valued PCA output which is offset by an arbitrary constant (b > max|a_{0}[n]|) that
prevents the amplitude from going below zero (over-modulation). This simple trick allows the both terms to be
separated during demodulation. An illustrative example below (left) shows the idea (not to scale). Real examples
of the prototype IF and IA are also shown below (middle and right).

** Reconstruction** is a straightforward process. The reconstructed sound is generated by the means of additive
synthesis, similarly as in the case of a normal sinusoidal model. Individual partials are synthesized as a product of AM and FM
modulations and are composed of two complex exponentials. One exponential is the upsampled corresponding row of the HE. The second
exponential is obtained by appropriate scaling of the IF and IA recovered from the prototype. The IF is scaled simply by the partial
order, k. The IA is scaled by the appropriate element of the scaling vector,

For this purpose, the IF and IA have to be recovered from the prototype signal. This may be performed quite reliably using the standard Gabor method employing Hilbert transform, since the prototype is narrow-band and mono-component. More precisely speaking, we do not estimate the instantaneous frequency (IF), which would involve phase unwrapping. For synthesis of partials we simply measure the instantaneous phase of the prototype φ

**For demonstrations, please select in the menu on the left.**