New bandwidth extension techniques for wideband perceptual audio coding
- Tomasz Żernicki,
"Kompresja cyfrowych sygnałów fonicznych z łącznym wykorzystaniem rozszerzania widma i modelowania"
(Digital sound compression including bandwidth extension and signal modeling), in Polish,
Ph.D. dissertation, Poznan University of Technology, 2010.
For more information, contact the author at tomasz.zernicki_at_gmail.com
The High Efficiency Profile of the MPEG-4 AAC standard (MPEG-4 AAC HE) as well as the forthcoming new standard
MPEG-D USAC (Universal Speech and Audio coding) include a technique called Spectral Band Replication (SBR).
The basic idea of SBR is based on the observation that the signal spectrum in the high-frequency (HF) bands is
highly correlated with the signal spectrum in the low-frequency (LF) band. Therefore, it is possible to replace
the HF signal components with a modified version of the LF band, avoiding the need to transmit the HF signals
at all. Additionally, the encoder transmits a very small amount of control information which the decoder uses
to shape the spectrum in the HF band. Moreover, the components representing sinusoids and noise that cannot be
obtained by copying may be synthesized to limited extent in the SBR decoder.
In general, SBR reconstructs HF components by employing a patching algorithm, i.e. copying of LF spectral content
in the time-frequency plane created through filterbank analysis of the reconstructed signal.
In certain situations described below, the patching algorithm will not be able to reconstruct some important
- If the signal has a prominent components with fundamental frequency near or above the fSBR.
This includes highly pitched sounds, like orchestral bells, and other percussive instruments. In this case, no
shifting or scaling is able to re-create such components in the SBR range. The SBR tool may use an additional
technique called "sinusoidal coding" to inject a fixed sinusoidal component into a certain subband of the QMF
filterbank. This component has a fixed frequency and amplitude, and a low frequency resolution and causes a
significant discrepancy of timbre due to added inharmonicity.
- If the signal has a significantly varying frequency (e.g. vibrato modulation), its energy in the lower band
is spread over a range of transform coefficients which are subsequently distorted by quantization. For low
bit rates the local SNR becomes very low, and a partial that was originally purely tonal may not be considered
as tonal any more. In such case, patching leads to additional artifacts, since the frequency modulations are not
properly scaled (modulation depth does not increase with partial order).
New HFR techniques
In this project, two new high frequency reconstruction (HFR) tools are proposed for improved coding
of tonal HF components in audio compression employing the SBR tool:
- Tomasz Zernicki, Maciej Bartkowiak, Marek Domanski,
"Improved coding of tonal components in audio techniques utilizing the SBR tool",
ISO/IEC JTC1/SC29/WG11 MPEG 2010 / M17914, Geneva, Switzerland, July 2010
- Tomasz Żernicki, Marek Domański
"Improved coding of tonal components in MPEG-4 AAC with SBR"
16th European Signal Processing Conference, August 25-29, 2008, Lausanne, Switzerland
[ Abstract ]
[ Full paper ]
- Tomasz Żernicki, Maciej Bartkowiak
"Audio bandwidth extension by frequency scaling of sinusoidal components",
125th Convention of the Audio Engineering Society, AES Preprint 7622, 2-5 October 2008, San Francisco, USA
[ Abstract ]
[ Full paper ]