New bandwidth extension techniques for wideband perceptual audio coding

Reference

  1. Tomasz Żernicki,
    "Kompresja cyfrowych sygnałów fonicznych z łącznym wykorzystaniem rozszerzania widma i modelowania"
    (Digital sound compression including bandwidth extension and signal modeling), in Polish,
    Ph.D. dissertation, Poznan University of Technology, 2010.
    (41.3 MB)

For more information, contact the author at tomasz.zernicki_at_gmail.com

Introduction

The High Efficiency Profile of the MPEG-4 AAC standard (MPEG-4 AAC HE) as well as the forthcoming new standard MPEG-D USAC (Universal Speech and Audio coding) include a technique called Spectral Band Replication (SBR). The basic idea of SBR is based on the observation that the signal spectrum in the high-frequency (HF) bands is highly correlated with the signal spectrum in the low-frequency (LF) band. Therefore, it is possible to replace the HF signal components with a modified version of the LF band, avoiding the need to transmit the HF signals at all. Additionally, the encoder transmits a very small amount of control information which the decoder uses to shape the spectrum in the HF band. Moreover, the components representing sinusoids and noise that cannot be obtained by copying may be synthesized to limited extent in the SBR decoder.

In general, SBR reconstructs HF components by employing a patching algorithm, i.e. copying of LF spectral content in the time-frequency plane created through filterbank analysis of the reconstructed signal. In certain situations described below, the patching algorithm will not be able to reconstruct some important tonal components.

  1. If the signal has a prominent components with fundamental frequency near or above the fSBR. This includes highly pitched sounds, like orchestral bells, and other percussive instruments. In this case, no shifting or scaling is able to re-create such components in the SBR range. The SBR tool may use an additional technique called "sinusoidal coding" to inject a fixed sinusoidal component into a certain subband of the QMF filterbank. This component has a fixed frequency and amplitude, and a low frequency resolution and causes a significant discrepancy of timbre due to added inharmonicity.
  2. If the signal has a significantly varying frequency (e.g. vibrato modulation), its energy in the lower band is spread over a range of transform coefficients which are subsequently distorted by quantization. For low bit rates the local SNR becomes very low, and a partial that was originally purely tonal may not be considered as tonal any more. In such case, patching leads to additional artifacts, since the frequency modulations are not properly scaled (modulation depth does not increase with partial order).
Rozmiar: 64324 bajtów Rozmiar: 64324 bajtów

New HFR techniques

In this project, two new high frequency reconstruction (HFR) tools are proposed for improved coding of tonal HF components in audio compression employing the SBR tool:

Other references

  1. Tomasz Zernicki, Maciej Bartkowiak, Marek Domanski,
    "Improved coding of tonal components in audio techniques utilizing the SBR tool", ISO/IEC JTC1/SC29/WG11 MPEG 2010 / M17914, Geneva, Switzerland, July 2010

  2. Tomasz Żernicki, Marek Domański
    "Improved coding of tonal components in MPEG-4 AAC with SBR"
    16th European Signal Processing Conference, August 25-29, 2008, Lausanne, Switzerland
    [ Abstract TXT File Logo] [ Full paper PDF File Logo]

  3. Tomasz Żernicki, Maciej Bartkowiak
    "Audio bandwidth extension by frequency scaling of sinusoidal components", 125th Convention of the Audio Engineering Society, AES Preprint 7622, 2-5 October 2008, San Francisco, USA
    [ Abstract TXT File Logo] [ Full paper PDF File Logo]