Audio: MFCC: Use 32 bit FFT and Mel frequency scale filters for better precision#10750
Draft
singalsu wants to merge 20 commits intothesofproject:mainfrom
Draft
Audio: MFCC: Use 32 bit FFT and Mel frequency scale filters for better precision#10750singalsu wants to merge 20 commits intothesofproject:mainfrom
singalsu wants to merge 20 commits intothesofproject:mainfrom
Conversation
This patch updates the data clear and copy functions in mfcc_sink_copy_zero_s16() and mfcc_sink_copy_data_s16() with memset() and memcpy() instead looping sample by sample. The function mfcc_source_copy_s16() is moved to later under CONFIG_FORMAT_S16LE where it should be. There are no changes to the function itself. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
The memset() and memcpy() are as fast as HiFi data clear and copy functions so, the functions mfcc_sink_copy_zero_s16() and mfcc_sink_copy_data_s16() can be moved to mfcc_common.c. This change also will help with possible audio features output data format changes in future. The current data format as fake PCM stream may change to compress encode stream type. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Add S24_4LE and S32_LE processing functions for MFCC component. The new format variants convert input samples to internal 16-bit representation for FFT processing and expand cepstral output back to the sink format. Implementations are added for generic, HiFi3, and HiFi4 architectures. The source copy functions handle pre-emphasis filtering with the format conversion. The sink copy functions write 16-bit cepstral coefficients expanded to the 32-bit container format. The MFCC magic marker is written directly as a raw 32-bit value without format conversion. The function map in mfcc.c is updated to wire the new processing functions for S24_4LE and S32_LE formats. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
The configuration blob uses value -1 for for input channel select with mono format. This patch adds an error if the -1 is used for other than mono input stream. The low-information comp_info() trace print is moved a to better error message. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Add a mode where cepstral coefficients are not computed and the Mel frequency logarithm values are passed directly to the sink buffer. The mode is activated when sof_mfcc_config member num_ceps is set to zero. When num_ceps is zero: - DCT matrix and cepstral lifter are not allocated or initialized - The Mel log spectra (num_mel_bins values) are output to the sink instead of cepstral coefficients - A mel_only flag is added to mfcc_state for runtime path selection This is useful for applications that need Mel spectrogram features without the DCT transform, such as some neural network audio front-ends. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This change allows to have more than e.g. 30 ceps or Mel values plus magic sync value number in a single stereo 16 kHz 16 bit period. As much data can be packed as the FFT hop size and used sink format allows. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
The description for top_db was was wrong. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
The support for Hann window was missing from MFCC setup function. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
For compatibility with OpenVINO Whisper audio features this patch adds to function mfcc_stft_process() peak tracking of Mel spectra maximum in mel_only mode and clamp of Mel spectral values to found maximum minus config->top_db. The parameters for peak tracking and clamping are set via the configuration blob. The whisper audio features like absolute max behavior can be achieved with a mmax_coef zero. Then the mmax values rises to detected peak and remains there. The patch also adds normalization of Mel values with a configurable offset and scale. Whisper uses hard-coded values but making them configuration parameters from the blob is more flexible. The input parameter state is changed to struct mfcc_comp_data *cd to be able to access both state and configuration for the module. The ABI header user/mfcc.h is modified in a way that previous default operation for cepstral coefficients is not impacted. The new Mel only mode uses the added previous reserved fields in the configuration blob. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
There are several changes: - The topology v1 format blob export is removed. It updates the MFCC module blob default.conf and adds a new blob mel_spectrogram.conf for topology v2. - The script is organized to be able to output multiple blobs. - The topology sof-hda-benchmark-mfcc16/24/32.tplg is using stereo data format, so the blob configuration -1 for channels to assume mono is wrong in setup_mfcc.m. - A blob for Mel frequency scale logarithic spectrum output is added. It sets num_ceps to zero to indicate Mel mode for MFCC. The parameters are set for Whisper compatible audio features with 80 Mel bins, Hann -window, FFT size 400 (padded to 512) with hop of 160. - The missing export of mel_log (log/log10/db) and norm parameters (none/slaney) is added. - Parameters are added for compability with OpenVINO's Whisper audio features extractor. The Mel values are clamped vs. tracked Mel values maximum and existing top_db parameter and normalized with a configurable offset and scale. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch adds build of test topologies to test OpenVINO Whisper audio features extractor compatible setup for SOF MFCC. The topology names are sof-hda-benchmark-mfccmel16/24/32.tplg. The MFCC module is initialized to produce spectrogram data for 80 Mel frequency bands. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch contains several updates: - The run is with valgrind is added to catch memory leaks. - The script applied duplicate "-i" and "-o" arguments. They are removed from "OPT" variables. - The sof-testbench4 can't override the channels count in topology similarly as the IPC3 testbench could. Since the current topology is for stereo 16 kHz the input data and command line must be for such too. - To be able to compare MFCC output for successive runs, the "-R" option is added to run of sox audio convert utility to prevent e.g. randomization of dither. - The script converts input to s24 and s32 formats and runs them for easier check for correct operation with supported formats. The conversion is done from the s16 version to be able to compare the output audio features those should be the same if internal processing is 16 bit. - A run with Mel configured MFCC is added for s16/24/32 formats. - A script to decode and visualize Mel spectrogram data is added as decode_mel.m. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Change the Mel filterbank 32-bit variant psy_apply_mel_filterbank_32() output from int16_t Q9.7 (was wrongly commented as Q8.7) to int32_t Q9.23 format for improved signal resolution. The output parameter type is changed from int16_t* to int32_t* in both the implementation and the header declaration. The auditory unit test is updated to allocate int32_t output and convert Q9.23 to Q9.7 for comparison against existing reference vectors. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
The input samples must be shifted logically to sign bit and then shifted right arithmetically into place for the 16 bit saturation instruction to work correctly. This fixes a possible overflow with large input. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Remove the duplicate AE_MULFP32X16X2RS_H call in the 32-bit FFT path of mfcc_apply_window(). Its result was immediately overwritten by the AE_MULFP32X16X2RS_L call on the next line, making it dead code. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch switches MFCC_FFT_BITS from 16 to 32 to use 32-bit FFT mode for better precision in the MFCC processing pipeline. In cepstral mode (num_ceps > 0), the 32-bit Q9.23 Mel output from psy_apply_mel_filterbank_32() is converted to 16-bit Q9.7 before the existing 16-bit DCT calculation, preserving the current DCT and cepstral lifter behavior. In Mel-only mode, output format depends on sink format: - s16: Q9.7 (current format, backwards compatible) - s24: Q9.15 (one int32_t per Mel value) - s32: Q9.23 (full precision, one int32_t per Mel value) The mel_log_32 scratch buffer is placed after power_spectra in the fft_buf scratch area. A bounds check is added in mfcc_setup() to fail if num_mel_bins exceeds the available scratch space. The decode_mel.m Octave script is updated with s24 and s32 format support for the changed output encoding. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
When MFCC_FFT_BITS is 32, the HiFi3/4 mfcc_fill_fft_buffer() used AE_S16_0_XP to write 16-bit samples into 32-bit icomplex32 containers. This left the upper 16 bits of .real with stale data and .imag unzeroed, causing corrupted FFT input after the first frame when scratch buffers are reused for power_spectra and mel_log_32. Replace all platform-specific implementations with a single generic C version in mfcc_common.c. The function performs only data copying with no arithmetic, so HiFi intrinsics provide very little benefit. The new implementation uses conditional pointer types (int16_t for 16-bit FFT, int32_t for 32-bit FFT) with matching element stride, and relies on the caller's bzero of fft_buf to keep imaginary parts zero. Add missing icomplex16.h include to fft.h. The header uses struct icomplex16 in struct fft_plan but did not include its definition. After psy_apply_mel_filterbank_16() writes Q9.7 int16_t values to mel_spectra->data, convert to Q9.23 in mel_log_32 so that all downstream processing (dynamic mmax, clamping, scaling, DCT) works correctly in 16-bit FFT mode. Fix mel_log_32 scratch space check to use fft_buffer_size instead of assuming sizeof(icomplex32) per element, which overestimated available space by 2x in 16-bit mode. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
In 32-bit FFT mode the input data is 16-bit stored in the lower half of a 32-bit icomplex32 container. The AE_MULFP32X16X2RS_L intrinsic performs a Q1.31 x Q1.15 fractional multiply, so the 16-bit sample must first be shifted left by 16 to Q1.31 format. Without this shift the multiply treats the value as having 16 zero fractional bits, producing near-zero windowed output and a corrupt FFT result. Add the missing AE_SLAI32S(sample, 16) before the multiply in both HiFi3 and HiFi4 mfcc_apply_window() 32-bit paths, matching the generic C implementation. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Add missing cleanup for fft_plan. After mod_fft_plan_new() succeeds, failures in window setup and mel filterbank initialization jumped to free_fft_out, leaking the fft_plan. Add free_fft_plan label and route these error paths through it. Add missing cleanup for lifter.matrix. Late validation checks (mel_log_32 space, output capacity) jumped to free_dct_matrix, skipping the lifter matrix that may have been allocated. Add free_lifter label for these paths. Replace rfree() with mod_free() in all error cleanup labels to match the mod_zalloc() allocations and the existing mfcc_free_buffers() implementation. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Refactor run_mfcc.sh into functions for input conversion and testbench execution to reduce code duplication. Add Xtensa testbench support when XTENSA_PATH environment variable is set, producing xt_ prefixed output files. Add decode_all.m Octave script to decode and plot all MFCC cepstral and Mel spectrogram output files from run_mfcc.sh, including Xtensa variants. Update README.txt to document the current run_mfcc.sh output files, Xtensa support, and decode_all.m usage. Export XTENSA_PATH in rebuild-testbench.sh so that run_mfcc.sh can find the Xtensa toolchain path for the testbench build. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.