-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Open
Labels
Description
For certain input audio lengths, pitch calculation and MFCC calculation will produce different length results. See below for a minimal repro. Here is the output of this script: output.txt
#include "feat/feature-mfcc.h"
#include "feat/pitch-functions.h"
#include "feat/wave-reader.h"
int main() {
using namespace kaldi;
PitchExtractionOptions pitch_options;
ProcessPitchOptions opp;
WaveData wave;
MfccOptions mfcc_options;
Mfcc mfcc(mfcc_options);
for (int i = 400; i < 4000; i++) {
Vector<BaseFloat> waveform(i);
Matrix<BaseFloat> m1, m2;
ComputeAndProcessKaldiPitch(pitch_options, opp, waveform, &m1);
mfcc.Compute(waveform, 1.0, &m2);
if (m1.NumRows() != m2.NumRows()) {
KALDI_LOG << "I: " << i << " Pitch " << m1.NumRows() << " MFCC " << m2.NumRows();
}
}
}
Note that the phenomenon happens when approaching 600, 720, 880. I think the pattern is when
len(input) - frame_size) % frame_shift > 156
i.e. len(input) - 400) % 160 > 156
i.e. in the last 3 samples oof each frame calculation window (?)
Assuming these values are right, I think that the MFCC lengths are as expected, and pitch extraction sometimes returns one more frame than expected
Doesn't seem like a major bug, but caused some headaches for me when using the MFA library (ref)