Skip to content

Pitch and MFCC output lengths differ for same input audio #4960

@scottbreyfogle

Description

@scottbreyfogle

For certain input audio lengths, pitch calculation and MFCC calculation will produce different length results. See below for a minimal repro. Here is the output of this script: output.txt

#include "feat/feature-mfcc.h"
#include "feat/pitch-functions.h"
#include "feat/wave-reader.h"

int main() {
  using namespace kaldi;
  PitchExtractionOptions pitch_options;
  ProcessPitchOptions opp;
  WaveData wave;
  MfccOptions mfcc_options;
  Mfcc mfcc(mfcc_options);

  for (int i = 400; i < 4000; i++) {
    Vector<BaseFloat> waveform(i);
    Matrix<BaseFloat> m1, m2;
    ComputeAndProcessKaldiPitch(pitch_options, opp, waveform, &m1);
    mfcc.Compute(waveform, 1.0, &m2);
    if (m1.NumRows() != m2.NumRows()) {
      KALDI_LOG << "I: " << i << " Pitch " << m1.NumRows() << " MFCC " << m2.NumRows();
    }
  }
}

Note that the phenomenon happens when approaching 600, 720, 880. I think the pattern is when
len(input) - frame_size) % frame_shift > 156
i.e. len(input) - 400) % 160 > 156
i.e. in the last 3 samples oof each frame calculation window (?)
Assuming these values are right, I think that the MFCC lengths are as expected, and pitch extraction sometimes returns one more frame than expected

Doesn't seem like a major bug, but caused some headaches for me when using the MFA library (ref)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugstaleStale bot on the loose

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions