Analysis of voices based on the Mel-frequency band

Analysis of voices based on the Mel-frequency band.
Goal: Identification of voices speaking (diarization) and calculation of speech partition (in %).


  • Collect voice data
  • Sample audio data of x speakers that talk y times to represent a round of people talking
  • Annotate samples with labels and merge audio file
  • Create train & test split of samples
  • Train unsupervised clustering module to detect number of people
  • Train supervised RNN classifier to determine who is speaking at time x


  • Convert files to .wav
  • Collect data via LibriSpeech voices library (audiofiles)
  • Extract x random speakers with y audio samples per speaker Result: Generated audio samples of length 30-60 seconds

Feature extraction:

  • Create mel-frequency spectrum for each audio file
  • Define overlapping feature window for training





To finish reading, please visit source site