Skip to content

RainbowZL0/FCPE

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TorchFCPE

arXiv 

Overview

TorchFCPE(Fast Context-based Pitch Estimation) is a PyTorch-based library designed for audio pitch extraction and MIDI conversion. This README provides a quick guide on how to use the library for audio pitch inference and MIDI extraction.

Note: that the MIDI extractor of FCPE is quantized from f0 using non neural network methods

Note: I won't be updating FCPE (or benchmark) so soon, but I will definitely release a version with cleaned-up code by no later than next year.

Installation

Before using the library, make sure you have the necessary dependencies installed:

pip install torchfcpe

Usage

1. Audio Pitch Inference

from torchfcpe import spawn_bundled_infer_model
import torch
import librosa

# Configure device and target hop size
device = 'cpu'  # or 'cuda' if using a GPU
sr = 16000  # Sample rate
hop_size = 160  # Hop size for processing

# Load and preprocess audio
audio, sr = librosa.load('test.wav', sr=sr)
audio = librosa.to_mono(audio)
audio_length = len(audio)
f0_target_length = (audio_length // hop_size) + 1
audio = torch.from_numpy(audio).float().unsqueeze(0).unsqueeze(-1).to(device)

# Load the model
model = spawn_bundled_infer_model(device=device)

# Perform pitch inference
f0 = model.infer(
    audio,
    sr=sr,
    decoder_mode='local_argmax',  # Recommended mode
    threshold=0.006,  # Threshold for V/UV decision
    f0_min=80,  # Minimum pitch
    f0_max=880,  # Maximum pitch
    interp_uv=False,  # Interpolate unvoiced frames
    output_interp_target_length=f0_target_length,  # Interpolate to target length
)

print(f0)

2. MIDI Extraction

# Extract MIDI from audio
midi = model.extact_midi(
    audio,
    sr=sr,
    decoder_mode='local_argmax',  # Recommended mode
    threshold=0.006,  # Threshold for V/UV decision
    f0_min=80,  # Minimum pitch
    f0_max=880,  # Maximum pitch
    output_path="test.mid",  # Save MIDI to file
)

print(midi)

Notes

  • Inference Parameters:

    • audio: Input audio as a torch.Tensor.
    • sr: Sample rate of the audio.
    • decoder_mode (Optional): Mode for decoding, 'local_argmax' is recommended.
    • threshold (Optional): Threshold for voice/unvoiced decision; default is 0.006.
    • f0_min (Optional): Minimum pitch value; default is 80 Hz.
    • f0_max (Optional): Maximum pitch value; default is 880 Hz.
    • interp_uv (Optional): Whether to interpolate unvoiced frames; default is False.
    • output_interp_target_length (Optional): Length to which the output pitch should be interpolated.
  • MIDI Extraction Parameters:

    • audio: Input audio as a torch.Tensor.
    • sr: Sample rate of the audio.
    • decoder_mode (Optional): Mode for decoding; 'local_argmax' is recommended.
    • threshold (Optional): Threshold for voice/unvoiced decision; default is 0.006.
    • f0_min (Optional): Minimum pitch value; default is 80 Hz.
    • f0_max (Optional): Maximum pitch value; default is 880 Hz.
    • output_path (Optional): File path to save the MIDI file. If not provided, only returns the MIDI structure.
    • tempo (Optional): BPM for the MIDI file. If None, BPM is automatically predicted.

Additional Features

  • Model as a PyTorch Module: You can use the model as a standard PyTorch module. For example:

    # Change device
    model = model.to(device)
    
    # Compile model
    model = torch.compile(model)

Paper

If you find our work useful, please consider citing the paper:

@misc{luo2025fcpefastcontextbasedpitch,
      title={FCPE: A Fast Context-based Pitch Estimation Model}, 
      author={Yuxin Luo and Ruoyi Zhang and Lu-Chuan Liu and Tianyu Li and Hangyu Liu},
      year={2025},
      eprint={2509.15140},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2509.15140}, 
}

Important details

The model we use in our paper is DDSP-200K, you can get the model from here: DDSP-200K Model.

And there's another model which released earlier, you can get it from here FCPE-Previous.

More information about experiments will be released after the paper is accepted or rejected.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%