TorchFCPE

Overview

TorchFCPE(Fast Context-based Pitch Estimation) is a PyTorch-based library designed for audio pitch extraction and MIDI conversion. This README provides a quick guide on how to use the library for audio pitch inference and MIDI extraction.

Note: that the MIDI extractor of FCPE is quantized from f0 using non neural network methods

Note: I won't be updating FCPE (or benchmark) so soon, but I will definitely release a version with cleaned-up code by no later than next year.

Installation

Before using the library, make sure you have the necessary dependencies installed:

pip install torchfcpe

Usage

1. Audio Pitch Inference

from torchfcpe import spawn_bundled_infer_model
import torch
import librosa

# Configure device and target hop size
device = 'cpu'  # or 'cuda' if using a GPU
sr = 16000  # Sample rate
hop_size = 160  # Hop size for processing

# Load and preprocess audio
audio, sr = librosa.load('test.wav', sr=sr)
audio = librosa.to_mono(audio)
audio_length = len(audio)
f0_target_length = (audio_length // hop_size) + 1
audio = torch.from_numpy(audio).float().unsqueeze(0).unsqueeze(-1).to(device)

# Load the model
model = spawn_bundled_infer_model(device=device)

# Perform pitch inference
f0 = model.infer(
    audio,
    sr=sr,
    decoder_mode='local_argmax',  # Recommended mode
    threshold=0.006,  # Threshold for V/UV decision
    f0_min=80,  # Minimum pitch
    f0_max=880,  # Maximum pitch
    interp_uv=False,  # Interpolate unvoiced frames
    output_interp_target_length=f0_target_length,  # Interpolate to target length
)

print(f0)

2. MIDI Extraction

# Extract MIDI from audio
midi = model.extact_midi(
    audio,
    sr=sr,
    decoder_mode='local_argmax',  # Recommended mode
    threshold=0.006,  # Threshold for V/UV decision
    f0_min=80,  # Minimum pitch
    f0_max=880,  # Maximum pitch
    output_path="test.mid",  # Save MIDI to file
)

print(midi)

Notes

Inference Parameters:
- audio: Input audio as a torch.Tensor.
- sr: Sample rate of the audio.
- decoder_mode (Optional): Mode for decoding, 'local_argmax' is recommended.
- threshold (Optional): Threshold for voice/unvoiced decision; default is 0.006.
- f0_min (Optional): Minimum pitch value; default is 80 Hz.
- f0_max (Optional): Maximum pitch value; default is 880 Hz.
- interp_uv (Optional): Whether to interpolate unvoiced frames; default is False.
- output_interp_target_length (Optional): Length to which the output pitch should be interpolated.
MIDI Extraction Parameters:
- audio: Input audio as a torch.Tensor.
- sr: Sample rate of the audio.
- decoder_mode (Optional): Mode for decoding; 'local_argmax' is recommended.
- threshold (Optional): Threshold for voice/unvoiced decision; default is 0.006.
- f0_min (Optional): Minimum pitch value; default is 80 Hz.
- f0_max (Optional): Maximum pitch value; default is 880 Hz.
- output_path (Optional): File path to save the MIDI file. If not provided, only returns the MIDI structure.
- tempo (Optional): BPM for the MIDI file. If None, BPM is automatically predicted.

Additional Features

Model as a PyTorch Module: You can use the model as a standard PyTorch module. For example:

# Change device
model = model.to(device)

# Compile model
model = torch.compile(model)

Paper

If you find our work useful, please consider citing the paper:

@misc{luo2025fcpefastcontextbasedpitch,
      title={FCPE: A Fast Context-based Pitch Estimation Model}, 
      author={Yuxin Luo and Ruoyi Zhang and Lu-Chuan Liu and Tianyu Li and Hangyu Liu},
      year={2025},
      eprint={2509.15140},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2509.15140}, 
}

Important details

The model we use in our paper is DDSP-200K, you can get the model from here: DDSP-200K Model.

And there's another model which released earlier, you can get it from here FCPE-Previous.

More information about experiments will be released after the paper is accepted or rejected.

Name		Name	Last commit message	Last commit date
Latest commit History 283 Commits
torchfcpe		torchfcpe
train		train
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TorchFCPE

Overview

Installation

Usage

1. Audio Pitch Inference

2. MIDI Extraction

Notes

Additional Features

Paper

Important details

About

Uh oh!

Releases

Packages

Languages

License

RainbowZL0/FCPE

Folders and files

Latest commit

History

Repository files navigation

TorchFCPE

Overview

Installation

Usage

1. Audio Pitch Inference

2. MIDI Extraction

Notes

Additional Features

Paper

Important details

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages