- [2025.08.25] Added support for OpenRouter API - Release v0.25.7
- [2025.07.23] Added support for custom prompt templates with YAML files - Release v0.25.0. You can now integrate your own prompt and language model with just a few lines of code. Checkout the Reasonrank integration as an example.
- [2025.05.25] Our RankLLM resource paper is accepted to SIGIR 2025! πππ
We offer a suite of rerankers - pointwise models like MonoT5, pairwise models like DuoT5 and listwise models with a focus on open source LLMs compatible with vLLM, SGLang, or TensorRT-LLM. We also support RankGPT and RankGemini variants, which are proprietary listwise rerankers. Addtionally, we support reranking with the first-token logits only to improve inference efficiency. Some of the code in this repository is borrowed from RankGPT, PyGaggle, and LiT5!
current_version = "0.25.7"
- Installation
- Quick Start
- End-to-end Run and 2CR
- Model Zoo
- Training
- Community Contribution
- References and Citations
- Acknowledgments
β οΈ RankLLM is not compatible with macOS, regardless of whether you are using an Intel-based Mac or Apple Silicon (M-series). We recommend using Linux or Windows instead.
As rank_llm relies on Anserini, it is required that you have JDK 21 installed. Please note that using JDK 11 is not supported and may lead to errors.
conda create -n rankllm python=3.11
conda activate rankllmpip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121conda install -c conda-forge openjdk=21 maven -ypip install "rank-llm[pyserini]"pip install -e .[all]      # local installation for development
pip install rank-llm[all]  # or pip installationpip install -e .[sglang]      # local installation for development
pip install rank-llm[sglang]  # or pip installationRemember to install flashinfer to use SGLang backend.
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/pip install -e .[tensorrt-llm]      # local installation for development
pip install rank-llm[tensorrt-llm]  # or pip installationpip install -e .[training]      # local installation for development
pip install rank-llm[training]  # or pip installationRemember to also install flash-attn to use as optimized implementation of attention mechanism used in Transformer models.
pip install flash-attn --no-build-isolationThe following code snippet is a minimal walk through of retrieval, reranking, evalaution, and invocations analysis of top 100 retrieved documents for queries from DL19. In this example BM25 is used as the retriever and RankZephyr as the reranker. Additional sample snippets are available to run under the src/rank_llm/demo directory.
from pathlib import Path
from rank_llm.analysis.response_analysis import ResponseAnalyzer
from rank_llm.data import DataWriter
from rank_llm.evaluation.trec_eval import EvalFunction
from rank_llm.rerank import Reranker, get_openai_api_key
from rank_llm.rerank.listwise import (
    SafeOpenai,
    VicunaReranker,
    ZephyrReranker,
)
from rank_llm.retrieve.retriever import RetrievalMethod, Retriever
from rank_llm.retrieve.topics_dict import TOPICS
# -------- Retrieval --------
# By default BM25 is used for retrieval of top 100 candidates.
dataset_name = "dl19"
retrieved_results = Retriever.from_dataset_with_prebuilt_index(dataset_name)
# Users can specify other retrieval methods and number of retrieved candidates.
# retrieved_results = Retriever.from_dataset_with_prebuilt_index(
#     dataset_name, RetrievalMethod.SPLADE_P_P_ENSEMBLE_DISTIL, k=50
# )
# ---------------------------
# --------- Rerank ----------
# Rank Zephyr model
reranker = ZephyrReranker()
# Rank Vicuna model
# reranker = VicunaReranker()
# RankGPT
# model_coordinator = SafeOpenai("gpt-4o-mini", 4096, keys=get_openai_api_key())
# reranker = Reranker(model_coordinator)
kwargs = {"populate_invocations_history": True}
rerank_results = reranker.rerank_batch(requests=retrieved_results, **kwargs)
# ---------------------------
# ------- Evaluation --------
# Evaluate retrieved results.
topics = TOPICS[dataset_name]
ndcg_10_retrieved = EvalFunction.from_results(retrieved_results, topics)
print(ndcg_10_retrieved)
# Evaluate rerank results.
ndcg_10_rerank = EvalFunction.from_results(rerank_results, topics)
print(ndcg_10_rerank)
# By default ndcg@10 is the eval metric, other value can be specified:
# eval_args = ["-c", "-m", "map_cut.100", "-l2"]
# map_100_rerank = EvalFunction.from_results(rerank_results, topics, eval_args)
# print(map_100_rerank)
# eval_args = ["-c", "-m", "recall.20"]
# recall_20_rerank = EvalFunction.from_results(rerank_results, topics, eval_args)
# print(recall_20_rerank)
# ---------------------------
# --- Analyze invocations ---
analyzer = ResponseAnalyzer.from_inline_results(rerank_results)
error_counts = analyzer.count_errors(verbose=True)
print(error_counts)
# ---------------------------
# ------ Save results -------
writer = DataWriter(rerank_results)
Path(f"demo_outputs/").mkdir(parents=True, exist_ok=True)
writer.write_in_jsonl_format(f"demo_outputs/rerank_results.jsonl")
writer.write_in_trec_eval_format(f"demo_outputs/rerank_results.txt")
writer.write_inference_invocations_history(
    f"demo_outputs/inference_invocations_history.json"
)
# ---------------------------If you are interested in running retrieval and reranking end-to-end or reproducing the results from the reference papers, run_rank_llm.py is a convinent wrapper script that combines these two steps.
The comperehensive list of our two-click reproduction commands are available on MS MARCO V1 and MS MARCO V2 webpages for DL19 and DL20 and DL21-23 datasets, respectively. Moving forward, we plan to cover more datasets and retrievers in our 2CR pages. The rest of this session provides some sample e2e runs.
We can run the RankZephyr model with the following command:
python src/rank_llm/scripts/run_rank_llm.py  --model_path=castorini/rank_zephyr_7b_v1_full --top_k_candidates=100 --dataset=dl20 \
--retrieval_method=SPLADE++_EnsembleDistil_ONNX --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml  --context_size=4096 --variable_passagesIncluding the --sglang_batched flag will allow you to run the model in batched mode using the SGLang library.
Including the --tensorrt_batched flag will allow you to run the model in batched mode using the TensorRT-LLM library.
If you want to run multiple passes of the model, you can use the --num_passes flag.
We can run the RankGPT4-o model with the following command:
python src/rank_llm/scripts/run_rank_llm.py  --model_path=gpt-4o --top_k_candidates=100 --dataset=dl20 \
  --retrieval_method=bm25 --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_gpt_apeer_template.yaml  --context_size=4096 --use_azure_openaiNote that the --prompt_template_path is set to rank_gpt_apeer to use the LLM refined prompt from APEER.
This can be changed to rank_GPT to use the original prompt.
We can run the LiT5-Distill V2 model (which could rerank 100 documents in a single pass) with the following command:
python src/rank_llm/scripts/run_rank_llm.py  --model_path=castorini/LiT5-Distill-large-v2 --top_k_candidates=100 --dataset=dl19 \
        --retrieval_method=bm25 --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_fid_template.yaml  --context_size=150 --batch_size=4 \
    --variable_passages --window_size=100We can run the LiT5-Distill original model (which works with a window size of 20) with the following command:
python src/rank_llm/scripts/run_rank_llm.py  --model_path=castorini/LiT5-Distill-large --top_k_candidates=100 --dataset=dl19 \
    --retrieval_method=bm25 --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_fid_template.yaml  --context_size=150 --batch_size=32 \
    --variable_passagesWe can run the LiT5-Score model with the following command:
python src/rank_llm/scripts/run_rank_llm.py  --model_path=castorini/LiT5-Score-large --top_k_candidates=100 --dataset=dl19 \
    --retrieval_method=bm25 --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_fid_score_template.yaml --context_size=150 --batch_size=8 \
    --window_size=100 --variable_passagesThe following runs the 3B variant of MonoT5 trained for 10K steps:
python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/monot5-3b-msmarco-10k --top_k_candidates=1000 --dataset=dl19 \
    --retrieval_method=bm25 --prompt_template_path=src/rank_llm/rerank/prompt_templates/monot5_template.yaml --context_size=512Note that we usually rerank 1K candidates with MonoT5.
The following runs the #B variant of DuoT5 trained for 10K steps:
python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/duot5-3b-msmarco-10k --top_k_candidates=50 --dataset=dl19 \
    --retrieval_method=bm25 --prompt_template_path=src/rank_llm/rerank/prompt_templates/duot5_template.yamlSince Duo's pairwise comparison has $O(n^2) runtime complexity, we recommend reranking top 50 candidates using DuoT5 models.
We can run the FirstMistral model, reranking using the first-token logits only with the following command:
python src/rank_llm/scripts/run_rank_llm.py  --model_path=castorini/first_mistral --top_k_candidates=100 --dataset=dl20 --retrieval_method=SPLADE++_EnsembleDistil_ONNX --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml  --context_size=4096 --variable_passages --use_logits --use_alpha --num_gpus 1
Omit --use_logits if you wish to perform traditional listwise reranking.
First install genai:
pip install -e .[genai]      # local installation for development
pip install rank-llm[genai]  # or pip installationThen run the following command:
python src/rank_llm/scripts/run_rank_llm.py  --model_path=gemini-2.0-flash-001 --top_k_candidates=100 --dataset=dl20 \
    --retrieval_method=SPLADE++_EnsembleDistil_ONNX --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_gpt_apeer_template.yaml  --context_size=4096The following is a table of the listwise models our repository was primarily built to handle (with the models hosted on HuggingFace):
vLLM, SGLang, and TensorRT-LLM backends are only supported for RankZephyr and RankVicuna models.
| Model Name | Hugging Face Identifier/Link | 
|---|---|
| RankZephyr 7B V1 - Full - BF16 | castorini/rank_zephyr_7b_v1_full | 
| RankVicuna 7B - V1 | castorini/rank_vicuna_7b_v1 | 
| RankVicuna 7B - V1 - No Data Augmentation | castorini/rank_vicuna_7b_v1_noda | 
| RankVicuna 7B - V1 - FP16 | castorini/rank_vicuna_7b_v1_fp16 | 
| RankVicuna 7B - V1 - No Data Augmentation - FP16 | castorini/rank_vicuna_7b_v1_noda_fp16 | 
We also officially support the following rerankers built by our group:
The following is a table specifically for our LiT5 suite of models hosted on HuggingFace:
| Model Name | π€ Hugging Face Identifier/Link | 
|---|---|
| LiT5 Distill base | castorini/LiT5-Distill-base | 
| LiT5 Distill large | castorini/LiT5-Distill-large | 
| LiT5 Distill xl | castorini/LiT5-Distill-xl | 
| LiT5 Distill base v2 | castorini/LiT5-Distill-base-v2 | 
| LiT5 Distill large v2 | castorini/LiT5-Distill-large-v2 | 
| LiT5 Distill xl v2 | castorini/LiT5-Distill-xl-v2 | 
| LiT5 Score base | castorini/LiT5-Score-base | 
| LiT5 Score large | castorini/LiT5-Score-large | 
| LiT5 Score xl | castorini/LiT5-Score-xl | 
Now you can run top-100 reranking with the v2 model in a single pass while maintaining efficiency!
The following is a table specifically for our monoT5 suite of models hosted on HuggingFace:
| Model Name | π€ Hugging Face Identifier/Link | 
|---|---|
| monoT5 Small MSMARCO 10K | castorini/monot5-small-msmarco-10k | 
| monoT5 Small MSMARCO 100K | castorini/monot5-small-msmarco-100k | 
| monoT5 Base MSMARCO | castorini/monot5-base-msmarco | 
| monoT5 Base MSMARCO 10K | castorini/monot5-base-msmarco-10k | 
| monoT5 Large MSMARCO 10K | castorini/monot5-large-msmarco-10k | 
| monoT5 Large MSMARCO | castorini/monot5-large-msmarco | 
| monoT5 3B MSMARCO 10K | castorini/monot5-3b-msmarco-10k | 
| monoT5 3B MSMARCO | castorini/monot5-3b-msmarco | 
| monoT5 Base Med MSMARCO | castorini/monot5-base-med-msmarco | 
| monoT5 3B Med MSMARCO | castorini/monot5-3b-med-msmarco | 
We recommend the Med models for biomedical retrieval. We also provide both 10K (generally better OOD effectiveness) and 100K checkpoints (better in-domain).
Please check the training directory for finetuning open-source listwise rerankers.
RankLLM is implemented in many popular toolkits such as LlamaIndex, rerankers, and LangChain. For usage of RankLLM in those toolkits and examples, please check this external integrations README
If you would like to contribute to the project, please refer to the contribution guidelines.
- v0.25.7: August 25, 2025 [Release Notes]
- v0.25.6: August 5, 2025 [Release Notes]
- v0.25.0: July 23, 2025 [Release Notes]
If you use RankLLM, please cite the following relevant papers:
[2505.19284] RankLLM: A Python Package for Reranking with LLMs
@inproceedings{sharifymoghaddam2025rankllm,
author = {Sharifymoghaddam, Sahel and Pradeep, Ronak and Slavescu, Andre and Nguyen, Ryan and Xu, Andrew and Chen, Zijian and Zhang, Yilin and Chen, Yidi and Xian, Jasper and Lin, Jimmy},
title = {{RankLLM}: A Python Package for Reranking with LLMs},
year = {2025},
isbn = {9798400715921},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
booktitle = {Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval},
pages = {3681β3690},
numpages = {10},
keywords = {information retrieval, large language models, python, reranking},
location = {Padua, Italy},
series = {SIGIR '25}
}
@ARTICLE{pradeep2023rankvicuna,
  title   = {{RankVicuna}: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models},
  author  = {Ronak Pradeep and Sahel Sharifymoghaddam and Jimmy Lin},
  year    = {2023},
  journal = {arXiv:2309.15088}
}
[2312.02724] RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!
@ARTICLE{pradeep2023rankzephyr,
  title   = {{RankZephyr}: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!},
  author  = {Ronak Pradeep and Sahel Sharifymoghaddam and Jimmy Lin},
  year    = {2023},
  journal = {arXiv:2312.02724}
}
If you use one of the LiT5 models please cite the following relevant paper:
@ARTICLE{tamber2023scaling,
  title   = {Scaling Down, LiTting Up: Efficient Zero-Shot Listwise Reranking with Seq2seq Encoder-Decoder Models},
  author  = {Manveer Singh Tamber and Ronak Pradeep and Jimmy Lin},
  year    = {2023},
  journal = {arXiv:2312.16098}
}
If you use one of the monoT5 models please cite the following relevant paper:
@ARTICLE{pradeep2021emd,
  title = {The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models},
  author = {Ronak Pradeep and Rodrigo Nogueira and Jimmy Lin},
  year = {2021},
  journal = {arXiv:2101.05667},
}
If you use the FirstMistral model, please consider citing:
@ARTICLE{chen2024firstrepro,
  title   = title={An Early FIRST Reproduction and Improvements to Single-Token Decoding for Fast Listwise Reranking},
  author  = {Zijian Chen and Ronak Pradeep and Jimmy Lin},
  year    = {2024},
  journal = {arXiv:2411.05508}
}
If you would like to cite the FIRST methodology, please consider citing:
[2406.15657] FIRST: Faster Improved Listwise Reranking with Single Token Decoding
@ARTICLE{reddy2024first,
  title   = {FIRST: Faster Improved Listwise Reranking with Single Token Decoding},
  author  = {Reddy, Revanth Gangi and Doo, JaeHyeok and Xu, Yifei and Sultan, Md Arafat and Swain, Deevya and Sil, Avirup and Ji, Heng},
  year    = {2024}
  journal = {arXiv:2406.15657},
}
This research is supported in part by the Natural Sciences and Engineering Research Council (NSERC) of Canada.