Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
Empty file added -collection
Empty file.
Empty file added -generator
Empty file.
Empty file added -index
Empty file.
Empty file added -input
Empty file.
Empty file added -threads
Empty file.
46 changes: 22 additions & 24 deletions docs/experiments-msmarco-passage.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,21 @@
# Anserini: BM25 Baselines for MS MARCO Passage Ranking

This page contains instructions for running BM25 baselines on the [MS MARCO *passage* ranking task](https://microsoft.github.io/msmarco/).
Note that there is a separate [MS MARCO *document* ranking task](experiments-msmarco-doc.md).
This page contains instructions for running BM25 baselines on the [MS MARCO _passage_ ranking task](https://microsoft.github.io/msmarco/).
Note that there is a separate [MS MARCO _document_ ranking task](experiments-msmarco-doc.md).
This exercise will require a machine with >8 GB RAM and >15 GB free disk space .
If you're using a Windows machine, equivalent commands are provided alongside the Unix-like (Linux/macOS) commands.

If you're a Waterloo student traversing the [onboarding path](https://github.com/lintool/guide/blob/master/ura.md), [start here](start-here.md
).
If you're a Waterloo student traversing the [onboarding path](https://github.com/lintool/guide/blob/master/ura.md), [start here](start-here.md).
In general, don't try to rush through this guide by just blindly copying and pasting commands into a shell;
that's what I call [cargo culting](https://en.wikipedia.org/wiki/Cargo_cult_programming).
Instead, really try to understand what's going on.


**Learning outcomes** for this guide, building on previous steps in the onboarding path:

+ Be able to use Anserini to build a Lucene inverted index on the MS MARCO passage collection.
+ Be able to use Anserini to perform a batch retrieval run on the MS MARCO passage collection with the dev queries.
+ Be able to evaluate the retrieved results above.
+ Understand the MRR metric.
- Be able to use Anserini to build a Lucene inverted index on the MS MARCO passage collection.
- Be able to use Anserini to perform a batch retrieval run on the MS MARCO passage collection with the dev queries.
- Be able to evaluate the retrieved results above.
- Understand the MRR metric.

What's Anserini?
Well, it's the repo that you're in right now.
Expand All @@ -33,8 +31,7 @@ That is, most things done with Anserini can be "translated" into Elasticsearch q
## Data Prep

In this guide, we're just going through the mechanical steps of data prep.
To better understand what you're actually doing, go through the [start here](start-here.md
) guide.
To better understand what you're actually doing, go through the [start here](start-here.md) guide.
The guide contains the same exact instructions, but provide more detailed explanations.

We're going to use the repository's root directory as the working directory.
Expand Down Expand Up @@ -80,8 +77,8 @@ The output queries file `collections/msmarco-passage/queries.dev.small.tsv` shou

In building a retrieval system, there are generally two phases:

+ In the **indexing** phase, an indexer takes the document collection (i.e., corpus) and builds an index, which is a data structure that supports efficient retrieval.
+ In the **retrieval** (or **search**) phase, the retrieval system returns a ranked list given a query _q_, with the aid of the index constructed in the previous phase.
- In the **indexing** phase, an indexer takes the document collection (i.e., corpus) and builds an index, which is a data structure that supports efficient retrieval.
- In the **retrieval** (or **search**) phase, the retrieval system returns a ranked list given a query _q_, with the aid of the index constructed in the previous phase.

(There's also a training phase when we start to discuss models that _learn_ from data, but we're not there yet.)

Expand All @@ -102,13 +99,13 @@ bin/run.sh io.anserini.index.IndexCollection \
-generator DefaultLuceneDocumentGenerator \
-threads 9 -storePositions -storeDocvectors -storeRaw
```

For Windows:

```bash
bin\run.bat io.anserini.index.IndexCollection -collection JsonCollection -input collections\msmarco-passage\collection_jsonl -index indexes\msmarco-passage\lucene-index-msmarco -generator DefaultLuceneDocumentGenerator -threads 9 -storePositions -storeDocvectors -storeRaw
```



In this case, Lucene creates what is known as an **inverted index**.

Upon completion, we should have an index with 8,841,823 documents.
Expand All @@ -117,7 +114,6 @@ On the new MacBook Pro M3 Laptop, if you only have 8GB memory, you might encount
finishes. This is likely caused by JVM allocating more memory than available on the system, thus causing too much memory swapping without actively
garbage collecting. To mitigate this issue, you may need to modify run.sh to change the -Xms option to 2GB and -Xmx to 6GB.


## Retrieval

In the above step, we've built the inverted index.
Expand All @@ -132,7 +128,9 @@ bin/run.sh io.anserini.search.SearchCollection \
-parallelism 4 \
-bm25 -bm25.k1 0.82 -bm25.b 0.68 -hits 1000
```

For Windows:

```bash
bin\run.bat io.anserini.search.SearchCollection -index indexes\msmarco-passage\lucene-index-msmarco -topics collections\msmarco-passage\queries.dev.small.tsv -topicReader TsvInt -output runs\run.msmarco-passage.dev.small.tsv -format msmarco -parallelism 4 -bm25 -bm25.k1 0.82 -bm25.b 0.68 -hits 1000
```
Expand Down Expand Up @@ -191,8 +189,7 @@ $ grep 7187158 collections/msmarco-passage/collection.tsv
In this case, the document (hit) seems relevant.
That is, it contains information that addresses the information need.
So here, the retrieval system "did well".
Remember that this document was indeed marked relevant in the qrels, as we saw in the [start here](start-here.md
) guide.
Remember that this document was indeed marked relevant in the qrels, as we saw in the [start here](start-here.md) guide.

As an additional sanity check, run the following:

Expand Down Expand Up @@ -224,8 +221,7 @@ QueriesRanked: 6980

(Yea, the number of digits of precision is a bit... excessive)

Remember from the [start here](start-here.md
) guide that with relevance judgments (qrels), we can automatically evaluate the retrieval system output (i.e., the run).
Remember from the [start here](start-here.md) guide that with relevance judgments (qrels), we can automatically evaluate the retrieval system output (i.e., the run).

The final ingredient is a metric, i.e., how to quantify the "quality" of a ranked list.
Here, we're using a metric called MRR, or mean reciprocal rank.
Expand Down Expand Up @@ -329,22 +325,23 @@ There are five different sets of 10k samples (using the `shuf` command).
We tuned on each individual set and then averaged parameter values across all five sets (this has the effect of regularization).
In separate trials, we optimized for:

+ recall@1000, since Anserini output serves as input to downstream rerankers (e.g., based on BERT), and we want to maximize the number of relevant documents the rerankers have to work with;
+ MRR@10, for the case where Anserini output is directly presented to users (i.e., no downstream reranking).
- recall@1000, since Anserini output serves as input to downstream rerankers (e.g., based on BERT), and we want to maximize the number of relevant documents the rerankers have to work with;
- MRR@10, for the case where Anserini output is directly presented to users (i.e., no downstream reranking).

It turns out that optimizing for MRR@10 and MAP yields the same settings.

Here's the comparison between the Anserini default and optimized parameters:

| Setting | MRR@10 | MAP | Recall@1000 |
|:------------------------------------------------|-------:|-------:|------------:|
| :---------------------------------------------- | -----: | -----: | ----------: |
| Default (`k1=0.9`, `b=0.4`) | 0.1840 | 0.1926 | 0.8526 |
| Optimized for recall@1000 (`k1=0.82`, `b=0.68`) | 0.1874 | 0.1957 | 0.8573 |
| Optimized for MRR@10/MAP (`k1=0.60`, `b=0.62`) | 0.1892 | 0.1972 | 0.8555 |

As mentioned above, the BM25 run with `k1=0.82`, `b=0.68` corresponds to the entry "BM25 (Lucene8, tuned)" dated 2019/06/26 on the [MS MARCO Passage Ranking Leaderboard](https://microsoft.github.io/MSMARCO-Passage-Ranking-Submissions/leaderboard/).
The BM25 run with default parameters `k1=0.9`, `b=0.4` roughly corresponds to the entry "BM25 (Anserini)" dated 2019/04/10 (but Anserini was using Lucene 7.6 at the time).


## Reproduction Log[*](reproducibility.md)

+ Results reproduced by [@ronakice](https://github.com/ronakice) on 2019-08-12 (commit [`5b29d16`](https://github.com/castorini/anserini/commit/5b29d1654abc5e8a014c2230da990ab2f91fb340))
Expand Down Expand Up @@ -542,5 +539,6 @@ The BM25 run with default parameters `k1=0.9`, `b=0.4` roughly corresponds to th
+ Results reproduced by [@sherloc512](https://github.com/sherloc512) on 2024-12-04 (commit [`9e55b1c`](https://github.com/castorini/anserini/commit/9e55b1c97fced46530dac1f78975d19635ffaf7a))
+ Results reproduced by [@zdann15](https://github.com/zdann15) on 2024-12-04 (commit [`9d311b4`](https://github.com/castorini/anserini/commit/9d311b4409a9ff3d79b01910178eaec3931f0abe))
+ Results reproduced by [@Alireza-Zwolf](https://github.com/Alireza-Zwolf) on 2024-12-15 (commit [`c7dff5f`](https://github.com/castorini/anserini/commit/c7dff5f8417905612ad9f97e85012440e9e16087))
+ Results reproduced by [@Linsen-gao-457](https://github.com/Linsen-gao-457) on 2024-12-17 (commit [a86484a6](https://github.com/castorini/anserini/commit/a86484a6e99a7a97966c423d230ad05279b24508))
+ Results reproduced by [@Linsen-gao-457](https://github.com/Linsen-gao-457) on 2024-12-17 (commit [`a86484a`](https://github.com/castorini/anserini/commit/a86484a6e99a7a97966c423d230ad05279b24508))
+ Results reproduced by [@vincent-4](https://github.com/vincent-4) on 2024-12-20 (commit [`c619dc8`](https://github.com/castorini/anserini/commit/c619dc8d9ab28298251964053a927906e9957f51))

12 changes: 6 additions & 6 deletions docs/regressions/regressions-backgroundlinking18.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,21 +45,21 @@ After indexing has completed, you should be able to perform retrieval as follows
```
bin/run.sh io.anserini.search.SearchCollection \
-index indexes/lucene-index.wapo.v2/ \
-topics tools/topics-and-qrels/topics.backgroundlinking18.txt \
-topics tools\topics-and-qrels\topics.backgroundlinking18.txt \
-topicReader BackgroundLinking \
-output runs/run.wapo.v2.bm25.topics.backgroundlinking18.txt \
-backgroundLinking -backgroundLinking.k 100 -bm25 -hits 100 &

bin/run.sh io.anserini.search.SearchCollection \
-index indexes/lucene-index.wapo.v2/ \
-topics tools/topics-and-qrels/topics.backgroundlinking18.txt \
-topics tools\topics-and-qrels\topics.backgroundlinking18.txt \
-topicReader BackgroundLinking \
-output runs/run.wapo.v2.bm25+rm3.topics.backgroundlinking18.txt \
-backgroundLinking -backgroundLinking.k 100 -bm25 -rm3 -hits 100 &

bin/run.sh io.anserini.search.SearchCollection \
-index indexes/lucene-index.wapo.v2/ \
-topics tools/topics-and-qrels/topics.backgroundlinking18.txt \
-topics tools\topics-and-qrels\topics.backgroundlinking18.txt \
-topicReader BackgroundLinking \
-output runs/run.wapo.v2.bm25+rm3+df.topics.backgroundlinking18.txt \
-backgroundLinking -backgroundLinking.dateFilter -backgroundLinking.k 100 -bm25 -rm3 -hits 100 &
Expand All @@ -68,11 +68,11 @@ bin/run.sh io.anserini.search.SearchCollection \
Evaluation can be performed using `trec_eval`:

```
bin/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 tools/topics-and-qrels/qrels.backgroundlinking18.txt runs/run.wapo.v2.bm25.topics.backgroundlinking18.txt
bin/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 tools\topics-and-qrels\qrels.backgroundlinking18.txt runs/run.wapo.v2.bm25.topics.backgroundlinking18.txt

bin/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 tools/topics-and-qrels/qrels.backgroundlinking18.txt runs/run.wapo.v2.bm25+rm3.topics.backgroundlinking18.txt
bin/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 tools\topics-and-qrels\qrels.backgroundlinking18.txt runs/run.wapo.v2.bm25+rm3.topics.backgroundlinking18.txt

bin/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 tools/topics-and-qrels/qrels.backgroundlinking18.txt runs/run.wapo.v2.bm25+rm3+df.topics.backgroundlinking18.txt
bin/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 tools\topics-and-qrels\qrels.backgroundlinking18.txt runs/run.wapo.v2.bm25+rm3+df.topics.backgroundlinking18.txt
```

## Effectiveness
Expand Down
12 changes: 6 additions & 6 deletions docs/regressions/regressions-backgroundlinking19.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,21 +45,21 @@ After indexing has completed, you should be able to perform retrieval as follows
```
bin/run.sh io.anserini.search.SearchCollection \
-index indexes/lucene-index.wapo.v2/ \
-topics tools/topics-and-qrels/topics.backgroundlinking19.txt \
-topics tools\topics-and-qrels\topics.backgroundlinking19.txt \
-topicReader BackgroundLinking \
-output runs/run.wapo.v2.bm25.topics.backgroundlinking19.txt \
-backgroundLinking -backgroundLinking.k 100 -bm25 -hits 100 &

bin/run.sh io.anserini.search.SearchCollection \
-index indexes/lucene-index.wapo.v2/ \
-topics tools/topics-and-qrels/topics.backgroundlinking19.txt \
-topics tools\topics-and-qrels\topics.backgroundlinking19.txt \
-topicReader BackgroundLinking \
-output runs/run.wapo.v2.bm25+rm3.topics.backgroundlinking19.txt \
-backgroundLinking -backgroundLinking.k 100 -bm25 -rm3 -hits 100 &

bin/run.sh io.anserini.search.SearchCollection \
-index indexes/lucene-index.wapo.v2/ \
-topics tools/topics-and-qrels/topics.backgroundlinking19.txt \
-topics tools\topics-and-qrels\topics.backgroundlinking19.txt \
-topicReader BackgroundLinking \
-output runs/run.wapo.v2.bm25+rm3+df.topics.backgroundlinking19.txt \
-backgroundLinking -backgroundLinking.dateFilter -backgroundLinking.k 100 -bm25 -rm3 -hits 100 &
Expand All @@ -68,11 +68,11 @@ bin/run.sh io.anserini.search.SearchCollection \
Evaluation can be performed using `trec_eval`:

```
bin/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 tools/topics-and-qrels/qrels.backgroundlinking19.txt runs/run.wapo.v2.bm25.topics.backgroundlinking19.txt
bin/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 tools\topics-and-qrels\qrels.backgroundlinking19.txt runs/run.wapo.v2.bm25.topics.backgroundlinking19.txt

bin/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 tools/topics-and-qrels/qrels.backgroundlinking19.txt runs/run.wapo.v2.bm25+rm3.topics.backgroundlinking19.txt
bin/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 tools\topics-and-qrels\qrels.backgroundlinking19.txt runs/run.wapo.v2.bm25+rm3.topics.backgroundlinking19.txt

bin/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 tools/topics-and-qrels/qrels.backgroundlinking19.txt runs/run.wapo.v2.bm25+rm3+df.topics.backgroundlinking19.txt
bin/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 tools\topics-and-qrels\qrels.backgroundlinking19.txt runs/run.wapo.v2.bm25+rm3+df.topics.backgroundlinking19.txt
```

## Effectiveness
Expand Down
12 changes: 6 additions & 6 deletions docs/regressions/regressions-backgroundlinking20.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,21 +45,21 @@ After indexing has completed, you should be able to perform retrieval as follows
```
bin/run.sh io.anserini.search.SearchCollection \
-index indexes/lucene-index.wapo.v3/ \
-topics tools/topics-and-qrels/topics.backgroundlinking20.txt \
-topics tools\topics-and-qrels\topics.backgroundlinking20.txt \
-topicReader BackgroundLinking \
-output runs/run.wapo.v3.bm25.topics.backgroundlinking20.txt \
-backgroundLinking -backgroundLinking.k 100 -bm25 -hits 100 &

bin/run.sh io.anserini.search.SearchCollection \
-index indexes/lucene-index.wapo.v3/ \
-topics tools/topics-and-qrels/topics.backgroundlinking20.txt \
-topics tools\topics-and-qrels\topics.backgroundlinking20.txt \
-topicReader BackgroundLinking \
-output runs/run.wapo.v3.bm25+rm3.topics.backgroundlinking20.txt \
-backgroundLinking -backgroundLinking.k 100 -bm25 -rm3 -hits 100 &

bin/run.sh io.anserini.search.SearchCollection \
-index indexes/lucene-index.wapo.v3/ \
-topics tools/topics-and-qrels/topics.backgroundlinking20.txt \
-topics tools\topics-and-qrels\topics.backgroundlinking20.txt \
-topicReader BackgroundLinking \
-output runs/run.wapo.v3.bm25+rm3+df.topics.backgroundlinking20.txt \
-backgroundLinking -backgroundLinking.dateFilter -backgroundLinking.k 100 -bm25 -rm3 -hits 100 &
Expand All @@ -68,11 +68,11 @@ bin/run.sh io.anserini.search.SearchCollection \
Evaluation can be performed using `trec_eval`:

```
bin/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 tools/topics-and-qrels/qrels.backgroundlinking20.txt runs/run.wapo.v3.bm25.topics.backgroundlinking20.txt
bin/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 tools\topics-and-qrels\qrels.backgroundlinking20.txt runs/run.wapo.v3.bm25.topics.backgroundlinking20.txt

bin/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 tools/topics-and-qrels/qrels.backgroundlinking20.txt runs/run.wapo.v3.bm25+rm3.topics.backgroundlinking20.txt
bin/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 tools\topics-and-qrels\qrels.backgroundlinking20.txt runs/run.wapo.v3.bm25+rm3.topics.backgroundlinking20.txt

bin/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 tools/topics-and-qrels/qrels.backgroundlinking20.txt runs/run.wapo.v3.bm25+rm3+df.topics.backgroundlinking20.txt
bin/trec_eval -c -M1000 -m map -c -M1000 -m ndcg_cut.5 tools\topics-and-qrels\qrels.backgroundlinking20.txt runs/run.wapo.v3.bm25+rm3+df.topics.backgroundlinking20.txt
```

## Effectiveness
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ After indexing has completed, you should be able to perform retrieval as follows
```
bin/run.sh io.anserini.search.SearchFlatDenseVectors \
-index indexes/lucene-flat-int8.beir-v1.0.0-arguana.bge-base-en-v1.5/ \
-topics tools/topics-and-qrels/topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.gz \
-topics tools\topics-and-qrels\topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.gz \
-topicReader JsonStringVector \
-output runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-int8-cached.topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.txt \
-hits 1000 -removeQuery -threads 16 &
Expand All @@ -62,9 +62,9 @@ bin/run.sh io.anserini.search.SearchFlatDenseVectors \
Evaluation can be performed using `trec_eval`:

```
bin/trec_eval -c -m ndcg_cut.10 tools/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-int8-cached.topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -c -m recall.100 tools/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-int8-cached.topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -c -m recall.1000 tools/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-int8-cached.topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -c -m ndcg_cut.10 tools\topics-and-qrels\qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-int8-cached.topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -c -m recall.100 tools\topics-and-qrels\qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-int8-cached.topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -c -m recall.1000 tools\topics-and-qrels\qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-int8-cached.topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.txt
```

## Effectiveness
Expand Down
Loading
Loading