This is the official repo of the ICML 2025 spotlight paper "Is Complex Query Answering Complex?" https://arxiv.org/pdf/2410.12537
If you use our repo, please cite:
@article{gregucci2024complex, title={Is Complex Query Answering Really Complex?}, author={Gregucci, Cosimo and Xiong, Bo and Hernandez, Daniel and Loconte, Lorenzo and Minervini, Pasquale and Staab, Steffen and Vergari, Antonio}, journal={arXiv preprint arXiv:2410.12537}, year={2024} }
This repo contains several algorithms for multi-hop reasoning on knowledge graphs, including the official PyTorch implementation of Beta Embeddings for Multi-Hop Logical Reasoning in Knowledge Graphs and a PyTorch implementation of Complex Query Answering with Neural Link Predictors.
To reproduce our stratified analysis on both new and old benchmarks, or to execute it for other benchmarks, run read_queries_pair.py.
To change the benchmark it is sufficient to set the --dataset parameter.
The script will generate the files to perform both the stratified analysis and the analysis on cardinality, which you can find in the folder benchmark/test-query-red and benchmark/test-query-card.
We included such files for each benchmark we considered in the release https://github.com/april-tools/is-cqa-complex/releases/tag/benchs-1.0
To generate new benchmarks following the strategy we described in the paper, run create_queries.py, which is a modified version of the one included in the official PyTorch implementation of [Beta Embeddings for Multi-Hop Logical Reasoning in Knowledge Graphs].
The KG data and the benchmarks we used in this paper (FB15k237+H, NELL995+H, ICEWS18+H) can be downloaded from https://github.com/april-tools/is-cqa-complex/releases/tag/benchs-1.0
The folder contains both the old and the new benchmarks, including the benchmark files for their stratified analysis.
All pre-trained models we used in this paper can be downloaded from here
- Download new benchmarks and pre-trained models--> see above
- To test the subset of 2p queries that can be reduced to 1p using CQD or CQD-Hybrid it is sufficient to set --tasks -2pand--subtask 1p. For details about CQD-Hybrid seecqd/CQD.mdIf subtask--subtask None, then the whole orginal set of queries will be tested, while if--subtask New, the new set of queries will be tested. An example for NELL995 is provided injob-NELL995.sh
- A file results.csvcontaining the MRR,H@1,H@3,H@10 for every task/subtask will be created while running the script.
There is no clear state-of-the-art (SoTA) method for the new benchmarks. As shown in the table below, the Mean Reciprocal Rank (MRR) on the new benchmarks is significantly lower than the old ones. For example, for 3i queries on FB15k-237+H, QTO achieves an MRR of 10.1, whereas for FB15k237, it was 54.6.
| Model | 1p | 2p | 3p | 2i | 3i | 1p2i | 2i1p | 2u | 2u1p | 4p | 4i | 2in | 3in | 2pi1pn | 2nu1p | 2in1p | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GNN-QE | 42.8 | 5.2 | 4.0 | 6.0 | 8.8 | 5.6 | 9.9 | 32.5 | 10.0 | 4.3 | 20.0 | 6.8 | 6.5 | 3.7 | 5.0 | 3.3 | 
| ULTRAQ | 40.6 | 4.5 | 3.5 | 5.2 | 7.2 | 5.3 | 10.1 | 29.4 | 8.3 | 3.8 | 16.4 | 5.3 | 5.5 | 2.6 | 3.7 | 2.2 | 
| CQD | 46.7 | 4.4 | 2.4 | 11.3 | 12.8 | 6.0 | 11.5 | 40.1 | 10.6 | 1.1 | 23.8 | 3.3 | 2.6 | 0.6 | 4.9 | 1.2 | 
| CQD-Hybrid | 46.7 | 4.8 | 3.1 | 6.0 | 8.6 | 5.5 | 12.9 | 42.2 | 12.0 | 2.4 | 17.4 | 4.7 | 1.6 | 1.0 | 3.2 | 1.3 | 
| ConE | 41.8 | 4.6 | 3.9 | 9.1 | 10.3 | 3.8 | 7.9 | 22.8 | 6.0 | 3.5 | 20.3 | 5.1 | 4.9 | 2.9 | 3.3 | 3.6 | 
| QTO | 46.7 | 4.9 | 3.7 | 8.7 | 10.1 | 6.1 | 13.5 | 30.6 | 11.2 | 3.9 | 20.2 | 10.6 | 3.1 | 2.0 | 5.3 | 1.5 | 
| CLMPT | 45.3 | 5.3 | 4.7 | 10.2 | 12.2 | 5.6 | 14.9 | 33.6 | 14.2 | 4.5 | 24.0 | 6.8 | 2.3 | 1.6 | 4.8 | 2.5 | 
| Model | 1p | 2p | 3p | 2i | 3i | 1p2i | 2i1p | 2u | 2u1p | 4p | 4i | 2in | 3in | 2pi1pn | 2nu1p | 2in1p | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GNN-QE | 53.6 | 8.0 | 6.0 | 10.7 | 13.3 | 16.0 | 13.5 | 47.5 | 9.8 | 4.7 | 19.4 | 5.5 | 6.4 | 5.8 | 3.3 | 4.4 | 
| ULTRAQ | 38.9 | 6.1 | 4.1 | 7.9 | 10.2 | 15.8 | 9.3 | 28.1 | 9.5 | 4.2 | 15.6 | 4.5 | 5.9 | 4.3 | 2.7 | 3.6 | 
| CQD | 60.4 | 9.6 | 4.2 | 18.5 | 19.6 | 18.9 | 22.6 | 46.3 | 18.5 | 2.0 | 24.8 | 4.2 | 1.5 | 1.5 | 4.9 | 2.6 | 
| CQD-Hybrid | 60.4 | 9.0 | 6.1 | 12.1 | 14.4 | 17.4 | 21.2 | 46.4 | 19.3 | 3.5 | 20.4 | 5.1 | 1.2 | 1.4 | 4.3 | 2.4 | 
| ConE | 53.1 | 7.9 | 6.7 | 21.8 | 23.6 | 14.9 | 11.8 | 39.9 | 8.8 | 5.2 | 27.6 | 4.6 | 6.0 | 3.7 | 2.7 | 6.4 | 
| QTO | 60.3 | 9.8 | 8.0 | 14.6 | 15.8 | 17.6 | 21.1 | 49.1 | 18.9 | 7.0 | 20.9 | 10.2 | 2.3 | 3.1 | 8.4 | 2.4 | 
| CLMPT | 58.1 | 10.1 | 7.8 | 22.7 | 25.0 | 17.2 | 24.4 | 50.0 | 22.0 | 7.2 | 29.1 | 6.5 | 2.4 | 4.1 | 2.3 | 4.5 | 
| Model | 1p | 2p | 3p | 2i | 3i | 1p2i | 2i1p | 2u | 2u1p | 4p | 4i | 2in | 3in | 2pi1pn | 2nu1p | 2in1p | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GNN-QE | 12.2 | 0.9 | 0.5 | 16.1 | 26.5 | 19.1 | 3.5 | 7.6 | 1.1 | 0.4 | 34.0 | 4.5 | 6.9 | 0.9 | 3.5 | 0.8 | 
| ULTRAQ | 6.3 | 1.2 | 1.2 | 7.0 | 11.7 | 8.8 | 1.3 | 3.3 | 1.2 | 0.8 | 15.9 | 2.3 | 4.8 | 1.2 | 2.2 | 1.6 | 
| CQD | 16.6 | 2.5 | 1.5 | 13.0 | 19.5 | 17.1 | 6.7 | 6.8 | 5.9 | 1.1 | 24.0 | 1.5 | 2.9 | 0.2 | 2.7 | 0.9 | 
| CQD-Hybrid | 16.6 | 2.6 | 1.5 | 15.0 | 25.5 | 17.5 | 5.8 | 6.8 | 5.6 | 0.9 | 33.2 | 1.7 | 4.0 | 0.3 | 2.2 | 1.1 | 
| ConE | 3.5 | 0.9 | 0.9 | 1.2 | 0.5 | 1.2 | 1.6 | 1.1 | 0.9 | 0.6 | 0.3 | 1.7 | 2.9 | 1.1 | 0.9 | 1.3 | 
| QTO | 16.6 | 2.6 | 1.4 | 15.7 | 25.0 | 18.4 | 6.2 | 6.7 | 4.9 | 1.1 | 31.5 | 4.9 | 8.7 | 1.2 | 3.0 | 0.9 | 
| CLMPT | 4.7 | 0.8 | 0.1 | 12.0 | 23.0 | 9.7 | 2.1 | 2.7 | 2.2 | 0.1 | 31.0 | 1.2 | 2.1 | 0.1 | 1.0 | 0.2 |