UPDATE: Proceedings are now online at https://ceur-ws.org/Vol-3592/
Task Description
The importance of how typical Web users can access the body of knowledge on the Web grows with the amount of structured data published thereon. Over the past years, there has been an increasing amount of research on interaction paradigms that allow end users to profit from the expressive power of Semantic Web standards while at the same time hiding their complexity behind an intuitive and easy-to-use interface.
However, no natural language interface allows one to access scholarly data such as papers, authors, institutions, models, or datasets. The key challenge is to translate the users’ information needs into a form such that they can be evaluated using standard Semantic Web query processing and inference techniques. Such interfaces would allow users to express arbitrarily complex information needs in an intuitive fashion and, at least in principle, in their own words. The intent of this challenge is thus of great importance and interest to Semantic Web scholars.
By taking part in the challenges sub-tasks, participants can explore and will experience a versatile range of academic and industrial approaches and applica- tions through the multi-faceted workshop format. We will try to bridge the gap between academia and industry to attract junior as well as senior researchers from both worlds leading to a memorable experience. We target to publish system descriptions as proceedings published by CEUR-WS as in past years
Thus, in its first iteration at ISWC 2023, we have two independent tasks:
Task 1: DBLP-QUAD — Knowledge Graph Question Answering over DBLP: For this task, participants will use the DBLP-QUAD dataset, which consists of 10,000 question-SPARQL pairs, and is answerable over the DBLP Knowledge Graph. The task is hosted on https://codalab.lisn.upsaclay.fr/competitions/14264.
Task 2: SciQA — Question Answering of Scholarly Knowledge: This new task introduced this year will use a scholarly data source ORKG as a target repository for answering comparative questions. The task is hosted on https://codalab.lisn.upsaclay.fr/competitions/14759.
Dataset
Task 1: DBLP-QUAD — Knowledge Graph Question Answering over DBLP
For this task, participants will use the DBLP-QUAD dataset (https://doi.org/10.5281/zenodo.7643971), see also https://huggingface.co/datasets/awalesushil/DBLP-QuAD, which consists of 10,000 question-SPARQL pairs, and is answerable over the DBLP Knowledge Graph (https://blog.dblp.org/2022/03/02/dblp-in-rdf/) and (https://zenodo.org/record/7638511). A live SPARQL endpoint for the DBLP KG is available at https://dblp-kg.ltdemos.informatik.uni-hamburg.de/sparql .
DBLP is a well-known repository for computer science bibliography and has recently released an RDF dump. This allows users to query it as a knowledge graph. The first subtask is to fetch the right answer from the DBLP KG given the question. The second subtask is entity linking (EL) on the same dataset. The DBLP-QuAD dataset was created using the OVERNIGHT approach, where logical forms are first generated from a KG. Then canonical questions are generated from these logical forms.
{
"id": "Q0001",
"query_type": "SINGLE_FACT",
"question": {
"string": "Show the Wikidata ID of the person Robert Schober."
},
"paraphrased_question": {
"string": "What is the Wikidata identifier of the author Robert S.?"
},
"query": {
"sparql": "SELECT DISTINCT ?answer WHERE {
<https://dblp.org/pid/95/2265> <https://dblp.org/rdf/schema#wikidata> ?answer
}"
},
"template_id": "TC04",
"entities": [
"<https://dblp.org/pid/95/2265>"
],
"relations": [
"<https://dblp.org/rdf/schema#wikidata>"
],
"temporal": false,
"held_out": false
}
Task 2: SciQA — Question Answering of Scholarly Knowledge
This new task introduced this year will use a scholarly data source ORKG as a target repository for answering comparative questions. KGQA benchmarks and systems were so far mainly geared towards encyclopedic knowledge graphs such as DBpedia and Wikidata.
In this task, we will leverage a novel QA benchmark for scholarly knowledge – SciQA (https://zenodo.org/record/7744048), see also https://huggingface.co/datasets/orkg/SciQA. A live SPARQL endpoint is available at https://orkg.org/SciQA. The benchmark leverages the Open Research Knowledge Graph (ORKG) which includes over 100,000 resources describing complex research contributions. The dataset contains 1,795 train, 257 validation and 513 test questions respectively.
{
"id": "AQ2193",
"query_type": "Factoid",
"question": {
"string": "Can you provide links to code used in papers that benchmark the
Transformer-XL Base model?"
},
"paraphrased_question": [],
"query": {
"sparql": "SELECT DISTINCT ?code WHERE { ?model a orkgc:Model;
rdfs:label ?model_lbl.
FILTER (str(?model_lbl) = "Transformer-XL Base")
?benchmark orkgp:HAS_DATASET ?dataset.
?cont orkgp:HAS_BENCHMARK ?benchmark.
?cont orkgp:HAS_MODEL ?model;
orkgp:HAS_SOURCE_CODE ?code.
}"
},
"template_id": "T07",
"auto_generated": true,
"query_shape": "Tree",
"query_class": "WHICH-WHAT",
"number_of_patterns": 4
}
Evaluation
Leaderboard
Here are the standings of the final submissions:DBLP-QuAD Entity Linking
Team | F1 |
---|---|
nsteinmetz | 0.8353 |
ruijie.wang | 0.7961 |
hannabiakl-dsti | 0.7100 |
Shreyaar12 | 0.6235 |
DBLP-QuAD KGQA
Team | F1 |
---|---|
ruijie.wang | 0.8488 |
longquanj | 0.6619 |
Shreyaar12 | 0.2175 |
SciQA KGQA
Team | F1 |
---|---|
longquanj | 0.9919 |
tilahun | 0.9904 |
zeio | 0.9358 |
Important Dates
The tentative timeline template for the Scholarly QALD Challenge is as follows. All deadlines are 23:59 AoE (anywhere on earth).Date | Description |
---|---|
2023-03-16 | Website & first call for participation ( https://kgqa.github.io/scholarly-QALD-challenge/2023/ ) |
2023-10-06 | Submission of Systems Results |
2023-10-20 |
Submission of Papers |
2023-10-27 | Notification of Acceptance |
2023-11-10 |
Camera-ready submission. |
6th to 10th November 2023 | ISWC Conference, Visit iswc2023.semanticweb.org |