Scholarly QALD at ISWC 2023
Subtasks for Question Answering over Scholarly Knowledge Graphs

UPDATE: Proceedings are now online at https://ceur-ws.org/Vol-3592/

Task Description

The importance of how typical Web users can access the body of knowledge on the Web grows with the amount of structured data published thereon. Over the past years, there has been an increasing amount of research on interaction paradigms that allow end users to profit from the expressive power of Semantic Web standards while at the same time hiding their complexity behind an intuitive and easy-to-use interface.

However, no natural language interface allows one to access scholarly data such as papers, authors, institutions, models, or datasets. The key challenge is to translate the users’ information needs into a form such that they can be evaluated using standard Semantic Web query processing and inference techniques. Such interfaces would allow users to express arbitrarily complex information needs in an intuitive fashion and, at least in principle, in their own words. The intent of this challenge is thus of great importance and interest to Semantic Web scholars.

By taking part in the challenges sub-tasks, participants can explore and will experience a versatile range of academic and industrial approaches and applica- tions through the multi-faceted workshop format. We will try to bridge the gap between academia and industry to attract junior as well as senior researchers from both worlds leading to a memorable experience. We target to publish system descriptions as proceedings published by CEUR-WS as in past years


Thus, in its first iteration at ISWC 2023, we have two independent tasks:

Task 1: DBLP-QUAD — Knowledge Graph Question Answering over DBLP: For this task, participants will use the DBLP-QUAD dataset, which consists of 10,000 question-SPARQL pairs, and is answerable over the DBLP Knowledge Graph. The task is hosted on https://codalab.lisn.upsaclay.fr/competitions/14264.

Task 2: SciQA — Question Answering of Scholarly Knowledge: This new task introduced this year will use a scholarly data source ORKG as a target repository for answering comparative questions. The task is hosted on https://codalab.lisn.upsaclay.fr/competitions/14759.

Dataset

Task 1: DBLP-QUAD — Knowledge Graph Question Answering over DBLP

For this task, participants will use the DBLP-QUAD dataset (https://doi.org/10.5281/zenodo.7643971), see also https://huggingface.co/datasets/awalesushil/DBLP-QuAD, which consists of 10,000 question-SPARQL pairs, and is answerable over the DBLP Knowledge Graph (https://blog.dblp.org/2022/03/02/dblp-in-rdf/) and (https://zenodo.org/record/7638511). A live SPARQL endpoint for the DBLP KG is available at https://dblp-kg.ltdemos.informatik.uni-hamburg.de/sparql .

DBLP is a well-known repository for computer science bibliography and has recently released an RDF dump. This allows users to query it as a knowledge graph. The first subtask is to fetch the right answer from the DBLP KG given the question. The second subtask is entity linking (EL) on the same dataset. The DBLP-QuAD dataset was created using the OVERNIGHT approach, where logical forms are first generated from a KG. Then canonical questions are generated from these logical forms.


{
    "id": "Q0001",
    "query_type": "SINGLE_FACT",
    "question": {
        "string": "Show the Wikidata ID of the person Robert Schober."
    },
    "paraphrased_question": {
        "string": "What is the Wikidata identifier of the author Robert S.?"
    },
    "query": {
        "sparql": "SELECT DISTINCT ?answer WHERE { 
	         <https://dblp.org/pid/95/2265> <https://dblp.org/rdf/schema#wikidata> ?answer 
	          }"
    },
    "template_id": "TC04",
    "entities": [
        "<https://dblp.org/pid/95/2265>"
    ],
    "relations": [
        "<https://dblp.org/rdf/schema#wikidata>"
    ],
    "temporal": false,
    "held_out": false
} 
            

Task 2: SciQA — Question Answering of Scholarly Knowledge

This new task introduced this year will use a scholarly data source ORKG as a target repository for answering comparative questions. KGQA benchmarks and systems were so far mainly geared towards encyclopedic knowledge graphs such as DBpedia and Wikidata.

In this task, we will leverage a novel QA benchmark for scholarly knowledge – SciQA (https://zenodo.org/record/7744048), see also https://huggingface.co/datasets/orkg/SciQA. A live SPARQL endpoint is available at https://orkg.org/SciQA. The benchmark leverages the Open Research Knowledge Graph (ORKG) which includes over 100,000 resources describing complex research contributions. The dataset contains 1,795 train, 257 validation and 513 test questions respectively.


{
    "id": "AQ2193",
    "query_type": "Factoid",
    "question": {
        "string": "Can you provide links to code used in papers that benchmark the 
	           Transformer-XL Base model?"
    },
    "paraphrased_question": [],
    "query": {
        "sparql": "SELECT DISTINCT ?code WHERE { ?model a orkgc:Model;
	                                                rdfs:label ?model_lbl.  
					 FILTER (str(?model_lbl) = "Transformer-XL Base")  
					 ?benchmark      orkgp:HAS_DATASET        ?dataset.  
					 ?cont           orkgp:HAS_BENCHMARK      ?benchmark.  
					 ?cont           orkgp:HAS_MODEL          ?model;                  
					 orkgp:HAS_SOURCE_CODE    ?code.
		  }"
    },
    "template_id": "T07",
    "auto_generated": true,
    "query_shape": "Tree",
    "query_class": "WHICH-WHAT",
    "number_of_patterns": 4
 }
        

Evaluation

For both tasks, we aim to evaluate the participants' approaches using CodaLab. That is, participants upload their solutions to CodaLab and an automated evaluation takes place, on the platform. For Task 1: DBLP, please visit https://codalab.lisn.upsaclay.fr/competitions/14264. For Task 2: SciQA, please visit https://codalab.lisn.upsaclay.fr/competitions/14759. Participants are also required to upload a paper of up to 8 pages limit (excluding references) in the CEUR-WS format on EasyChair by the date mentioned below. Further details regarding evaluation, like procedure to upload system results on CodaLab, and the evaluation scripts, will be made available shortly.

Leaderboard

Here are the standings of the final submissions:


DBLP-QuAD Entity Linking

Team F1
nsteinmetz 0.8353
ruijie.wang 0.7961
hannabiakl-dsti 0.7100
Shreyaar12 0.6235



DBLP-QuAD KGQA

Team F1
ruijie.wang 0.8488
longquanj 0.6619
Shreyaar12 0.2175



SciQA KGQA

Team F1
longquanj 0.9919
tilahun 0.9904
zeio 0.9358

Important Dates

The tentative timeline template for the Scholarly QALD Challenge is as follows. All deadlines are 23:59 AoE (anywhere on earth).
Date Description
2023-03-16 Website & first call for participation ( https://kgqa.github.io/scholarly-QALD-challenge/2023/ )
2023-10-06 Submission of Systems Results
2023-10-13
2023-10-20
Submission of Papers
2023-10-27 Notification of Acceptance
2023-11-03
2023-11-10
Camera-ready submission.
6th to 10th November 2023 ISWC Conference, Visit iswc2023.semanticweb.org