Date: 9/19/19, 9 am to 5 pm

Location: MIT, Cambridge MA (Room TBD)

Go to Registration

QA Challenge

This year at KR2ML, we are organizing a shared task inspired by the challenges encountered when working with real-world enterprise data. End-users interact with AI systems via a natural language interface – customer support, chatbots, enterprise search, etc. The end-users are not aware of the internal knowledge representation employed by the AI system and often lack the expertise to issue structured queries to the underlying knowledge base.

To this end, we propose a task to automatically translate a natural language query to a structured representation (SPARQL) that can be used to query the underlying knowledge base. We will build upon the series of QALD challenges and release a set of training queries – natural language queries and their corresponding SPARQL queries. The set of questions will be a subset of previous QALD challenges and will be representative of different real-world issues commonly encountered in such applications (noise, ambiguous language, errors in the knowledge base, etc.).

The underlying knowledge base will be DBpedia (2016-10 version), and Wikipedia as the optional text corpus. The participants may use the information present in the structured knowledge base and the unstructured text corpus to understand the natural language query and output the structured representation. Most of the existing systems for translating natural language queries to structured representation often rely on a set of rules or templates. As a result, such systems are very brittle and fail to perform satisfactorily when faced with even slight variations of the query. To overcome that, multiple paraphrasing/variants of the same question were generated to test how sensitive the algorithms are to such changes.

Examples

Q1: Give me the currency of China.
Q1a: What is the name of currency used in China?

Q2: When did Latvia join the EU?
Q2a: When did Latvia become part of the EU?

Training Data

The training data is available for download now.

There are 5 tab separated columns in the file:

ID: question id
Original Question: original text of the question
Paraphrasing 1 : paraphrasing of the questions by annotator 1
Paraphrasing 2 : paraphrasing of the questions by annotator 2
SPARQL Query: SPARQL query for the original question

Test Data & Evaluation

The test data is available for download now.

The test file is a tab separated file with the following columns:

ID: question id
Question: a natural language question

You will be required to produce the SPARQL query that can be used to answer the question from DBpedia corpus specified above.

You are required to submit the output produced by your system in a single tab separated file with three columns:

ID: question id
Question: a natural language question
Output SPARQL: SPARQL query as produced by your system

Note that each line corresponds to one natural language question. Please make sure that there are no line-breaks in the SPARQL query.

The systems will be evaluated based on the accuracy of retrieved answers via the generated structured representations. Participants should report their train and test accuracies in their presentations/posters during the workshop.

Technical Report

You are also required to submit a brief description of your approach in the form of a technical report/arXiv paper. The paper should provide sufficient details for the readers to understand and replicate your approach.

Deadline

The deadline for submission of results is 10th September, 2019.

The results will be announced during the challenge session in the workshop.

Contact

Please email your submissions, and direct any questions to sumitbhatia@in.ibm.com.