Question Answering Track
RIRES: Russian Information Retrieval Evaluation Seminar

 Call for participation 
 General principles 
 Test collections 
 Relevance tables 


Question Answering Track


This track is dedicated to retrieval of answers to well-formed natural language questions.

Test Collection

The source dataset is collection.
Documents from all the archives narod.* and narod_training.* must be searched.

Task Description for Participating Systems

Each participant is granted access to collection and a set of queries.

The queries used in the evaluation are selected randomly from the set of Russian language questions proposed by the participants and the organizers. The following types of questions are accepted:

  • Questions to the attribute or to the subject:
    • What is ...? (What is anaphoresis?)
    • Who is ...? (Who is Nabokov?)
    • Who did ...? (Who invented bicycle?)
    • What/which ...? (Which country won the soccer championship?)
  • Questions to the direct object:
    • What did ... do? (What did Edison invent?)
  • Questions to the adverbial modifier:
    • How many/much ...? (How many people live in Moscow?)
    • What is the size/length/height/area of ...?
    • When? What day? What month? What year? How long?
      (What year did the house burn down?)
    • Where to? To which country/city? To which continent?
      (Where was the cargo sent to on May 18th?)
    • Where from? From which country/city? (From which country did the cargo come?)
    • Where? In which country/city? On which continent?
      (In which city is the Eiffel tower located?)
    • Why? (Why was the alarm activated?)
    • How? (How to remove a stain from carpet?)
  • Questions to the indirect object:
    • Preposition + <what> (Of what does water consist?)
    • What/which + <word with known semantics>?
    • What/which + <word with unknown semantics>?

Participants obtain the tasks for a very short period of time (one day).

Expected result is an ordered list of not more than 10 "answers" for each query. Each answer must be supplied with an URL of a document where the answer was found and a plain text snippet of the document not longer than 300 characters containing the answer.

Task Collection

The task collection is built in four stages according to the following schedule:

  • May 23th - each participant proposes his definition of "correct" questions with 5-10 examples
  • May 27th - final definition; the overall list of questions is formed
  • June 10th - participants send 200 questions to the organizers. From each group of questions 50 are filtered out so that the same number of questions from each participant is accepted.
  • June 15th - final query set (total of 500 questions)

Evaluation Methodology

  • number of questions: 500
  • instructions for assessors:
    Assessor looks through the snippets with answers and the documents where they were found and tries to answer the following questions:
    • Does the snippet contain an answer to the question?
    • Having seen only the snippet do you think it is likely that the document contains an answer to the question?
    • Does the document contain an answer to the question?
    Assessor formulates also a "correct" answer ("key criterion").
  • evaluation method: pooling (pool depth is 50)
  • relevance scale:
    • snippet contains an answer/document probably contains an answer/document contains an answer/no answer/impossible to evaluate
  • official metrics:
    • precision
    • recall

Data Formats

услуги синхронного перевода