Legal Documents Collection 2004
RIRES: Russian Information Retrieval Evaluation Seminar

 Call for participation 
 General principles 
 Test collections 
 Relevance tables 


Legal Documents Collection 2004


This collection consists of documents from the Russian Federation legislation and is provided by Kodeks. It contains HTML documents and unlike the Web collections is much more uniform.

Dataset Parameters
  • Size of HTML data: 1.6 Gb
  • Number of pages: 61 000
  • Encoding: cp1251
Rights to Use

The rights to use are granted to ROMIP by Kodeks, which is the owner of the collection. To get access to the collection you must sign the usage agreement (in Russian).

Data Format

The collection is distributed in xml files of a certain format. These files are split into two groups: legal.* and legal_training.*. Files from the second group contain documents which were used as a training set in the track of legal documents classification.

Tracks in Which the Collection Was Used
  • Ad hoc search in a collection of legal documents
  • Ad hoc search in a mixed collection
  • Similar documents search
  • Classification of legal documents
  • Query-biased summarization