Legal Documents Collection 2007
RIRES: Russian Information Retrieval Evaluation Seminar

 Call for participation 
 General principles 
 Test collections 
 Relevance tables 


Legal Documents Collection 2007


This collection is created and provided by Kodeks in 2007.

It consists of documents from the legislation of Russian Federation, Moscow and St.Petersburg by the state on the second week of December, 2006. The collection contains HTML documents and unlike the Web collections is much more uniform.


  • Title of document is inserted into the title field of document content
  • Formating of documents is made by styles, which are not included
  • Tags Hx are not used in the text of documents.
    (If you want to detect headers you need to analyze tags P for which value of class attribute is "headertext".)

Dataset Parameters
  • Size of HTML data (bz2 archives): 1.6 Gb
  • Number of pages: 300 000
  • Encoding: cp1251
Rights to Use

The rights to use are granted to ROMIP by Kodeks, which is the owner of the collection. To get access to the collection you must sign the usage agreement (in Russian).

Data Format

The collection is distributed in xml files of a certain format.

Tracks in Which the Collection Was Used
  • Ad hoc search in a collection of legal documents
  • Ad hoc search in a mixed collection
  • Classification of legal documents