ROMIP Test Collections
We prepared the following collections for evaluation of participating systems:
Narod.ru Web collection
It is a pseudorandom selection of web sites from the domain narod.ru
(narod.ru is a national free hosting provider in Russia).
The collection consists of 728 000 documents.
KM.ru Web collection 2007 (NEW)
KM.ru collection is a copy of www.km.ru multiportal. It consists of about 3 000 000 documents.
BY.web collection 2007 (NEW)
It is a subset of pages from the .by domain which were present in the index of Yandex on May, 2007.
DMOZ Web collection
Collection based on the Russian-language section of the
This collection is used as a training set in classification of
Web sites and Web pages tracks.
Legal documents collection 2004
Collection of documents from the Russian Federation legislation built in 2004. It consists of 61 000 documens.
Legal documents collection 2007 (NEW)
Collection of documents from the Russian Federation legislation built in 2007. It consists of 300 000 documens.
A set of news reports from 25 different sources covering three non-overlapping time intervals. The size of this collection is about 31 500 documents.