We prepared the following collections for evaluation of participating systems:
Narod.ru Web collection
It is a pseudorandom selection of web sites from the domain narod.ru
(narod.ru is a national free hosting provider in Russia).
The collection consists of 728 000 documents.
DMOZ Web collection
Collection based on the Russian-language section of the
dmoz.org catalog.
This collection is used as a training set in classification of
Web sites and Web pages tracks.
Legal documents collection 2004
Collection of documents from the Russian Federation legislation built in 2004. It consists of 61 000 documens.
Legal documents collection 2007 (NEW)
Collection of documents from the Russian Federation legislation built in 2007. It consists of 300 000 documens.
News collection
A set of news reports from 25 different sources covering three non-overlapping time intervals. The size of this collection is about 31 500 documents.