Narod.ru Web Collection
The collection consists of a pseudorandom selection of about 3% of web sites hosted in Russia by the national free hosting provider narod.ru. Non-HTML documents and pages built with use of the standard templates provided by narod.ru were excluded from the collection. In relation to the whole Russian segment of the Web the size of the collection consists about 0.12-0.30%.
Rights to Use
The collection is distributed in xml files of a certain format. These files are split into two groups: narod.* and narod_training.*. Files from the second group contain documents which were used as a training set in the track of Web page classification.
Tracks in Which the Collection Was Used