Narod.ru Web CollectionDescriptionThe collection consists of a pseudorandom selection of about 3% of web sites hosted in Russia by the national free hosting provider narod.ru. Non-HTML documents and pages built with use of the standard templates provided by narod.ru were excluded from the collection. In relation to the whole Russian segment of the Web the size of the collection consists about 0.12-0.30%. Dataset Parameters
Rights to Use
Rights to use the Narod.ru collection are granted by
Data FormatThe collection is distributed in xml files of a certain format. These files are split into two groups: narod.* and narod_training.*. Files from the second group contain documents which were used as a training set in the track of Web page classification. Tracks in Which the Collection Was Used |