Web site Classification Track
RIRES: Russian Information Retrieval Evaluation Seminar

 News 
 About 
 Manifesto 
 Call for participation 
 General principles 
 Participation 
 Tracks 
 Participants 
 Test collections 
 Publications 
 Relevance tables 
 History 
 2004 
 2005 
 Forum 

По-русскиПо-русски
 

Web site Classification Track

Overview

The purpose of this track is to evaluate methods of Web site topic classification.

For this track the standard procedure is used.

Test Collection

The source dataset consists of Narod.ru and DMOZ collections. The latter is used as a training set.

Task Description for Participating Systems

Each participant is granted access to the training set, DMOZ collection, a set of web sites (not separate web pages!) from the Narod.ru collection. The task is to assign topics from training set to each web to each web site. Valid number of topics per site is from 0 to 5. Topics should be returned as an ordered list for each web site.

The training set is based on a subset of the Russian-language categories from the DMOZ catalog.

It's suggested that all the sites from the narod.ru collection are classified by participants, i.e. not only narod.*, but also narod_training.* archives.

The training set contains also several sites from the narod.ru domain. These sites will be excluded from the results during evaluation.

Evaluation Methodology

  • instructions for assessors: evaluate web-site relevance to categories basing on the extended category descriptions
  • relevance scale:
    • yes / probably yes / perhaps yes / no / impossible to evaluate
    • yes / no / impossible to evaluate
  • official metrics:
    • precision
    • recall

Data Formats