Web page Classification Track
RIRES: Russian Information Retrieval Evaluation Seminar

 News 
 About 
 Manifesto 
 Call for participation 
 General principles 
 Participation 
 Tracks 
 Participants 
 Test collections 
 Publications 
 Relevance tables 
 History 
 2004 
 2005 
 Forum 

По-русскиПо-русски
 

Web page Classification Track

Overview

The purpose of this track is to evaluate methods of Web page topic classification.

For this track the standard procedure is used.

Test Collection

The source dataset consists of Narod.ru and DMOZ collections. The latter is used as a training set.

The training set consists of web sites, but still different classes can be assigned to pages from the same site.

Task Description for Participating Systems

Just as for the web site classification track each participant is granted access to the training set, DMOZ and Narod.ru collections. The task is to assign a topic from the training set to each web page. The difference to the web site classification track is that web sites are used only for training.

Expected result is an ordered list of web pages for each category.

All the documents from the Narod.ru collection should be classified by each participant, i.e. not only documents from the archives narod.*, but also from narod_training.*.

Evaluation Methodology

  • instructions for assessors: evaluate web-page relevance to categories basing on the extended category descriptions
  • relevance scale:
    • yes / probably yes / perhaps yes / no / impossible to evaluate
    • yes / no / impossible to evaluate
  • official metrics:
    • precision
    • recall

Data Formats