Near Duplicates Detection Track
RIRES: Russian Information Retrieval Evaluation Seminar

 Call for participation 
 General principles 
 Test collections 
 Relevance tables 


Near Duplicates Detection Track


The purpose of this track is to evaluate content-based methods for detecting near duplicates in image collections. Notion of near duplicates involves images of exactly the same scene or object taken in different conditions. These conditions may differ in zooming, focus levels, illumination, foreground occlusions, view points.

This task differs from the common one of transformed image recognition. In the latter synthetic datasets are usually generated for evaluation, in which several images are processed automatically using various image transforms to get a set of duplicate images. We provide a collection of natural near duplicates. Examples of images treated as near duplicates are below.

For this track the standard procedure is used.

Test Collection

The source dataset is Yandex image collection.

Task Description for Participating Systems

Participants are to find all groups of near-duplicates in the provided data collection.

Participants are required to submit the lists of images which form groups of near-duplicates.

Evaluation Methodology

The evaluation will be performed in the several steps. First, intersections of all groups provided by different participants will be found and sorted by their size and by the number of participants who detected the intersected groups. Then the following subsets for evaluation will be created. The first subset consists of images from the groups that form first N largest intersections. The second subset is the images from the first K largest groups for every participant with the smallest rate of intersection. Both subsets will be evaluated by assessors, who will mark the real near duplicates there. The binary classification will be used to judge images:

  • belongs to the group of near duplicates;
  • doesn't belong to the group of near duplicates.

When in the same group several sets of near duplicates are found by assessors, the biggest one is considered to be a set for entire group.

Data Formats

  • collections
  • tasks
  • results