As in previous years, VBS2023 will be part of the International Conference on MultiMedia Modeling 2023 (MMM2023) in Bergen, Norway, and organized as a special side event to the Welcome Reception. It will be a moderated session where participants solve Known-Item Search (KIS) and Ad-Hoc Video Search (AVS) tasks that are issued as live presentation of scenes of interest, either as a visual clip, or as a textual description. The goal is to find correct segments (for KIS exactly one segment, for AVS many segments) as fast as possible and submit the segment description (video id and frame number) to the VBS server, which evaluates the correctness of submissions.

Changes to Previous Years

We plan several changes to make the VBS competition even more challenging, since visual content recognition has significantly improved over the last few years. These changes will include:

  • smaller target scenes for KIS tasks (e.g., only 3 seconds, instead of 20)
  • a dedicated session issuing tasks in a sub-datasets with highly redundant content (from scuba diving)
  • to be discussed: prohibition of any automatic submission – every submission must be manually clicked


VBS2023 will use the V3C1+V3C2 dataset (from the Vimeo Creative Commons Collection) in collaboration with NIST, i.e. TRECVID 2022 (i.e. with the Ad-Hoc Video Search (AVS) Task), as well as marine video (underwater/scuba diving) dataset. V3C1 consists of 7475 video files, amounting for 1000 hours of video content (1082659 predefined segments) and 1.3 TB in size and was also used in previous years. V3C2 contains additional 9760 video files, amounting for 1300 hours of video content (1425454 predefined segments) and 1.6 TB in size. In order to download the dataset (which is provided by NIST), please complete this data agreement form and send a scan to with CC to and You will be provided with a link for downloading the data.

New: The marine video (underwater) dataset has been provided by Prof. Sai-Kit Yeung (many thanks!) and can be downloaded directly from this website (please contact Tan Sang Ha for the username and password). For VBS 2023 we will the snapshot that is available at

AVS and KIS Tasks

We plan to test at least 20 search tasks:

  • 10 AVS tasks, randomly selected in collaboration with TRECVID AVS. Each AVS task has several/many target shots that should be found.
  • 10 KIS tasks, which are selected completely random on site. Each KIS task has only one target segment.
  • New: There will be a dedicated session that tests tasks issued for the scuba-diving dataset, so that VBS systems internally may switch to this dataset exclusively (without V3C1+V3C2). 

VBS Server and Testing

The VBS uses its own Distributed Retrieval Evaluation Server (DRES) to evaluate found segments for correctness. In case VBS 2023 will be a virtual/hybrid event, DRES will run as a public service on the Internet. Participants can submit found segments to the server via a simple HTTP-like protocol, as described in the Client Examples. The server is connected to a projector on-site and presents the current score of all teams in a live manner (in addition to presenting task descriptions). The server as well as example tasks from the previous years are provided here.


Anyone with an exploratory video search tool that allows for retrieval, interactive browsing, exploration in a video collection may participate.
There are no restrictions in terms of allowed features, except for presentation screen recording during the competition, which is disallowed. That means in addition to interactive content search you can use any automatic content search as well.

Available Analysis Results

In order to give new teams an easy entry, we provide results of content analysis to all teams. The V3C1 and V3C2 datasets already come with segmentation information and include shot boundaries as well as keyframes. Moreover, we provide resulting data for V3C1 and V3C2 from different content analysis steps (e.g., color, faces, text, detected ImageNet classes, etc.). The analysis data is available here and described in this article and this one for V3C2. Also the ASR data has been released here (many thanks to Luca Rossetto et al.)!

Shot Boundary Detection

Moreover, the SIRET team shared their shot detection network (TransNet) as well as TransNet V2 in (see paper here: Many thanks to Jakub Lokoc ans his team!

Existing Tools

If you want to join the VBS competition but do not have enough resources to build a new system from scratch, you can start with and extend a simple lightweight version of SOMHunter, the winning system at VBS 2020. The system is provided with all the necessary metadata for the V3C1 dataset.

Providing a solid basis for research and development in the area of multimedia management retrieval, vitrivr (overall winner of VBS 2021) is a modular open-source multimedia retrieval stack which has been participating to VBS for several years. It’s flexible architecture allows it to serve as a platform for the development of new retrieval approaches. The entire stack is available from

Submission Instructions

To participate please submit an extended demo paper (4-6 pages in Springer LNCS format) until the deadline via the MMM 2023 Submission System (please select „Video Browser Showdown“ track). The submission should include a detailed description of the video search tool (including a screenshot of the tool) and show describe how it supports interactive search in video data. Submissions will be peer-reviewed to ensure maximum quality. Accepted papers will be published in the proceedings of the MMM conference. In the public VBS session each system needs to be presented (typically as a very short introductory video, or sometimes as a poster – but this will depend on the local situation and announced a few weeks before the competition).

Journal Paper

We plan to write a joint journal paper after the VBS competition, where each participating team should contribute to. The winning team will be honored to be in charge of the journal paper (as the main author).

Video Browser Showdown - The Video Retrieval Competition