As in previous years, VBS 2024 will be part of the International Conference on MultiMedia Modeling 2024 (MMM 2024) in Amsterdam, The Netherlands, and organized as a special side event to the Welcome Reception. It will be a moderated session where participants solve Known-Item Search (KIS), Ad-Hoc Video Search (AVS), and Question Answering (Q/A) tasks that are issued as live presentation of scenes of interest, either as a visual clip, or as a textual description. The goal is to find correct segments (for KIS exactly one segment, for AVS many segments) or the correct answer (for Q/A tasks) as fast as possible and submit the answer (for KIS and AVS: segment description – video id and frame number) to the VBS server (DRES), which evaluates the correctness of submissions.
Changes to Previous Years
We plan several changes to make the VBS competition even more challenging since visual content recognition has significantly improved over the last few years. These changes will include:
- introduction of Question Answering (Q/A) tasks, where not a segment but a particular answer needs to be submitted as text.
- tasks in sub-datasets: marine videos and medical videos.
VBS2024 will use the V3C1+V3C2 dataset (from the Vimeo Creative Commons Collection) in collaboration with NIST, i.e., TRECVID 2023 (i.e., with the Ad-Hoc Video Search (AVS) Task), as well as the marine video (underwater/scuba diving) dataset and the LapGyn100 dataset (surgeries in laparoscopic gynecology). V3C1 consists of 7,475 video files, amounting to 1,000 hours of video content (1,082,659 predefined segments) and 1.3 TB in size, and was also used in previous years. V3C2 contains additional 9,760 video files, amounting to 1,300 hours of video content (1,425,454 predefined segments) and 1.6 TB in size. In order to download the dataset (provided by NIST), please complete this data agreement form and send a scan to email@example.com with CC to firstname.lastname@example.org and email@example.com. There will be a link for downloading the data.
New since 2023: The marine video (underwater) dataset has been provided by Prof. Sai-Kit Yeung (many thanks!) and can be downloaded directly from this website (please contact Tan Sang Ha for the username and password). A snapshot from 2023 is available at https://download-dbis.dmi.unibas.ch/mvk/.
New in 2024: The LapGyn100 dataset (laparoscopic gynecology) has been provided by Prof. Jörg Keckstein (many thanks!) and will be available for download shortly. Please contact Klaus Schoeffmann for the username and password), to sign a data agreement form.
AVS, KIS, and Q/A Tasks for Experts and Novices
We plan to test:
- about 10 AVS (ad-hoc video search) tasks. Each AVS task is described via text and has several/many target segments that should be found as fast as possible.
- about 10 KIS (known-item search) tasks. Each KIS task is described either by a video clip or text and has only one target segment.
- about 10 Q/A tasks. Each Q/A task is described via text and needs a textual answer (typically manually entered and sent to the server).
We will also split the test evenly between the expert and novice sessions. In the novice session, volunteers from the audience will be recruited to solve tasks with the search system. Hence, the participating systems should be easy to use.
VBS Server and Testing
The VBS uses the Distributed Retrieval Evaluation Server (DRES) to evaluate found segments for correctness. DRES will run as a public service on a server at Klagenfurt University, Austria. Participants can submit found segments to the server via a simple HTTP-like protocol described in the Client Examples. The server is connected to a projector on-site and presents the current score of all teams lively (in addition to presenting task descriptions). The server and example tasks from the previous years are provided here; for Angular systems, another description is provided here.
Anyone with an exploratory video search tool that allows for retrieval, interactive browsing, and exploration in a video collection may participate.
There are no restrictions regarding allowed features, except for presentation screen recording during the competition, which is disallowed. That means in addition to interactive content search you can use any automatic content search as well.
Participants are expected to implement the functionality to send results and logs to the VBS Server via a REST API.
Available Analysis Results
In order to give new teams an easy entry, we provide results of content analysis to all teams. The V3C1 and V3C2 datasets already come with segmentation information and include shot boundaries and keyframes. Moreover, we provide resulting data for V3C1 and V3C2 from different content analysis steps (e.g., color, faces, text, detected ImageNet classes, etc.). The analysis data is available here and described in this article and this one for V3C2. Also, the ASR data has been released here (many thanks to Luca Rossetto et al.)!
Shot Boundary Detection
Moreover, the SIRET team shared their shot detection network (TransNet) as well as TransNet V2 in https://github.com/soCzech/TransNetV2 (see paper here: https://arxiv.org/pdf/2008.04838.pdf). Many thanks to Jakub Lokoc and his team!
For teams that want to join the VBS competition but need more resources to build a new system from scratch, starting with a simple, lightweight version of SOMHunter, the winning system at VBS 2020 is possible. The system is provided with all the necessary metadata for the V3C1 dataset. https://github.com/siret/somhunter
Providing a solid basis for research and development in the area of multimedia management retrieval, vitrivr (overall winner of VBS 2021) is a modular open-source multimedia retrieval stack that has been participating in VBS for several years. Its flexible architecture allows it to serve as a platform for developing new retrieval approaches. The entire stack is available from https://vitrivr.org/
To participate, please submit an extended demo paper (6+2 pages in Springer LNCS format, where the 2 pages may be used for references) by the deadline via the MMM 2024 Submission System (please select „Video Browser Showdown“ track). The submission should include a detailed description of the video search tool (including a screenshot) and how it supports interactive search in video data. Submissions will be peer-reviewed to ensure maximum quality. Accepted papers will be published in the proceedings of the MMM conference. In the public VBS session, each system needs to be presented (typically as a concise introductory video, or sometimes as a poster – but this will depend on the local situation and announced a few weeks before the competition).
We plan to write a joint journal paper after the VBS competition, to which each participating team should contribute to. The winning team will be honored to be in charge of the journal paper (as the main author).