Informedia @ TRECVID 2010

Huan Li, Lei Bao, Zan Gao, Arnold Overwijk, Wei Liu, Long Fei Zhang, Shoou I. Yu, Ming Yu Chen, Florian Metze, Alexander Hauptmann

Research output: Contribution to conferencePaperpeer-review

9 Citations (Scopus)

Abstract

The Informedia group participated in four tasks this year, including Semantic indexing, Known-item search, Surveillance event detection and Event detection in Internet multimedia pilot. For semantic indexing, except for training traditional SVM classifiers for each high level feature by using different low level features, a kind of cascade classifier was trained which including four layers with different visual features respectively. For Known Item Search task, we built a text-based video retrieval and a visual-based video retrieval system, and then query-class dependent late fusion was used to combine the runs from these two systems. For surveillance event detection, we especially put our focus on analyzing motions and human in videos. We detected the events by three channels. Firstly, we adopted a robust new descriptor called MoSIFT, which explicitly encodes appearance features together with motion information. And then we trained event classifiers in sliding windows using a bag-of-video-word approach. Secondly, we used the human detection and tracking algorithms to detect and track the regions of human, and then just focus on the MoSIFT points in the human regions. Thirdly, after getting the decision, we also borrow the results of human detection to filter the decision. In addition, to reduce the number of false alarms further, we aggregated short positive windows to favor long segmentation and applied a cascade classifier approach. The performance shows dramatic improvement over last year on the event detection task. For event detection in internet multimedia pilot, our system is purely based on textual information in the form of Automatic Speech Recognition (ASR) and Optical Character Recognition (OCR).We submitted three runs; a run based on a simple combination of three different ASR transcripts, a run based on OCR only and a run that combines ASR and OCR. We noticed that both ASR and OCR contribute to the goals of this task. However the video collection is very challenging for those features, resulting in a low recall but high precision.

Original languageEnglish
Publication statusPublished - 2010
EventTREC Video Retrieval Evaluation, TRECVID 2010 - Gaithersburg, MD, United States
Duration: 15 Nov 201017 Nov 2010

Conference

ConferenceTREC Video Retrieval Evaluation, TRECVID 2010
Country/TerritoryUnited States
CityGaithersburg, MD
Period15/11/1017/11/10

Fingerprint

Dive into the research topics of 'Informedia @ TRECVID 2010'. Together they form a unique fingerprint.

Cite this