Informedia @ TRECVID 2010

Huan Li; Lei Bao; Zan Gao; Arnold Overwijk; Wei Liu; Long Fei Zhang; Shoou I. Yu; Ming Yu Chen; Florian Metze; Alexander Hauptmann

Informedia @ TRECVID 2010

Huan Li, Lei Bao, Zan Gao, Arnold Overwijk, Wei Liu, Long Fei Zhang, Shoou I. Yu, Ming Yu Chen, Florian Metze, Alexander Hauptmann

School of Computer Science and Technology

Research output: Contribution to conference › Paper › peer-review

9 Citations (Scopus)

Abstract

The Informedia group participated in four tasks this year, including Semantic indexing, Known-item search, Surveillance event detection and Event detection in Internet multimedia pilot. For semantic indexing, except for training traditional SVM classifiers for each high level feature by using different low level features, a kind of cascade classifier was trained which including four layers with different visual features respectively. For Known Item Search task, we built a text-based video retrieval and a visual-based video retrieval system, and then query-class dependent late fusion was used to combine the runs from these two systems. For surveillance event detection, we especially put our focus on analyzing motions and human in videos. We detected the events by three channels. Firstly, we adopted a robust new descriptor called MoSIFT, which explicitly encodes appearance features together with motion information. And then we trained event classifiers in sliding windows using a bag-of-video-word approach. Secondly, we used the human detection and tracking algorithms to detect and track the regions of human, and then just focus on the MoSIFT points in the human regions. Thirdly, after getting the decision, we also borrow the results of human detection to filter the decision. In addition, to reduce the number of false alarms further, we aggregated short positive windows to favor long segmentation and applied a cascade classifier approach. The performance shows dramatic improvement over last year on the event detection task. For event detection in internet multimedia pilot, our system is purely based on textual information in the form of Automatic Speech Recognition (ASR) and Optical Character Recognition (OCR).We submitted three runs; a run based on a simple combination of three different ASR transcripts, a run based on OCR only and a run that combines ASR and OCR. We noticed that both ASR and OCR contribute to the goals of this task. However the video collection is very challenging for those features, resulting in a low recall but high precision.

Original language	English
Publication status	Published - 2010
Event	TREC Video Retrieval Evaluation, TRECVID 2010 - Gaithersburg, MD, United States Duration: 15 Nov 2010 → 17 Nov 2010

Conference

Conference	TREC Video Retrieval Evaluation, TRECVID 2010
Country/Territory	United States
City	Gaithersburg, MD
Period	15/11/10 → 17/11/10

Cite this

@conference{c7716c7da8d14382a155dfdafc4a7ce7,

title = "Informedia @ TRECVID 2010",

abstract = "The Informedia group participated in four tasks this year, including Semantic indexing, Known-item search, Surveillance event detection and Event detection in Internet multimedia pilot. For semantic indexing, except for training traditional SVM classifiers for each high level feature by using different low level features, a kind of cascade classifier was trained which including four layers with different visual features respectively. For Known Item Search task, we built a text-based video retrieval and a visual-based video retrieval system, and then query-class dependent late fusion was used to combine the runs from these two systems. For surveillance event detection, we especially put our focus on analyzing motions and human in videos. We detected the events by three channels. Firstly, we adopted a robust new descriptor called MoSIFT, which explicitly encodes appearance features together with motion information. And then we trained event classifiers in sliding windows using a bag-of-video-word approach. Secondly, we used the human detection and tracking algorithms to detect and track the regions of human, and then just focus on the MoSIFT points in the human regions. Thirdly, after getting the decision, we also borrow the results of human detection to filter the decision. In addition, to reduce the number of false alarms further, we aggregated short positive windows to favor long segmentation and applied a cascade classifier approach. The performance shows dramatic improvement over last year on the event detection task. For event detection in internet multimedia pilot, our system is purely based on textual information in the form of Automatic Speech Recognition (ASR) and Optical Character Recognition (OCR).We submitted three runs; a run based on a simple combination of three different ASR transcripts, a run based on OCR only and a run that combines ASR and OCR. We noticed that both ASR and OCR contribute to the goals of this task. However the video collection is very challenging for those features, resulting in a low recall but high precision.",

author = "Huan Li and Lei Bao and Zan Gao and Arnold Overwijk and Wei Liu and Zhang, {Long Fei} and Yu, {Shoou I.} and Chen, {Ming Yu} and Florian Metze and Alexander Hauptmann",

year = "2010",

language = "English",

note = "TREC Video Retrieval Evaluation, TRECVID 2010 ; Conference date: 15-11-2010 Through 17-11-2010",

}

TY - CONF

T1 - Informedia @ TRECVID 2010

AU - Li, Huan

AU - Bao, Lei

AU - Gao, Zan

AU - Overwijk, Arnold

AU - Liu, Wei

AU - Zhang, Long Fei

AU - Yu, Shoou I.

AU - Chen, Ming Yu

AU - Metze, Florian

AU - Hauptmann, Alexander

PY - 2010

Y1 - 2010

N2 - The Informedia group participated in four tasks this year, including Semantic indexing, Known-item search, Surveillance event detection and Event detection in Internet multimedia pilot. For semantic indexing, except for training traditional SVM classifiers for each high level feature by using different low level features, a kind of cascade classifier was trained which including four layers with different visual features respectively. For Known Item Search task, we built a text-based video retrieval and a visual-based video retrieval system, and then query-class dependent late fusion was used to combine the runs from these two systems. For surveillance event detection, we especially put our focus on analyzing motions and human in videos. We detected the events by three channels. Firstly, we adopted a robust new descriptor called MoSIFT, which explicitly encodes appearance features together with motion information. And then we trained event classifiers in sliding windows using a bag-of-video-word approach. Secondly, we used the human detection and tracking algorithms to detect and track the regions of human, and then just focus on the MoSIFT points in the human regions. Thirdly, after getting the decision, we also borrow the results of human detection to filter the decision. In addition, to reduce the number of false alarms further, we aggregated short positive windows to favor long segmentation and applied a cascade classifier approach. The performance shows dramatic improvement over last year on the event detection task. For event detection in internet multimedia pilot, our system is purely based on textual information in the form of Automatic Speech Recognition (ASR) and Optical Character Recognition (OCR).We submitted three runs; a run based on a simple combination of three different ASR transcripts, a run based on OCR only and a run that combines ASR and OCR. We noticed that both ASR and OCR contribute to the goals of this task. However the video collection is very challenging for those features, resulting in a low recall but high precision.

AB - The Informedia group participated in four tasks this year, including Semantic indexing, Known-item search, Surveillance event detection and Event detection in Internet multimedia pilot. For semantic indexing, except for training traditional SVM classifiers for each high level feature by using different low level features, a kind of cascade classifier was trained which including four layers with different visual features respectively. For Known Item Search task, we built a text-based video retrieval and a visual-based video retrieval system, and then query-class dependent late fusion was used to combine the runs from these two systems. For surveillance event detection, we especially put our focus on analyzing motions and human in videos. We detected the events by three channels. Firstly, we adopted a robust new descriptor called MoSIFT, which explicitly encodes appearance features together with motion information. And then we trained event classifiers in sliding windows using a bag-of-video-word approach. Secondly, we used the human detection and tracking algorithms to detect and track the regions of human, and then just focus on the MoSIFT points in the human regions. Thirdly, after getting the decision, we also borrow the results of human detection to filter the decision. In addition, to reduce the number of false alarms further, we aggregated short positive windows to favor long segmentation and applied a cascade classifier approach. The performance shows dramatic improvement over last year on the event detection task. For event detection in internet multimedia pilot, our system is purely based on textual information in the form of Automatic Speech Recognition (ASR) and Optical Character Recognition (OCR).We submitted three runs; a run based on a simple combination of three different ASR transcripts, a run based on OCR only and a run that combines ASR and OCR. We noticed that both ASR and OCR contribute to the goals of this task. However the video collection is very challenging for those features, resulting in a low recall but high precision.

UR - http://www.scopus.com/inward/record.url?scp=84905186113&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:84905186113

T2 - TREC Video Retrieval Evaluation, TRECVID 2010

Y2 - 15 November 2010 through 17 November 2010

ER -

Informedia @ TRECVID 2010

Abstract

Conference

Other files and links

Fingerprint

Cite this