出版时间:2010-5 出版社:哈尔滨工程大学出版社 作者:任伟 页数:212
前言
The problem of semantic video scene eategorisation by using spatio-temporal information is one of the significant open challenges in the field ofvideo retrieval. During the past few years, advances in digital storagetechnology and computer performance have promoted video as a valuableinformation resource.Numerous video retrieval techniques have beensuccessfully developed. Most of the techniques for video indexing andretrieval have extended the previous work in the context image basedretrieval. In this process, video sequences are treated as collections of stillimages. Relevant key-frames are first extracted followed by their indexingusing existing image processing techniques based on low-level features. Forthe research in the book the key question is how to encode the spatial andtemporal information in video for its efficient retrieval. Novel algorithms areproposed for matching videos and are compared them with state-of-the-art.These algorithms take into account image objects and their spatialrelationships, and temporal information within a video which correlates withits semantic class. Also, the algorithms perform hierarchical matchingstarting with frame, and shot level before overall video level similarity can becomputed. The approach, then, is exhaustively tested on the basis ofprecision and recall measures on a large number of queries and use the areaunder the average precision recall curve to compare the methods with those inthe literature. As a part of this book an international video benchmarkMinerva was proposed on which the results have been discussed.
内容概要
本书重点挖掘了视频的时空关系,探索了利用机器学习的方法进行视频切割、语义分类。本书分七章,阐明了图像的各种特性,论述了视频的特征,系统介绍了视频的时空逻辑关系、视频的统计分析方法,研究了如何捕捉视频的时空特性,如何利用人工智能神经网络进行视频切割,如何训练计算机“学会”用人类的思维进行视频语义分类、检索。各章节撰写排列体现了从简到繁、由浅入深、从理论到实际、从技术到系统的特点。 本书可以作为高等学校信号与图像处理、计算机科学、机器学习、人工智能、机器视觉等领域的研究生教材和参考书,也可以作为在这些领域从事相关工作的高级科学技术人员的参考书。
书籍目录
Chapter I Introduction 1.1 Motivation 1.2 Proposed Solution 1.3 Structure of BookChapter lI Approaches to Video Retrieval 2.1 Introduction 2.2 Video Structure and Properties 2.3 Query 2.4 Similarity Metrics 2.5 Performance Evaluation Metrics 2.6 SystemsChapter IlI Spatio-temporai Image and Video Analysis 3.1 Spatio-temporal Information for Video Retrieval 3.2 Spatial Information Modelling in Multimedia Retrieval 3.3 Temporal Model 3.4 Spatio-temporal Information FusionChapter IV Video Spatio-temporal Analysis and Retrieval (VSTAR). A New Model 4.1 VSTAR Model Components 4.2 Spatial Image Analysis 4.3 A Model for the Temporal Analysis of Image Sequences 4.4 Video Representation, Indexing, and Retrieval Using VSTAR 4.5 Conclusions Chapter V Two Comparison Baseline Models for Video Retrieval 5.1 Baseline Models 5.2 Adjeroh et al. (1999) Sequences Matching--Video Retrieval Model 5.3 Kim and Park (2002a) data set matchingmVideo Retrieval Model Chapter VI Spatio-temporal Video RetrievalmExperiments and Results 6.1 Purpose of Experiments 6.2 Data Description 6.3 Spatial and Temporal Feature Extraction 6.4 Video Retriewd Models: Procedure for Parameter Optimisation 6.5 Video Retrieval Models:Results on Parameter Optimisation 6.6 Comparison of Four Models 6.7 Model Robustness (Noise) 6.8 Computational Complexity 6.9 ConclusionsChapter VII Conclusions 7.1 Reflections on the book as a whole……Reference
章节摘录
插图:In Ioka and Kurokawa (1992), the user is allowed to specify a query by drawing amotion trajectory. The similarity is computed as the Euclidean distance between thequery vector and the stored vector for each given interval to match the specifiedtrajectory with the trajectories of the sequences in the database.3.3.2.2 Correlation Based ComparisonThis approach is based on finding the maximum correlation between tile predictorand the current one, for gesture recognition to identify actions. Martin and Shah (1992)used dense optical flow fields over a region, and computed correlation between differentsequences for matching. In Campbell and Bobick' s (1995) work on gesturerecognition, the learning/training process is accomplished by fitting the unique curve ofa gesture into the subset of the phase space with low-order polynomials.Rui and Anandan (2000) addressed the problem of detecting action boundaries ina video sequence containing unfamiliar and arbitrary visual actions. Their approach wasbased on detecting temporal discontinuities of the spatial pattern of object region motionwhich correspond to the action temporal boundary to capture the action. Theyrepresented frame-to-frame optical flow in terms of the coefficients calculated from all ofthe flow fields in a sequence, after principal components analysis to determine the mostsignificant such flow fields. The temporal trajectories of those coefficients of the flowfield are analysed to determine locations of the action segment boundaries of videoobjects.
编辑推荐
《时空视频检索(英文版)》:学者书屋系列
图书封面
评论、评分、阅读与下载