VideoRec'07 workshop will be held as a Special
Session of the CRV'07 conference
on the second day of the conference on Tuesday 13:00 - 18:30
Tuesday 15:30 - 18:30 (Maisonneuve - 36th Floor)
Abstract: This paper reports on the implementation of a GPU-based,
real-time eye blink detector on very low contrast images
acquired under near-infrared illumination. This detector
is part of a multi-sensor data acquisition and analysis
system for driver performance assessment and training.
Vision-Based Motorcycle Detection and Tracking System with Occlusion
Chung-Cheng Chiu, Min-Yu Ku, Hung-Tsung Chen
Real-time commercial recognition Using Color Moments and Hashing
Abhishek Shivadas, John M. Gauch
Constructing Face Image Logs that are Both Complete and Concise
Adam Fourney and Robert Laganière
Abstract: This paper describes a construct that we call a face image
log. Face image logs are collections of time stamped images
representing faces detected in surveillance videos. The techniques
demonstrated in this paper strive to construct face image logs that
are complete and concise in the sense that the logs contain only the
best images available for each individual observed. We begin by
describing how to assess and compare the quality of face images. We
then illustrate a robust method for selecting high quality images.
This selection process takes into consideration the limitations
inherent in existing face detection and person tracking techniques.
Experimental results demonstrate that face logs constructed in this
manner generally contain fewer than 5% of all detected faces, yet
these faces are of high quality, and they represent all individuals
detected in the video sequence.
Real-time eye blink detection with GPU-based SIFT tracking
Marc Lalonde, David Byrns, Langis Gagnon, Normand Teasdale, Denis Laurendeau
Eye blinks are detected inside regions of interest that are
aligned with the subject’s eyes at initialization. Alignment
is maintained through time by tracking SIFT feature points that are used to estimate the affine transformation between
the initial face pose and the pose in subsequent frames. The
GPU implementation of the SIFT feature point extraction
algorithm ensures real-time processing. An eye blink detection
rate of 97% is obtained on a video dataset of 33,000
frames showing 237 blinks from 22 subjects.
A Robust Video Foreground Segmentation by Using Generalized Gaussian
Mohand Saïd Allili, Nizar Bouguila and Djemel Ziou
A Framework for 3D Hand Tracking and Gesture Recognition
Ayman El-Sawah, Chris Joslin, Nicolas D. Georganas, Emil M. Petriu
Abstract: In this paper we present a framework for 3D hand
tracking and dynamic gesture recognition using a single camera. Hand
tracking is performed in a two step process: we first generate 3D hand
posture hypothesis using geometric and kinematics inverse
transformations, and then validate the hypothesis by projecting the
postures on the image plane and comparing the projected model with the
ground truth using a probabilistic observation model. Dynamic gesture
recognition is performed using a Dynamic Bayesian Network model. The
framework utilizes elements of soft computing to resolve the ambiguity
inherent in vision-based tracking by producing a fuzzy hand posture
output by the hand tracking module and feeding back potential posture
hypothesis from the gesture recognition module.
Adaptive Appearance Model for Object Contour Tracking in Videos
Mohand Saïd Allili and Djemel Ziou
Automatic Annotation of Humans in Surveillance Video
T.B. Moeslund, D.M. Hansen, P.Y. Duizer
Abstract: In this paper we present a system for automatic annotation of humans
passing a surveillance camera. Each human has 4 associated
annotations: the primary color of the clothing, the height, and focus
of attention. The annotation occurs after robust background
subtraction based on a Codebook representation. The primary colors of
the clothing are estimated by grouping similar pixels according to a
body model. The height is estimated based on a 3D mapping using the
head and feet. Lastly, the focus of attention is defined as the
overall direction of the head, which is estimated using changes in
intensity at four different positions. Results show successful
detection and hence successful annotation for most test sequences.
[Video] [Head direction dataset]
Registration of IR and Video Sequences Based on Frame Difference
Zheng Liu and Robert Laganière
Abstract: Multi-modal imaging sensors are employed in advanced
surveillance systems in the recent years. The performance of
surveillance systems can be enhanced by using information beyond the
visible spectrum, for example, infrared imaging. To ensure correctness
of low- or high-level processing, multi-modal imagers must be fully
calibrated or registered. In this paper, an algorithm is proposed to
register the video sequences acquired by an infrared and an
electro-optical (CCD) camera. The registration method is based on the
silhouette extracted by differencing adjacent frames. This difference
is found by an image structural similarity measurement. Initial
registration is implemented by tracing the top head points in
consecutive frames. Finally, an optimization procedure to maximize
mutual information is employed to refine the registration results.
Posters & Demos
Held together with CRV'07 Poster Session 2
Tuesday 13:00 - 15:00 (Maisonneuve - 36th Floor)
Detecting, Tracking and Classifying Animals in Underwater Observatory Video
Duane R. Edgington, Danelle E. Cline, Jerome Mariette, Ishbel Kerkez
Abstract: For oceanographic research, remotely operated underwater vehicles
(ROVs) and underwater observatories routinely record several hours of video material every day. Manual processing of such large amounts of video has become a major bottleneck for scientific research based on this data. We have developed an automated system that detects, tracks, and classifies objects that are of potential interest for human video annotators. By pre-selecting salient targets for track initiation using a selective attention algorithm, we reduce the complexity of multi-target tracking. Then, if an object is tracked for several frames, a visual event is created and passed to a Bayesian classifier utilizing a Gaussian mixture model to determine the object class of the detected event.
GET-based map icon Identification for Interaction with Map and Kiosks
Huiqiong Chen, Derek Reilly
Abstract: This paper presents a GET (Generic Edge Token) based
approach of detecting and recognizing objects by their
shapes, and applies it to improve our ongoing work that
considers ways of interacting with paper maps using a
handheld. In our work, the GET-based technique aims to
help user better locate points of interest on map by
recognizing these icons from images/videos captured by
handheld camera. In this method, video/image content can
be described using a set of perceptual shape features called GETs. Perceptible object can be extracted from a GET map,
and then be compared against pre-defined icon models
based on GET shape features in recognition. This method
provides a simple, efficient way to locate points of interest
on the map, determining handheld location, orientation
when combined with RFID (senor-based) technique. The
tests show that the GET-based object identification can be
executed in reasonable time for the real-time interaction
system. Meanwhile, the detections and recognitions are
robust under different lighting conditions, camera focus,
camera rotation, and distance from the map.
Face Recognition in Video Using Modular ARTMAP Neural Networks
M. Barry and E. Granger
Abstract: In video-based of face recognition applications, the What-and-Where Fusion Neural Network (WWFNN) has been shown to reduce the generalization error by accumulating a classifier’s predictions over time, according to each individual in the environment. In this paper, three ARTMAP variants – fuzzy ARTMAP, ART-EMAP (Stage 1) and ARTMAP-IC – are compared for the classification of faces detected in the WWFNN. ART-EMAP (stage 1) and ARTMAP-IC expand on the well-known fuzzy ARTMAP by using distributed activation of category neurons, and by biasing distributed predictions according to the number of time these neurons are activated by training set patterns. The average performance of the WWFNNs with each ARTMAP network is compared to the WWFNN with a reference k-NN classifier in terms of generalization error, convergence time and compression, using a data set of real-world video sequences. Simulations results indicate that when ARTMAP-IC is used inside the WWFNN, it can achieve a generalization error that is significantly higher (about 20% on average) than if fuzzy ARTMAP or ARTEMAP is used. Indeed, ARTMAP-IC is less discriminant than the two other ARTMAP networks in cases with complex decision bounderies, when the training data is limited and unbalanced, as found in complex video data. However, ARTMAP-IC can outperform the others when classes are designed with a larger number of training patterns.
A Simple Inter- and Intrascale Statistical Model for Video Denoising in
3-D Complex Wavelet Domain Using a Local Laplace Distribution
Hossein Rabbani, Mansur Vafadust, Saeed Gazor
This paper presents a new video denoising algorithm
based on the modeling of wavelet coefficients in each
subband with a Laplacian probability density function (pdf) with local variance. Since Laplacian pdf is
leptokurtic, it is able to model the sparsity of wavelet
coefficients. We estimate the local variance of this pdf
employing adjacent coefficients at same scale and parent
scale. This local variance models interscale dependency
between adjacent scales and intrascale dependency
between spatial adjacent. Within this framework, we
design a maximum a posteriori (MAP) estimator for
video denoising, which relies on the proposed local pdf.
Because separate 3-D transforms, such as ordinary 3-D
wavelet transforms, have artifacts that degrade the
performance of this transform, we implement our
algorithm in 3-D complex wavelet transform. This nonseparable
and oriented transform gives a motion-based
multiscale decomposition for video that isolates in its
subbands motion along different directions. In addition,
we use our denoising algorithm in 2-D complex wavelet
transform, where the 2-D transform is applied to each
frame individually. Despite the simplicity of our method
in implementation, it achieves better performance than
several denoising methods both visually and in terms of
peak signal-to-noise ratio (PSNR).
Automatic extraction of semantic object in image using local
Chee Sun Won
This paper deals with the problem of segmenting
semantic object in an image. Fully automatic solution
of this problem is not possible, but human intervention
is needed for outlining the rough boundary of the
semantic object to be segmented. Our goal is to make
the object extraction automatic after the first semiautomatic
segmentation. To achieve our goal, we
manipulate the contrast of the object and the background such that any contrast-based object
segmentation method can extract the object automatically.
Zoom on the evidence with
ACE Surveillance (+demo)
Dmitry O. Gorodnichy, Mohammad A. Ali, Elan Dubrofsky, Kris Woodbeck
Abstract: Despite the population's growing awareness of the need to use surveillance systems for better security in private and business settings, such systems still have not become commonplace.
The main reason for this is the amount of time and resources an average user has to dedicate in order to
collect video data and then to dig through it searching for a evidence when using traditional
DVR-based surveillance systems. Here we present ACE-Surveillance - an automated surveillance technology based on real-time Annotation of Critical
Evidence that provides an efficient and low-cost solution to the problem.
We describe the main features of this technology as related to its two components: ACE-Capture and ACE-Browser.
The first component deals with detection and archival of annotated evidence, which is normally performed on a client's desktop
computer. The latter deals with browsing and displaying archived video evidence and can be performed either locally on client's computer or remotely via a dedicated server.
A new Zoom-on-the-Evidence browsing technique featured by the ACE Surveillance is introduced.
Demonstrations of running the technology on several real-life long-term monitoring assignments,
including its deployment by the NRC commissionaires, are shown.
[Paper] [Poster] [Link]
Working with computer hands-free using Nouse
Perceptual Vision Interface (+demo)
Dmitry O. Gorodnichy, Elan Dubrofsky, Mohammad A. Ali
Abstract: Normal work with a computer implies being able to perform the following three
computer control tasks: 1) pointing , 2) clicking, and 3) typing. Many attempts have been made to make it possible to perform these tasks hands-free using a video image of the user as input.
Nevertheless, as reported by assistive technology practitioners, no real vision-based hands-free computer control solution
that can be used by disabled users has been produced as of yet. Here we present the Nouse Perceptual Vision Interface
(Nouse-PVI) that is hoped to finally offer such as a solution. Evolved from the original Nouse "Nose as Mouse" concept and
currently under testing with SCO Health Service, Nouse-PVI has several unique features that make it
preferable to other hands-free computer input alternatives.
These include i) using the nose tip, which has been confirmed to be very convenient for disabled
whom the nose literally becomes a new "finger" that can be used to
control the computer; ii) a feedback-providing mechanism called
Perceptual Cursor (Nousor) that creates an invisible link between the computer and user and which is very important for control as it allows the user to adjust his/her head motion so that the computer can better interpret
it, and iii) a number of design solutions specifically tailored for vision-based data entry using small range head motion, such as
motion-based code entry (NouseCode and NouseType), a motion-based virtual keyboard (NouseBoard and NousePad) and a word-by-word letter drawing tool
(NouseChalk). While demonstrating these innovative tools, we also address the issue of the
user's ability and readiness to work with a computer in the brand new way,
i.e. hands-free. The problem is that a user has to understand that it is not entirely the responsibility of
the computer to understand what one wants, but it's also the responsibility of the user.
- Just as a conventional computer user cannot move the cursor on the screen without first putting
the hand on the mouse, so a perceptual interface user cannot work with a computer until
s/he "connects" to it.
That is, the computer and user must work as a team for the best control results to be achieved.
This presentation is therefore designed to serve both as a guide to those developing vision-based input devices and as a tutorial for those who will be using them.