The Video Yardstick
Title | The Video Yardstick |
Publication Type | Book Chapters |
Year of Publication | 1998 |
Authors | Brodský T, Fermüller C, Aloimonos Y |
Editor | Magnenat-Thalmann N, Thalmann D |
Book Title | Modelling and Motion Capture Techniques for Virtual EnvironmentsModelling and Motion Capture Techniques for Virtual Environments |
Series Title | Lecture Notes in Computer Science |
Volume | 1537 |
Pagination | 144 - 158 |
Publisher | Springer Berlin / Heidelberg |
ISBN Number | 978-3-540-65353-0 |
Abstract | Given uncalibrated video sequences, how can we recover rich descriptions of the scene content, beyond two-dimensional (2D) measurements such as color/texture or motion fields — descriptions of shape and three-dimensional (3D) motion? This is the well known structure from motion (SFM) problem. Up to now, SFM algorithms proceeded in two well defined steps, where the first and most important step is recovering the rigid transformation between two views, and the subsequent step is using this transformation to compute the structure of the scene in view. This paper introduces a novel approach to structure from motion in which both steps are accomplished in a synergistic manner. It deals with the classical structure from motion problem considering a calibrated camera as well as the extension to an uncalibrated optical device. Existing approaches to estimation of the viewing geometry are mostly based on the use of optic flow, which, however, poses a problem at the locations of depth discontinuities. If we knew where depth discontinuities were, we could (using a multitude of approaches based on smoothness constraints) accurately estimate flow values for image patches corresponding to smooth scene patches; but to know the discontinuities requires solving the structure from motion problem first. In the past this dilemma has been addressed by improving the estimation of flow through sophisticated optimization techniques, whose performance often depends on the scene in view. In this paper we follow a different approach. We directly utilize the image derivatives and employ constraints which involve the 3D motion and shape of the scene, leading to a geometric and statistical estimation problem. The main idea is based on the interaction between 3D motion and shape which allows us to estimate the 3D motion while at the same time segmenting the scene. If we use a wrong 3D motion estimate to compute depth, we obtain a distorted version of the depth function. The distortion, however, is such that the worse the motion estimate, the more likely we are to obtain depth estimates that are locally unsmooth, i.e., they vary more than the correct ones. Since local variability of depth is due either to the existence of a discontinuity or to a wrong 3D motion estimate, being able to differentiate between these two cases provides the correct motion, which yields the “smoothest” estimated depth as well as the image locations of scene discontinuities. We analyze the new constraints introduced by our approach and show their relationship to the minimization of the epipolar constraint, which becomes a special case of our theory. Finally, we present a number of experimental results with real image sequences indicating the robustness of our method and the improvement over traditional methods. The resulting system is a video yardstick that can be applied to any video sequence to recover first the calibration parameters of the camera that captured the video and, subsequently, the structure of the scene. |
URL | http://dx.doi.org/10.1007/3-540-49384-0_12 |