Using computer vision to generate customized spatial audio
Title | Using computer vision to generate customized spatial audio |
Publication Type | Conference Papers |
Year of Publication | 2003 |
Authors | Mohan A, Duraiswami R, Zotkin DN, DeMenthon D, Davis LS |
Conference Name | Multimedia and Expo, IEEE International Conference on |
Date Published | 2003/// |
Publisher | IEEE Computer Society |
Conference Location | Los Alamitos, CA, USA |
ISBN Number | 0-7803-7965-9 |
Abstract | Creating high quality virtual spatial audio over headphones requires real-time head tracking, personalized head-related transfer functions (HRTFs) and customized room response models. While there are expensive solutions to address these issues based on costly head trackers, measured personalized HRTFs and room responses, these are not suitable for widespread or easy deployment and use. We report on the development of a system that uses computer vision to produce customizable models for both the HRTF and the room response, and to achieve head-tracking. The system uses relatively inexpensive cameras and widely available personal computers. Computer-vision based anthropometric measurements of the head, torso, and the external ears are used for HRTF customization. For low-frequency HRTF customization we employ a simple head-and-torso model developed recently [V. R. Algazi et al., 2002]. For high frequency customization we employ measured pinna characteristics as an index into a database of HRTFs [D. N. Zotkin et al., 2002]. For head tracking we employ an online implementation of the POSIT algorithm [D. DeMenthon and L. Davis, 1995] along with active markers to compute head pose in real-time. The system provides an enhanced virtual listening experience at low cost. |
DOI | 10.1109/ICME.2003.1221247 |