posted on 2017-02-27, 06:02authored byTang, Titus Jia Jie
This thesis presents novel computer vision technologies and spatial audio user interfaces that the author has developed for use in electronic visual aids for vision impaired people. It focuses on the challenges of assisted indoor navigation and object localisation, both active and expanding areas of research with many unsolved problems. The presented research leverages 3D sensing technologies to develop 3D modelling algorithms and spatial sonification techniques that cumulatively result in the development and characterisation of a visual aid prototype for object localisation.
Planar surfaces are a common geometrical feature in many man-made environments. The detection of planes could be useful in various applications such as mapping and developing an understanding of the environment. This thesis presents a novel method of allowing a computer system to automatically detect multiple planar surfaces. The proposed algorithm leverages on the RanSaC paradigm to fit plane models to the depth data from an RGB-D sensor. Plane modelling and fitting is performed in inverse depth coordinates which is advantageous in simplifying the error modelling of depth data from the RGB-D sensor.
In order to evaluate the performance of the multiple plane detection algorithm, it is first applied to perform egomotion estimation of a moving RGB-D sensor. The proposed egomotion estimation algorithm leverages on the re-detection of planar correspondences between consecutive image frames in order to estimate the movement of the sensor. Next, the plane detection algorithm is used to develop a novel staircase detector. The staircase detection algortihm iteratively detects the multiple planar surfaces of a staircase based on the assumption that staircases consist of multiple evenly spaced steps.
The plane detection algorithm is then integrated with a spatial audio user interface to develop an end-to-end visual aid prototype that helps vision impaired users localise objects at close range. The visual aid prototype uses plane detection to narrow the search space for objects in a scene. The location of detected objects are conveyed to the user via spatial audio cues.
Characterisation of the prototype is performed by conducting quantitative user trials with both normally sighted individuals who are blindfolded as well as vision impaired individuals. These trials allow for the quantification of human-in-the-loop performance and also reveal interesting trends in user bias and variability. The trials also provide an understanding of the similarities and differences in performance and preference between both normally sighted and vision impaired user groups.
Additional user trials are conducted with the use of a simultaneous sonification strategy in an attempt to study how such a communication strategy could be used to improve on the effectiveness and efficiency of existing spatial audio user interfaces. This strategy is based on the hypothesis that human listeners can pay selective attention to relevant streams of audio in the presense of multiple sound sources. Trial results show that simultaneously sonifying multiple sound sources does not adversely impact user performance.
Finally, the lessons learned from user trials are applied towards improving the spatial audio user interface. A new spatial audio sonification strategy that spatially distorts the sonified location of objects is introduced. User trials with the new sonification strategy reveal a statistically significant improvement of about 24% on average compared to earlier trials.