The research work presented in this thesis is motivated by the increasing demand for care for the elderly. A domestic assistive robot has the potential to supplement humans in the provision of assistance for the elderly with simple daily tasks, such as retrieving small objects from various places, switching lights on and off, and opening and closing doors. The proposed assistive robot possesses both transactional intelligence and spatial intelligence. This thesis concentrates on the realization of the transactional intelligence, which enables the robot to naturally and effectively interact with human users. The ultimate goal of this research is to develop a system for the robot to perceive multiple modalities used by humans during face-to-face communication, including speech, eye gaze and gestures, so that the robot is able to understand the user’s intention and make appropriate responses.
Some important features in the design and implementation of the system are as follows.
1. Naturalness and effectiveness are the fundamental principles in the design of the interaction interface. Therefore, only cameras are used as non-contact sensing devices.
2. The user is observed only from the robot’s view, so that the interaction can take place anywhere rather than be confined to a particular room.
3. The behavioural differences between individuals are emphasized, enabling the robot to give appropriate responses to different users. This is achieved by a user identification method and a profile built for each individual user, which stores several characteristics of a specific user.
4. The proposed hand gesture recognition system recognizes both dynamic motion patterns and static hand postures. The 3D Particle Filter-based hand tracking approach combines information of colour, motion and depth. It robustly tracks the hands even when the person wears a short-sleeved shirt exposing the forearm.
5. Different sources of information conveyed by speech, eye gaze and gestures are aligned and then combined by the proposed multimodal interaction system. The approach takes into account that each sub-system may generate incomplete or erroneous results.
6. Mutual interaction is realised by a dialogue manager. Based on the perceived information, the robot decides either to perform a required task or negotiate with the user when the command is ambiguous or not feasible.
7. The ability of the robot to infer the user’s emotional states as a social companion is also attempted.
The technical contributions of this thesis have been validated with a series of experiments in typical indoor environments.