HRI-JP Honda Research Institute Japan (HRI-JP) – Research and development of advanced technologies

Publications > Multi-talker Speech Recognition under Ego-motion Noise using Missing Feature Theory



Advanced Search

October 2010

Multi-talker Speech Recognition under Ego-motion Noise using Missing Feature Theory

  • article   (1.64MB)
    Copyright (C) IEEE, 2010. The copyright of this material is retained by IEEE. This material is published on this web site by permission of IEEE for your personal use. Not for redistribution.
  • Ince_2010_634   (1.89KB)
  • G. Ince, K. Nakadai, T. Rodemann, H. Tsujino, J. Imura,
  • in Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2010),
  • 2010,
  • pp. 982-987,
  • Conference paper

This paper presents a system that gives a mobile robot the ability to recognize target speaker’s speech, even if the robot performs an action and there are multiple speakers talking in the room. Associated problems to this system are twofold: (1) While the robot is moving, the joints inevitably generate ego-motion noise due to its motors. (2) Recognizing target speech against other interfering speech signals is a difficult task. Since typical solutions to (1) and (2), motor noise suppression and sound source separation, both introduce distortion to the processed signals, the performance of automatic speech recognition (ASR) deteriorates. Instead of removing the ego-motion noise with conventional noise suppression methods, in this work, we investigate methods to eliminate the unreliable parts of the audio features that are contaminated by the ego-motion noise. For this purpose, we model masks that filter unreliable speech features based on the ratio of speech and motor noise energies We analyze the performance of the proposed technique under various test conditions by comparing it to the performance of existing Missing Feature Theory-based ASR implementations. Finally, we propose an integration framework for two different masks that are designed to eliminate ego noise and to filter the leakage energy of interfering sound sources. We demonstrate that the proposed methods achieve a high ASR accuracy.

Search by Other Conditions

Entry type