ホンダ・リサーチ・インスティチュート・ジャパン – 先端技術の研究開発

論文検索 > Multi-timescale Feature-extraction Architecture of Deep Neural Networks for Acoustic Model Training from Raw Speech Signal

研究活動

論文検索

Advanced Search

October 2018

Multi-timescale Feature-extraction Architecture of Deep Neural Networks for Acoustic Model Training from Raw Speech Signal

  • R. Takeda, K. Nakadai, K. Komatani,
  • in Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2018),
  • IEEE,
  • 2018,
  • pp. 2503-2510,
  • Conference paper

This paper describes a new architecture of deep neural networks (DNNs) for acoustic models. Training DNNs from raw speech signals will provide 1) novel features of signals, 2) normalization-free processing such as utterance-wise mean subtraction, and 3) low-latency speech recognition for robot audition. Exploiting the longer context of raw speech signals seems useful in improving recognition accuracy. However, naive use of longer contexts r sults in the loss of short-term patterns; thus, recognition accuracy degrades. We propose a multitimescale feature-extraction architecture of DNNs with blocks of different time scales, which enable capturing long- and shortterm patterns of speech signals. Each block consists of complexvalued networks that correspond to Fourier and filterbank transformations for analysis. Experiments showed that the proposed multi-timescale architecture reduced the word error rate by about 3% compared with those only with the long term context. Analysis of the extracted features revealed that our architecture efficiently captured the slow and fast changes of speech features.

Search by Other Conditions

Keywords
Entry type
Years
to
Authors
Language
Refereed