Current projects in HRI-JP
HRI-JP Principal Researcher Mikio Nakano
Our research is “developing models for controlling speech and action of robots.” We study robots with built in oral conversation functionality that can understand, learn and take action based on spoken language by autonomously estimating the current situation and the context. In other words these robots act because they understand human speech even though the situation may change and multiple tasks may be required.
You have probably seen a scene where robots are talking with humans on TV. In a railway company, there is an automated phone service talking with customers on the phone and issues express train tickets. However, in reality these robots are executing given tasks with fixed objectives and under fixed situations.
For robots to take various actions based on what people say, robots must be preprogrammed with many contingent behaviors available depending on conversations and various conditions. Multidomain conversation functionality is an oral conversation functionality where a number of these contingent behaviors built for each system are loaded on a single robot, and the robot flexibly chooses between these behaviors.
We need to program various conversation domains that allow multidomain conversation into a single robot for evolving “controlling speech and action of robots”. Furthermore, we need functionality to decide how to react to speech and to decide in which domain to make a conversation, and a mechanism for the robot to determine and learn unknown words.
For example, if we order a robot to “go to the meeting room”, the robot does not understand what it heard and cannot take action if it does not know what “the meeting room” means. The robot needs to know the meaning, and predict the occurrence of the phrase in the conversation. If the robot cannot predict correctly, it may totally misunderstand the context and take unexpected action.
Even if the robot does not recognize a phrase and realizes that it does not know the phrase, the robot may ask the counterpart human for clarification of the phrase that it did not know, learn what the phrase means, and take the correct action. Understanding and creating the mechanism of processing language is the most difficult and most interesting part of our research.
We take two approaches in our research: understanding how and why people make conversations with each other, and how people can communicate with machines. In both approaches we make actual conversation situations, analyze the vast data and build various conversation models.
What kind of mechanisms do people use to understand voices? What kind of problems are there when people talk to robots and robots do not respond? When people determine that the robot does not understand well, people talk slowly, just say the important words, or change their tone or speech pattern. These are very characteristic behaviors, and we can collect very interesting samples.
As explained above, there are many, many areas left uncovered in speech technology when considering natural voice interfaces. However, we believe that communication by voices, which is easily done by children when speaking to the elderly, is one of the most human-friendly interfaces.
When voice interfaces connecting people with robots develop in the future, we think we can build a new relationship between people and robots. We predict that in the future there will be “handbook-free” robots that listen to human speech and learn by themselves.
I am happy to learn the elaborate intellectual activities of people and the mechanism behind our language. We are not given research topics at HRI-JP, but choose and pursue our own research topics.
We feel liberated being able to choose our own topics, but there is attendant responsibility. What are the issues, what can be uncovered, and what is the output? We approach challenging research topics not as short-term tasks but as steps to achieving long-term goals.