Researcher: Research and Development of Multimodal Conversational Intelligence for Human-Centric AI

Career Opportunities

Researcher: Research and Development of Multimodal Conversational Intelligence for Human-Centric AI

Job Overview

Position Overview

HRI-JP is looking for a forward-thinking Research Scientist or Engineer to advance the capabilities of next-generation conversational systems. In this role, you will drive research at the intersection of large language models and multimodal signal processing (speech, vision, and text). Your primary focus will be developing multimodal architectures designed to deeply understand and adapt to complex human interaction scenarios in real-time.

You will work on foundational techniques for extracting and fusing rich information from diverse modalities, enabling systems to distinguish and respond to nuances in multi-party environments. Collaborating with our global network of research institutes in Germany and the U.S., you will help shape the future of human-centric AI, deploying robust solutions that function seamlessly across languages and cultural contexts.

Key Responsibilities

Advance Multimodal Architectures: Design and validate novel research concepts of next-generation conversational models, focusing on enhancing internal reasoning (chain-of-thought) and reducing latency to support complex, real-time decision-making.
Develop Context-Aware Models: Create algorithms that analyze continuous streams of audio, visual, and textual data to interpret user dynamics and adapt to environmental context.
Solve real-world challenges: Design mechanisms to convert real-world interaction signals into high-quality training data, ensuring continuous model improvement and bias mitigation while maintaining rigorous evaluation pipelines.
Drive Global Collaboration: Work closely with our sister institutes in Germany and the U.S. to align research with corporate strategy and tackle ambitious research challenges that have a direct path to product impact.
Technical Leadership & Growth: Proactively identify emerging research themes and grow into a team leadership role, taking responsibility for project management, mentoring junior researchers, and representing our work at top-tier international venues.

Job Characteristics

You will take ownership of the full research lifecycle, independently driving the design and implementation of novel multimodal frameworks suited for dynamic, real-world environments. This role bridges the gap between academic discovery and practical deployment, giving you direct experience in translating foundational models into robust systems that solve genuine user challenges. Working within a tight-knit, expert-led team, you will benefit from close mentorship while maintaining the autonomy to shape your own research agenda, propose innovative technical solutions, and actively publish your work in leading scientific venues.

Technologies Used

Multimodal Fusion & LLMs: Vision-Language Models (VLMs) and multimodal foundational models, focusing on cross-modal attention mechanisms to align visual cues with linguistic and acoustic streams.
Human-Centric Computer Vision: Algorithms for analyzing interaction dynamics, including active speaker detection, gaze tracking, pose estimation, and facial attribute analysis to ground conversational context visually.
Speech & Natural Language Processing: Integrated systems that fuse visual signals with speech for robust speaker-aware recognition, combined with NLP algorithms to extract semantic meaning and conversational context.

Mission

Guided by Honda’s core philosophy, “Technology for People,” we aim to lead research and development with speed and innovative ideas, striving to create technologies that are truly unique in the world.

By understanding the latest trends in multimodal AI, we identify high-value research initiatives and proactively advance research themes that align with Honda’s corporate strategy.

We anchor our work in concrete application domains, aiming for high-impact results through rapid prototyping, rigorous experimentation, and continuous feedback loops. To contribute to the global research community and enhance our presence, we actively disseminate our achievements through top-tier international conferences and academic publications.

Project Scale

You will work in collaboration with team members and researchers from external research institutions.
Close collaboration with overseas sister companies (Germany and the United States) enables engagement in globally oriented research projects.

Team Structure

Research projects are conducted under the Research Division Manager (Research Division).
Projects are initiated based on proposals submitted by researchers themselves, including content and budget, and are launched upon board approval.

Work Environment

Approximately half of our employees are non-Japanese nationals, creating a highly international workplace
Daily communication is conducted in both Japanese and English
The workplace is located within Honda’s award-winning suburban Wako Campus
Researchers are granted a high degree of autonomy, and external publication of research achievement is strongly encouraged
The company values both organizational direction and individual researchers’ initiative

About Honda Research Institute Japan

Established in 2003 in Japan, the U.S., and Europe as a wholly owned subsidiary of Honda R&D, Honda Research Institute aims to explore new domains beyond automotive technology.
Our guiding philosophy is “Innovate through Science.”
Through multidisciplinary approaches spanning artificial intelligence, robotics, systems science, neuroscience, materials science, psychology, and social ethics, we focus on communication and sensing under the concepts of Cooperative Intelligence and Cooperative Devices, pursuing a wide range of research initiatives.

Required Skills/Experience

Qualifications

Minimum Qualifications

Masters/Ph.D. in machine learning, artificial intelligence, or computer science (or equivalent practical experience).
Solid foundation in Deep Learning with expertise in multimodal signal processing (intersection of Natural Language Processing, Computer Vision, and Speech Processing).
Strong programming skills in Python and proficiency with ML frameworks such as PyTorch or TensorFlow.
A minimum of 3 years of relevant professional and research experience.
Excellent collaboration skills and business-level English proficiency (spoken and written).

Preferred Qualifications

Experience with multimodal learning, LLMs, and/or conversational AI systems.
A proven track record of publications at top-tier conferences (e.g., AAAI, NeurIPS, ICASSP, CVPR, ACL)
Experience with industry-standard open-source toolkits (e.g., vLLM, ESPnet, Nemo, Hugging Face etc.)
Experience with pretraining, post-training techniques, representation learning, few-shot learning, and evaluations.
Familiarity with large-scale model training and deployment.
Contributions to open-source projects.
Experience working in a collaborative, cross-functional team environment or business activities with overseas research institutions.

Even if you feel you do not meet all of the above requirements, we welcome your positive application!
We are looking for people who can tackle problem solving systematically and who can work on realistic and sustainable solutions as a team player in a multicultural environment!

Language

Japanese: Native or higher than JLPT N3
English: Native or Fluent *TOEIC 600 points or higher (standard)

Type of Employment

Employment status

Fixed-term contract (3 years)

* Renewable
* 6-month probationary period

Salary

Negotiable based on company regulations in consideration of experience, ability and previous salary

How to Apply

To apply for this position, please send your resume and cover letter via email or postal mail to the address below. *Application documents will be disposed of responsibly after the selection process is complete.

Address: Recruitment Office
Honda Research Institute Japan Co., Ltd.
8-1 Honcho, Wakō-shi, Saitama 351-0188
Email: recruit (at) jp.honda-ri.com