Incoming PhD Student · Multimodal AI · Embodied Intelligence

Bohan Hou 侯博涵

I work on multimodal models, information retrieval, embodied intelligence, and robot learning.

Multimodal Models Information Retrieval Embodied Intelligence Robot Learning

Email GitHub

Short Bio

A concise research-focused introduction.

I am an incoming PhD student at Nanyang Technological University (NTU), where I will be advised by Prof. Jianfei Yang. I received my undergraduate education at Shandong University, where I studied Computer Science at Taishan College under the guidance of Prof. Xuemeng Song. I have also been a research assistant at the Intelligent Media Research Center (iLearn), led by Prof. Liqiang Nie. My research interests include multimodal models, information retrieval, embodied intelligence, robot learning, and related areas.

Experience

Research appointments and internships.

Nov. 2025 — Present

Alibaba DAMO Academy

Research Intern · Embodied Intelligence and World Model

Mentors: Dr. Jun Cen, Ronghao Dang, and Dr. Xin Li.

I also work closely with my talented colleagues Jiayan Guo(We explore Agents) and Siteng Huang(We explore VLA)

Sep. 2022 — Present

iLearn Lab, Shandong University

Research Assistant · Multimodal Large Language Models and Information Retrieval

Supervisors: Prof. Xuemeng Song and Prof. Liqiang Nie.

Education

Academic training and degree programs.

Incoming

Nanyang Technological University

Incoming PhD Student

Advised by Prof. Jianfei Yang. Research direction: multimodal learning, MLLM Agents, and embodied intelligence.

Sep. 2022 — Jun. 2026

Shandong University

B.Eng. in Computer Science and Technology

Taishan College, First-Honor Program. GPA: 88.5 / 100; Rank: 5 / 22.

First-Author Publications

Left: visual preview. Right: paper information.

arXiv 2026Agentic Search

InterLV-Search: Benchmarking Interleaved Multimodal Agentic Search

Bohan Hou, Jiuning Gu, Jiayan Guo, Ronghao Dang, Sicong Leng, Xin Li, Xuemeng Song, and Jianfei Yang.

A benchmark for interleaved language-vision agentic search with active visual evidence seeking and open-web multimodal search.

Paper Code

arXiv 2026Survey

World Model for Robot Learning: A Comprehensive Survey

Bohan Hou, Gen Li, Jindou Jia, Tuo An, Xinying Guo, Sicong Leng, Haoran Geng, Yanjie Ze, Tatsuya Harada, Philip Torr, Oier Mees, Marc Pollefeys, Zhuang Liu, Jiajun Wu, Pieter Abbeel, Jitendra Malik, Yilun Du, and Jianfei Yang.

A robotics-centered survey of world models for policy learning, planning, learned simulation, evaluation, and video generation.

Paper Project

arXiv 2026Embodied Foundation Models

RynnBrain: Open Embodied Foundation Models

Ronghao Dang*, Jiayan Guo*, Bohan Hou*, Sicong Leng*, Kehan Li*, Xin Li*, et al.

An open embodied foundation model family for egocentric understanding, spatiotemporal localization, physical reasoning, and planning.

Paper Project Code

SIGIR 2025Conference · CCF-A / CORE-A

FiRE: Enhancing MLLMs with Fine-Grained Context Learning for Complex Image Retrieval

Bohan Hou, Haoqiang Lin, Xuemeng Song, Haokun Wen, Meng Liu, Yupeng Hu, and Xiangyu Zhao.

Fine-grained context learning for complex image retrieval with multimodal large language models.

Paper Code

WWW 2025 ChallengeOral · CCF-A

Visual Anchor Point for Multimodal Document Retrieval

Bohan Hou, Haokun Wen, Haoqiang Lin, Xuemeng Song, and Liqiang Nie.

Winner of the WWW’25 Multimodal Document Retrieval Challenge.

Code

IJCNN 2025Conference · CORE-A

Pseudo-triplet Guided Few-shot Composed Image Retrieval

Bohan Hou, Haoqiang Lin, Haokun Wen, Meng Liu, Mingzhu Xu, and Xuemeng Song.

A pseudo-triplet guided scheme for improving few-shot composed image retrieval.

Paper Code

Co-Author Publications

Left: visual preview. Right: paper information.

NeurIPS 2025Conference · CCF-A

ImgEdit: A Unified Image Editing Dataset and Benchmark

Yang Ye, Xianyi He, Zongjian Li, Bin Lin, Shenghai Yuan, Zhiyuan Yan, Bohan Hou, and Li Yuan.

A unified image editing dataset, benchmark, and model for high-quality single-turn and multi-turn image editing.

Paper Code

TOISJournal · CCF-A

A Comprehensive Survey on Composed Image Retrieval

Xuemeng Song, Haoqiang Lin, Haokun Wen, Bohan Hou, Mingzhu Xu, and Liqiang Nie.

A comprehensive survey of composed image retrieval, covering supervised, zero-shot, and related multimodal retrieval settings.

Paper Resource

Selected Project

Representative open-source effort.

Embodied-AI-Guide

★ 13.6k stars

A popular Chinese knowledge base and resource index for embodied intelligence, organized in an encyclopedia-style format. It covers robotics foundations, robot learning, simulators, benchmarks, datasets, multimodal models, VLA models, navigation, and related embodied-AI research directions.

GitHub README

Honors & Misc.

Compact highlights; link to CV for the full list.

Selected Honors

Champion, WWW’25 Multimodal Document Retrieval Challenge.
Shandong University Outstanding Student Scholarship, 2023–2025.
Shandong University Academic Innovation Scholarship, 2025.
Second Prize, National Undergraduate Mathematics Competition, 2024.

Bohan Hou 侯博涵

Short Bio

Experience

Alibaba DAMO Academy

iLearn Lab, Shandong University

Education

Nanyang Technological University

Shandong University

First-Author Publications

InterLV-Search: Benchmarking Interleaved Multimodal Agentic Search

World Model for Robot Learning: A Comprehensive Survey

RynnBrain: Open Embodied Foundation Models

FiRE: Enhancing MLLMs with Fine-Grained Context Learning for Complex Image Retrieval

Visual Anchor Point for Multimodal Document Retrieval

Pseudo-triplet Guided Few-shot Composed Image Retrieval

Co-Author Publications

ImgEdit: A Unified Image Editing Dataset and Benchmark

A Comprehensive Survey on Composed Image Retrieval

Selected Project

Embodied-AI-Guide

Honors & Misc.

Selected Honors

Links