Ruihang Li (李睿航)

Ruihang Li is a PhD student in a joint program between the University of Science and Technology of China (USTC) and the Shanghai Innovation Institute (SII), advised by Prof. Wenjie Wang and Dr. Jiaqi Wang. His research focuses on Reinforcement Learning, Visual Generation, and Unified Multimodal Models.

He is currently a research intern at Baidu ERNIE, focusing on training unified multimodal models. He was a research intern at Tencent Hunyuan Frontier Lab, where he worked on RL and evaluation for visual generation. Additionally, he implemented a robust RL pipeline for the unified multimodal model DeepGen, which has gained 580+ GitHub stars and 2500+ HuggingFace downloads.

Previously, he interned at Microsoft Research Asia, closely collaborating with Han Hu, Zheng Zhang, and Houwen Peng on LLM pretraining. He received his B.S. from USTC in 2023. He enjoys vibe coding and exploring the unknown, and aspires to push the boundaries of multimodal machine intelligence.

GitHub / Google Scholar / LinkedIn

Research

I'm interested in reinforcement learning, visual generation, unified multimodal models, and evaluation for generative systems.

	Optimizing Visual Generative Models via Distribution-wise Rewards *Ruihang Li, Mengde Xu, Shuyang Gu, Leigang Qu, Fuli Feng, Han Hu, Wenjie Wang ICML 2026 Main* Proposes a distribution-wise RL framework for visual generation to mitigate reward hacking and mode collapse. By employing an efficient subset-replace strategy, this approach significantly improves the visual quality and diversity on the SiT model.
	DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing Dianyi Wang, Ruihang Li, Feng Han, Chaofan Ma, Wei Song, Siyuan Wang, Yibin Wang, Yi Xin, Hongjian Liu, Zhixiong Zhang, Shengyuan Ding, Tianhang Wang, Zhenglin Cheng, Tao Lin, Cheng Jin, Kaicheng Yu, Jingjing Chen, Wenjie Wang, Zhongyu Wei, Jiaqi Wang Technical Report*, 2026 paper / page / code / blog (量子位) Presents a lightweight unified multimodal model for image generation and editing. Utilizes MR-GRPO for stable 1,500-step RL training, improved text rendering, and stronger 5B-model generation and editing performance.
	GenArena: How Can We Achieve Human-Aligned Evaluation for Visual Generation Tasks? *Ruihang Li, Leigang Qu, Jingxu Zhang, Dongnan Gui, Mengde Xu, Xiaosong Zhang, Han Hu, Wenjie Wang, Jiaqi Wang Preprint*, 2026 paper / page / code Introduces GenArena, a pairwise comparison framework using open-source VLM judges for visual generation evaluation. It improves evaluation accuracy by +25.4% and produces rankings with an 86% match rate to LMArena across 15+ models.
	ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws *Ruihang Li, Yixuan Wei, Miaosen Zhang, Nenghai Yu, Han Hu, Houwen Peng EMNLP 2024 Main* paper / code Presents ScalingFilter, a reference-free text data filtering method for LLM pretraining. By inversely applying scaling laws, it improves downstream performance by 1.12% over previous state-of-the-art filtering methods while preserving stronger semantic diversity.

Design and source code from Jon Barron's website