Research
I'm interested in reinforcement learning, visual generation, unified multimodal models, and evaluation for generative systems.
|
|
Optimizing Visual Generative Models via Distribution-wise Rewards
Ruihang Li, Mengde Xu, Shuyang Gu, Leigang Qu, Fuli Feng, Han Hu, Wenjie Wang
ICML 2026 Main
Proposes a distribution-wise RL framework for visual generation to mitigate reward hacking and mode collapse. By employing an efficient subset-replace strategy, this approach significantly improves the visual quality and diversity on the SiT model.
|
|
DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing
Dianyi Wang*, Ruihang Li*, Feng Han*, Chaofan Ma*, Wei Song*, Siyuan Wang*, Yibin Wang*, Yi Xin, Hongjian Liu, Zhixiong Zhang, Shengyuan Ding, Tianhang Wang, Zhenglin Cheng, Tao Lin, Cheng Jin, Kaicheng Yu, Jingjing Chen, Wenjie Wang, Zhongyu Wei, Jiaqi Wang
Technical Report, 2026
paper
/ page
/ code
/ blog (量子位)
Presents a lightweight unified multimodal model for image generation and editing. Utilizes MR-GRPO for stable 1,500-step RL training, improved text rendering, and stronger 5B-model generation and editing performance.
|
|
GenArena: How Can We Achieve Human-Aligned Evaluation for Visual Generation Tasks?
Ruihang Li, Leigang Qu, Jingxu Zhang, Dongnan Gui, Mengde Xu, Xiaosong Zhang, Han Hu, Wenjie Wang, Jiaqi Wang
Preprint, 2026
paper
/ page
/ code
Introduces GenArena, a pairwise comparison framework using open-source VLM judges for visual generation evaluation. It improves evaluation accuracy by +25.4% and produces rankings with an 86% match rate to LMArena across 15+ models.
|
|
ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws
Ruihang Li, Yixuan Wei, Miaosen Zhang, Nenghai Yu, Han Hu, Houwen Peng
EMNLP 2024 Main
paper
/ code
Presents ScalingFilter, a reference-free text data filtering method for LLM pretraining. By inversely applying scaling laws, it improves downstream performance by 1.12% over previous state-of-the-art filtering methods while preserving stronger semantic diversity.
|
|