πŸ€” About-me

I earned my Master’s degree in Artificial Intelligence from Tsinghua University , where I conducted research under the supervision of Prof. Haoqian Wang and collaborated closely with Prof. Yebin Liu on 3DV & Human Avatar Reconstruction. Prior to this, I completed my B.Eng. in Measurement and Control Technology & Instruments at Southeast University . During my graduate studies, I also had the privilege of visiting Harvard University as a research intern, working with Prof. Hanspeter Pfister on computer graphics.

I am currently an Researcher at ByteDance Seed , focusing on cutting-edge challenges in perception, generation, and world model.

Google Scholar Citations

I am actively recruiting research interns to collaborate on:

πŸ“Œ 3D Scene Perception.

πŸ“Œ 3D Content Creation.

πŸ“Œ World Model.

If you are seeking any form of academic cooperation, please feel free to email me at qinminghan1999@gmail.com.

πŸ”₯ News

  • 2024.07: Β πŸŽ‰πŸŽ‰ 1 paper accepted to ACM MM 2025 !!!
  • 2025.06: Β πŸŽ‰πŸŽ‰ 2 paper accepted to ICCV 2025 !!!
  • 2025.06: Β πŸŽ‰πŸŽ‰ NOVA3D has been selected as ICME 2025 Bestpaper Candidate!!!
  • 2025.02: Β πŸŽ‰πŸŽ‰ 2 paper accepted to CVPR 2025 !!!
More News - *2024.09*:  πŸŽ‰πŸŽ‰ 1 paper accepted to NeurIPS 2024 !!! - *2024.07*:  πŸŽ‰πŸŽ‰ 1 paper accepted to ACM MM 2024 !!! - *2024.02*:  πŸŽ‰πŸŽ‰ 2 paper accepted to ECCV 2024 !!! - *2024.02*:  πŸŽ‰πŸŽ‰ LangSplat has been selected as CVPR 2024 Highlight !!! - *2024.02*:  πŸŽ‰πŸŽ‰ 1 paper accepted to CVPR 2024 !!! - *2023.11*:  πŸŽ‰πŸŽ‰ 1 paper accepted to AAAI 2024 !!!

πŸ“ Publications

Vision-Language 3D Perception Generation Model Digital Human 3D Reconstruction

Vision-Language 3D Perception

CVPR 2024 Highlight
sym

LangSplat: 3D Language Gaussian Splatting

Minghan Qin*, Wanhua Li*†, Jiawei Zhou*, Haoqian Wang†, Hanspeter Pfister

Website

minghanqin%2FLangSplat | Trendshift

  • We introduces LangSplat, which constructs a 3D language field that enables precise and efficient open-vocabulary querying within 3D spaces.
CVPR 2025
sym

4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models

Wanhua Li*, Renping Zhou*, Jiawei Zhou, Yingwei Song, Johannes Herter, Minghan Qin, Gao Huang, Hanspeter Pfister

Website

  • We present 4D LangSplat, an approach to constructing a dynamic 4D language field in evolving scenes, leveraging Multimodal Large Language Models.
Arxiv
sym

LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding

Hao Li*, Minghan Qin*†, Zhengyu Zou*, Diqi He, Bohan Li, Bingquan Dai, Dingwen Zhang†, Junwei Han

Website

  • We propose LangSurf, a model that aligns language features with object surfaces to enhance 3D scene understanding
ACM MM 2025
sym

SLGaussian: Fast Language Gaussian Splatting in Sparse Views

Kangjie Chen, Bingquan Dai, Minghan Qin, Dongbin Zhang, Peihao Li, Yingshuang Zou, Haoqian Wang†

Website

  • We propose SLGaussian, a feed-forward method for constructing 3D semantic fields from sparse viewpoints, allowing direct inference of 3DGS-based scenes.

Generation Model

ICCV 2025
sym

VAP: Precise Action-to-Video Generation through Visual Action Prompts

Yuang Wang, Chao Wen, Haoyu Guo, Sida Peng, Minghan Qin, Hujun Bao, Xiaowei Zhou, Ruizhen Hu

Website

  • VAP harnesses subject renderings as action proxies for interactive video generation, striking an balance between precision and generality in action representation.
ICME 2025, Best Paper Award Candidate
sym

NOVA3D: Normal Aligned Video Diffusion Model for Single Image to 3D Generation

Yuxiao Yang, Peihao Li, Yuhong Zhang, Junzhe Lu, Xianglong He, Minghan Qin, Weitao Wang, Haoqian Wang†

  • NOVA3D unleashes geometric 3D prior from a video diffusion model to generate high-quality textured meshes from input image.

Digital Human

ICCV 2025
sym

GUAVA: Generalizable Upper Body 3D Gaussian Avatar

Dongbin Zhang, Yunfei Liu, Lijian Lin, Ye Zhu, Yang Li, Minghan Qin, Yu Li, Haoqian Wang

Website

  • For each single image with a tracked pose, GUAVA can reconstruct a 3D upper-body Gaussian avatar via feed-forward inference within sub-second time, enabling real-time expressive animation and novel view synthesis at 512βœ–οΈ512 resolution.
CVPR 2025
sym

HRAvatar: High-Quality and Relightable Gaussian Head Avatar

Dongbin Zhang, Yunfei Liu, Lijian Lin, Ye Zhu, Kangjie Chen, Minghan Qin, Yu Li†, Haoqian Wang†

Website

  • With monocular video input, HRAvatar reconstructs a high-quality, animatable 3D head avatar that enables realistic relighting effects and simple material editing.
ACM MM 2024
sym

Animatable 3d gaussian: Fast and high-quality reconstruction of multiple human avatars

Yang Liu, Xiang Huang, Minghan Qin, Qinwei Lin, Haoqian Wang (* indicates equal contribution)

Website

  • We propose Animatable 3D Gaussian, a novel neural representation for fast and high-fidelity reconstruction of multiple animatable human avatars, which can animate and render the model at interactive rate.
AAAI 2024
sym

High-Fidelity 3D Head Avatars Reconstruction through Spatially-Varying Expression Conditioned Neural Radiance Field

Minghan Qin*, Yifan Liu*, Yuelang Xu, Xiaochen Zhao, Yebin Liu†, Haoqian Wang†

Website - We introduce a novel Spatially-Varying Expression (SVE) conditioning, encompassing both spatial positional features and global expression information.

3D Reconstruction

NeurIPS 2024
sym

HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting

Yuanhao Cai , Zihao Xiao, Yixun Liang, Minghan Qin, Yulun Zhang, Xiaokang Yang, Yaoyao Liu, Alan Yuille

Website

  • The first 3D Gaussian splatting-based method for high dynamic range imaging
ECCV 2024
sym

Gaussian in the Wild: 3D Gaussian Splatting for Unconstrained Image Collections

Dongbin Zhang*, Chuming Wang*, Weitao Wang, Peihao Li, Minghan Qin, Haoqian Wang†

Website

  • We utilize 3D Gaussian Splatting with introduced separated intrinsic and dynamic appearance to reconstruct scenes from uncontrolled images, achieving high-quality results and a 1000 Γ— rendering speed increase.
ECCV 2024
sym

Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images

Chuanrui Zhang*, Yonggen Ling*†, Minglei Lu, Minghan Qin, Haoqian Wang†

Website Datasets

  • We present CODERS, a one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images.

πŸŽ– Honors and Awards

πŸ’» Research Experience

  • 2023.09 - 2024.4, Harvard University - VCG Lab - Computer Vision Group. I spent a good time with Prof. Hanspeter Pfister.

πŸ’ Academic Service

Reviewers of: CVPR, ECCV, ICCV, NeurIPS, SIGGRAPH, ACM MM, AAAI, 3DV, etc.