I am currently an Senior Algorithm Engineer at Big Data and Intelligence Lab, Alibaba Cloud. I received my master's degree in Computer Vision from Zhejiang University in 2022, advised by Prof. Zhiyu Xiang. I got my bachelor's degree also from Zhejiang University in 2019.
ResearchI specialize in multi-modal large language models (MLLMs) and various vision application in 3D computer vision. My current work focuses on multi-modal reasoning, particularly in areas such as multi-modal Chain-of-Thought (CoT) and spatial reasoning within MLLMs. Previously, I have worked extensively on 3D vision derived from 2D images, encompassing 3D visual localization, 3D scene understanding, and the development of 3D vision-language models.