Malachy Xinyu Yang

They/Them

Ph.D. Candidate

Carnegie Mellon University

xinyuagi@cmu.edu

Welcome! I'm delighted to have you here on this page. Sending you a virtual but heartfelt greeting! 👋

I am an incoming third-year Ph.D. student in InfiniAI Lab InfiniAI Logo and Catalyst at CMU CMU Logo , where I am fortunated to be advised by Prof. Beidi Chen. I also collaborate closely with Prof. Tianqi Chen at CMU and Prof. Huaxiu Yao at UNC UNC Logo . Previously, I obtained my bachelar's degree from ACM Honors Class, Zhiyuan College, Shanghai Jiao Tong University SJTU Logo , where I conducted research with Prof. Junchi Yan at ThinkLab Thinklab Logo . I had a wonderful time through internships with Prof. Song Han in HAN Lab HAN Logo at MIT MIT Logo , Prof. Chelsea Finn in IRIS Lab IRIS Logo at Stanford Stanford Logo , and Dr. Luca Zancato at Amazon Web Services Amazon Logo .

Additionally, I am a passionate community builder, which I founded the series of Foundation Models in the Wild workshops. Please follow us on Twitter for the latest news, or join us on the Slack for workshop issues and discussions. I also lead-organize the Reliable and Responsible Foundation Models Workshop and the Advances in Sequence modeling from Algorithmic Perspectives (ASAP) Seminar Series.

News

I will dedicate 30 mins every week for meetings to chat about life, career plan, or research ideas related to foundation models. I especially welcome and encourage students from underrepresented groups to reach out and will prioritize these meetings. Please email me to schedule a meeting if you are interested.

June 19, 2025

I will be giving a talk on Multiverse at JIQIZHIXIN (机器之心).

June 17, 2025

I will be giving a talk on Multiverse at the ASAP Seminar Series (Slides, Video).

June 16, 2025

Our preprint paper released: Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation. Check out the website for more details.

May 1, 2025

We welcome submissions to The 2nd Workshop on Reliable and Responsible Foundation Models at ICML 2025. More details are available on our webpage.

April 24, 2025

I'm lead-organizing the 2nd Workshop on Foundation Models in the Wild at ICLR 2025. Thanks to all co-organizers and speakers for their support. See you at Singapore.

April 13, 2025

Excited to share that our proposal for The 2nd Workshop on Reliable and Responsible Foundation Models has been accepted to ICML 2025. Stay tuned for updates and details.

Jan 22, 2025

APE: Faster and Longer Context‑Augmented Generation via Adaptive Parallel Encoding and Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity have been accepted to ICLR 2025!

Jan 13, 2025

We welcome submissions to The 2nd Workshop on Foundation Models in the Wild at ICLR 2025. More details are available on our webpage.

Dec 20, 2024

Excited to share that our proposal for The 2nd Workshop on Foundation Models in the Wild has been accepted to ICLR 2025. Stay tuned for updates and details.

Dec 09, 2024

Our NeurIPS paper released: S²FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity. Check out the code and blog for more details.

Sep 30, 2024

I'm co-organizing The 3rd Workshop for Out-of-Distribution Generalization in Computer Vision Foundation Models at ECCV 2024. See you at MiCo Milano.

July 26, 2024

I'm lead-organizing Workshop on Foundation Models in the Wild at ICML 2024. Thanks to all co-organizers and speakers for their support. See you at Vienna.

Research Highlights

My research is centered on the intersection of machine learning system and foundation model, with a specific focus on the development of scalable and generalizable foundation model systems in the wild. Recently, I am particularly interested in hardware-aware algorithm design with sub-linear complexity.

Infinite-length Retrieval: For each single query, how can we retrieve relevant informtion from an $O(n)$-length contextual cache in foundation models using $O(\log(n))$ computations and positions?
Infinite-depth Reasoning: An $O(n)$-depth reasoning problem requires either a Transformer with $O(\log(n))$-layer or $O(n)$-width. Can we achieve equivalent capabilities through repeating layers?
Infinite-volume Memory: How to encode $O(n)$-volume knowledge into model parameters? The model size can grow linearly, but computational costs should be $O(\log(n))$ using sparse activation.

Additionally, I am fascinated with structural contextual cache architectures that transcend traditional sequential patterns. This is important for enhancing parallelism in agentic foundation models.

Uni-directional and Bi-directional Relations: Typically, relations can be categorized as uni- or bi-directional. How can we seperate them with different attention masks and position embeddings?
One-to-many and Many-to-one Relations: Current cache architectures only support one-to-many relations. How can we further enable many-to-one relations to enabling information aggregation?
Static and Dynamic Relations: Current cache establishes static relations, but operations involve dynamic relations, as they might not depend on prior information or affect subsequent information.

If you would like to chat more about these topics, please feel free to email me to schedule a meeting.

Publications

Preprint

Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Xinyu Yang*, Yuwei An*, Hongyi Liu, Tianqi Chen, Beidi Chen

Paper / Project Page / Code / Slides / Video

ICLR 2025

APE: Faster and Longer Context‑Augmented Generation via Adaptive Parallel Encoding

Xinyu Yang, Tianqi Chen, Beidi Chen

Paper / Project Page / Code / Slides

ICRA 2025

FlatFusion: Delving into Details of Sparse Transformer-based Camera-LiDAR Fusion for Autonomous Driving

Yutao Zhu, Xiaosong Jia, Xinyu Yang, Junchi Yan

Paper

Preprint

VcLLM: Video Codecs are Secretly Tensor Codecs

Ceyu Xu*, Yongji Wu*, Xinyu Yang*, Beidi Chen, Matthew Lentz, Danyang Zhuo, Lisa Wu Wills

Paper

Preprint

It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF

Taiming Lu*, Lingfeng Shen*, Xinyu Yang*, Weiting Tan, Beidi Chen, Huaxiu Yao

Paper

ICLR 2025

Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity

Wentao Guo, Jikai Long, Yimeng Zeng, Zirui Liu, Xinyu Yang, Yide Ran, Jacob R Gardner, Osbert Bastani, Christopher De Sa, Xiaodong Yu, Beidi Chen, Zhaozhuo Xu

Paper

NeurIPS 2024

S²FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity

Xinyu Yang, Jixuan Leng, Geyang Guo, Jiawei Zhao, Ryumei Nakada, Linjun Zhang, Huaxiu Yao, Beidi Chen

Paper / Project Page / Code

COLM 2024

Triforce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Hanshi Sun, Zhuoming Chen, Xinyu Yang, Yuandong Tian, Beidi Chen

Paper / Project Page / Code

ICML 2024

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

Harry Dong, Xinyu Yang, Zhenyu Zhang, Zhangyang Wang, Yuejie Chi, Beidi Chen

Paper / Code

ICLR 2024 (Spotlight)

Improving Domain Generalization with Domain Relations

Huaxiu Yao*, Xinyu Yang*, Xinyi Pan, Shengchao Liu, Pang Wei Koh, Chelsea Finn

Paper

Preprint

Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges

Chenhang Cui*, Yiyang Zhou*, Xinyu Yang, Shirley Wu, Linjun Zhang, James Zou, Huaxiu Yao

Paper / Code

TMLR

Multi-Domain Long-Tailed Learning by Augmenting Disentangled Representations

Xinyu Yang*, Huaxiu Yao*, Allan Zhou, Chelsea Finn

Paper / Code

ICRA 2023

Bevfusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

Zhijian Liu*, Haotian Tang*, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela L Rus, Song Han

Paper / Project Page / Code / Poster

CVPR 2023

FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

Zhijian Liu*, Xinyu Yang*, Haotian Tang, Shang Yang, Song Han

Paper / Project Page / Code / Poster

KDD 2022

Variational Inference for Training Graph Neural Networks in Low-Data Regime through Joint Structure-Label Estimation

Danning Lao*, Xinyu Yang*, Qitian Wu, Junchi Yan

Paper / Code / Poster / Slides

Experience

Education

Carnegie Mellon University

Ph.D. student in ECE, advised by Prof. Beidi Chen.
Aug 2023 - Present

Shanghai Jiao Tong University

B.Eng. in CS, advised by Prof. Yong Yu. (Ranking: 1/29, GPA: 4.0/4.3)
Sep 2019 - Jun 2023

Research

Stanford University

Research Intern, advised by Prof. Chelsea Finn.
Mar 2022 - Oct 2023

Massachusetts Institute of Technology

Research Intern, advised by Prof. Song Han.
Nov 2021 - Jun 2023

Industry

Amazon Search

Applied Scientist Intern, mentored by Dr. Chen Luo.
May 2024 - Aug 2024

Service

Seminar Organization

Lead Organizer, Advances in Sequence modeling from Algorithmic Perspectives

Workshop Organization

Lead Organizer and Program Chair, The 2nd Workshop on Reliable and Responsible Foundation Models, ICML 2025
Lead Organizer and Program Chair, The 2nd Workshop on Foundation Models in the Wild, ICLR 2025
Lead Organizer and Program Chair, Workshop on Foundation Models in the Wild, ICML 2024
Organizer, The 3rd Workshop for Out-of-Distribution Generalization in Computer Vision Foundation Models, ECCV 2024

Conference Review

International Conference on Machine Learning (ICML), 2024-2025
International Conference on Learning Representations (ICLR), 2025
Conference on Neural Information Processing Systems (NeurIPS), 2024-2025
Conference on Language Modeling (COLM), 2024
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Empirical Methods in Natural Language Processing (EMNLP), 2024
IEEE International Conference on Robotics & Automation (ICRA), 2025

Journal Review

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
Transactions on Machine Learning Research (TMLR)
IEEE Robotics and Automation Letters (RA-L)

Contact Me

Misc

Before moving to the US, I studied and lived in Shanghai, China for the first two decades of my life.
Aside from research, I am passionate about traveling, dining, gaming, shopping, anime and manga.

News

Research Highlights

Publications

Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

APE: Faster and Longer Context‑Augmented Generation via Adaptive Parallel Encoding

FlatFusion: Delving into Details of Sparse Transformer-based Camera-LiDAR Fusion for Autonomous Driving

VcLLM: Video Codecs are Secretly Tensor Codecs

It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF

Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity

S2FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity

Triforce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

Improving Domain Generalization with Domain Relations

Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges

Multi-Domain Long-Tailed Learning by Augmenting Disentangled Representations

Bevfusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

Variational Inference for Training Graph Neural Networks in Low-Data Regime through Joint Structure-Label Estimation

Experience

Education

Research

Industry

Service

Seminar Organization

Workshop Organization

Conference Review

Journal Review

Contact Me

Misc

S²FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity