Zeyuan Allen-Zhu is a research scientist at Meta, specialising in the physics of language models and AI.
Zeyuan Allen-Zhu received his Doctor of Science in Computer Science from the Massachusetts Institute of Technology (MIT), advised by Jon Kelner and Silvio Micali. He also holds a Master's degree in Computer Science and a Bachelor's degree in Mathematics and Physics, both from MIT, summa cum laude. During his Bachelor's, he was awarded the Chi-Sun Yeh prize for his major in physics.
Zeyuan Allen-Zhu is currently an AI research scientist at Meta/FAIR Labs, a position he has held since 2022. Prior to this, he was a senior researcher at Microsoft Research Redmond from 2017, becoming a principal researcher during his time there. From 2015 to 2017, he was a postdoc at Princeton and IAS, hosted by Elad Hazan and Avi Wigderson.
Zeyuan Allen-Zhu's research focuses on investigating the physics of language models and AI, designing experiments to uncover the fundamental principles governing how transformers/GPTs learn to perform various tasks. He aims to understand the intricate physical mechanisms behind large language models by probing the neurons of pre-trained transformers.
Previously, he worked on the mathematics of deep learning, developing theoretical proofs to explain the learnability of neural networks and certain phenomena observed in deep learning. He has also worked in machine learning, optimisation theory, and theoretical computer science.
Zeyuan Allen-Zhu has received several awards and recognition for his work. His paper on ensemble/knowledge distillation received an award from ICLR'23. He also holds the following accolades:
Zeyuan Allen-Zhu has numerous publications, including:
Youtube Title: Theory of accelerated methods - Zeyuan Allen-Zhu
Youtube Link: link
Youtube Channel Name: Institute for Advanced Study
Youtube Channel Link: https://www.youtube.com/@videosfromIAS
Theory of accelerated methods - Zeyuan Allen-Zhu
Youtube Title: Accelerated stochastic gradient ..first-order optimization - Zeyuan Allen-Zhu
Youtube Link: link
Youtube Channel Name: Institute for Advanced Study
Youtube Channel Link: https://www.youtube.com/@videosfromIAS
Accelerated stochastic gradient ..first-order optimization - Zeyuan Allen-Zhu
Youtube Title: Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU
Youtube Link: link
Youtube Channel Name: Zeyuan Allen-Zhu
Youtube Channel Link: https://www.youtube.com/@zhuzeyuan
Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU
Youtube Title: Three ICML 2016 Talks on Optimization
Youtube Link: link
Youtube Channel Name: Zeyuan Allen-Zhu
Youtube Channel Link: https://www.youtube.com/@zhuzeyuan
Three ICML 2016 Talks on Optimization
Youtube Title: ICML 2017 Tutorial: Recent Advances in Stochastic Convex and Non-Convex Optimization
Youtube Link: link
Youtube Channel Name: Zeyuan Allen-Zhu
Youtube Channel Link: https://www.youtube.com/@zhuzeyuan
ICML 2017 Tutorial: Recent Advances in Stochastic Convex and Non-Convex Optimization
Youtube Title: Using Optimization to Solve Positive LPs Faster in Parallel
Youtube Link: link
Youtube Channel Name: Zeyuan Allen-Zhu
Youtube Channel Link: https://www.youtube.com/@zhuzeyuan
Using Optimization to Solve Positive LPs Faster in Parallel
Youtube Title: Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter
Youtube Link: link
Youtube Channel Name: Zeyuan Allen-Zhu
Youtube Channel Link: https://www.youtube.com/@zhuzeyuan
Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter
Youtube Title: ICML 2017 Tutorial: Recent Advances in Stochastic Convex and Non-Convex Optimization (audio fixed)
Youtube Link: link
Youtube Channel Name: Zeyuan Allen-Zhu
Youtube Channel Link: https://www.youtube.com/@zhuzeyuan
ICML 2017 Tutorial: Recent Advances in Stochastic Convex and Non-Convex Optimization (audio fixed)
Youtube Title: LazySVD: Even Faster SVD Decomposition Yet Without Agonizing Pain
Youtube Link: link
Youtube Channel Name: Zeyuan Allen-Zhu
Youtube Channel Link: https://www.youtube.com/@zhuzeyuan
LazySVD: Even Faster SVD Decomposition Yet Without Agonizing Pain
Youtube Title: Linear Coupling of Gradient and Mirror Descent
Youtube Link: link
Youtube Channel Name: Zeyuan Allen-Zhu
Youtube Channel Link: https://www.youtube.com/@zhuzeyuan
Linear Coupling of Gradient and Mirror Descent
Youtube Title: Optimal Black-Box Reductions Between Optimization Objectives
Youtube Link: link
Youtube Channel Name: Zeyuan Allen-Zhu
Youtube Channel Link: https://www.youtube.com/@zhuzeyuan
Optimal Black-Box Reductions Between Optimization Objectives
Youtube Title: Optimal Experimental Design via A New Regret Minimization Framework
Youtube Link: link
Youtube Channel Name: Zeyuan Allen-Zhu
Youtube Channel Link: https://www.youtube.com/@zhuzeyuan
Optimal Experimental Design via A New Regret Minimization Framework
Youtube Title: Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization
Youtube Link: link
Youtube Channel Name: Zeyuan Allen-Zhu
Youtube Channel Link: https://www.youtube.com/@zhuzeyuan
Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization
Youtube Title: First Efficient Convergence for Streaming k-PCA: a Global, Gap-Free, and Near-Optimal Rate
Youtube Link: link
Youtube Channel Name: Zeyuan Allen-Zhu
Youtube Channel Link: https://www.youtube.com/@zhuzeyuan
First Efficient Convergence for Streaming k-PCA: a Global, Gap-Free, and Near-Optimal Rate
Youtube Title: How to Swing By Saddle Points: Faster Non-Convex Optimization Than SGD
Youtube Link: link
Youtube Channel Name: Zeyuan Allen-Zhu
Youtube Channel Link: https://www.youtube.com/@zhuzeyuan
How to Swing By Saddle Points: Faster Non-Convex Optimization Than SGD
Youtube Title: Backward Feature Correction: How Deep Learning Performs Deep Learning (May 2020 by Yuanzhi Li)
Youtube Link: link
Youtube Channel Name: Zeyuan Allen-Zhu
Youtube Channel Link: https://www.youtube.com/@zhuzeyuan
Backward Feature Correction: How Deep Learning Performs Deep Learning (May 2020 by Yuanzhi Li)
Youtube Title: 03 - Allen-Zhu - Linear Coupling of Gradient and Mirror Descent
Youtube Link: link
Youtube Channel Name: ITCS Conference
Youtube Channel Link: https://www.youtube.com/@itcsconference6649
03 - Allen-Zhu - Linear Coupling of Gradient and Mirror Descent
Youtube Title: Nearly-Linear Time Positive LP Solver with Faster Convergence Rate (STOC 2015)
Youtube Link: link
Youtube Channel Name: Zeyuan Allen-Zhu
Youtube Channel Link: https://www.youtube.com/@zhuzeyuan
Nearly-Linear Time Positive LP Solver with Faster Convergence Rate (STOC 2015)
Youtube Title: Knightian Self Uncertainty in the VCG Mechanism for Unrestricted Combinatorial Auctions
Youtube Link: link
Youtube Channel Name: Zeyuan Allen-Zhu
Youtube Channel Link: https://www.youtube.com/@zhuzeyuan
Knightian Self Uncertainty in the VCG Mechanism for Unrestricted Combinatorial Auctions