Weizhu Chen is a researcher in the field of artificial intelligence and machine learning, currently working as a leader of a modelling team in Microsoft Gen AI. Chen's work focuses on large-scale model training, with a particular interest in OpenAI and Microsoft models.
Education & Career
Hong Kong University of Science and Technology
Microsoft
Google (current)
Publications
Weizhu Chen has an extensive list of publications, including:
Relative Preference Optimisation: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts (2024)
SciAgent: Tool-augmented Language Models for Scientific Reasoning (2024)
Multi-LoRA Composition for Image Generation (2024)
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning (2024)
Code Execution with Pre-trained Language Models (2023)
Making Language Models Better Reasoners with Step-Aware Verifier (2023)
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models (2023)
RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation (2023)
Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy (2023)
Skill-Based Few-Shot Selection for In-Context Learning (2023)
CodeT: Code Generation with Generated Tests (2023)
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing (2023)
Diffusion-GAN: Training GANs with Diffusion (2023)
LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation (2023)
Meet in the Middle: A New Pre-training Paradigm (2023)
Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models (2023)
In-Context Learning Unlocked for Diffusion Models (2023)
AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation (2023)
Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models (2023)
Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback (2023)
CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing (2022)
XLM-K: Improving Cross-Lingual Language Model Pre-training with Multilingual Knowledge (2022)
Finding the Dominant Winning Ticket in Pre-Trained Language Models (2022)
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models (2022)
Controllable Natural Language Generation with Contrastive Prefixes (2022)
DialogVED: A Pre-trained Latent Variable Encoder-Decoder Model for Dialog Response Generation (2022)
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation (2022)
LoRA: Low-Rank Adaptation of Large Language Models (2022)
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models (2022)
OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering (2022)
ALLSH: Active Learning Guided by Local Sensitivity and Hardness (2022)
PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (2022)
CodeRetriever: A Large Scale Contrastive Pre-Training Method for Code Search (2022)
Youtube Videos
Youtube Title: Developer Tech Minutes: AI for Natural Language Understanding