doi.bio/weizhu_chen
Weizhu Chen
Weizhu Chen is a researcher in the field of artificial intelligence and machine learning, currently working as a leader of a modelling team in Microsoft Gen AI. Chen's work focuses on large-scale model training, with a particular interest in OpenAI and Microsoft models.
Education & Career
- Hong Kong University of Science and Technology
- Microsoft
- Google (current)
Publications
Weizhu Chen has an extensive list of publications, including:
- Relative Preference Optimisation: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts (2024)
- SciAgent: Tool-augmented Language Models for Scientific Reasoning (2024)
- Multi-LoRA Composition for Image Generation (2024)
- Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning (2024)
- Code Execution with Pre-trained Language Models (2023)
- Making Language Models Better Reasoners with Step-Aware Verifier (2023)
- DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models (2023)
- RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation (2023)
- Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy (2023)
- Skill-Based Few-Shot Selection for In-Context Learning (2023)
- CodeT: Code Generation with Generated Tests (2023)
- DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing (2023)
- Diffusion-GAN: Training GANs with Diffusion (2023)
- LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation (2023)
- Meet in the Middle: A New Pre-training Paradigm (2023)
- Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models (2023)
- In-Context Learning Unlocked for Diffusion Models (2023)
- AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation (2023)
- Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models (2023)
- Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback (2023)
- CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing (2022)
- XLM-K: Improving Cross-Lingual Language Model Pre-training with Multilingual Knowledge (2022)
- Finding the Dominant Winning Ticket in Pre-Trained Language Models (2022)
- A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models (2022)
- Controllable Natural Language Generation with Contrastive Prefixes (2022)
- DialogVED: A Pre-trained Latent Variable Encoder-Decoder Model for Dialog Response Generation (2022)
- A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation (2022)
- LoRA: Low-Rank Adaptation of Large Language Models (2022)
- No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models (2022)
- OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering (2022)
- ALLSH: Active Learning Guided by Local Sensitivity and Hardness (2022)
- PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (2022)
- CodeRetriever: A Large Scale Contrastive Pre-Training Method for Code Search (2022)
Youtube Videos
Youtube Title: Developer Tech Minutes: AI for Natural Language Understanding
Youtube Link: link
Youtube Channel Name: Microsoft Developer
Youtube Channel Link: https://www.youtube.com/@MicrosoftDeveloper
Developer Tech Minutes: AI for Natural Language Understanding