Back to dashboard

SDK orchestrator UI retest — researchers working on MoE routing and expert load…

completed93 qualified1 runApr 20, 1:44 PMsdk-orchestrator-ui-retest-researchers-working-on-moe-routin
Parsed4 topics · Researcher
Generating seed nodes
0 proposed
Explored 0 queries
0/0 done
    3
    Expanding nodes
    queued
    4
    Qualifying candidates
    queued

    Qualified Candidates (92)

    AB

    Abhimanyu Rajeshkumar Bambhaniya

    high hireability

    Research Intern@Meta

    Previously: Intern @ Google

    San Francisco, US

    • Direct MoE acceleration researcher at Meta (intern, Georgia Tech PhD)
    • Paper 'MoE-ERAS: Expert Residency Aware Selection' (2024) is on-topic expert selection for MoE
    • Research expertise explicitly includes 'MoE Model Acceleration, Sparse model training, HW-SW codesign'. h=6, hireability=high
    • SF-based
    AP

    Ashwinee Panda

    high hireability

    Postdoctoral Fellow@University of Maryland

    Previously: PhD Candidate @ Princeton University

    San Francisco, US

    • Active MoE routing researcher
    • PhD (postdoc UMD/TogetherAI)
    • Papers directly on MoE: 'Dense Backpropagation Improves Training for Sparse MoE' (NeurIPS 2025), 'Continual Pre-training of MoEs: How robust is your router?' (2025), 'StructMoE' (ICLR 2025 withdrawn)
    • ICLR 2025 outstanding paper award
    • Recent website activity (Feb 2026 new paper)
    • Open Philanthropy grantee
    • Job market now
    CX

    Chaojun Xiao

    high hireability

    Post-Doctoral Researcher@Tsinghua University

    Previously: Business Development Intern @ P.E.R.K. Consulting

    Beijing, CN

    • Research expertise explicitly lists 'mixture of expert, pre-training'
    • Has 'BlockFFN: Towards End-Side Acceleration-Friendly MoE with Chunk-Level Activation Sparsity' (2025)
    • Post-Doctoral Researcher at Tsinghua, h=22
    • Website active until Dec 2025
    • China-based limits mobility
    • Hireability: HIGH — postdoc is prime hiring target despite China location
    CZ

    Chenggang Zhao

    high hireability

    infra@DeepSeek AI

    ex-NVIDIA, SenseTime

    Hangzhou, CN

    • Core DeepSeek infra engineer with direct MoE architecture authorship
    • Co-authored DeepSeekMoE (575 citations, ultimate expert specialization), DeepSeek-V2 (531 citations, MoE LLM), and Auxiliary-Loss-Free Load Balancing (49 citations) — the most important recent paper on MoE load balancing without auxiliary loss
    • Also on DeepSeek-R1 (4614 citations) and DeepSeek-V3 (2341 citations)
    • Currently at DeepSeek AI infra team in Hangzhou
    • Tsinghua background
    CS

    Chenyang Song

    high hireability

    PhD student@PhD student, Tsinghua University

    Beijing, CN

    • Research expertise: 'Mixture of Experts, Activation Sparsity, LLM'. 'BlockFFN: Towards End-Side Acceleration-Friendly MoE with Chunk-Level Activation Sparsity' (2025)
    • Strong activation sparsity background: ReLU2 Wins (47 cites), ProSparse (38 cites), Sparsing Law (9 cites)
    • PhD @ Tsinghua. hireability=high, score=60.85
    • MoE + sparsity intersection is directly relevant
    • Beijing, CN
    FS

    Filip Szatkowski

    high hireability

    PhD Student@IDEAS NCBR

    Previously: Applied Scientist Intern @ Amazon

    Warsaw, PL

    • MoE conversion and adaptive computation researcher. 'Exploiting Activation Sparsity with Dense to Dynamic-k MoE Conversion' (13 cites, 2024) — converts dense models to dynamic MoE via activation sparsity
    • PhD student at IDEAS NCBR, h-index 4
    • Broader expertise in speculative decoding, knowledge distillation, conditional computation
    • GitHub active (fszatkowski)
    HN

    Huy Nguyen

    high hireability

    PhD Candidate@Department of Statistics and Data Sciences, The University of Texas at Austin

    Previously: Research Intern @ Microsoft

    Austin, US

    • PhD candidate at UT Austin, hireability=high, h=14
    • Dedicated MoE routing theorist: 5 MoE papers including 'Demystifying Softmax Gating Function in Gaussian Mixture of Experts' (2023, 43 cites), 'Statistical Perspective of Top-K Sparse Softmax Gating Mixture of Experts' (2023, 26 cites), 'On Least Square Estimation in Softmax Gating MoE' (2024, 22 cites), 'FuseMoE' (2024, 54 cites)
    • Strong theoretical grounding in MoE gating/routing from optimal transport perspective
    • Very active output (27 website changes, prev intern at Microsoft AI)
    JK

    Jakub Krajewski

    high hireability

    Head of Biofuels Development Office@ORLEN

    Previously: Biofuels Development Program Manager @ ORLEN

    Warsaw, PL

    • Core MoE scaling researcher. 'Scaling Laws for Fine-Grained MoE' (83 cites) + 'MoE-Mamba' (80 cites) + 'Joint MoE Scaling Laws' + 'Mu-Parametrization for MoE' + 'Scaling Fine-Grained MoE Beyond 50B'
    • PhD student at IDEAS NCBR (Warsaw), previously interned at Apple and Nvidia on LLMs. h-index 4
    • One of the most relevant candidates for MoE architecture and expert routing research
    JL

    Jan Ludziejewski

    high hireability

    AI Research Scientist@Mistral AI

    Previously: Badacz-doktorant @ IDEAS NCBR

    Warsaw, PL

    • AI Research Scientist at Mistral AI working directly on MoE scaling laws
    • Research expertise explicitly listed as 'LLM, Mixture of Experts, Scaling Laws, Pretraining'
    • Published 5 MoE papers including 'Scaling Laws for Fine-Grained Mixture of Experts' (83 citations), 'MoE-Mamba' (80 citations), and 'Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient' (2025)
    • Direct production MoE experience at Mistral, one of the top MoE model builders
    • PhD in ML (Warsaw). h=6
    • Poland-based
    LW

    Lean Wang

    high hireability

    Research Intern@DeepSeek

    Previously: PhD student @ Peking University

    Beijing, CN

    • Research Intern at DeepSeek, core contributor to DeepSeek's MoE architecture
    • Research expertise explicitly 'LLM backbone; MoE, interpretation & analysis of LLM'
    • Published 'Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts' (2025, 63 citations) — directly on expert load balancing, the core query topic
    • PhD student at PKU. h=6
    • Beijing-based
    PP

    Piotr Piekos

    high hireability

    PhD Student@KAUST

    Previously: Applied Scientist Intern @ Amazon

    Warsaw, PL

    • MoE attention and routing researcher. 'SwitchHead: Accelerating Transformers with MoE Attention' (23 cites, 2024) + 'Mixture of Sparse Attention: Content-Based Expert-Choice Routing' (2025)
    • PhD student at KAUST. h-index 4
    • Discovered via Joan Puigcerver (SoftMoE) collaboration
    • Expert-choice routing directly aligns with load balancing query
    TN

    TrungTin Nguyen

    high hireability

    MACSYS Postdoctoral Research Fellow (Applied Statistics)@Queensland University of Technology

    Previously: Postdoctoral Research Fellow @ University of Queensland

    Brisbane, AU

    • Postdoctoral Research Fellow at Queensland University of Technology (Applied Statistics/MACSYS)
    • Hireability 'high'
    • Dedicated MoE theorist: 'CompeteSMoE - Effective Training of Sparse MoE via Competition' (19 cites, 2024), 'HyperRouter: Towards Efficient Training and Inference of Sparse MoE' (15 cites, 2023), 'Towards Convergence Rates for Parameter Estimation in Gaussian-gated MoE' (20 cites, 2024), 'A non-asymptotic approach for model selection via penalization in high-dimensional MoE models' (25 cites, 2023)
    • Unique theory-to-practice profile bridging statistical MoE theory and efficient routing algorithms
    • LinkedIn update Feb 2026 indicates possible job search (new headline 'Statistician & Mathematician | Bridging Natural & Artificial Intelligence')
    VC

    Vitaliy Chiley

    high hireability

    Researcher@Meta

    Previously: Staff Research Scientist (Head of LLM Pretraining in MosaicAI Org) @ Databricks

    San Francisco, US

    • Researcher at Google DeepMind (recently moved from Meta)
    • Pinned repos include 'databricks/megablocks' — the MoE training library — plus llm-foundry and composer (MosaicML stack)
    • Published 'Training MoEs at Scale with PyTorch' (2024)
    • Strong LLM training background: co-authored MPT-7B (358 citations), DBRX (MoE model, 38 citations), LoRA Learns Less and Forgets Less (276 citations)
    • Deep MoE production experience via MosaicML/Databricks. h=8
    • SF-based
    XY

    Xiaozhe Yao

    high hireability

    Doctoral Student@ETH Zurich

    Previously: Research Scientist Intern @ Meta

    Zurich, CH

    • Doctoral Student at ETH Zurich (Zurich, CH), highly active coder (1374 GitHub contributions in 2024, 760 in 2025)
    • Published 'DeltaMoE: Memory-Efficient Inference for Merged Mixture of Experts with Delta Compression' (2025)
    • Primary research is ML systems and LLM inference (RedPajama, HexGen, DeltaZip)
    • MoE paper is recent but ML systems/inference focus is strong. h=8, hireability=high
    • Systems-oriented MoE work aligns with load balancing efficiency
    ZL

    Zhili Liu

    high hireability

    Ph.D. Candidate@Hong Kong University of Science and Technology

    Previously: Intern @ Huawei

    Hong Kong, CN

    • PhD candidate at HKUST, hireability=high, h=14
    • Research expertise: 'Diffusion Model, Mixture of Experts'
    • Top MoE paper: 'Mixture of cluster-conditional LoRA experts for vision-language instruction tuning' (2023, 113 cites)
    • Also 'Task-Customized Masked Autoencoder via Mixture of Cluster-conditional Experts' (2022, 30 cites), 'MoTE: Synergy of Thought Chains and Expert Mixtures' (2024)
    • Active research (13 website changes, multiple papers Jan 2026)
    • Based in Hong Kong
    AC

    Aakanksha Chowdhery

    medium hireability

    Member of Technical Staff@ReflectionAI

    Previously: Senior Staff Research Scientist @ Meta

    San Francisco, US

    • PhD, h-index 40
    • MoE researcher: authored DSelect-k differentiable top-k for MoE (2021) and sparse differentiable MoE routing (2022)
    • Now Member of Technical Staff at ReflectionAI in SF
    AA

    Ahmed Hassan Awadallah

    medium hireability

    Partner Research Manager@Microsoft

    • h-index 60
    • Strong MoE background: Sparsely Activated MoE (2023, 61 cites) and AutoMoE architecture search (2023, 17 cites)
    • Partner Research Manager at Microsoft
    AD

    Andrew M. Dai

    medium hireability
    • Senior researcher / co-founder at stealth startup (ex-Google DeepMind director)
    • Authored seminal 'Mixture-of-Experts with Expert Choice Routing' (NeurIPS 2022) and holds patent on 'Routing to expert subnetworks in mixture-of-experts neural networks' (2025). h_index ~56
    • Left Google DeepMind (Gemini Data Area Lead) Jan 2026 to found stealth startup — actively in motion and open to opportunity
    AK

    Aran Komatsuzaki

    medium hireability

    Founder@Stealth Startup

    Previously: Co-Founder @ XinobiAI

    San Francisco, US

    • Seminal MoE contributor: 'Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints' (2022, 191 cites) — pioneered dense→MoE conversion
    • Also co-authored GPT-J-6B (1075 cites) and LAION-400M (1873 cites)
    • Now Founder at stealth startup in SF. h=6
    • Despite hireability=low in DB, the MoE research impact is high and directly relevant
    BB

    Babak Ehteshami Bejnordi

    medium hireability

    Sr. Staff Engineer and Manager@Qualcomm

    Previously: Deep Learning and Computer Vision for Autonomous Driving @ Mapscape

    Amsterdam, NL

    • MoE and conditional computation researcher at Qualcomm AI Research, San Diego
    • Papers on 'Read-ME: Router-Decoupled MoE with System Co-Design' (2024), 'Revisiting Single-gated Mixtures of Experts' (2023, 8c), conditional gating networks (batch-shaping 97c), and early-exit methods
    • Bridges MoE routing with efficient inference systems. h=25
    • Hireability: MEDIUM
    BM

    Basil Mustafa

    medium hireability

    Staff Research Engineer@Google

    Previously: Senior Research Software Engineer @ Google

    Zurich, CH

    • Strong MoE researcher at Google Zurich (Staff Research Engineer). 6 MoE papers: V-MoE/Scaling Vision with Sparse MoE (NeurIPS 2021), LIMoE multimodal MoE (NeurIPS 2022), Sparse Upcycling (ICLR 2023), Soft MoE (ICLR 2024 spotlight), Sparse MoEs meet Ensembles
    • Conditional computation specialist
    • MS degree
    • Hireability flagged low by DB (stable Google role in Zurich), but MoE depth is exceptional
    BT

    Benjamin Thérien

    medium hireability

    Research Scientist Intern@Meta

    Previously: Applied Research Intern (LLM Pre-training) @ Capital One

    New York, US

    • Research Scientist Intern at Meta, PhD (Université de Montréal). 4 MoE papers focused directly on routing and training: 'Continual Pre-training of MoEs: How robust is your router?' (2025), 'Dense Backpropagation Improves Routing for Sparsely-Gated MoE' (2024), 'StructMoE: Structured MoE Using Low Rank Experts' (2024)
    • Strong focus on MoE routing robustness and training. h=7
    • Research expertise includes muP and LLMs
    • NY-based intern — likely finishing PhD soon
    CB

    Charlie Blake

    medium hireability

    AI research engineer@Graphcore

    Previously: MS student @ University of Oxford

    • Research expertise explicitly lists 'mixture-of-experts, routing networks' at Graphcore (IPU hardware company)
    • Strong numerics/low-precision background: SparQ Attention (77 cites), FP8 training (31 cites), unit scaling
    • No MoE-specific papers in DB but MoE routing is listed as core expertise
    • MS degree. h=5
    DD

    Damai Dai

    medium hireability

    Researcher@DeepSeek AI

    Previously: PhD student @ Peking University

    • Core DeepSeekMoE author at DeepSeek AI, Beijing
    • Lead work on DeepSeekMoE (592c) with fine-grained expert segmentation and shared-expert isolation, StableMoE stable routing strategy (106c), representation collapse in sparse MoE (136c), and DeepSeek-V2 MoE architecture (600c)
    • Direct expert on MoE routing and load balancing. h=25
    • Hireability: MEDIUM
    DD

    Di Dai

    medium hireability

    PhD student@Peking University

    Previously: MS student @ Nanjing University

    Beijing, CN

    • DeepSeekMoE co-author at Peking University, Beijing
    • Deep involvement in DeepSeekMoE (592c) routing and expert specialization, StableMoE stable routing (106c), representation collapse in sparse MoE (136c), and DeepSeek-V2 MoE (600c)
    • Direct MoE routing and training expertise. h=25
    • Hireability: MEDIUM
    DT

    Dustin Tran

    medium hireability

    Member of Technical Staff@xAI

    Previously: Senior Staff Research Scientist @ DeepMind

    • h-index 48
    • MoE paper: Sparse MoEs meet Efficient Ensembles (2021, 30 cites)
    • Member of Technical Staff at xAI
    • Strong Bayesian/probabilistic ML background
    GH

    ghostplant

    medium hireability
    • Top contributor to microsoft/tutel MoE library (160+ merged PRs)
    • Deep MoE infra expertise: added DeepSeek/Kimi 1T-param support, FP8/NVFP4/MXFP4 gating, expert token sort APIs, cudaGraph-compatible all_reduce, ROCm support
    • Also contributes to mscclpp (GPU-driven comms)
    • Microsoft employee
    • No public identity but clearly the primary engineer driving Tutel's MoE routing/load balancing implementation
    HH

    Haiyang Huang

    medium hireability

    Software Engineer@Google

    Previously: PhD student @ Duke University

    San Francisco, US

    • Software Engineer at Google (SF), PhD
    • Research expertise lists 'mixture of experts, large language model, machine learning infrastructure, machine learning inference' directly
    • Published 'Toward Efficient Inference for Mixture of Experts' (2024, 47 citations) and 'Towards MoE Deployment: Mitigating Inefficiencies in MoE Inference' (2023)
    • Focus on MoE inference efficiency and load balancing at serving time
    • Strong interpretable ML and dim-reduction background. h=6
    HL

    Hanxiao Liu

    medium hireability

    Member of Technical Staff@Microsoft

    Previously: Member of Technical Staff @ Inflection AI

    San Francisco, US

    • Core MoE routing researcher: co-author of 'Expert Choice Routing' (514 citations), JetMoE (56 citations), Mod-Squad (154 citations), ModuleFormer (43 citations), 'Dense training sparse inference' (33 citations)
    • Research Scientist at Microsoft (formerly Google Brain). h_index 32
    • Expert Choice Routing is a foundational load-balancing paper directly matching query
    HZ

    Haozhen Zhang

    medium hireability

    Ph.D. Student in Computer Science@Nanyang Technological University

    Previously: Research Assistant @ The Hong Kong University of Science and Technology

    Singapore, SG

    • LLM routing researcher. 'Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via RL' (5 cites, 2025) + 'Fusing LLM Capabilities with Routing Data' (2025)
    • PhD student at NTU Singapore
    • Research expertise: 'LLM Routers'
    • Active website (28 changes, recent)
    • Note: routing here is LLM-to-LLM routing, not MoE expert routing — adjacent but not identical to load balancing
    HG

    Huazuo Gao

    medium hireability

    Researcher@DeepSeek

    Previously: Undergrad student @ Peking University

    • Researcher at DeepSeek. h=14
    • Author of 'Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts' (2025) — directly addresses MoE expert load balancing, the core search topic
    • BS degree only but at DeepSeek working on frontier MoE systems (DeepSeek-MoE/V2/V3 team)
    • Extremely targeted match for MoE load balancing research
    JW

    Jason Wei

    medium hireability

    Research Scientist@Meta

    Previously: Research Scientist @ OpenAI

    San Francisco, US

    • h-index 47, Research Scientist at Meta, SF
    • MoE + instruction tuning paper (2024, 117 cites)
    • Known for chain-of-thought prompting; contributed to scaling MoE models with instruction tuning
    JJ

    Juyong Jiang

    medium hireability

    Research Intern@NHN

    Previously: Research Assistant @ Hong Kong University of Science and Technology

    Seoul, KR

    • HKUST PhD (currently Research Intern at NHN, Seoul) with MoE survey authorship. 'A Survey on Mixture of Experts in Large Language Models' (110 citations, 2025) and 'Shortcut-connected Expert Parallelism for Accelerating Mixture of Experts' (24 citations, 2025)
    • Research expertise covers MoE, LLMs, PEFT, RL
    • Survey paper signals broad domain knowledge; expert parallelism paper shows systems-level practical work. h=10, hireability=high
    LD

    Li Dong

    medium hireability

    Partner Group Product Manager@Microsoft

    Previously: Principal Group Program Manager, Bing Relevance @ Microsoft

    Seattle, US

    • Senior Research Scientist at Google DeepMind (Amsterdam). h-index 53
    • Published on Sparse Upcycling (training MoE from dense checkpoints, 2022)
    • Known for ViT and Vision Transformer scaling work
    • MoE paper count is thin — only 1 direct paper — but upcycling work is directly relevant to expert load balancing
    • High seniority at a top lab
    • Hireability 'medium' with no LinkedIn changes or site activity
    MS

    Maciej Stefaniak

    medium hireability

    LLM Researcher@IDEAS NCBR

    Previously: Machine Learning Developer @ TIDK

    Warsaw, PL

    • Co-author on MoE scaling papers from IDEAS NCBR group: 'Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient' (2025) + 'Mu-Parametrization for Mixture of Experts' (2025)
    • LLM Researcher at IDEAS NCBR, Warsaw
    • PhD. h-index 1 (early career)
    • Research expertise explicitly listed as 'LLM MoE'
    • Part of the same strong MoE research cluster as Jakub Krajewski
    MM

    Mayank Mishra

    medium hireability

    Graduate Student Researcher@University of California, Berkeley

    Previously: Research Engineer-II @ MIT-IBM Watson AI Lab

    Berkeley, US

    • PhD student at UC Berkeley with 1075 GitHub contributions in 2024 and 1461 in 2025 — highly active
    • Research focus: efficient LLM architectures and distributed training
    • Key MoE paper: 'Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models' (33 cites, 2024)
    • Also co-authored BLOOM (2161 cites), StarCoder (1286 cites), Granite Code Models (81 cites)
    • LinkedIn education update Jan 2026 and recent website activity (new paper Feb 2026)
    • Strong practical MoE + LLM systems background, very active open-source contributor
    MT

    Mingjie Tang

    medium hireability

    working on LLM systems and algorithms@iQuest Research Lab

    Previously: tech lead @ Ant Group

    • LLM systems researcher at iQuest Research Lab. h=16
    • Direct MoE work: 'MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA based Mixture of Experts' (2024, 96 cites) and 'DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism' (2024)
    • Active research output (9 website changes in last year, multiple new papers Jan 2026)
    • PhD
    • Broad LLM systems background with MoE routing specialization
    ND

    Nan Du

    medium hireability

    Member of Technical Staff@OpenAI

    Previously: Principal Researcher @ Apple

    San Francisco, US

    • Researcher at Capital One with h=13 and 4 MoE-specific papers: 'Continual Pre-training of MoEs: How robust is your router?' (2025), 'Dense Backpropagation Improves Routing for Sparsely-Gated Mixture-of-Experts' (2024), 'StructMoE: Structured MoE Using Low Rank Experts' (2024), 'StructMoE: Augmenting MoEs with Hierarchically Routed Low Rank Experts' (2024)
    • Research directly targets MoE routing robustness and hierarchical routing design
    • Top publications are in face recognition and NLP (Alexa Teacher Model, 80 cites)
    • The MoE routing work is recent and highly relevant despite industry/finance context
    OF

    Orhan Firat

    medium hireability

    Research Scientist@DeepMind

    Previously: Research Scientist @ Google

    New York, US

    • PhD, h-index 57, Research Scientist at DeepMind, NY
    • Routing Strategies for Multilingual MoE (2020) — early expert routing work for multilingual NMT
    • Also contributed to GShard and multilingual MT scaling
    PL

    Pingzhi Li

    medium hireability

    External Collaborator@Eigen AI

    Previously: Research Intern @ Apple

    San Francisco, US

    • Strong MoE routing researcher: 'Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy' (2024, 61 cites) — directly on MoE routing analysis
    • Also 'Advancing MoE Efficiency: C2R Strategy for Better Expert Parallelism' (2025), 'Hexa-MoE: Efficient and Heterogeneous-aware Training for MoE' (2024), 'QuantMoE-Bench' (20 cites), 'Finding Fantastic Experts in MoEs' (2025)
    • PhD @ UNC, now External Collaborator at Eigen AI, SF
    • LinkedIn shows recent move from Apple Research Intern. h=6
    • Multiple focused MoE routing papers
    SS

    Sambit Sahu

    medium hireability

    Vice President, Core LLM & Agentic AI, AI Foundations@Capital One

    Previously: Senior Engineering Manager, Alexa AI @ Amazon

    New York, US

    • PhD, h-index 55, VP Core LLM & Agentic AI at Capital One, NY
    • Active 2024-2025 MoE work: StructMoE structured expert design (2024), Dense Backpropagation for sparse MoEs (2024), Continual Pre-training of MoEs (2025)
    SB

    Shruti Bhosale

    medium hireability

    Research Engineer@Meta

    Previously: Research Engineer @ Meta

    San Francisco, US

    • MoE research engineer at Meta (Menlo Park)
    • Co-authored 'Efficient Large Scale Language Modeling with Mixtures of Experts' (238c), 'Towards MoE Deployment: Mitigating Inefficiencies in MoE Inference' (30c), 'Toward Efficient Inference for MoE' (17c), and 'Fixing MoE Over-fitting on Low-Resource Languages' (10c)
    • Practical MoE deployment and load-balancing experience. h=27
    • Hireability: MEDIUM
    SK

    Sneha Kudugunta

    medium hireability

    Researcher@Google

    Previously: Researcher @ DeepMind

    • Researcher at Google with PhD
    • Three dedicated MoE papers: 'Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference' (139 cites, 2021), 'Exploring Routing Strategies for Multilingual MoE Models' (2020), and a 2025 patent on routing within multitask MoE models
    • Primary research angle is MoE routing for multilingual/multitask NLP — directly relevant to expert load balancing
    • Also co-author on NLLB (1288 cites)
    • Strong signal from Google affiliation with continued MoE work through 2025
    SR

    Stephen Rawls

    medium hireability

    Researcher@CapitalOne

    Previously: Researcher @ Amazon

    • Researcher at Capital One with h=13 and 4 MoE-specific papers: 'Continual Pre-training of MoEs: How robust is your router?' (2025), 'Dense Backpropagation Improves Routing for Sparsely-Gated Mixture-of-Experts' (2024), 'StructMoE: Structured MoE Using Low Rank Experts' (2024), 'StructMoE: Augmenting MoEs with Hierarchically Routed Low Rank Experts' (2024)
    • Research directly targets MoE routing robustness and hierarchical routing design
    • Top publications are in face recognition and NLP (Alexa Teacher Model, 80 cites)
    • The MoE routing work is recent and highly relevant despite industry/finance context
    SS

    Suvinay Subramanian

    medium hireability

    Software Engineer@Google

    San Francisco, US

    • AI infra engineer at Google (recently shifted to 'AI Infrastructure' per LinkedIn change). h=16
    • Deep expertise in sparsity and MoE for hardware: 'Sparse SIMD Cross-lane Processing Unit', 'N:M Sparsity Training in Transformers', 'Journey Matters: Average Parameter Count...Unifies Sparse and Dense Scaling Laws' (2025)
    • Research expertise explicitly lists 'Mixture-of-Experts, Sparsity, Hardware-software Codesign'
    • TPU v4 co-author (599 cites)
    • Strong systems + MoE-sparsity intersection — rare hardware-aware MoE profile
    TZ

    Tianyi Zhou

    medium hireability

    Assistant Professor of Computer Science@University of Maryland

    Previously: Visiting Research Scientist @ Google

    College Park, US

    • h-index 52, Asst Prof CS at U Maryland, College Park
    • Recent MoE-specific research: MoE embedding model (2025, 23 cites), MoE re-routing strategies (2025)
    • Active expert routing contributor
    XZ

    Xingkui Zhu

    medium hireability

    PhD student@Huazhong University of Science and Technology

    CN

    • Research expertise: 'mixture of experts, dynamic networks, parameter-efficient fine-tuning'
    • Paper 'MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks' (2024, 7 cites) focuses on adapting dense→MoE routing for vision
    • PhD @ HUST
    • Strong vision model background (TPH-YOLOv5: 2270 cites, PointMamba: 268 cites) — MoE work appears to be a more recent pivot. h=6
    • China-based, no GitHub/LinkedIn
    XP

    Xinglin Pan

    medium hireability

    PhD student@Hong Kong University of Science and Technology (Guangzhou)

    Previously: Intern @ Hong Kong Baptist University

    Guangzhou, CN

    • PhD student at HKUST Guangzhou, MLSys focus
    • Strong MoE systems papers: 'ScheMoE: Extensible MoE Distributed Training System with Tasks Scheduling' (44 cites, 2024), 'PipeMoE: Accelerating Mixture-of-Experts through Adaptive Pipelining' (38 cites, 2023), 'FSMoE: Flexible and Scalable Training System for Sparse MoE Models' (8 cites, 2025)
    • Research is directly on MoE distributed training, scheduling, and pipelining — core load-balancing infrastructure
    • Also has LLM pruning work (Pruner-Zero, 66 cites)
    • Location in CN is a downside for US-focused roles
    XM

    Xupeng Miao

    medium hireability

    Assistant Professor@Purdue University

    Previously: Post Doctoral Fellow @ Carnegie Mellon University

    West Lafayette, US

    • MoE training systems specialist, now AP at Purdue
    • Built FlexMoE (dynamic device placement for sparse MoE, 69c), EvoMoE (dense-to-sparse gate training, 46c), HetuMoE (trillion-scale MoE distributed training, 44c), dense-to-sparse gate (28c), and NetMoE (dynamic sample placement, 2025)
    • Focused on MoE inference and training efficiency. h=26
    • Hireability: MEDIUM
    YX

    Yifan Xiong

    medium hireability
    • Microsoft engineer with MoE-adjacent infra focus
    • Contributed 2D hierarchical AlltoAll algorithm to tutel (key for MoE expert parallelism dispatch)
    • Pinned repos include microsoft/tutel, msccl (collective comms), hivedscheduler, superbenchmark — all core to distributed MoE training infra. 3 tutel commits (2022)
    • More infra/systems than algorithm-focused MoE
    YS

    Yikang Shen

    medium hireability

    Member of Technical Staff@xAI

    Previously: Staff Research Scientist @ IBM

    San Francisco, US

    • Expertise explicitly lists MoE and Transformers
    • Papers: JetMoE (56 citations), Mod-Squad (154 citations), Dense training/sparse inference (33 citations), Sparse Universal Transformer (30 citations)
    • MTS at xAI — active MoE practitioner. h_index 30
    • Strong fit for MoE routing and load balancing
    YT

    Yi Tay

    medium hireability

    Senior Staff Research Scientist@DeepMind

    Previously: Chief Scientist & Co-founder @ Reka AI

    SG

    • Fudan PhD with specific MoE router work: 'Turn Waste into Worth: Rectifying Top-k Router of MoE' (5 citations, 2024) and 'Towards More Effective and Economic Sparsely-Activated Model' (19 citations, 2021)
    • Also contributor to InternLM2 (506 citations)
    • Research expertise includes 'Mixture of Expert'
    • Work addresses wasted expert capacity in top-k routing — relevant to expert load balancing
    • Shanghai-based PhD student
    YK

    Young Jin Kim

    medium hireability

    Member of Technical Staff@Microsoft

    Previously: Principal Research Manager / Principal Researcher @ Microsoft

    Seattle, US

    • Member of Technical Staff at Microsoft AI Superintelligence (recently promoted per LinkedIn). h=14
    • Research expertise explicitly: 'Mixture of experts, large language models, natural language processing, high performance computing'. 5 MoE papers: 'GRIN: GRadient-INformed MoE' (2024), 'SlimMoE: Structured Compression of Large MoE Models' (2025), 'AutoMoE: Heterogeneous MoE with Adaptive Computation' (2023), 'Task-Based MoE for Multitask Multilingual MT' (2023)
    • Systems-oriented MoE researcher at top lab
    ZH

    Zeyu Huang

    medium hireability

    PhD student@University of Edinburgh

    Edinburgh, GB

    • Edinburgh PhD with deeply focused MoE routing research
    • Papers: 'Layerwise Recurrent Router for MoE' (4 citations, 2024), 'A Closer Look into MoE in LLMs' (25 citations, 2024), 'Demons in the Detail: Load Balancing Loss for Training Specialized MoE' (11 citations, 2025)
    • Also 'Mixture of Attention Heads: Selecting Attention Heads Per Token' (80 citations)
    • Exactly on target for MoE routing and load balancing
    • Currently PhD at Edinburgh; website activity shows paper additions in 2025. h=10
    ZF

    Zhaoye Fei

    medium hireability

    PhD student@Fudan University

    Previously: intern @ Huawei

    Shanghai, CN

    • Fudan PhD with specific MoE router work: 'Turn Waste into Worth: Rectifying Top-k Router of MoE' (5 citations, 2024) and 'Towards More Effective and Economic Sparsely-Activated Model' (19 citations, 2021)
    • Also contributor to InternLM2 (506 citations)
    • Research expertise includes 'Mixture of Expert'
    • Work addresses wasted expert capacity in top-k routing — relevant to expert load balancing
    • Shanghai-based PhD student
    ZY

    Ziyue Yang

    medium hireability

    Software Engineer@Microsoft

    Previously: Software Engineer Intern @ Microsoft

    Seattle, US

    • Microsoft Research Asia SWE (Networking/AI Infra group) with 4 commits to microsoft/tutel MoE library
    • DB shows Tutel paper (193 citations) under their slug
    • PRs in microsoft/ltp-megatron-lm fixing MoE expert DP sharding bugs — touching megatron/core/transformer/moe/experts.py
    • Also has Tutel patents (dynamic gating, switchable parallel modes, collective communication at MoE layer)
    • Hands-on MoE systems engineer building production-scale MoE infrastructure at Microsoft
    • Located Seattle/Beijing
    AR

    Adam Roberts

    low hireability

    Director of Research@DeepMind

    Previously: Senior Staff Software Engineer @ DeepMind

    San Francisco, US

    • Strong direct MoE routing work: 'AdaMoE: Token-Adaptive Routing with Null Experts for MoE LMs' (2024, 27 citations) — proposes adaptive top-k routing with null experts and load-balancing loss
    • Incoming PhD at Tsinghua, h=20, highly active website (21 recent changes)
    • Core MoE routing/load balancing research
    • Hireability: LOW — China-based incoming PhD student

    Ahmet Üstün

    low hireability

    Code Agents Lead@Cohere

    Previously: Senior Research Scientist @ Cohere

    Groningen, NL

    • Direct MoE researcher: 'Pushing MoE to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning' (166 citations), 'BAM!
    • Just Like That: Simple and Efficient Parameter Upcycling for MoE' (13 citations), Nexus adaptive upcycling
    • Research expertise explicitly includes Mixture of Experts + Efficient Deep Learning
    • Code Agents Lead at Cohere, previously Cohere For AI researcher. h=20
    • Hireability: LOW — senior Code Agents Lead role at Cohere
    AK

    Anastasios Kyrillidis

    low hireability

    Dean Fellow in AI/Computation@George R. Brown School of Engineering and Computing

    Previously: Goldstine Fellow @ IBM

    Houston, US

    • Sparse optimization theorist at Rice (Assoc
    • Prof.), now publishing on MoE routing theory
    • Recent papers: 'Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings' (2025, 14c), 'Guided by the Experts: Provable Feature Learning Dynamic of Soft-Routed MoE' (2025), federated MoE (Fed-ZERO, FedJETs)
    • Sparse methods background (h=30) directly applicable to expert routing and load balancing
    • US-based
    • Hireability: LOW (tenured/tenure-track)
    AP

    André Susano Pinto

    low hireability

    Software Engineer@DeepMind

    Previously: Intern @ Google

    CH

    • Direct MoE researcher at Google DeepMind
    • Authored 'Scaling Vision with Sparse MoE' (2021, 39 citations) and 'Processing images using MoE' (2024)
    • Research expertise explicitly includes 'mixture of experts, sparsity, large models'. h=19, SWE at DeepMind
    • Hireability: LOW — based in Switzerland, low hireability flag
    BZ

    Barret Zoph

    low hireability

    Something New@OpenAI

    Previously: CTO, Co-Founder @ Thinking Machines

    San Francisco, US

    • Switch Transformers co-author (3190 citations), GLaM MoE (963 citations), ST-MoE: Designing Stable and Transferable Sparse Expert Models (307 citations) — among the most impactful MoE routing papers ever published
    • Research Scientist at OpenAI in SF
    • Hireability: LOW — RS at OpenAI (frontier lab), unlikely to move, but exceptional MoE pedigree
    BC

    Bin CUI

    low hireability

    Professor@Peking University

    Previously: Research Fellow @ Singapore-MIT Alliance

    • PhD, h-index 69, Professor at Peking University
    • Highly active MoE researcher: NetMoE (2025), LSH-MoE communication-efficient routing (2024), FlexMoE dynamic architecture (2023)
    • Based in China
    CR

    Carlos Riquelme Ruiz

    low hireability

    Principal Researcher@Microsoft

    Previously: Head of Language Models Team @ Stability AI

    Madrid, ES

    • Directly authored landmark MoE routing papers: 'Scaling Vision with Sparse MoE' (981 citations), 'From Sparse to Soft Mixtures of Experts' (235 citations), 'LIMoE' (332 citations), and 'Routers in Vision MoE: An Empirical Study'
    • Core MoE routing/sparse gating expert
    • Principal Researcher now at Microsoft Superintelligence Team (recently moved from Google Brain)
    • Hireability: LOW — senior Principal Researcher role, PhD, Madrid-based
    EH

    Ethan He

    low hireability

    Member of Technical Staff@xAI

    Previously: Staff Engineer @ NVIDIA

    San Francisco, US

    • Direct MoE work: 'Upcycling LLMs into MoE' (2024, 25 citations) and 'Llama 3 meets MoE: Efficient Upcycling' (2024, 5 citations)
    • Research expertise lists 'mixture of experts' explicitly
    • MoTS at xAI, prev NVIDIA/Meta
    • Working on video generation with MoE backbone (Grok Imagine). h=19, 9k citations
    • Hireability: LOW — hireability flag is low
    FW

    Furu Wei

    low hireability

    Chief Scientist@Microsoft

    Previously: Partner Research Manager @ Microsoft

    Beijing, CN

    • Multiple direct MoE papers: Multi-Head MoE (2024), Mixture of LoRA Experts (2024, 152 citations), On Representation Collapse of Sparse MoE (2022), VLMo (modality-expert MoE, 2022)
    • Strong MoE architecture research at Microsoft Research
    • Hireability: LOW — Chief Scientist at Microsoft Beijing, very senior (h-index 120)
    GN

    Guoshun Nan

    low hireability

    Full Professor@Beijing University of Posts and Telecommunications

    Previously: Tenure-tracked Professor @ Beijing University of Posts and Telecommunications

    Beijing, CN

    • Published 'Advancing Expert Specialization for Better MoE' (2025, 1 citation) — directly on MoE expert specialization
    • However, core research expertise is Video LLMs, semantic communications, multimodal understanding — MoE is peripheral
    • Full Professor at Beijing University of Posts and Telecom, China-based. h=17
    • Hireability: not set
    HH

    Hannaneh Hajishirzi

    low hireability

    Senior Director@Allen Institute for Artificial Intelligence

    Previously: Senior Director @ Allen Institute for Artificial Intelligence

    • Co-author of OLMoE (Open Mixture-of-Experts LM, 2025, 110 citations) — direct MoE LLM architecture
    • Also 'SHARCS: Efficient Transformers Through Routing with Dynamic Width Sub-networks' (2023)
    • Senior Director at AI2 (OLMo/OLMoE project)
    • Hireability: LOW — Senior Director at AI2, very senior leadership (h-index 92)
    HG

    Hongcheng Gao

    low hireability

    Incoming PhD student@College of AI at Tsinghua University

    Previously: Intern @ Tsinghua University

    Beijing, CN

    • Strong direct MoE routing work: 'AdaMoE: Token-Adaptive Routing with Null Experts for MoE LMs' (2024, 27 citations) — proposes adaptive top-k routing with null experts and load-balancing loss
    • Incoming PhD at Tsinghua, h=20, highly active website (21 recent changes)
    • Core MoE routing/load balancing research
    • Hireability: LOW — China-based incoming PhD student
    HC

    Hyung Won Chung

    low hireability

    AI Research Scientist@Meta

    Previously: Research Scientist @ OpenAI

    San Francisco, US

    • Research expertise includes 'routing, model ensembles, cascade, adaptive computation'
    • Papers on 'Universal model routing for efficient LLM inference' (17 citations, 2025) and 'Universal LLM Routing with Correctness-Based Representation' (4 citations, 2025)
    • Won Google Tech Impact Award 2024 specifically for model routing work
    • Research Scientist at Google DeepMind, NY. h=22
    • Hireability: LOW — senior Research Scientist at DeepMind
    IT

    Ivan Titov

    low hireability

    Full Professor@ILCC, School of Informatics, University of Edinburgh and ILLC, University of Amsterdam

    Edinburgh, GB

    • PhD, h-index 55
    • Recent MoE-specific work: Load Balancing Loss for MoE (2025) and Layerwise Recurrent Router for MoE (2025)
    • Full Professor at U Edinburgh/U Amsterdam
    • Based in Edinburgh, GB
    JG

    Jianfeng Gao

    low hireability

    Distinguished Scientist & Vice President@Microsoft

    Previously: Partner Research Manager in Business AI @ Microsoft

    Woodinville, US

    • Multiple direct MoE papers: GRIN (Gradient-Informed MoE, 2024), AutoMoE (NAS for sparse MoE, 2022), sparse MoE pruning, sparsely activated MoE multi-task learners
    • Strong MoE architecture and routing expertise at Microsoft
    • Hireability: LOW — Distinguished Scientist & VP at Microsoft, very senior (h-index 139)
    JP

    Joan Puigcerver

    low hireability

    Senior Software Engineer in Research@Google

    Previously: Software Engineer in Research @ Google

    Zurich, CH

    • Direct MoE routing specialist at Google Zurich
    • Author of 'From Sparse to Soft Mixtures of Experts', 'Routers in Vision MoE: An Empirical Study', sparse upcycling MoE, LIMoE, and fast differentiable top-k for routing
    • One of the most focused MoE routing researchers in the field. h=29
    • Hireability: LOW (likely comfortable at Google)
    JZ

    Jun Zhu

    low hireability

    Professor@Tsinghua University

    Previously: Adjunct Faculty @ Carnegie Mellon University

    CN

    • First author of 'ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing' (2025) — direct MoE routing innovation replacing top-k with ReLU gating for fully differentiable training
    • Direct contribution to MoE routing mechanism design
    • Hireability: LOW — Professor at Tsinghua University (h-index 97)
    LZ

    Luke Zettlemoyer

    low hireability

    Research Scientist@Meta

    Previously: Research Lead @ Allen Institute for AI

    Seattle, US

    • OLMoE co-author (Open MoE Language Model, 2025, 110 citations) — direct sparse MoE architecture work
    • Also 'Efficient Large Scale Language Modeling with Mixtures of Experts' (2021) and MoMa (modality-aware MoE experts, 2024, 279 citations)
    • Research Scientist at Meta
    • Hireability: LOW — Full Professor at UW + Meta Research Scientist, very senior (h-index 135)
    MF

    Mehrdad Farajtabar

    low hireability

    Research Scientist/Team Lead@Apple

    Previously: Research Scientist @ DeepMind

    Seattle, US

    • MoE expertise listed explicitly; 2 recent MoE papers (2025): 'From Dense to Dynamic: Token-Difficulty Driven MoEfication' and 'MoE-PHDS: flexible runtime sparsity'
    • Research Scientist at Apple, h_index 37
    • Hireability low but active in MoE research
    MD

    Mostafa Dehghani

    low hireability

    Research Scientist@Google

    Previously: Researcher @ Apple

    Amsterdam, NL

    • h-index 53, Research Scientist at Google, Amsterdam
    • Authored Sparse Upcycling (2022) — converting dense models to sparse MoEs
    • Strong transformer architecture background (Vision Transformer, PerceiverIO)
    NH

    Neil Houlsby

    low hireability

    Member of Technical Staff@Anthropic

    Previously: Senior Staff Research Scientist @ Google

    Zurich, CH

    • Senior MoE researcher (Anthropic MTS, ex-Google Brain)
    • Led V-MoE (NeurIPS 2021), Soft MoE (ICLR 2024 spotlight), LIMoE (NeurIPS 2022), Scaling Laws for Sparsely-Connected Foundation Models (ICLR 2024 spotlight), Sparse Upcycling (ICLR 2023)
    • Pioneered adapter modules. 7 MoE/sparse papers
    • PhD
    • Currently at Anthropic (hireability low — unlikely to leave), but caliber is top-tier
    QL

    Quoc V Le

    low hireability

    Research Scientist@Google

    Previously: Research Visitor @ Max Planck Institute for Biological Cybernetics

    San Francisco, US

    • Lead author of 'Mixture-of-Experts with Expert Choice Routing' (2022/2023) and 'Diversity and Depth in Per-Example Routing Models' (2018)
    • Direct MoE routing + expert load balancing expertise
    • Also 'Routing to expert subnetworks in MoE neural networks' (2025)
    • Hireability: LOW — Research Scientist/senior leader at Google, very senior (h-index 147)
    SK

    Souvik Kundu

    low hireability

    Inference and SLM Optimization Lead@Intel

    Previously: Staff Research Scientist @ Intel

    Los Angeles, US

    • Research expertise explicitly includes 'Mixture of Experts' and 'Inference Efficiency and Optimizations for LLMs'
    • Has 'CITER: Collaborative Inference for Efficient LLM Decoding with Token-Level Routing' (2024) directly on routing
    • Active on website with recent papers
    • Intel Inference and SLM Optimization Lead; h=22
    • Hireability: LOW — senior industry optimization lead at Intel
    TN

    Tan Minh Nguyen

    low hireability

    Assistant Professor@National University of Singapore

    Previously: Postdoctoral Scholar @ University of California, Los Angeles

    Singapore, SG

    • Direct MoE work: 'MomentumSMoE: Integrating Momentum into Sparse MoE' (2024, 3 citations) and 'MoLEx: Mixture of Layer Experts for Finetuning with Sparse Upcycling' (2025, 1 citation)
    • Research expertise explicitly lists 'mixture of experts, state-space models'
    • Assistant Professor at NUS, h=20
    • Hireability: LOW — hireability flag is low
    WD

    Wei Dong

    low hireability

    Associate Professor@Xi'an University of Architecture and Technology

    Previously: PhD student @ Northwest Polytechnical University

    Xi'an, CN

    • Research expertise includes 'Mixture of Experts' alongside Parameter Efficient Fine-Tuning and Self-supervised Learning
    • Associate Professor at Xi'an University of Architecture and Technology, China. h=24
    • No MoE-specific papers retrieved from DB but MoE is listed expertise
    • China-based significantly limits mobility
    • Hireability: LOW — professor in China
    WC

    Weizhu Chen

    low hireability

    Technical Fellow and CVP@Microsoft

    Previously: Vice President @ Microsoft

    Redmond, US

    • h-index 73, Technical Fellow & CVP at Microsoft, Redmond
    • GRIN MoE (2024) — gradient-informed expert routing; MoEBERT (2022, 76 cites) — MoE for BERT efficiency
    • Core MoE architecture researcher
    WF

    William Fedus

    low hireability

    OpenAI

    • PhD, h-index 40, at OpenAI
    • Co-author of Switch Transformer (2022, seminal MoE routing paper) and MoE + Instruction Tuning
    • One of the most cited researchers specifically in MoE routing architecture
    WJ

    Wittawat Jitkrittum

    low hireability

    Research Scientist@DeepMind

    Previously: Research Scientist @ Google

    New York, US

    • Research expertise includes 'routing, model ensembles, cascade, adaptive computation'
    • Papers on 'Universal model routing for efficient LLM inference' (17 citations, 2025) and 'Universal LLM Routing with Correctness-Based Representation' (4 citations, 2025)
    • Won Google Tech Impact Award 2024 specifically for model routing work
    • Research Scientist at Google DeepMind, NY. h=22
    • Hireability: LOW — senior Research Scientist at DeepMind
    XN

    Xiaonan Nie

    low hireability

    Staff Research Scientist@ByteDance

    Previously: Technical Lead @ Tencent

    San Francisco, US

    • Core MoE systems researcher with 5 MoE papers: FlexMoE (2023, 69 citations) addresses routing imbalance/fluctuation via dynamic device placement; HetuMoE (2022, 44 citations) trillion-scale MoE training; EvoMoE (2021, 43 citations) dense-to-sparse gating; Dense-to-Sparse Gate (2021, 28 citations) load-balanced MoE gating; NetMoE (2025, 5 citations) dynamic sample placement
    • Staff RS at ByteDance SF. h=17
    • Expertise: LLM + distributed ML systems
    • Hireability: LOW — hireability flag is low
    XQ

    Xipeng Qiu

    low hireability

    Professor@Fudan University

    Shanghai, CN

    • h-index 78, Professor at Fudan University, Shanghai
    • MoE routing paper: Turn Waste into Worth: Rectifying Top-k Router of MoE (2024)
    • Prolific NLP researcher; based in China
    YH

    Yanping Huang

    low hireability

    Engineer@Google

    Previously: PhD student @ University of Washington

    • Exceptional MoE routing pedigree: GLaM (1174 citations, MoE LLM scaling), Expert Choice Routing (514 citations, foundational load-balancing paper), ST-MoE (307 citations, stable sparse expert training), Beyond Distillation MoE (139 citations)
    • Research Scientist at Google, h_index 30
    • Directly on-target for MoE routing and expert load balancing
    YZ

    Yanqi Zhou

    low hireability

    Staff Research Scientist@Google

    Previously: Senior Research Scientist @ Google

    San Francisco, US

    • Co-author of 'Mixture-of-Experts with Expert Choice Routing' (2022/2023) — a seminal load-balancing routing paper from Google Brain
    • Also 'MoE meets Instruction Tuning' (2024)
    • Research Scientist at Google Brain, h_index 36
    • Directly on-target for MoE routing and expert load balancing
    YZ

    Yifan Zhu

    low hireability

    Assistant Professor@Beijing University of Posts and Telecommunications

    Previously: Postdoctoral Research Fellow @ Tsinghua University

    Beijing, CN

    • Research expertise explicitly includes 'mixture of experts'
    • Has 'PM-MOE: Mixture of Experts on Private Model Parameters for Personalized Federated Learning' (2025) and 'A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models' (2025)
    • Active website with 37 changes through Feb 2026
    • Assistant Professor at BUPT Beijing. h=23
    • Hireability: LOW — faculty, China-based
    ZX

    Zhaozhuo Xu

    low hireability

    Assistant Professor@Stevens Institute of Technology

    Previously: PhD student @ Rice University

    • Has 'Replicate and Quantize: A Plug-and-Play Strategy for Load Balancing in Sparse MoE LLMs' (2025) — directly on MoE expert load balancing
    • Background in approximate nearest neighbor search (used in sparse routing)
    • Active with papers through late 2025
    • Assistant Professor at Stevens Institute. h=21
    • Hireability: LOW — faculty role
    ZL

    Zhenhuan Liu

    low hireability

    Software Engineer@NVIDIA

    Previously: Researcher @ NVIDIA

    Beijing, CN

    • Software Engineer at NVIDIA with two MoE-specific papers: 'Llama 3 Meets MoE: Efficient Upcycling' (2024, 5 cites) and 'MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core' (2025, 2 cites)
    • Research expertise includes 'MoE models, Distributed LLM Training'
    • Background is primarily image generation; MoE work appears more recent
    • Beijing, CN. h=5, hireability=low
    ZQ

    Zihan Qiu

    low hireability

    Researcher@Qwen

    Previously: Research Intern @ INF Technology

    Beijing, CN

    • Qwen researcher (Alibaba) working on scalable LLMs
    • Papers: HyperMoE (24 citations, 2024), 'A Closer Look into MoE in LLMs' (23 citations), 'GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory' (3 citations), 'Demons in the Detail: Load Balancing Loss' (6 citations)
    • Co-author Qwen2.5 (3885 citations)
    • Research expertise explicitly lists 'Mixture of Experts, Modular Networks'
    • LinkedIn updated Jan 2026 to 'not currently open to new opportunities' — low availability signal
    • Beijing-based

    Runs

    #1completed93 qualified / 126 foundApr 20, 1:44 PM