SDK orchestrator UI retest — researchers working on MoE routing and expert load…

completed93 qualified1 runApr 20, 1:44 PMsdk-orchestrator-ui-retest-researchers-working-on-moe-routin

Parsed4 topics · Researcher

Generating seed nodes

0 proposed

Explored 0 queries

0/0 done

Expanding nodes

queued

Qualifying candidates

queued

Qualified Candidates (92)

Abhimanyu Rajeshkumar Bambhaniya

high hireability

Research Intern@Meta

Previously: Intern @ Google

San Francisco, US

Direct MoE acceleration researcher at Meta (intern, Georgia Tech PhD)
Paper 'MoE-ERAS: Expert Residency Aware Selection' (2024) is on-topic expert selection for MoE
Research expertise explicitly includes 'MoE Model Acceleration, Sparse model training, HW-SW codesign'. h=6, hireability=high
SF-based

Ashwinee Panda

high hireability

Postdoctoral Fellow@University of Maryland

Previously: PhD Candidate @ Princeton University

San Francisco, US

Active MoE routing researcher
PhD (postdoc UMD/TogetherAI)
Papers directly on MoE: 'Dense Backpropagation Improves Training for Sparse MoE' (NeurIPS 2025), 'Continual Pre-training of MoEs: How robust is your router?' (2025), 'StructMoE' (ICLR 2025 withdrawn)
ICLR 2025 outstanding paper award
Recent website activity (Feb 2026 new paper)
Open Philanthropy grantee
Job market now

Chaojun Xiao

high hireability

Post-Doctoral Researcher@Tsinghua University

Previously: Business Development Intern @ P.E.R.K. Consulting

Beijing, CN

Research expertise explicitly lists 'mixture of expert, pre-training'
Has 'BlockFFN: Towards End-Side Acceleration-Friendly MoE with Chunk-Level Activation Sparsity' (2025)
Post-Doctoral Researcher at Tsinghua, h=22
Website active until Dec 2025
China-based limits mobility
Hireability: HIGH — postdoc is prime hiring target despite China location

Chenggang Zhao

high hireability

infra@DeepSeek AI

ex-NVIDIA, SenseTime

Hangzhou, CN

Core DeepSeek infra engineer with direct MoE architecture authorship
Co-authored DeepSeekMoE (575 citations, ultimate expert specialization), DeepSeek-V2 (531 citations, MoE LLM), and Auxiliary-Loss-Free Load Balancing (49 citations) — the most important recent paper on MoE load balancing without auxiliary loss
Also on DeepSeek-R1 (4614 citations) and DeepSeek-V3 (2341 citations)
Currently at DeepSeek AI infra team in Hangzhou
Tsinghua background

Chenyang Song

high hireability

PhD student@PhD student, Tsinghua University

Beijing, CN

Research expertise: 'Mixture of Experts, Activation Sparsity, LLM'. 'BlockFFN: Towards End-Side Acceleration-Friendly MoE with Chunk-Level Activation Sparsity' (2025)
Strong activation sparsity background: ReLU2 Wins (47 cites), ProSparse (38 cites), Sparsing Law (9 cites)
PhD @ Tsinghua. hireability=high, score=60.85
MoE + sparsity intersection is directly relevant
Beijing, CN

Filip Szatkowski

high hireability

PhD Student@IDEAS NCBR

Previously: Applied Scientist Intern @ Amazon

Warsaw, PL

MoE conversion and adaptive computation researcher. 'Exploiting Activation Sparsity with Dense to Dynamic-k MoE Conversion' (13 cites, 2024) — converts dense models to dynamic MoE via activation sparsity
PhD student at IDEAS NCBR, h-index 4
Broader expertise in speculative decoding, knowledge distillation, conditional computation
GitHub active (fszatkowski)

Huy Nguyen

high hireability

PhD Candidate@Department of Statistics and Data Sciences, The University of Texas at Austin

Previously: Research Intern @ Microsoft

Austin, US

PhD candidate at UT Austin, hireability=high, h=14
Dedicated MoE routing theorist: 5 MoE papers including 'Demystifying Softmax Gating Function in Gaussian Mixture of Experts' (2023, 43 cites), 'Statistical Perspective of Top-K Sparse Softmax Gating Mixture of Experts' (2023, 26 cites), 'On Least Square Estimation in Softmax Gating MoE' (2024, 22 cites), 'FuseMoE' (2024, 54 cites)
Strong theoretical grounding in MoE gating/routing from optimal transport perspective
Very active output (27 website changes, prev intern at Microsoft AI)

Jakub Krajewski

high hireability

Head of Biofuels Development Office@ORLEN

Previously: Biofuels Development Program Manager @ ORLEN

Warsaw, PL

Core MoE scaling researcher. 'Scaling Laws for Fine-Grained MoE' (83 cites) + 'MoE-Mamba' (80 cites) + 'Joint MoE Scaling Laws' + 'Mu-Parametrization for MoE' + 'Scaling Fine-Grained MoE Beyond 50B'
PhD student at IDEAS NCBR (Warsaw), previously interned at Apple and Nvidia on LLMs. h-index 4
One of the most relevant candidates for MoE architecture and expert routing research

Jan Ludziejewski

high hireability

AI Research Scientist@Mistral AI

Previously: Badacz-doktorant @ IDEAS NCBR

Warsaw, PL

AI Research Scientist at Mistral AI working directly on MoE scaling laws
Research expertise explicitly listed as 'LLM, Mixture of Experts, Scaling Laws, Pretraining'
Published 5 MoE papers including 'Scaling Laws for Fine-Grained Mixture of Experts' (83 citations), 'MoE-Mamba' (80 citations), and 'Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient' (2025)
Direct production MoE experience at Mistral, one of the top MoE model builders
PhD in ML (Warsaw). h=6
Poland-based

Lean Wang

high hireability

Research Intern@DeepSeek

Previously: PhD student @ Peking University

Beijing, CN

Research Intern at DeepSeek, core contributor to DeepSeek's MoE architecture
Research expertise explicitly 'LLM backbone; MoE, interpretation & analysis of LLM'
Published 'Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts' (2025, 63 citations) — directly on expert load balancing, the core query topic
PhD student at PKU. h=6
Beijing-based

Piotr Piekos

high hireability

PhD Student@KAUST

Previously: Applied Scientist Intern @ Amazon

Warsaw, PL

MoE attention and routing researcher. 'SwitchHead: Accelerating Transformers with MoE Attention' (23 cites, 2024) + 'Mixture of Sparse Attention: Content-Based Expert-Choice Routing' (2025)
PhD student at KAUST. h-index 4
Discovered via Joan Puigcerver (SoftMoE) collaboration
Expert-choice routing directly aligns with load balancing query

TrungTin Nguyen

high hireability

MACSYS Postdoctoral Research Fellow (Applied Statistics)@Queensland University of Technology

Previously: Postdoctoral Research Fellow @ University of Queensland

Brisbane, AU

Postdoctoral Research Fellow at Queensland University of Technology (Applied Statistics/MACSYS)
Hireability 'high'
Dedicated MoE theorist: 'CompeteSMoE - Effective Training of Sparse MoE via Competition' (19 cites, 2024), 'HyperRouter: Towards Efficient Training and Inference of Sparse MoE' (15 cites, 2023), 'Towards Convergence Rates for Parameter Estimation in Gaussian-gated MoE' (20 cites, 2024), 'A non-asymptotic approach for model selection via penalization in high-dimensional MoE models' (25 cites, 2023)
Unique theory-to-practice profile bridging statistical MoE theory and efficient routing algorithms
LinkedIn update Feb 2026 indicates possible job search (new headline 'Statistician & Mathematician | Bridging Natural & Artificial Intelligence')

Vitaliy Chiley

high hireability

Researcher@Meta

Previously: Staff Research Scientist (Head of LLM Pretraining in MosaicAI Org) @ Databricks

San Francisco, US

Researcher at Google DeepMind (recently moved from Meta)
Pinned repos include 'databricks/megablocks' — the MoE training library — plus llm-foundry and composer (MosaicML stack)
Published 'Training MoEs at Scale with PyTorch' (2024)
Strong LLM training background: co-authored MPT-7B (358 citations), DBRX (MoE model, 38 citations), LoRA Learns Less and Forgets Less (276 citations)
Deep MoE production experience via MosaicML/Databricks. h=8
SF-based

Xiaozhe Yao

high hireability

Doctoral Student@ETH Zurich

Previously: Research Scientist Intern @ Meta

Zurich, CH

Doctoral Student at ETH Zurich (Zurich, CH), highly active coder (1374 GitHub contributions in 2024, 760 in 2025)
Published 'DeltaMoE: Memory-Efficient Inference for Merged Mixture of Experts with Delta Compression' (2025)
Primary research is ML systems and LLM inference (RedPajama, HexGen, DeltaZip)
MoE paper is recent but ML systems/inference focus is strong. h=8, hireability=high
Systems-oriented MoE work aligns with load balancing efficiency

Zhili Liu

high hireability

Ph.D. Candidate@Hong Kong University of Science and Technology

Previously: Intern @ Huawei

Hong Kong, CN

PhD candidate at HKUST, hireability=high, h=14
Research expertise: 'Diffusion Model, Mixture of Experts'
Top MoE paper: 'Mixture of cluster-conditional LoRA experts for vision-language instruction tuning' (2023, 113 cites)
Also 'Task-Customized Masked Autoencoder via Mixture of Cluster-conditional Experts' (2022, 30 cites), 'MoTE: Synergy of Thought Chains and Expert Mixtures' (2024)
Active research (13 website changes, multiple papers Jan 2026)
Based in Hong Kong

Aakanksha Chowdhery

medium hireability

Member of Technical Staff@ReflectionAI

Previously: Senior Staff Research Scientist @ Meta

San Francisco, US

PhD, h-index 40
MoE researcher: authored DSelect-k differentiable top-k for MoE (2021) and sparse differentiable MoE routing (2022)
Now Member of Technical Staff at ReflectionAI in SF

Ahmed Hassan Awadallah

medium hireability

Partner Research Manager@Microsoft

h-index 60
Strong MoE background: Sparsely Activated MoE (2023, 61 cites) and AutoMoE architecture search (2023, 17 cites)
Partner Research Manager at Microsoft

Andrew M. Dai

medium hireability

Senior researcher / co-founder at stealth startup (ex-Google DeepMind director)
Authored seminal 'Mixture-of-Experts with Expert Choice Routing' (NeurIPS 2022) and holds patent on 'Routing to expert subnetworks in mixture-of-experts neural networks' (2025). h_index ~56
Left Google DeepMind (Gemini Data Area Lead) Jan 2026 to found stealth startup — actively in motion and open to opportunity

Aran Komatsuzaki

medium hireability

Founder@Stealth Startup

Previously: Co-Founder @ XinobiAI

San Francisco, US

Seminal MoE contributor: 'Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints' (2022, 191 cites) — pioneered dense→MoE conversion
Also co-authored GPT-J-6B (1075 cites) and LAION-400M (1873 cites)
Now Founder at stealth startup in SF. h=6
Despite hireability=low in DB, the MoE research impact is high and directly relevant

Babak Ehteshami Bejnordi

medium hireability

Sr. Staff Engineer and Manager@Qualcomm

Previously: Deep Learning and Computer Vision for Autonomous Driving @ Mapscape

Amsterdam, NL

MoE and conditional computation researcher at Qualcomm AI Research, San Diego
Papers on 'Read-ME: Router-Decoupled MoE with System Co-Design' (2024), 'Revisiting Single-gated Mixtures of Experts' (2023, 8c), conditional gating networks (batch-shaping 97c), and early-exit methods
Bridges MoE routing with efficient inference systems. h=25
Hireability: MEDIUM

Basil Mustafa

medium hireability

Staff Research Engineer@Google

Previously: Senior Research Software Engineer @ Google

Zurich, CH

Strong MoE researcher at Google Zurich (Staff Research Engineer). 6 MoE papers: V-MoE/Scaling Vision with Sparse MoE (NeurIPS 2021), LIMoE multimodal MoE (NeurIPS 2022), Sparse Upcycling (ICLR 2023), Soft MoE (ICLR 2024 spotlight), Sparse MoEs meet Ensembles
Conditional computation specialist
MS degree
Hireability flagged low by DB (stable Google role in Zurich), but MoE depth is exceptional

Benjamin Thérien

medium hireability

Research Scientist Intern@Meta

Previously: Applied Research Intern (LLM Pre-training) @ Capital One

New York, US

Research Scientist Intern at Meta, PhD (Université de Montréal). 4 MoE papers focused directly on routing and training: 'Continual Pre-training of MoEs: How robust is your router?' (2025), 'Dense Backpropagation Improves Routing for Sparsely-Gated MoE' (2024), 'StructMoE: Structured MoE Using Low Rank Experts' (2024)
Strong focus on MoE routing robustness and training. h=7
Research expertise includes muP and LLMs
NY-based intern — likely finishing PhD soon

Charlie Blake

medium hireability

AI research engineer@Graphcore

Previously: MS student @ University of Oxford

Research expertise explicitly lists 'mixture-of-experts, routing networks' at Graphcore (IPU hardware company)
Strong numerics/low-precision background: SparQ Attention (77 cites), FP8 training (31 cites), unit scaling
No MoE-specific papers in DB but MoE routing is listed as core expertise
MS degree. h=5

Damai Dai

medium hireability

Researcher@DeepSeek AI

Previously: PhD student @ Peking University

Core DeepSeekMoE author at DeepSeek AI, Beijing
Lead work on DeepSeekMoE (592c) with fine-grained expert segmentation and shared-expert isolation, StableMoE stable routing strategy (106c), representation collapse in sparse MoE (136c), and DeepSeek-V2 MoE architecture (600c)
Direct expert on MoE routing and load balancing. h=25
Hireability: MEDIUM

Di Dai

medium hireability

PhD student@Peking University

Previously: MS student @ Nanjing University

Beijing, CN

DeepSeekMoE co-author at Peking University, Beijing
Deep involvement in DeepSeekMoE (592c) routing and expert specialization, StableMoE stable routing (106c), representation collapse in sparse MoE (136c), and DeepSeek-V2 MoE (600c)
Direct MoE routing and training expertise. h=25
Hireability: MEDIUM

Dustin Tran

medium hireability

Member of Technical Staff@xAI

Previously: Senior Staff Research Scientist @ DeepMind

h-index 48
MoE paper: Sparse MoEs meet Efficient Ensembles (2021, 30 cites)
Member of Technical Staff at xAI
Strong Bayesian/probabilistic ML background

ghostplant

medium hireability

Top contributor to microsoft/tutel MoE library (160+ merged PRs)
Deep MoE infra expertise: added DeepSeek/Kimi 1T-param support, FP8/NVFP4/MXFP4 gating, expert token sort APIs, cudaGraph-compatible all_reduce, ROCm support
Also contributes to mscclpp (GPU-driven comms)
Microsoft employee
No public identity but clearly the primary engineer driving Tutel's MoE routing/load balancing implementation

Haiyang Huang

medium hireability

Software Engineer@Google

Previously: PhD student @ Duke University

San Francisco, US

Software Engineer at Google (SF), PhD
Research expertise lists 'mixture of experts, large language model, machine learning infrastructure, machine learning inference' directly
Published 'Toward Efficient Inference for Mixture of Experts' (2024, 47 citations) and 'Towards MoE Deployment: Mitigating Inefficiencies in MoE Inference' (2023)
Focus on MoE inference efficiency and load balancing at serving time
Strong interpretable ML and dim-reduction background. h=6

Hanxiao Liu

medium hireability

Member of Technical Staff@Microsoft

Previously: Member of Technical Staff @ Inflection AI

San Francisco, US

Core MoE routing researcher: co-author of 'Expert Choice Routing' (514 citations), JetMoE (56 citations), Mod-Squad (154 citations), ModuleFormer (43 citations), 'Dense training sparse inference' (33 citations)
Research Scientist at Microsoft (formerly Google Brain). h_index 32
Expert Choice Routing is a foundational load-balancing paper directly matching query

Haozhen Zhang

medium hireability

Ph.D. Student in Computer Science@Nanyang Technological University

Previously: Research Assistant @ The Hong Kong University of Science and Technology

Singapore, SG

LLM routing researcher. 'Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via RL' (5 cites, 2025) + 'Fusing LLM Capabilities with Routing Data' (2025)
PhD student at NTU Singapore
Research expertise: 'LLM Routers'
Active website (28 changes, recent)
Note: routing here is LLM-to-LLM routing, not MoE expert routing — adjacent but not identical to load balancing

Huazuo Gao

medium hireability

Researcher@DeepSeek

Previously: Undergrad student @ Peking University

Researcher at DeepSeek. h=14
Author of 'Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts' (2025) — directly addresses MoE expert load balancing, the core search topic
BS degree only but at DeepSeek working on frontier MoE systems (DeepSeek-MoE/V2/V3 team)
Extremely targeted match for MoE load balancing research

Jason Wei

medium hireability

Research Scientist@Meta

Previously: Research Scientist @ OpenAI

San Francisco, US

h-index 47, Research Scientist at Meta, SF
MoE + instruction tuning paper (2024, 117 cites)
Known for chain-of-thought prompting; contributed to scaling MoE models with instruction tuning

Juyong Jiang

medium hireability

Research Intern@NHN

Previously: Research Assistant @ Hong Kong University of Science and Technology

Seoul, KR

HKUST PhD (currently Research Intern at NHN, Seoul) with MoE survey authorship. 'A Survey on Mixture of Experts in Large Language Models' (110 citations, 2025) and 'Shortcut-connected Expert Parallelism for Accelerating Mixture of Experts' (24 citations, 2025)
Research expertise covers MoE, LLMs, PEFT, RL
Survey paper signals broad domain knowledge; expert parallelism paper shows systems-level practical work. h=10, hireability=high

Li Dong

medium hireability

Partner Group Product Manager@Microsoft

Previously: Principal Group Program Manager, Bing Relevance @ Microsoft

Seattle, US

Senior Research Scientist at Google DeepMind (Amsterdam). h-index 53
Published on Sparse Upcycling (training MoE from dense checkpoints, 2022)
Known for ViT and Vision Transformer scaling work
MoE paper count is thin — only 1 direct paper — but upcycling work is directly relevant to expert load balancing
High seniority at a top lab
Hireability 'medium' with no LinkedIn changes or site activity

Maciej Stefaniak

medium hireability

LLM Researcher@IDEAS NCBR

Previously: Machine Learning Developer @ TIDK

Warsaw, PL

Co-author on MoE scaling papers from IDEAS NCBR group: 'Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient' (2025) + 'Mu-Parametrization for Mixture of Experts' (2025)
LLM Researcher at IDEAS NCBR, Warsaw
PhD. h-index 1 (early career)
Research expertise explicitly listed as 'LLM MoE'
Part of the same strong MoE research cluster as Jakub Krajewski

Mayank Mishra

medium hireability

Graduate Student Researcher@University of California, Berkeley

Previously: Research Engineer-II @ MIT-IBM Watson AI Lab

Berkeley, US

PhD student at UC Berkeley with 1075 GitHub contributions in 2024 and 1461 in 2025 — highly active
Research focus: efficient LLM architectures and distributed training
Key MoE paper: 'Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models' (33 cites, 2024)
Also co-authored BLOOM (2161 cites), StarCoder (1286 cites), Granite Code Models (81 cites)
LinkedIn education update Jan 2026 and recent website activity (new paper Feb 2026)
Strong practical MoE + LLM systems background, very active open-source contributor

Mingjie Tang

medium hireability

working on LLM systems and algorithms@iQuest Research Lab

Previously: tech lead @ Ant Group

LLM systems researcher at iQuest Research Lab. h=16
Direct MoE work: 'MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA based Mixture of Experts' (2024, 96 cites) and 'DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism' (2024)
Active research output (9 website changes in last year, multiple new papers Jan 2026)
PhD
Broad LLM systems background with MoE routing specialization

Nan Du

medium hireability

Member of Technical Staff@OpenAI

Previously: Principal Researcher @ Apple

San Francisco, US

Researcher at Capital One with h=13 and 4 MoE-specific papers: 'Continual Pre-training of MoEs: How robust is your router?' (2025), 'Dense Backpropagation Improves Routing for Sparsely-Gated Mixture-of-Experts' (2024), 'StructMoE: Structured MoE Using Low Rank Experts' (2024), 'StructMoE: Augmenting MoEs with Hierarchically Routed Low Rank Experts' (2024)
Research directly targets MoE routing robustness and hierarchical routing design
Top publications are in face recognition and NLP (Alexa Teacher Model, 80 cites)
The MoE routing work is recent and highly relevant despite industry/finance context

Orhan Firat

medium hireability

Research Scientist@DeepMind

Previously: Research Scientist @ Google

New York, US

PhD, h-index 57, Research Scientist at DeepMind, NY
Routing Strategies for Multilingual MoE (2020) — early expert routing work for multilingual NMT
Also contributed to GShard and multilingual MT scaling

Pingzhi Li

medium hireability

External Collaborator@Eigen AI

Previously: Research Intern @ Apple

San Francisco, US

Strong MoE routing researcher: 'Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy' (2024, 61 cites) — directly on MoE routing analysis
Also 'Advancing MoE Efficiency: C2R Strategy for Better Expert Parallelism' (2025), 'Hexa-MoE: Efficient and Heterogeneous-aware Training for MoE' (2024), 'QuantMoE-Bench' (20 cites), 'Finding Fantastic Experts in MoEs' (2025)
PhD @ UNC, now External Collaborator at Eigen AI, SF
LinkedIn shows recent move from Apple Research Intern. h=6
Multiple focused MoE routing papers

Sambit Sahu

medium hireability

Vice President, Core LLM & Agentic AI, AI Foundations@Capital One

Previously: Senior Engineering Manager, Alexa AI @ Amazon

New York, US

PhD, h-index 55, VP Core LLM & Agentic AI at Capital One, NY
Active 2024-2025 MoE work: StructMoE structured expert design (2024), Dense Backpropagation for sparse MoEs (2024), Continual Pre-training of MoEs (2025)

Shruti Bhosale

medium hireability

Research Engineer@Meta

Previously: Research Engineer @ Meta

San Francisco, US

MoE research engineer at Meta (Menlo Park)
Co-authored 'Efficient Large Scale Language Modeling with Mixtures of Experts' (238c), 'Towards MoE Deployment: Mitigating Inefficiencies in MoE Inference' (30c), 'Toward Efficient Inference for MoE' (17c), and 'Fixing MoE Over-fitting on Low-Resource Languages' (10c)
Practical MoE deployment and load-balancing experience. h=27
Hireability: MEDIUM

Sneha Kudugunta

medium hireability

Researcher@Google

Previously: Researcher @ DeepMind

Researcher at Google with PhD
Three dedicated MoE papers: 'Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference' (139 cites, 2021), 'Exploring Routing Strategies for Multilingual MoE Models' (2020), and a 2025 patent on routing within multitask MoE models
Primary research angle is MoE routing for multilingual/multitask NLP — directly relevant to expert load balancing
Also co-author on NLLB (1288 cites)
Strong signal from Google affiliation with continued MoE work through 2025

Stephen Rawls

medium hireability

Researcher@CapitalOne

Previously: Researcher @ Amazon

Researcher at Capital One with h=13 and 4 MoE-specific papers: 'Continual Pre-training of MoEs: How robust is your router?' (2025), 'Dense Backpropagation Improves Routing for Sparsely-Gated Mixture-of-Experts' (2024), 'StructMoE: Structured MoE Using Low Rank Experts' (2024), 'StructMoE: Augmenting MoEs with Hierarchically Routed Low Rank Experts' (2024)
Research directly targets MoE routing robustness and hierarchical routing design
Top publications are in face recognition and NLP (Alexa Teacher Model, 80 cites)
The MoE routing work is recent and highly relevant despite industry/finance context

Suvinay Subramanian

medium hireability

Software Engineer@Google

San Francisco, US

AI infra engineer at Google (recently shifted to 'AI Infrastructure' per LinkedIn change). h=16
Deep expertise in sparsity and MoE for hardware: 'Sparse SIMD Cross-lane Processing Unit', 'N:M Sparsity Training in Transformers', 'Journey Matters: Average Parameter Count...Unifies Sparse and Dense Scaling Laws' (2025)
Research expertise explicitly lists 'Mixture-of-Experts, Sparsity, Hardware-software Codesign'
TPU v4 co-author (599 cites)
Strong systems + MoE-sparsity intersection — rare hardware-aware MoE profile

Tianyi Zhou

medium hireability

Assistant Professor of Computer Science@University of Maryland

Previously: Visiting Research Scientist @ Google

College Park, US

h-index 52, Asst Prof CS at U Maryland, College Park
Recent MoE-specific research: MoE embedding model (2025, 23 cites), MoE re-routing strategies (2025)
Active expert routing contributor

Xingkui Zhu

medium hireability

PhD student@Huazhong University of Science and Technology

Research expertise: 'mixture of experts, dynamic networks, parameter-efficient fine-tuning'
Paper 'MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks' (2024, 7 cites) focuses on adapting dense→MoE routing for vision
PhD @ HUST
Strong vision model background (TPH-YOLOv5: 2270 cites, PointMamba: 268 cites) — MoE work appears to be a more recent pivot. h=6
China-based, no GitHub/LinkedIn

Xinglin Pan

medium hireability

PhD student@Hong Kong University of Science and Technology (Guangzhou)

Previously: Intern @ Hong Kong Baptist University

Guangzhou, CN

PhD student at HKUST Guangzhou, MLSys focus
Strong MoE systems papers: 'ScheMoE: Extensible MoE Distributed Training System with Tasks Scheduling' (44 cites, 2024), 'PipeMoE: Accelerating Mixture-of-Experts through Adaptive Pipelining' (38 cites, 2023), 'FSMoE: Flexible and Scalable Training System for Sparse MoE Models' (8 cites, 2025)
Research is directly on MoE distributed training, scheduling, and pipelining — core load-balancing infrastructure
Also has LLM pruning work (Pruner-Zero, 66 cites)
Location in CN is a downside for US-focused roles

Xupeng Miao

medium hireability

Assistant Professor@Purdue University

Previously: Post Doctoral Fellow @ Carnegie Mellon University

West Lafayette, US

MoE training systems specialist, now AP at Purdue
Built FlexMoE (dynamic device placement for sparse MoE, 69c), EvoMoE (dense-to-sparse gate training, 46c), HetuMoE (trillion-scale MoE distributed training, 44c), dense-to-sparse gate (28c), and NetMoE (dynamic sample placement, 2025)
Focused on MoE inference and training efficiency. h=26
Hireability: MEDIUM

Yifan Xiong

medium hireability

Microsoft engineer with MoE-adjacent infra focus
Contributed 2D hierarchical AlltoAll algorithm to tutel (key for MoE expert parallelism dispatch)
Pinned repos include microsoft/tutel, msccl (collective comms), hivedscheduler, superbenchmark — all core to distributed MoE training infra. 3 tutel commits (2022)
More infra/systems than algorithm-focused MoE

Yikang Shen

medium hireability

Member of Technical Staff@xAI

Previously: Staff Research Scientist @ IBM

San Francisco, US

Expertise explicitly lists MoE and Transformers
Papers: JetMoE (56 citations), Mod-Squad (154 citations), Dense training/sparse inference (33 citations), Sparse Universal Transformer (30 citations)
MTS at xAI — active MoE practitioner. h_index 30
Strong fit for MoE routing and load balancing

Yi Tay

medium hireability

Senior Staff Research Scientist@DeepMind

Previously: Chief Scientist & Co-founder @ Reka AI

Fudan PhD with specific MoE router work: 'Turn Waste into Worth: Rectifying Top-k Router of MoE' (5 citations, 2024) and 'Towards More Effective and Economic Sparsely-Activated Model' (19 citations, 2021)
Also contributor to InternLM2 (506 citations)
Research expertise includes 'Mixture of Expert'
Work addresses wasted expert capacity in top-k routing — relevant to expert load balancing
Shanghai-based PhD student

Young Jin Kim

medium hireability

Member of Technical Staff@Microsoft

Previously: Principal Research Manager / Principal Researcher @ Microsoft

Seattle, US

Member of Technical Staff at Microsoft AI Superintelligence (recently promoted per LinkedIn). h=14
Research expertise explicitly: 'Mixture of experts, large language models, natural language processing, high performance computing'. 5 MoE papers: 'GRIN: GRadient-INformed MoE' (2024), 'SlimMoE: Structured Compression of Large MoE Models' (2025), 'AutoMoE: Heterogeneous MoE with Adaptive Computation' (2023), 'Task-Based MoE for Multitask Multilingual MT' (2023)
Systems-oriented MoE researcher at top lab

Zeyu Huang

medium hireability

PhD student@University of Edinburgh

Edinburgh, GB

Edinburgh PhD with deeply focused MoE routing research
Papers: 'Layerwise Recurrent Router for MoE' (4 citations, 2024), 'A Closer Look into MoE in LLMs' (25 citations, 2024), 'Demons in the Detail: Load Balancing Loss for Training Specialized MoE' (11 citations, 2025)
Also 'Mixture of Attention Heads: Selecting Attention Heads Per Token' (80 citations)
Exactly on target for MoE routing and load balancing
Currently PhD at Edinburgh; website activity shows paper additions in 2025. h=10

Zhaoye Fei

medium hireability

PhD student@Fudan University

Previously: intern @ Huawei

Shanghai, CN

Fudan PhD with specific MoE router work: 'Turn Waste into Worth: Rectifying Top-k Router of MoE' (5 citations, 2024) and 'Towards More Effective and Economic Sparsely-Activated Model' (19 citations, 2021)
Also contributor to InternLM2 (506 citations)
Research expertise includes 'Mixture of Expert'
Work addresses wasted expert capacity in top-k routing — relevant to expert load balancing
Shanghai-based PhD student

Ziyue Yang

medium hireability

Software Engineer@Microsoft

Previously: Software Engineer Intern @ Microsoft

Seattle, US

Microsoft Research Asia SWE (Networking/AI Infra group) with 4 commits to microsoft/tutel MoE library
DB shows Tutel paper (193 citations) under their slug
PRs in microsoft/ltp-megatron-lm fixing MoE expert DP sharding bugs — touching megatron/core/transformer/moe/experts.py
Also has Tutel patents (dynamic gating, switchable parallel modes, collective communication at MoE layer)
Hands-on MoE systems engineer building production-scale MoE infrastructure at Microsoft
Located Seattle/Beijing

Adam Roberts

low hireability

Director of Research@DeepMind

Previously: Senior Staff Software Engineer @ DeepMind

San Francisco, US

Strong direct MoE routing work: 'AdaMoE: Token-Adaptive Routing with Null Experts for MoE LMs' (2024, 27 citations) — proposes adaptive top-k routing with null experts and load-balancing loss
Incoming PhD at Tsinghua, h=20, highly active website (21 recent changes)
Core MoE routing/load balancing research
Hireability: LOW — China-based incoming PhD student

AÜ

Ahmet Üstün

low hireability

Code Agents Lead@Cohere

Previously: Senior Research Scientist @ Cohere

Groningen, NL

Direct MoE researcher: 'Pushing MoE to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning' (166 citations), 'BAM!
Just Like That: Simple and Efficient Parameter Upcycling for MoE' (13 citations), Nexus adaptive upcycling
Research expertise explicitly includes Mixture of Experts + Efficient Deep Learning
Code Agents Lead at Cohere, previously Cohere For AI researcher. h=20
Hireability: LOW — senior Code Agents Lead role at Cohere

Anastasios Kyrillidis

low hireability

Dean Fellow in AI/Computation@George R. Brown School of Engineering and Computing

Previously: Goldstine Fellow @ IBM

Houston, US

Sparse optimization theorist at Rice (Assoc
Prof.), now publishing on MoE routing theory
Recent papers: 'Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings' (2025, 14c), 'Guided by the Experts: Provable Feature Learning Dynamic of Soft-Routed MoE' (2025), federated MoE (Fed-ZERO, FedJETs)
Sparse methods background (h=30) directly applicable to expert routing and load balancing
US-based
Hireability: LOW (tenured/tenure-track)

André Susano Pinto

low hireability

Software Engineer@DeepMind

Previously: Intern @ Google

Direct MoE researcher at Google DeepMind
Authored 'Scaling Vision with Sparse MoE' (2021, 39 citations) and 'Processing images using MoE' (2024)
Research expertise explicitly includes 'mixture of experts, sparsity, large models'. h=19, SWE at DeepMind
Hireability: LOW — based in Switzerland, low hireability flag

Barret Zoph

low hireability

Something New@OpenAI

Previously: CTO, Co-Founder @ Thinking Machines

San Francisco, US

Switch Transformers co-author (3190 citations), GLaM MoE (963 citations), ST-MoE: Designing Stable and Transferable Sparse Expert Models (307 citations) — among the most impactful MoE routing papers ever published
Research Scientist at OpenAI in SF
Hireability: LOW — RS at OpenAI (frontier lab), unlikely to move, but exceptional MoE pedigree

Bin CUI

low hireability

Professor@Peking University

Previously: Research Fellow @ Singapore-MIT Alliance

PhD, h-index 69, Professor at Peking University
Highly active MoE researcher: NetMoE (2025), LSH-MoE communication-efficient routing (2024), FlexMoE dynamic architecture (2023)
Based in China

Carlos Riquelme Ruiz

low hireability

Principal Researcher@Microsoft

Previously: Head of Language Models Team @ Stability AI

Madrid, ES

Directly authored landmark MoE routing papers: 'Scaling Vision with Sparse MoE' (981 citations), 'From Sparse to Soft Mixtures of Experts' (235 citations), 'LIMoE' (332 citations), and 'Routers in Vision MoE: An Empirical Study'
Core MoE routing/sparse gating expert
Principal Researcher now at Microsoft Superintelligence Team (recently moved from Google Brain)
Hireability: LOW — senior Principal Researcher role, PhD, Madrid-based

Ethan He

low hireability

Member of Technical Staff@xAI

Previously: Staff Engineer @ NVIDIA

San Francisco, US

Direct MoE work: 'Upcycling LLMs into MoE' (2024, 25 citations) and 'Llama 3 meets MoE: Efficient Upcycling' (2024, 5 citations)
Research expertise lists 'mixture of experts' explicitly
MoTS at xAI, prev NVIDIA/Meta
Working on video generation with MoE backbone (Grok Imagine). h=19, 9k citations
Hireability: LOW — hireability flag is low

Furu Wei

low hireability

Chief Scientist@Microsoft

Previously: Partner Research Manager @ Microsoft

Beijing, CN

Multiple direct MoE papers: Multi-Head MoE (2024), Mixture of LoRA Experts (2024, 152 citations), On Representation Collapse of Sparse MoE (2022), VLMo (modality-expert MoE, 2022)
Strong MoE architecture research at Microsoft Research
Hireability: LOW — Chief Scientist at Microsoft Beijing, very senior (h-index 120)

Guoshun Nan

low hireability

Full Professor@Beijing University of Posts and Telecommunications

Previously: Tenure-tracked Professor @ Beijing University of Posts and Telecommunications

Beijing, CN

Published 'Advancing Expert Specialization for Better MoE' (2025, 1 citation) — directly on MoE expert specialization
However, core research expertise is Video LLMs, semantic communications, multimodal understanding — MoE is peripheral
Full Professor at Beijing University of Posts and Telecom, China-based. h=17
Hireability: not set

Hannaneh Hajishirzi

low hireability

Senior Director@Allen Institute for Artificial Intelligence

Previously: Senior Director @ Allen Institute for Artificial Intelligence

Co-author of OLMoE (Open Mixture-of-Experts LM, 2025, 110 citations) — direct MoE LLM architecture
Also 'SHARCS: Efficient Transformers Through Routing with Dynamic Width Sub-networks' (2023)
Senior Director at AI2 (OLMo/OLMoE project)
Hireability: LOW — Senior Director at AI2, very senior leadership (h-index 92)

Hongcheng Gao

low hireability

Incoming PhD student@College of AI at Tsinghua University

Previously: Intern @ Tsinghua University

Beijing, CN

Strong direct MoE routing work: 'AdaMoE: Token-Adaptive Routing with Null Experts for MoE LMs' (2024, 27 citations) — proposes adaptive top-k routing with null experts and load-balancing loss
Incoming PhD at Tsinghua, h=20, highly active website (21 recent changes)
Core MoE routing/load balancing research
Hireability: LOW — China-based incoming PhD student

Hyung Won Chung

low hireability

AI Research Scientist@Meta

Previously: Research Scientist @ OpenAI

San Francisco, US

Research expertise includes 'routing, model ensembles, cascade, adaptive computation'
Papers on 'Universal model routing for efficient LLM inference' (17 citations, 2025) and 'Universal LLM Routing with Correctness-Based Representation' (4 citations, 2025)
Won Google Tech Impact Award 2024 specifically for model routing work
Research Scientist at Google DeepMind, NY. h=22
Hireability: LOW — senior Research Scientist at DeepMind

Ivan Titov

low hireability

Full Professor@ILCC, School of Informatics, University of Edinburgh and ILLC, University of Amsterdam

Edinburgh, GB

PhD, h-index 55
Recent MoE-specific work: Load Balancing Loss for MoE (2025) and Layerwise Recurrent Router for MoE (2025)
Full Professor at U Edinburgh/U Amsterdam
Based in Edinburgh, GB

Jianfeng Gao

low hireability

Distinguished Scientist & Vice President@Microsoft

Previously: Partner Research Manager in Business AI @ Microsoft

Woodinville, US

Multiple direct MoE papers: GRIN (Gradient-Informed MoE, 2024), AutoMoE (NAS for sparse MoE, 2022), sparse MoE pruning, sparsely activated MoE multi-task learners
Strong MoE architecture and routing expertise at Microsoft
Hireability: LOW — Distinguished Scientist & VP at Microsoft, very senior (h-index 139)

Joan Puigcerver

low hireability

Senior Software Engineer in Research@Google

Previously: Software Engineer in Research @ Google

Zurich, CH

Direct MoE routing specialist at Google Zurich
Author of 'From Sparse to Soft Mixtures of Experts', 'Routers in Vision MoE: An Empirical Study', sparse upcycling MoE, LIMoE, and fast differentiable top-k for routing
One of the most focused MoE routing researchers in the field. h=29
Hireability: LOW (likely comfortable at Google)

Jun Zhu

low hireability

Professor@Tsinghua University

Previously: Adjunct Faculty @ Carnegie Mellon University

First author of 'ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing' (2025) — direct MoE routing innovation replacing top-k with ReLU gating for fully differentiable training
Direct contribution to MoE routing mechanism design
Hireability: LOW — Professor at Tsinghua University (h-index 97)

Luke Zettlemoyer

low hireability

Research Scientist@Meta

Previously: Research Lead @ Allen Institute for AI

Seattle, US

OLMoE co-author (Open MoE Language Model, 2025, 110 citations) — direct sparse MoE architecture work
Also 'Efficient Large Scale Language Modeling with Mixtures of Experts' (2021) and MoMa (modality-aware MoE experts, 2024, 279 citations)
Research Scientist at Meta
Hireability: LOW — Full Professor at UW + Meta Research Scientist, very senior (h-index 135)

Mehrdad Farajtabar

low hireability

Research Scientist/Team Lead@Apple

Previously: Research Scientist @ DeepMind

Seattle, US

MoE expertise listed explicitly; 2 recent MoE papers (2025): 'From Dense to Dynamic: Token-Difficulty Driven MoEfication' and 'MoE-PHDS: flexible runtime sparsity'
Research Scientist at Apple, h_index 37
Hireability low but active in MoE research

Mostafa Dehghani

low hireability

Research Scientist@Google

Previously: Researcher @ Apple

Amsterdam, NL

h-index 53, Research Scientist at Google, Amsterdam
Authored Sparse Upcycling (2022) — converting dense models to sparse MoEs
Strong transformer architecture background (Vision Transformer, PerceiverIO)

Neil Houlsby

low hireability

Member of Technical Staff@Anthropic

Previously: Senior Staff Research Scientist @ Google

Zurich, CH

Senior MoE researcher (Anthropic MTS, ex-Google Brain)
Led V-MoE (NeurIPS 2021), Soft MoE (ICLR 2024 spotlight), LIMoE (NeurIPS 2022), Scaling Laws for Sparsely-Connected Foundation Models (ICLR 2024 spotlight), Sparse Upcycling (ICLR 2023)
Pioneered adapter modules. 7 MoE/sparse papers
PhD
Currently at Anthropic (hireability low — unlikely to leave), but caliber is top-tier

Quoc V Le

low hireability

Research Scientist@Google

Previously: Research Visitor @ Max Planck Institute for Biological Cybernetics

San Francisco, US

Lead author of 'Mixture-of-Experts with Expert Choice Routing' (2022/2023) and 'Diversity and Depth in Per-Example Routing Models' (2018)
Direct MoE routing + expert load balancing expertise
Also 'Routing to expert subnetworks in MoE neural networks' (2025)
Hireability: LOW — Research Scientist/senior leader at Google, very senior (h-index 147)

Souvik Kundu

low hireability

Inference and SLM Optimization Lead@Intel

Previously: Staff Research Scientist @ Intel

Los Angeles, US

Research expertise explicitly includes 'Mixture of Experts' and 'Inference Efficiency and Optimizations for LLMs'
Has 'CITER: Collaborative Inference for Efficient LLM Decoding with Token-Level Routing' (2024) directly on routing
Active on website with recent papers
Intel Inference and SLM Optimization Lead; h=22
Hireability: LOW — senior industry optimization lead at Intel

Tan Minh Nguyen

low hireability

Assistant Professor@National University of Singapore

Previously: Postdoctoral Scholar @ University of California, Los Angeles

Singapore, SG

Direct MoE work: 'MomentumSMoE: Integrating Momentum into Sparse MoE' (2024, 3 citations) and 'MoLEx: Mixture of Layer Experts for Finetuning with Sparse Upcycling' (2025, 1 citation)
Research expertise explicitly lists 'mixture of experts, state-space models'
Assistant Professor at NUS, h=20
Hireability: LOW — hireability flag is low

Wei Dong

low hireability

Associate Professor@Xi'an University of Architecture and Technology

Previously: PhD student @ Northwest Polytechnical University

Xi'an, CN

Research expertise includes 'Mixture of Experts' alongside Parameter Efficient Fine-Tuning and Self-supervised Learning
Associate Professor at Xi'an University of Architecture and Technology, China. h=24
No MoE-specific papers retrieved from DB but MoE is listed expertise
China-based significantly limits mobility
Hireability: LOW — professor in China

Weizhu Chen

low hireability

Technical Fellow and CVP@Microsoft

Previously: Vice President @ Microsoft

Redmond, US

h-index 73, Technical Fellow & CVP at Microsoft, Redmond
GRIN MoE (2024) — gradient-informed expert routing; MoEBERT (2022, 76 cites) — MoE for BERT efficiency
Core MoE architecture researcher

William Fedus

low hireability

OpenAI

PhD, h-index 40, at OpenAI
Co-author of Switch Transformer (2022, seminal MoE routing paper) and MoE + Instruction Tuning
One of the most cited researchers specifically in MoE routing architecture

Wittawat Jitkrittum

low hireability

Research Scientist@DeepMind

Previously: Research Scientist @ Google

New York, US

Research expertise includes 'routing, model ensembles, cascade, adaptive computation'
Papers on 'Universal model routing for efficient LLM inference' (17 citations, 2025) and 'Universal LLM Routing with Correctness-Based Representation' (4 citations, 2025)
Won Google Tech Impact Award 2024 specifically for model routing work
Research Scientist at Google DeepMind, NY. h=22
Hireability: LOW — senior Research Scientist at DeepMind

Xiaonan Nie

low hireability

Staff Research Scientist@ByteDance

Previously: Technical Lead @ Tencent

San Francisco, US

Core MoE systems researcher with 5 MoE papers: FlexMoE (2023, 69 citations) addresses routing imbalance/fluctuation via dynamic device placement; HetuMoE (2022, 44 citations) trillion-scale MoE training; EvoMoE (2021, 43 citations) dense-to-sparse gating; Dense-to-Sparse Gate (2021, 28 citations) load-balanced MoE gating; NetMoE (2025, 5 citations) dynamic sample placement
Staff RS at ByteDance SF. h=17
Expertise: LLM + distributed ML systems
Hireability: LOW — hireability flag is low

Xipeng Qiu

low hireability

Professor@Fudan University

Shanghai, CN

h-index 78, Professor at Fudan University, Shanghai
MoE routing paper: Turn Waste into Worth: Rectifying Top-k Router of MoE (2024)
Prolific NLP researcher; based in China

Yanping Huang

low hireability

Engineer@Google

Previously: PhD student @ University of Washington

Exceptional MoE routing pedigree: GLaM (1174 citations, MoE LLM scaling), Expert Choice Routing (514 citations, foundational load-balancing paper), ST-MoE (307 citations, stable sparse expert training), Beyond Distillation MoE (139 citations)
Research Scientist at Google, h_index 30
Directly on-target for MoE routing and expert load balancing

Yanqi Zhou

low hireability

Staff Research Scientist@Google

Previously: Senior Research Scientist @ Google

San Francisco, US

Co-author of 'Mixture-of-Experts with Expert Choice Routing' (2022/2023) — a seminal load-balancing routing paper from Google Brain
Also 'MoE meets Instruction Tuning' (2024)
Research Scientist at Google Brain, h_index 36
Directly on-target for MoE routing and expert load balancing

Yifan Zhu

low hireability

Assistant Professor@Beijing University of Posts and Telecommunications

Previously: Postdoctoral Research Fellow @ Tsinghua University

Beijing, CN

Research expertise explicitly includes 'mixture of experts'
Has 'PM-MOE: Mixture of Experts on Private Model Parameters for Personalized Federated Learning' (2025) and 'A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models' (2025)
Active website with 37 changes through Feb 2026
Assistant Professor at BUPT Beijing. h=23
Hireability: LOW — faculty, China-based

Zhaozhuo Xu

low hireability

Assistant Professor@Stevens Institute of Technology

Previously: PhD student @ Rice University

Has 'Replicate and Quantize: A Plug-and-Play Strategy for Load Balancing in Sparse MoE LLMs' (2025) — directly on MoE expert load balancing
Background in approximate nearest neighbor search (used in sparse routing)
Active with papers through late 2025
Assistant Professor at Stevens Institute. h=21
Hireability: LOW — faculty role

Zhenhuan Liu

low hireability

Software Engineer@NVIDIA

Previously: Researcher @ NVIDIA

Beijing, CN

Software Engineer at NVIDIA with two MoE-specific papers: 'Llama 3 Meets MoE: Efficient Upcycling' (2024, 5 cites) and 'MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core' (2025, 2 cites)
Research expertise includes 'MoE models, Distributed LLM Training'
Background is primarily image generation; MoE work appears more recent
Beijing, CN. h=5, hireability=low

Zihan Qiu

low hireability

Researcher@Qwen

Previously: Research Intern @ INF Technology

Beijing, CN

Qwen researcher (Alibaba) working on scalable LLMs
Papers: HyperMoE (24 citations, 2024), 'A Closer Look into MoE in LLMs' (23 citations), 'GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory' (3 citations), 'Demons in the Detail: Load Balancing Loss' (6 citations)
Co-author Qwen2.5 (3885 citations)
Research expertise explicitly lists 'Mixture of Experts, Modular Networks'
LinkedIn updated Jan 2026 to 'not currently open to new opportunities' — low availability signal
Beijing-based

Runs

#1completed93 qualified / 126 foundApr 20, 1:44 PM