mid-level speculative decoding engineers in the US

completed20 qualified2 runsApr 27, 12:32 PMmid-level-speculative-decoding-engineers-in-the-us

Parsed1 topics · Mid · Engineer · US

Generating seed nodes

0 proposed

Explored 0 queries

0/0 done

Expanding nodes

queued

Qualifying candidates

queued

Qualified Candidates (20)

Ben Athiwaratkun

high hireability

Staff Research Scientist@Together AI

Previously: AI Scientist @ Amazon

New York, US

Senior Director at Together AI leading 15-person Core ML (Turbo) inference team; set technical direction for speculative decoding, built adaptive drafter systems with online RL (Aurora, ATLAS) and Bifurcated Attention (ICML 2024) for massively parallel decoding
PhD Cornell, New York
Over-seniority for mid-level: Senior Director managing 15 engineers
Hireability: HIGH — burst of 7 CV commits April 13-22 2026, rebuilt CV from scratch, classic active job-search pattern

Amir Gholami

medium hireability

Postdoc@University of California, Berkeley

San Francisco, US

Active speculative decoding researcher with 3 directly relevant papers: 'Speculative Decoding with Big Little Decoder' (NeurIPS 2023, 131 citations), 'SPEED: Speculative Pipelined Execution for Efficient Decoding' (2025, 36 citations), and 'QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache' (2025)
Also strong in KV cache quantization (KVQuant) and efficient LLM inference (SqueezeLLM)
Associate Research Scientist/Postdoc at BAIR, UC Berkeley, SF
Note: h-index 45 puts him senior academically — likely targeting staff/principal-level, not mid-level
Hireability: MEDIUM — postdoc at Berkeley with no explicit job-seeking signals (no LinkedIn changes, no website activity) but actively publishing in 2025; postdocs typically on the market

Eric Sather

medium hireability

Technical Lead Manager, Machine Learning@Cerebras Systems

Previously: Principal Machine Learning Engineer @ Rivian

San Francisco, US

Co-authored DREAM (arxiv 2505.19201, 2025) — a novel speculative decoding framework for VLMs achieving 3.6x speedup, with direct contributions to draft model design using cross-attention and entropy-adaptive feature selection
Technical Lead Manager, ML at Cerebras Systems in SF; prolific patent record on NN quantization and inference circuits (2022-2025)
Note: TLM title is above mid-level but typical IC+lead hybrid at AI startups
Hireability: MEDIUM — no pipeline movement signals, actively publishing at Cerebras through 2025, stable role but within 2-4 year transition window

Han Guo

medium hireability

Research Intern@Together AI

Previously: Research Intern @ IBM

San Francisco, US

Adjacent inference efficiency expertise — strong quantization/efficient-inference background (FLUTE fast matrix multiplications in C++, LQ-LoRA, training-free activation sparsity) but no direct speculative decoding work; co-authored with Pragaash Ponnusamy (Together AI speculative decoding) and interned at Together AI under Tri Dao (Summer 2025), suggesting adjacent exposure
MIT PhD (h-index 16), based in SF
Hireability: MEDIUM — late-stage PhD with 2026 Jane Street fellowship (likely defending 2026-2027), active GitHub commits as recent as April 2026; transition window imminent but not yet on market

Tianhua Xia

medium hireability

PhD student@New York University

New York, US

Published TWO speculative decoding papers: DREAM (NeurIPS 2025, multimodal SD with 3.6x speedup over conventional decoding) and STAR (ACL 2026, searchable drafting + target-aware refinement)
Also KV cache co-design (MICRO 2025), quantization for MoE (ICLR 2026), and LLM compression — deep systems + inference optimization profile
PhD student at NYU Tandon (SAI Lab, advisor Sai Zhang), New York. 2026 Dante Youla Award for Graduate Research Excellence + 2025 DAC Young Fellow
Hireability: MEDIUM — PhD student in active publishing phase through 2026, likely year 3-4 and approaching graduation window

Tianle Cai

medium hireability

Graduate Research Assistant@Princeton University

Previously: AI Researcher @ Together AI

Princeton, US

Created Medusa (424 citations, multiple decoding heads for speculative decoding) and REST (128 citations, retrieval-based speculative decoding) — literally founded the FasterDecoding GitHub org and is the principal author of two landmark speculative decoding methods
PhD candidate at Princeton (Kai Li & Jason Lee group) working on ultra-efficient LLM inference systems; also part-time researcher at Together.ai
Based in Princeton, NJ (US)
Hireability: MEDIUM — OpenReview byline shows 'Researcher, ByteDance Inc.' suggesting a possible recent affiliation change; GitHub still shows Princeton. No explicit job-search signals detected, but final-stage PhD with major industry part-time work suggests transition window

Vithursan Thangarasa

medium hireability

Principal Research Scientist@Cerebras Systems

Previously: Lead Research Scientist @ Cerebras Systems

San Francisco, US

Published 3 speculative decoding papers in 2025 (DREAM at NeurIPS 2025, MASSV: VLM spec decoding, SD²: Self-Distilled Sparse Drafters) — directly researching spec decoding at Cerebras Systems in San Francisco
MASc from U of Guelph, no PhD
Principal Research Scientist title is senior-leaning vs 'mid-level' query
Hireability: MEDIUM — pipeline shows title tweak only (same company, added 'ML'), active 2025 publication output indicates high engagement at Cerebras, no open-to-work signals detected

Xiaoxia Wu

medium hireability

Principal Researcher@Together.AI

Previously: Researcher @ Microsoft

Newton, US

Principal Researcher at Together.AI directly working on speculative decoding — co-authored 'Beat the long tail: Distribution-Aware Speculative Decoding for RL Training' (2025, arXiv:2511.13841) and 'Aurora: When RL Meets Adaptive Speculative Training' (2026, arXiv:2602.06932, 1.5x inference speedup)
Strong LLM inference background (ZeroQuant, ZeRO++, KV cache quantization)
US (Newton, MA)
GitHub forks of vllm and TensorRT-LLM
Note: Principal Researcher level is more senior than 'mid-level' but she is squarely in the speculative decoding domain
Hireability: MEDIUM — switched from Microsoft to Together.AI (shows mobility), but no explicit open-to-work signals and tenure at Together.AI is unclear

Yunhai Hu

medium hireability

Big Data Development Engineer@Bilibili

Shanghai, CN

Published 3 speculative decoding papers in 2025: DREAM (multimodal SD framework with 3.6x speedup, accepted at major venue), PipeSpec (hierarchical LLM decoding), and a survey 'Speculative Decoding and Beyond' (11 citations)
Actively coding EAGLE-Qwen3, EAGLE3, SpecForge-Qwen3VL, and RLSD repos — hands-on implementer of major SD frameworks
GitHub lists @New York University, Courant; DIY-NIW-EB1A repos confirm US presence and intent to stay
Hireability: MEDIUM — likely PhD student/researcher at NYU Courant; immigration trajectory (NIW/EB1A) shows strong intent to remain in US; personal website still lists Bilibili/Shanghai (likely outdated)

Zhihao Zhang

medium hireability

Ph.D. student@Carnegie Mellon University

Previously: MS student @ Carnegie Mellon University

Pittsburgh, US

First author of SpecInfer (ASPLOS 2024) — tree-based speculative inference for LLM serving — plus SpecReason (NeurIPS 2025) and TidalDecode (ICLR 2025)
CMU PhD student (Zhihao Jia's group) focused on LLM serving systems; pinned FlashInfer (CUDA kernel library) and TVM forks show hands-on systems depth
Located Pittsburgh, PA (US)
Hireability: MEDIUM — ~5 years into PhD (papers from 2021–2025), LinkedIn profile fully wiped by Jan 2026 scrape (possible transition signal), OSDI 2026 paper in pipeline suggests near-graduation but not confirmed available yet

Zhuoming Chen

medium hireability

Ph.D. student@Carnegie Mellon University

Previously: Research Intern @ Meta

New York, US

Primary author of 6+ speculative decoding papers including SpecInfer (381 citations), Sequoia (69 citations), TriForce (74 citations), and MagicDec (50 citations) — one of the most prolific speculative decoding researchers
PhD at CMU (2023–) advised by Beidi Chen and Zhihao Jia
Recent Meta FAIR internship (2025, Leon Bottou)
Hireability: MEDIUM — 3rd year PhD, likely 1-2 more years in program, but Meta FAIR internship + CV update 73 days ago show active industry engagement

Ananda Theertha Suresh

low hireability

Senior Staff Research Scientist@Google

Previously: Graduate Student Researcher @ University of California, San Diego

New York, US

Core speculative decoding researcher at Google NY — co-authored SpecTr (NeurIPS 2023, 122 citations), SpecTr++, Block Verification Accelerates Speculative Decoding (2024), Optimal block-level draft verification (2024), and Fast Speculative Decoding Using Multiple Parallel Drafts (2025)
Research expertise explicitly lists 'Speculative decoding'
PhD UC San Diego
Note: Senior Staff Research Scientist (L7) is above mid-level, but direct topic expertise is exceptional
Hireability: LOW — no pipeline signals of job movement, no website/LinkedIn activity detected, likely long-tenured at Google with no observed transition signals

Bilge Acun

low hireability

Research Scientist@Meta

Previously: Research Staff Member @ IBM

San Francisco, US

Co-authored LayerSkip (ACL 2024, 175 citations) — self-speculative decoding via early exit inference, directly on-point for query
Also has CHAI (clustered head attention for efficient LLM inference) and 2025 work on hybrid architectures and LLM reasoning acceleration
Research Scientist at Meta FAIR in SF, PhD UIUC 2017
More senior than 'mid-level' but has exactly the right speculative decoding expertise
Hireability: LOW — 86 months (~7 yrs) at Meta FAIR with no open-to-work signals; website updated March 2026 still shows current role

Changhan Wang

low hireability

Technical Lead@Meta

Previously: Technical Lead @ Meta

New York, US

Co-authored LayerSkip (self-speculative decoding, ACL 2024, 150+ citations), CHAI (clustered head attention for efficient LLM inference), and multimodal generation inference acceleration papers — all directly on target
Technical Lead at Meta Superintelligence Labs in New York, h-index 34, 50+ papers
Hireability: LOW — pipeline shows position_update 73 days ago (recently transitioned to MSL), no open-to-work signals; likely just settled into new role. Possibly above 'mid-level' seniority

Daiyaan Arfeen

low hireability

PhD student@Graduate Student, Carnegie Mellon University

Previously: Deep Learning Architecture Intern @ NVIDIA

San Francisco, US

Co-authored SpecInfer (ASPLOS 2024, 386 citations) — one of the seminal speculative inference/decoding papers for LLMs, implementing tree-based speculative inference
PhD CMU CS 2020-2025, BS UC Berkeley
Also published at SOSP 2023 (Sia) and MLSys 2025 (PipeFill) showing strong LLM systems depth
Hireability: LOW — NVIDIA Deep Learning Architecture intern May 2024–Feb 2025, transitioned to full-time at NVIDIA post-PhD (~1 year in role), not open to work

Jian Chen

low hireability

Ph.D. student@University of California San Diego

Previously: Research Internship @ Microsoft

Pittsburgh, US

First author of MagicDec (ICLR 2025, 50 citations) and DFlash (arXiv 2026) — both directly on speculative decoding
MagicDec breaks latency-throughput tradeoff for long-context LLMs via sparse-KV drafting; DFlash achieves 6x lossless speedup via block diffusion drafting
MS CMU, now PhD Year 1 at UC San Diego (Zhijian Liu lab)
Based in San Diego, CA
Hireability: LOW — just started PhD program ~Fall 2025, no open-to-work signals; website updated March 2026 but with paper additions only

Mahyar Najibi

low hireability

Co-Founder and Chief Scientific Officer@ElastixAI

Previously: Senior AIML Manager / Lead Scientist @ Apple

San Francisco, US

Published directly on speculative decoding: 'Speculative Streaming: Fast LLM Inference without Auxiliary Models' (2024, arXiv:2402.11131) and 'QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache' (2025), both from his time as AI/ML Research Manager at Apple leading an LLM/CV team
Also: LazyLLM (efficient long-context inference), M2R2 (efficient transformer inference), Apple Intelligence Foundation LLMs
Seniority overshoot — PhD UMaryland, h-index 26, was managing a team at Apple, now Co-Founder/CSO
Hireability: LOW — ~10 months as Co-Founder/CSO of ElastixAI, actively building the startup

Sai Qian Zhang

low hireability

Assistant Professor@New York University

Previously: Senior Research Scientist @ Meta

New York, US

Directly relevant to speculative decoding: co-author of speculative decoding survey (2025), PipeSpec (hierarchical LLM decoding, 2025), and DREAM (multimodal speculative decoding, 2025)
Assistant Professor at NYU with prior experience as Senior Research Scientist at Meta Reality Labs; strong ML systems + hardware focus
Senior by academic standards — query asks for mid-level, but background is engineering-research crossover
Hireability: LOW — actively recruiting PhD students for their NYU lab, no open-to-work signals, stable faculty position

Zhengxin Zhang

low hireability

PhD Student@Cornell University

Previously: Research assistant @ Carnegie Mellon University

Ithaca, US

Co-author on SpecInfer (arxiv 2305.09781, 386 citations) — one of the foundational speculative decoding papers — while RA at CMU under Prof
Zhihao Jia (FlexFlow group)
Also published QST (ACL 2024 Outstanding Paper) on quantized LLM fine-tuning
Now 2nd-year PhD student at Cornell
Hireability: LOW — just started PhD in 2024, no job search signals detected, no LinkedIn available

Zhijian Liu

low hireability

Research Scientist@NVIDIA

Previously: Research Scientist @ NVIDIA

San Francisco, US

Directly relevant to speculative decoding — leads z-lab (UCSD) with DFlash (Block Diffusion for Flash Speculative Decoding) and Fast-dLLM (parallel decoding for diffusion LLMs), plus LServe on efficient LLM serving; h-index 30
Note: he's senior-level (Assistant Professor at UCSD + Research Scientist at NVIDIA), not mid-level
Hireability: LOW — transitioned to Assistant Professor at UCSD ~Jan 2026, just launched his own lab (z-lab.ai), extremely unlikely to step down to industry engineer role

Runs

#2completed0 qualified / 0 foundApr 27, 12:46 PM

#1completed0 qualified / 0 foundApr 27, 12:32 PM