Back to dashboard

mid-level speculative decoding engineers in the US

completed20 qualified2 runsApr 27, 12:32 PMmid-level-speculative-decoding-engineers-in-the-us
Parsed1 topics · Mid · Engineer · US
Generating seed nodes
0 proposed
Explored 0 queries
0/0 done
    3
    Expanding nodes
    queued
    4
    Qualifying candidates
    queued

    Qualified Candidates (20)

    BA

    Ben Athiwaratkun

    high hireability

    Staff Research Scientist@Together AI

    Previously: AI Scientist @ Amazon

    New York, US

    • Senior Director at Together AI leading 15-person Core ML (Turbo) inference team; set technical direction for speculative decoding, built adaptive drafter systems with online RL (Aurora, ATLAS) and Bifurcated Attention (ICML 2024) for massively parallel decoding
    • PhD Cornell, New York
    • Over-seniority for mid-level: Senior Director managing 15 engineers
    • Hireability: HIGH — burst of 7 CV commits April 13-22 2026, rebuilt CV from scratch, classic active job-search pattern
    AG

    Amir Gholami

    medium hireability

    Postdoc@University of California, Berkeley

    San Francisco, US

    • Active speculative decoding researcher with 3 directly relevant papers: 'Speculative Decoding with Big Little Decoder' (NeurIPS 2023, 131 citations), 'SPEED: Speculative Pipelined Execution for Efficient Decoding' (2025, 36 citations), and 'QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache' (2025)
    • Also strong in KV cache quantization (KVQuant) and efficient LLM inference (SqueezeLLM)
    • Associate Research Scientist/Postdoc at BAIR, UC Berkeley, SF
    • Note: h-index 45 puts him senior academically — likely targeting staff/principal-level, not mid-level
    • Hireability: MEDIUM — postdoc at Berkeley with no explicit job-seeking signals (no LinkedIn changes, no website activity) but actively publishing in 2025; postdocs typically on the market
    ES

    Eric Sather

    medium hireability

    Technical Lead Manager, Machine Learning@Cerebras Systems

    Previously: Principal Machine Learning Engineer @ Rivian

    San Francisco, US

    • Co-authored DREAM (arxiv 2505.19201, 2025) — a novel speculative decoding framework for VLMs achieving 3.6x speedup, with direct contributions to draft model design using cross-attention and entropy-adaptive feature selection
    • Technical Lead Manager, ML at Cerebras Systems in SF; prolific patent record on NN quantization and inference circuits (2022-2025)
    • Note: TLM title is above mid-level but typical IC+lead hybrid at AI startups
    • Hireability: MEDIUM — no pipeline movement signals, actively publishing at Cerebras through 2025, stable role but within 2-4 year transition window
    HG

    Han Guo

    medium hireability

    Research Intern@Together AI

    Previously: Research Intern @ IBM

    San Francisco, US

    • Adjacent inference efficiency expertise — strong quantization/efficient-inference background (FLUTE fast matrix multiplications in C++, LQ-LoRA, training-free activation sparsity) but no direct speculative decoding work; co-authored with Pragaash Ponnusamy (Together AI speculative decoding) and interned at Together AI under Tri Dao (Summer 2025), suggesting adjacent exposure
    • MIT PhD (h-index 16), based in SF
    • Hireability: MEDIUM — late-stage PhD with 2026 Jane Street fellowship (likely defending 2026-2027), active GitHub commits as recent as April 2026; transition window imminent but not yet on market
    TX

    Tianhua Xia

    medium hireability

    PhD student@New York University

    New York, US

    • Published TWO speculative decoding papers: DREAM (NeurIPS 2025, multimodal SD with 3.6x speedup over conventional decoding) and STAR (ACL 2026, searchable drafting + target-aware refinement)
    • Also KV cache co-design (MICRO 2025), quantization for MoE (ICLR 2026), and LLM compression — deep systems + inference optimization profile
    • PhD student at NYU Tandon (SAI Lab, advisor Sai Zhang), New York. 2026 Dante Youla Award for Graduate Research Excellence + 2025 DAC Young Fellow
    • Hireability: MEDIUM — PhD student in active publishing phase through 2026, likely year 3-4 and approaching graduation window
    TC

    Tianle Cai

    medium hireability

    Graduate Research Assistant@Princeton University

    Previously: AI Researcher @ Together AI

    Princeton, US

    • Created Medusa (424 citations, multiple decoding heads for speculative decoding) and REST (128 citations, retrieval-based speculative decoding) — literally founded the FasterDecoding GitHub org and is the principal author of two landmark speculative decoding methods
    • PhD candidate at Princeton (Kai Li & Jason Lee group) working on ultra-efficient LLM inference systems; also part-time researcher at Together.ai
    • Based in Princeton, NJ (US)
    • Hireability: MEDIUM — OpenReview byline shows 'Researcher, ByteDance Inc.' suggesting a possible recent affiliation change; GitHub still shows Princeton. No explicit job-search signals detected, but final-stage PhD with major industry part-time work suggests transition window
    VT

    Vithursan Thangarasa

    medium hireability

    Principal Research Scientist@Cerebras Systems

    Previously: Lead Research Scientist @ Cerebras Systems

    San Francisco, US

    • Published 3 speculative decoding papers in 2025 (DREAM at NeurIPS 2025, MASSV: VLM spec decoding, SD²: Self-Distilled Sparse Drafters) — directly researching spec decoding at Cerebras Systems in San Francisco
    • MASc from U of Guelph, no PhD
    • Principal Research Scientist title is senior-leaning vs 'mid-level' query
    • Hireability: MEDIUM — pipeline shows title tweak only (same company, added 'ML'), active 2025 publication output indicates high engagement at Cerebras, no open-to-work signals detected
    XW

    Xiaoxia Wu

    medium hireability

    Principal Researcher@Together.AI

    Previously: Researcher @ Microsoft

    Newton, US

    • Principal Researcher at Together.AI directly working on speculative decoding — co-authored 'Beat the long tail: Distribution-Aware Speculative Decoding for RL Training' (2025, arXiv:2511.13841) and 'Aurora: When RL Meets Adaptive Speculative Training' (2026, arXiv:2602.06932, 1.5x inference speedup)
    • Strong LLM inference background (ZeroQuant, ZeRO++, KV cache quantization)
    • US (Newton, MA)
    • GitHub forks of vllm and TensorRT-LLM
    • Note: Principal Researcher level is more senior than 'mid-level' but she is squarely in the speculative decoding domain
    • Hireability: MEDIUM — switched from Microsoft to Together.AI (shows mobility), but no explicit open-to-work signals and tenure at Together.AI is unclear
    YH

    Yunhai Hu

    medium hireability

    Big Data Development Engineer@Bilibili

    Shanghai, CN

    • Published 3 speculative decoding papers in 2025: DREAM (multimodal SD framework with 3.6x speedup, accepted at major venue), PipeSpec (hierarchical LLM decoding), and a survey 'Speculative Decoding and Beyond' (11 citations)
    • Actively coding EAGLE-Qwen3, EAGLE3, SpecForge-Qwen3VL, and RLSD repos — hands-on implementer of major SD frameworks
    • GitHub lists @New York University, Courant; DIY-NIW-EB1A repos confirm US presence and intent to stay
    • Hireability: MEDIUM — likely PhD student/researcher at NYU Courant; immigration trajectory (NIW/EB1A) shows strong intent to remain in US; personal website still lists Bilibili/Shanghai (likely outdated)
    ZZ

    Zhihao Zhang

    medium hireability

    Ph.D. student@Carnegie Mellon University

    Previously: MS student @ Carnegie Mellon University

    Pittsburgh, US

    • First author of SpecInfer (ASPLOS 2024) — tree-based speculative inference for LLM serving — plus SpecReason (NeurIPS 2025) and TidalDecode (ICLR 2025)
    • CMU PhD student (Zhihao Jia's group) focused on LLM serving systems; pinned FlashInfer (CUDA kernel library) and TVM forks show hands-on systems depth
    • Located Pittsburgh, PA (US)
    • Hireability: MEDIUM — ~5 years into PhD (papers from 2021–2025), LinkedIn profile fully wiped by Jan 2026 scrape (possible transition signal), OSDI 2026 paper in pipeline suggests near-graduation but not confirmed available yet
    ZC

    Zhuoming Chen

    medium hireability

    Ph.D. student@Carnegie Mellon University

    Previously: Research Intern @ Meta

    New York, US

    • Primary author of 6+ speculative decoding papers including SpecInfer (381 citations), Sequoia (69 citations), TriForce (74 citations), and MagicDec (50 citations) — one of the most prolific speculative decoding researchers
    • PhD at CMU (2023–) advised by Beidi Chen and Zhihao Jia
    • Recent Meta FAIR internship (2025, Leon Bottou)
    • Hireability: MEDIUM — 3rd year PhD, likely 1-2 more years in program, but Meta FAIR internship + CV update 73 days ago show active industry engagement
    AS

    Ananda Theertha Suresh

    low hireability

    Senior Staff Research Scientist@Google

    Previously: Graduate Student Researcher @ University of California, San Diego

    New York, US

    • Core speculative decoding researcher at Google NY — co-authored SpecTr (NeurIPS 2023, 122 citations), SpecTr++, Block Verification Accelerates Speculative Decoding (2024), Optimal block-level draft verification (2024), and Fast Speculative Decoding Using Multiple Parallel Drafts (2025)
    • Research expertise explicitly lists 'Speculative decoding'
    • PhD UC San Diego
    • Note: Senior Staff Research Scientist (L7) is above mid-level, but direct topic expertise is exceptional
    • Hireability: LOW — no pipeline signals of job movement, no website/LinkedIn activity detected, likely long-tenured at Google with no observed transition signals
    BA

    Bilge Acun

    low hireability

    Research Scientist@Meta

    Previously: Research Staff Member @ IBM

    San Francisco, US

    • Co-authored LayerSkip (ACL 2024, 175 citations) — self-speculative decoding via early exit inference, directly on-point for query
    • Also has CHAI (clustered head attention for efficient LLM inference) and 2025 work on hybrid architectures and LLM reasoning acceleration
    • Research Scientist at Meta FAIR in SF, PhD UIUC 2017
    • More senior than 'mid-level' but has exactly the right speculative decoding expertise
    • Hireability: LOW — 86 months (~7 yrs) at Meta FAIR with no open-to-work signals; website updated March 2026 still shows current role
    CW

    Changhan Wang

    low hireability

    Technical Lead@Meta

    Previously: Technical Lead @ Meta

    New York, US

    • Co-authored LayerSkip (self-speculative decoding, ACL 2024, 150+ citations), CHAI (clustered head attention for efficient LLM inference), and multimodal generation inference acceleration papers — all directly on target
    • Technical Lead at Meta Superintelligence Labs in New York, h-index 34, 50+ papers
    • Hireability: LOW — pipeline shows position_update 73 days ago (recently transitioned to MSL), no open-to-work signals; likely just settled into new role. Possibly above 'mid-level' seniority
    DA

    Daiyaan Arfeen

    low hireability

    PhD student@Graduate Student, Carnegie Mellon University

    Previously: Deep Learning Architecture Intern @ NVIDIA

    San Francisco, US

    • Co-authored SpecInfer (ASPLOS 2024, 386 citations) — one of the seminal speculative inference/decoding papers for LLMs, implementing tree-based speculative inference
    • PhD CMU CS 2020-2025, BS UC Berkeley
    • Also published at SOSP 2023 (Sia) and MLSys 2025 (PipeFill) showing strong LLM systems depth
    • Hireability: LOW — NVIDIA Deep Learning Architecture intern May 2024–Feb 2025, transitioned to full-time at NVIDIA post-PhD (~1 year in role), not open to work
    JC

    Jian Chen

    low hireability

    Ph.D. student@University of California San Diego

    Previously: Research Internship @ Microsoft

    Pittsburgh, US

    • First author of MagicDec (ICLR 2025, 50 citations) and DFlash (arXiv 2026) — both directly on speculative decoding
    • MagicDec breaks latency-throughput tradeoff for long-context LLMs via sparse-KV drafting; DFlash achieves 6x lossless speedup via block diffusion drafting
    • MS CMU, now PhD Year 1 at UC San Diego (Zhijian Liu lab)
    • Based in San Diego, CA
    • Hireability: LOW — just started PhD program ~Fall 2025, no open-to-work signals; website updated March 2026 but with paper additions only
    MN

    Mahyar Najibi

    low hireability

    Co-Founder and Chief Scientific Officer@ElastixAI

    Previously: Senior AIML Manager / Lead Scientist @ Apple

    San Francisco, US

    • Published directly on speculative decoding: 'Speculative Streaming: Fast LLM Inference without Auxiliary Models' (2024, arXiv:2402.11131) and 'QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache' (2025), both from his time as AI/ML Research Manager at Apple leading an LLM/CV team
    • Also: LazyLLM (efficient long-context inference), M2R2 (efficient transformer inference), Apple Intelligence Foundation LLMs
    • Seniority overshoot — PhD UMaryland, h-index 26, was managing a team at Apple, now Co-Founder/CSO
    • Hireability: LOW — ~10 months as Co-Founder/CSO of ElastixAI, actively building the startup
    SZ

    Sai Qian Zhang

    low hireability

    Assistant Professor@New York University

    Previously: Senior Research Scientist @ Meta

    New York, US

    • Directly relevant to speculative decoding: co-author of speculative decoding survey (2025), PipeSpec (hierarchical LLM decoding, 2025), and DREAM (multimodal speculative decoding, 2025)
    • Assistant Professor at NYU with prior experience as Senior Research Scientist at Meta Reality Labs; strong ML systems + hardware focus
    • Senior by academic standards — query asks for mid-level, but background is engineering-research crossover
    • Hireability: LOW — actively recruiting PhD students for their NYU lab, no open-to-work signals, stable faculty position
    ZZ

    Zhengxin Zhang

    low hireability

    PhD Student@Cornell University

    Previously: Research assistant @ Carnegie Mellon University

    Ithaca, US

    • Co-author on SpecInfer (arxiv 2305.09781, 386 citations) — one of the foundational speculative decoding papers — while RA at CMU under Prof
    • Zhihao Jia (FlexFlow group)
    • Also published QST (ACL 2024 Outstanding Paper) on quantized LLM fine-tuning
    • Now 2nd-year PhD student at Cornell
    • Hireability: LOW — just started PhD in 2024, no job search signals detected, no LinkedIn available
    ZL

    Zhijian Liu

    low hireability

    Research Scientist@NVIDIA

    Previously: Research Scientist @ NVIDIA

    San Francisco, US

    • Directly relevant to speculative decoding — leads z-lab (UCSD) with DFlash (Block Diffusion for Flash Speculative Decoding) and Fast-dLLM (parallel decoding for diffusion LLMs), plus LServe on efficient LLM serving; h-index 30
    • Note: he's senior-level (Assistant Professor at UCSD + Research Scientist at NVIDIA), not mid-level
    • Hireability: LOW — transitioned to Assistant Professor at UCSD ~Jan 2026, just launched his own lab (z-lab.ai), extremely unlikely to step down to industry engineer role

    Runs

    #2completed0 qualified / 0 foundApr 27, 12:46 PM
    #1completed0 qualified / 0 foundApr 27, 12:32 PM