Back to dashboard

senior FP8 training engineers in the US

completed22 qualified2 runsApr 27, 1:01 PMsenior-fp8-training-engineers-in-the-us
Parsed1 topics · Senior · Engineer · US
Generating seed nodes
0 proposed
Explored 0 queries
0/0 done
    3
    Expanding nodes
    queued
    4
    Qualifying candidates
    queued

    Qualified Candidates (20)

    AG

    Amir Gholami

    medium hireability

    Postdoc@University of California, Berkeley

    San Francisco, US

    • Leading quantization researcher at UC Berkeley (HAWQ series, 1894-cited quantization survey, FP16 tensor core contribution to NVIDIA, 2025 paper on low-precision tensor processing). h-index 45, deep expertise in mixed-precision training and inference
    • SF-based, US
    • Hireability: MEDIUM — Associate Research Scientist co-directing Pallas Lab (more PI-like than typical postdoc), no explicit open-to-work signals; academic trajectory suggests possible faculty pursuit over industry
    DN

    Deepak Narayanan

    medium hireability

    Senior Applied Deep Learning Research Scientist@NVIDIA

    Previously: Senior Researcher @ Microsoft

    Seattle, US

    • Core Megatron-LM developer at NVIDIA (322 commits, still active Apr 2026) with direct TransformerEngine FP8 contributions (2 merged PRs, incl
    • FP32 wgrad fix for weight tying in FP8 training)
    • Also landed 'FP32 gradient accumulation for subset of params' PR in Megatron-LM — a key FP8 training stability primitive
    • Senior Applied DL Research Scientist, PhD Stanford CS 2021, based in Bellevue WA
    • Hireability: MEDIUM — 4.5 years at NVIDIA, extremely active (commits days ago), no outbound signals; long tenure but below 6-year entrenchment threshold
    PN

    Phuong Ha Nguyen

    medium hireability

    Applied Researcher@eBay

    Previously: Research Fellow @ University of Connecticut

    San Francisco, US

    • Core NVIDIA TransformerEngine contributor with 127 PRs (Oct 2024–Apr 2026), including [JAX] Collective GEMM with FP8 and MXFP8 support, FP8 GEMM precision configuration (TE_FP8_GEMM_HIGH_PRECISION_ACCUMULATION), and collective NCCL operations for multi-GPU FP8 training on Hopper/Ada GPUs. h_index 21
    • Based in Santa Clara, CA
    • DB shows eBay but GitHub profile and nvidia.com commit emails confirm current NVIDIA role
    • Hireability: MEDIUM — ~1.5 years at NVIDIA (since Oct 2024); no explicit signals of looking but within typical transition window
    RZ

    Ritchie Zhao

    medium hireability

    Senior AI and Machine Learning Engineer@NVIDIA

    Previously: Senior Data Science Manager @ Microsoft

    Redmond, US

    • Senior engineer at NVIDIA (Redmond, US) with entire career devoted to narrow-precision ML training and inference: co-author of 'Shared Microexponents' (MXFP8 precursor, 71 citations, 2023), 'Pushing the Limits of Narrow Precision with MSFP' (159 citations, NeurIPS 2020), and 2025 patent on activation compression for neural network training
    • Also has recent work on activation compression for training (2024–2025)
    • Cornell PhD 2020 (ECE), h-index 22
    • Hireability: MEDIUM — no job-seeking signals detected, stable at NVIDIA; no LinkedIn/website changes recorded in pipeline. Career stage (~5 years post-PhD) is within typical transition window
    SA

    Sai Aparna Aketi

    medium hireability

    Research Scientist@Meta

    Previously: Postdoctoral Researcher @ Meta

    San Francisco, US

    • Low-precision training researcher at Meta (Privacy Preserving ML team, Research Scientist)
    • Published papers on 8-bit quantization for decentralized distributed training (2021-2022) and mixed precision training via Opacus for LLMs (2025)
    • Primary expertise is decentralized/federated learning algorithms rather than FP8 training infrastructure specifically; PhD Purdue
    • Based in SF
    • Hireability: MEDIUM — website position_update ~8 months ago (Aug 2025), no explicit job-seeking signals detected, unclear tenure at Meta
    VC

    Vikas Chandra

    medium hireability

    Senior Director, AI@Meta

    Previously: Director, Applied ML @ Arm

    San Francisco, US

    • Senior Director of AI at Meta Reality Labs leading efficient on-device LLMs and quantization research for AR products
    • Published LLM-QAT (data-free QAT for LLMs, 414 citations), CPT (cyclic precision training, ICLR 2021 Spotlight), and SpinQuant — all directly relevant to FP8 training methodology. h-index 47, PhD CMU ECE, in SF
    • Hireability: MEDIUM — ~8 years at Meta (long tenure, Senior Director level), but cv_update on personal website in Jan 2026 signals recent career motion
    XL

    Xing Liu

    medium hireability

    Principal Research Scientist@Meta

    Previously: Research Scientist @ Intel

    San Francisco, US

    • Principal Research Scientist at Meta (SF) with strong HPC and distributed training background — co-designed software-hardware training systems for DLRM at scale (186 citations), contributed to TorchRec and Jagged Flash Attention GPU kernels
    • No explicit FP8 papers found, but deep expertise in precision-sensitive training efficiency at GPU scale is directly adjacent. h_index 24
    • Hireability: MEDIUM — ~5 years at Meta (papers from 2021-2024), within transition window; no active mobility signals from pipeline (no LinkedIn changes, no website updates)
    BG

    Boris Ginsburg

    low hireability

    Senior Director, Conversational AI@NVIDIA

    Previously: Principal Engineer, Deep Learning - HW/SW acceleration @ Intel

    San Francisco, US

    • Pioneer of mixed precision training (co-authored seminal 'Mixed Precision Training' 2018 paper, foundational to all FP8 work) and co-author of Nemotron-H (2025) which introduced an FP8-based training recipe at NVIDIA
    • Senior Director of Conversational AI at NVIDIA in SF, h-index 46, 12 years at NVIDIA
    • Hireability: LOW — long-tenured senior director at NVIDIA (12 years), no pipeline signals of job seeking, DB pre-computed hireability is low
    BC

    Bryan Catanzaro

    low hireability

    Vice President, Applied Deep Learning Research@NVIDIA

    Previously: Senior Researcher @ Baidu

    San Francisco, US

    • VP Applied Deep Learning Research at NVIDIA; co-authored Megatron-LM (2019/2021), NVFP4 pretraining (2025), and Nemotron-4 340B — directly oversees large-scale FP8/low-precision LLM training infrastructure
    • US-based in Santa Clara, CA
    • H-index 73
    • Hireability: LOW — VP-level executive at NVIDIA with no signals of transition (no LinkedIn changes, no website activity, no GitHub activity)
    CP

    Christian Puhrsch

    low hireability

    Researcher@Meta

    Previously: MS student @ New York University

    • Core TorchAO contributor (70 commits on pytorch/ao, co-author on TorchAO 2025 paper)
    • PyTorch-native float8/FP8 training infra including work on torchao/float8 module — FP8 pretraining at 1.5x speedup on 405B-scale models
    • Researcher at Meta, based in Seattle WA (US)
    • H-index 8
    • Hireability: LOW — ~10 years at Meta/Facebook (joined ~2016), no LinkedIn changes, no website activity, no open-to-work signals
    CS

    Christopher De Sa

    low hireability

    Researcher@together.ai

    Previously: Assistant Professor @ Cornell University

    Ithaca, US

    • Leading researcher in low-precision training and LLM quantization — authored QPyTorch (low-precision arithmetic simulation framework supporting FP8-like formats), SWALP (low-precision training), QuIP#/QTIP (SOTA LLM post-training quantization, 2024)
    • Associate Professor at Cornell, PhD Stanford 2017, h-index 41
    • US-based (Ithaca)
    • Hireability: LOW — tenured faculty with NSF CAREER + DARPA Young Faculty awards and active lab; 'together.ai' in DB is likely a consulting/research collaboration, not a full-time role. No job-seeking signals detected
    JD

    Jordan Dotzel

    low hireability

    Student Researcher@Google

    Previously: Software Engineer @ Datto

    San Francisco, US

    • Multiple publications on mixed-precision floating-point quantization and LLM numerical formats (FLIQS: Best Paper AutoML 2024; t-distribution formats for LLMs) — directly relevant to FP8 training
    • PhD Cornell (Computer Systems Lab) June 2025, now Neural Architect at Google Gemini+Cloud Advanced Development in SF
    • Hireability: LOW — ~9 months into new role at Google after completing PhD, likely still settling in
    KK

    Kurt Keutzer

    low hireability

    Co-Founder and Strategic Advisor@SigIQ.ai

    Previously: Chief Strategy Officer (CSO) @ Nexusflow

    San Francisco, US

    • Directly relevant: co-authored 'COAT: Memory-Efficient FP8 Training' (2025) plus broad quantization portfolio (KVQuant 296 citations, HAWQ, SqueezeLLM, FGMP)
    • Professor Emeritus at Berkeley BAIR, h-index 117, in SF
    • Hireability: LOW — Professor Emeritus status and active Co-Founder/Strategic Advisor at SigIQ.ai (previously founded Deepscale→Tesla, Nexusflow.ai→NVIDIA); pipeline shows no LinkedIn movement or website activity
    MS

    Mohammad Shoeybi

    low hireability

    Senior Director of Applied Research@NVIDIA

    Previously: Senior Research Engineer - Tech Lead @ DeepMind

    San Francisco, US

    • Co-author of 'FP8 Formats for Deep Learning' (283 citations, 2022) and lead contributor on NVIDIA/Megatron-LM (339 commits) — the canonical large-scale LLM training framework
    • Senior Director of Applied Research at NVIDIA in SF, h-index 43
    • Directly on-point for senior FP8 training work
    • Hireability: LOW — 5+ years at NVIDIA building Megatron-LM since 2019, very senior/entrenched role, no open-to-work signals in pipeline, website, or GitHub
    MP

    Mostofa Patwary

    low hireability

    Director of Large Foundational Language Model, Applied Deep Learning Research@NVIDIA

    Previously: Principal Research Scientist and Senior Engineering Manager, Applied Deep Learning Research @ NVIDIA

    San Francisco, US

    • Director of Large Foundational LM at NVIDIA Applied Deep Learning Research in SF; 94 commits on NVIDIA/Megatron-LM and co-author of 'Pretraining Large Language Models with NVFP4' (2025), demonstrating direct low-precision pretraining expertise
    • H-index 43, PI on Megatron-LM papers with 2800+ citations
    • Hireability: LOW — long-tenured Director at NVIDIA with no signals of intent to leave; actively publishing multiple 2025 Nemotron papers, no LinkedIn changes or website updates detected
    SH

    Song Han

    low hireability

    Researcher@NVIDIA

    Previously: Assistant Professor @ MIT

    • Pioneer in FP8/quantized training — authored COAT (FP8 training paper, ICLR 2024/2025), SmoothQuant, and AWQ (MLSys 2024 Best Paper). h-index 79
    • Active FP4/NVFP4 hardware work (fouroversix repo)
    • Based in Cambridge, MA (MIT)
    • Hireability: LOW — tenured Associate Professor at MIT running HAN Lab; OpenReview/DB list NVIDIA affiliation suggesting industry collaboration, but primary role is academic PI with own lab
    TD

    Tim Dettmers

    low hireability

    Assistant Professor@Carnegie Mellon University

    Previously: Researcher @ Allen Institute for Artificial Intelligence

    • The definitive expert on low-precision LLM training — created bitsandbytes, authored LLM.int8(), QLoRA, 8-bit Optimizers, and SpQR
    • FP8 training is a direct extension of his decade-long 8-bit/4-bit quantization research
    • Assistant Professor at CMU + Research Scientist at AI2, based in US
    • Hireability: LOW — tenure-track faculty with dual CMU/AI2 appointment, no open-to-work signals, recent Jan 2026 blog posts show active research on coding agents at AI2 with no career transition signals
    VK

    Vijay Anand Korthikanti

    low hireability

    Principal Research Scientist@NVIDIA

    Previously: Member of Technical Staff @ Cerebras Systems

    Hyderabad, IN

    • Principal Research Scientist at NVIDIA with 150 commits on Megatron-LM and co-author of landmark training papers ('Reducing Activation Recomputation in Large Transformer Models' 410 cites, 'Efficient Large-Scale LM Training on GPU Clusters' 1229 cites, 2025 MoE Parallel Folding)
    • Deep expertise in distributed LLM training — sequence/expert parallelism, memory optimization, MoE scaling — directly underpins FP8 training infrastructure in Megatron-LM
    • Still actively committing to Megatron-LM (April 2025, paged attention + H100 clusters)
    • NOTE: DB location = Hyderabad (IN); commits reference NVIDIA DFW H100 clusters but US physical presence unconfirmed
    • Hireability: LOW — ~68 months at NVIDIA as Principal RS, no open-to-work signals
    YL

    Yao Lu

    low hireability

    Distinguished Research Scientist@NVIDIA

    Previously: Principal Research Scientist @ NVIDIA

    San Francisco, US

    • Co-authored COAT (ICLR 2025) — 'Compressing Optimizer states and Activations for Memory-Efficient FP8 Training' — a direct FP8 training paper with dynamic range expansion and mixed-granularity activation quantization, achieving 1.54x memory reduction over BF16
    • Previously Distinguished Research Scientist at NVIDIA (2023-2025) working on VILA/LongVILA VLM training infra; h-index 35; SF-based
    • Hireability: LOW — recently moved to Physical Intelligence as Principal Researcher in 2025 (~1 year into new role), no open-to-work signals detected
    YH

    Yuxiong He

    low hireability

    Distinguished Engineer & Research Manager@Snowflake

    Previously: Partner Research & Product Manager - Cofounder and Leader of DeepSpeed @ Microsoft

    Seattle, US

    • Distinguished Engineer & Research Manager at Snowflake (formerly led Microsoft DeepSpeed team)
    • Direct FP8 work: ZeroQuant-FP (2023) on W4A8 quantization using FP8 activation; co-authored ZeRO++ (large model training communication), ZeroQuant, and Int4 quantization papers
    • H-index 54, very senior in the large-scale training and quantization space
    • US-based (Bellevue, WA)
    • Hireability: LOW — entrenched in senior leadership at Snowflake, actively announcing research releases (Arctic Inference, Arctic Long Sequence Training), no open-to-work signals on LinkedIn

    Runs

    #2completed0 qualified / 0 foundApr 27, 1:17 PM
    #1completed0 qualified / 0 foundApr 27, 1:01 PM