senior FP8 training engineers in the US

completed22 qualified2 runsApr 27, 1:01 PMsenior-fp8-training-engineers-in-the-us

Parsed1 topics · Senior · Engineer · US

Generating seed nodes

0 proposed

Explored 0 queries

0/0 done

Expanding nodes

queued

Qualifying candidates

queued

Qualified Candidates (20)

Amir Gholami

medium hireability

Postdoc@University of California, Berkeley

San Francisco, US

Leading quantization researcher at UC Berkeley (HAWQ series, 1894-cited quantization survey, FP16 tensor core contribution to NVIDIA, 2025 paper on low-precision tensor processing). h-index 45, deep expertise in mixed-precision training and inference
SF-based, US
Hireability: MEDIUM — Associate Research Scientist co-directing Pallas Lab (more PI-like than typical postdoc), no explicit open-to-work signals; academic trajectory suggests possible faculty pursuit over industry

Deepak Narayanan

medium hireability

Senior Applied Deep Learning Research Scientist@NVIDIA

Previously: Senior Researcher @ Microsoft

Seattle, US

Core Megatron-LM developer at NVIDIA (322 commits, still active Apr 2026) with direct TransformerEngine FP8 contributions (2 merged PRs, incl
FP32 wgrad fix for weight tying in FP8 training)
Also landed 'FP32 gradient accumulation for subset of params' PR in Megatron-LM — a key FP8 training stability primitive
Senior Applied DL Research Scientist, PhD Stanford CS 2021, based in Bellevue WA
Hireability: MEDIUM — 4.5 years at NVIDIA, extremely active (commits days ago), no outbound signals; long tenure but below 6-year entrenchment threshold

Phuong Ha Nguyen

medium hireability

Applied Researcher@eBay

Previously: Research Fellow @ University of Connecticut

San Francisco, US

Core NVIDIA TransformerEngine contributor with 127 PRs (Oct 2024–Apr 2026), including [JAX] Collective GEMM with FP8 and MXFP8 support, FP8 GEMM precision configuration (TE_FP8_GEMM_HIGH_PRECISION_ACCUMULATION), and collective NCCL operations for multi-GPU FP8 training on Hopper/Ada GPUs. h_index 21
Based in Santa Clara, CA
DB shows eBay but GitHub profile and nvidia.com commit emails confirm current NVIDIA role
Hireability: MEDIUM — ~1.5 years at NVIDIA (since Oct 2024); no explicit signals of looking but within typical transition window

Ritchie Zhao

medium hireability

Senior AI and Machine Learning Engineer@NVIDIA

Previously: Senior Data Science Manager @ Microsoft

Redmond, US

Senior engineer at NVIDIA (Redmond, US) with entire career devoted to narrow-precision ML training and inference: co-author of 'Shared Microexponents' (MXFP8 precursor, 71 citations, 2023), 'Pushing the Limits of Narrow Precision with MSFP' (159 citations, NeurIPS 2020), and 2025 patent on activation compression for neural network training
Also has recent work on activation compression for training (2024–2025)
Cornell PhD 2020 (ECE), h-index 22
Hireability: MEDIUM — no job-seeking signals detected, stable at NVIDIA; no LinkedIn/website changes recorded in pipeline. Career stage (~5 years post-PhD) is within typical transition window

Sai Aparna Aketi

medium hireability

Research Scientist@Meta

Previously: Postdoctoral Researcher @ Meta

San Francisco, US

Low-precision training researcher at Meta (Privacy Preserving ML team, Research Scientist)
Published papers on 8-bit quantization for decentralized distributed training (2021-2022) and mixed precision training via Opacus for LLMs (2025)
Primary expertise is decentralized/federated learning algorithms rather than FP8 training infrastructure specifically; PhD Purdue
Based in SF
Hireability: MEDIUM — website position_update ~8 months ago (Aug 2025), no explicit job-seeking signals detected, unclear tenure at Meta

Vikas Chandra

medium hireability

Senior Director, AI@Meta

Previously: Director, Applied ML @ Arm

San Francisco, US

Senior Director of AI at Meta Reality Labs leading efficient on-device LLMs and quantization research for AR products
Published LLM-QAT (data-free QAT for LLMs, 414 citations), CPT (cyclic precision training, ICLR 2021 Spotlight), and SpinQuant — all directly relevant to FP8 training methodology. h-index 47, PhD CMU ECE, in SF
Hireability: MEDIUM — ~8 years at Meta (long tenure, Senior Director level), but cv_update on personal website in Jan 2026 signals recent career motion

Xing Liu

medium hireability

Principal Research Scientist@Meta

Previously: Research Scientist @ Intel

San Francisco, US

Principal Research Scientist at Meta (SF) with strong HPC and distributed training background — co-designed software-hardware training systems for DLRM at scale (186 citations), contributed to TorchRec and Jagged Flash Attention GPU kernels
No explicit FP8 papers found, but deep expertise in precision-sensitive training efficiency at GPU scale is directly adjacent. h_index 24
Hireability: MEDIUM — ~5 years at Meta (papers from 2021-2024), within transition window; no active mobility signals from pipeline (no LinkedIn changes, no website updates)

Boris Ginsburg

low hireability

Senior Director, Conversational AI@NVIDIA

Previously: Principal Engineer, Deep Learning - HW/SW acceleration @ Intel

San Francisco, US

Pioneer of mixed precision training (co-authored seminal 'Mixed Precision Training' 2018 paper, foundational to all FP8 work) and co-author of Nemotron-H (2025) which introduced an FP8-based training recipe at NVIDIA
Senior Director of Conversational AI at NVIDIA in SF, h-index 46, 12 years at NVIDIA
Hireability: LOW — long-tenured senior director at NVIDIA (12 years), no pipeline signals of job seeking, DB pre-computed hireability is low

Bryan Catanzaro

low hireability

Vice President, Applied Deep Learning Research@NVIDIA

Previously: Senior Researcher @ Baidu

San Francisco, US

VP Applied Deep Learning Research at NVIDIA; co-authored Megatron-LM (2019/2021), NVFP4 pretraining (2025), and Nemotron-4 340B — directly oversees large-scale FP8/low-precision LLM training infrastructure
US-based in Santa Clara, CA
H-index 73
Hireability: LOW — VP-level executive at NVIDIA with no signals of transition (no LinkedIn changes, no website activity, no GitHub activity)

Christian Puhrsch

low hireability

Researcher@Meta

Previously: MS student @ New York University

Core TorchAO contributor (70 commits on pytorch/ao, co-author on TorchAO 2025 paper)
PyTorch-native float8/FP8 training infra including work on torchao/float8 module — FP8 pretraining at 1.5x speedup on 405B-scale models
Researcher at Meta, based in Seattle WA (US)
H-index 8
Hireability: LOW — ~10 years at Meta/Facebook (joined ~2016), no LinkedIn changes, no website activity, no open-to-work signals

Christopher De Sa

low hireability

Researcher@together.ai

Previously: Assistant Professor @ Cornell University

Ithaca, US

Leading researcher in low-precision training and LLM quantization — authored QPyTorch (low-precision arithmetic simulation framework supporting FP8-like formats), SWALP (low-precision training), QuIP#/QTIP (SOTA LLM post-training quantization, 2024)
Associate Professor at Cornell, PhD Stanford 2017, h-index 41
US-based (Ithaca)
Hireability: LOW — tenured faculty with NSF CAREER + DARPA Young Faculty awards and active lab; 'together.ai' in DB is likely a consulting/research collaboration, not a full-time role. No job-seeking signals detected

Jordan Dotzel

low hireability

Student Researcher@Google

Previously: Software Engineer @ Datto

San Francisco, US

Multiple publications on mixed-precision floating-point quantization and LLM numerical formats (FLIQS: Best Paper AutoML 2024; t-distribution formats for LLMs) — directly relevant to FP8 training
PhD Cornell (Computer Systems Lab) June 2025, now Neural Architect at Google Gemini+Cloud Advanced Development in SF
Hireability: LOW — ~9 months into new role at Google after completing PhD, likely still settling in

Kurt Keutzer

low hireability

Co-Founder and Strategic Advisor@SigIQ.ai

Previously: Chief Strategy Officer (CSO) @ Nexusflow

San Francisco, US

Directly relevant: co-authored 'COAT: Memory-Efficient FP8 Training' (2025) plus broad quantization portfolio (KVQuant 296 citations, HAWQ, SqueezeLLM, FGMP)
Professor Emeritus at Berkeley BAIR, h-index 117, in SF
Hireability: LOW — Professor Emeritus status and active Co-Founder/Strategic Advisor at SigIQ.ai (previously founded Deepscale→Tesla, Nexusflow.ai→NVIDIA); pipeline shows no LinkedIn movement or website activity

Mohammad Shoeybi

low hireability

Senior Director of Applied Research@NVIDIA

Previously: Senior Research Engineer - Tech Lead @ DeepMind

San Francisco, US

Co-author of 'FP8 Formats for Deep Learning' (283 citations, 2022) and lead contributor on NVIDIA/Megatron-LM (339 commits) — the canonical large-scale LLM training framework
Senior Director of Applied Research at NVIDIA in SF, h-index 43
Directly on-point for senior FP8 training work
Hireability: LOW — 5+ years at NVIDIA building Megatron-LM since 2019, very senior/entrenched role, no open-to-work signals in pipeline, website, or GitHub

Mostofa Patwary

low hireability

Director of Large Foundational Language Model, Applied Deep Learning Research@NVIDIA

Previously: Principal Research Scientist and Senior Engineering Manager, Applied Deep Learning Research @ NVIDIA

San Francisco, US

Director of Large Foundational LM at NVIDIA Applied Deep Learning Research in SF; 94 commits on NVIDIA/Megatron-LM and co-author of 'Pretraining Large Language Models with NVFP4' (2025), demonstrating direct low-precision pretraining expertise
H-index 43, PI on Megatron-LM papers with 2800+ citations
Hireability: LOW — long-tenured Director at NVIDIA with no signals of intent to leave; actively publishing multiple 2025 Nemotron papers, no LinkedIn changes or website updates detected

Song Han

low hireability

Researcher@NVIDIA

Previously: Assistant Professor @ MIT

Pioneer in FP8/quantized training — authored COAT (FP8 training paper, ICLR 2024/2025), SmoothQuant, and AWQ (MLSys 2024 Best Paper). h-index 79
Active FP4/NVFP4 hardware work (fouroversix repo)
Based in Cambridge, MA (MIT)
Hireability: LOW — tenured Associate Professor at MIT running HAN Lab; OpenReview/DB list NVIDIA affiliation suggesting industry collaboration, but primary role is academic PI with own lab

Tim Dettmers

low hireability

Assistant Professor@Carnegie Mellon University

Previously: Researcher @ Allen Institute for Artificial Intelligence

The definitive expert on low-precision LLM training — created bitsandbytes, authored LLM.int8(), QLoRA, 8-bit Optimizers, and SpQR
FP8 training is a direct extension of his decade-long 8-bit/4-bit quantization research
Assistant Professor at CMU + Research Scientist at AI2, based in US
Hireability: LOW — tenure-track faculty with dual CMU/AI2 appointment, no open-to-work signals, recent Jan 2026 blog posts show active research on coding agents at AI2 with no career transition signals

Vijay Anand Korthikanti

low hireability

Principal Research Scientist@NVIDIA

Previously: Member of Technical Staff @ Cerebras Systems

Hyderabad, IN

Principal Research Scientist at NVIDIA with 150 commits on Megatron-LM and co-author of landmark training papers ('Reducing Activation Recomputation in Large Transformer Models' 410 cites, 'Efficient Large-Scale LM Training on GPU Clusters' 1229 cites, 2025 MoE Parallel Folding)
Deep expertise in distributed LLM training — sequence/expert parallelism, memory optimization, MoE scaling — directly underpins FP8 training infrastructure in Megatron-LM
Still actively committing to Megatron-LM (April 2025, paged attention + H100 clusters)
NOTE: DB location = Hyderabad (IN); commits reference NVIDIA DFW H100 clusters but US physical presence unconfirmed
Hireability: LOW — ~68 months at NVIDIA as Principal RS, no open-to-work signals

Yao Lu

low hireability

Distinguished Research Scientist@NVIDIA

Previously: Principal Research Scientist @ NVIDIA

San Francisco, US

Co-authored COAT (ICLR 2025) — 'Compressing Optimizer states and Activations for Memory-Efficient FP8 Training' — a direct FP8 training paper with dynamic range expansion and mixed-granularity activation quantization, achieving 1.54x memory reduction over BF16
Previously Distinguished Research Scientist at NVIDIA (2023-2025) working on VILA/LongVILA VLM training infra; h-index 35; SF-based
Hireability: LOW — recently moved to Physical Intelligence as Principal Researcher in 2025 (~1 year into new role), no open-to-work signals detected

Yuxiong He

low hireability

Distinguished Engineer & Research Manager@Snowflake

Previously: Partner Research & Product Manager - Cofounder and Leader of DeepSpeed @ Microsoft

Seattle, US

Distinguished Engineer & Research Manager at Snowflake (formerly led Microsoft DeepSpeed team)
Direct FP8 work: ZeroQuant-FP (2023) on W4A8 quantization using FP8 activation; co-authored ZeRO++ (large model training communication), ZeroQuant, and Int4 quantization papers
H-index 54, very senior in the large-scale training and quantization space
US-based (Bellevue, WA)
Hireability: LOW — entrenched in senior leadership at Snowflake, actively announcing research releases (Arctic Inference, Arctic Long Sequence Training), no open-to-work signals on LinkedIn

Runs

#2completed0 qualified / 0 foundApr 27, 1:17 PM

#1completed0 qualified / 0 foundApr 27, 1:01 PM