Junior CUDA / GPU kernel engineers in the US

completed5 qualified1 runApr 28, 9:02 AMjunior-cuda-gpu-kernel-engineers-in-the-us-1777366932

ParsedJunior

Generating seed nodes

0 proposed

Explored 0 queries

0/0 done

Expanding nodes

queued

Qualifying candidates

queued

Qualified Candidates (5)

medium hireability

CS PhD Student@Snowflake AI Research

Previously: Research Scientist Intern @ Snowflake

ML systems researcher at CMU (4th-year PhD) with GPU parallel computing focus
Co-author of Korch (kernel orchestration for tensor programs, ASPLOS 2024) and active FlexFlow contributor (C++/CUDA distributed DNN training framework)
Work touches low-level GPU execution through systems research; primary focus is LLM serving/speculative decoding rather than direct CUDA kernel writing
US-based (Pittsburgh / San Mateo)
Hireability: MEDIUM — 4th-year PhD, expected graduation ~2027, currently interning at Snowflake AI Research, no explicit job search signals detected

medium hireability

Ph.D. student@Carnegie Mellon University

Previously: MS student @ Carnegie Mellon University

Pittsburgh, US

GPU kernel and LLM inference systems PhD student at CMU Catalyst (supervised by Zhihao Jia)
Pinned CUDA repo: FlashInfer (kernel library for LLM serving); also forked TVM
Papers at OSDI/ASPLOS/ICLR on speculative decoding and sparse attention kernels (SpecInfer, TidalDecode)
US (Pittsburgh)
Hireability: MEDIUM — LinkedIn profile went completely blank in Jan 2026 scrape (all experience/education/headline cleared), possibly post-PhD transition; no explicit open-to-work signal but dramatic profile change suggests career motion

medium hireability

Ph.D. student@Carnegie Mellon University

Previously: Research Intern @ Meta

New York, US

LLM inference systems researcher in Beidi Chen's (FlashAttention) lab at CMU, working on speculative decoding (SpecInfer, 381 cites) and GPU-efficient attention (MagicPIG: LSH Sampling)
Research sits at the systems/algorithms layer above raw CUDA kernel writing, but Beidi Chen's lab does substantial GPU kernel work so direct CUDA exposure is likely
US-based
Hireability: MEDIUM — year-3 PhD (started 2023 at CMU), not yet in final-year job market window; Meta FAIR internship in 2025 shows active industry engagement; Feb 2026 website update was styling-only with no job-search signals

low hireability

GEMM Kernels intern@AMD

Previously: Research Intern @ Together AI

Austin, US

GEMM Kernels intern at AMD (Austin), 2nd-year ECE Master's at UT Austin focused on computing systems for large-scale AI
ICML 2025 paper on constant-sized KV caches (MorphKV); ArXiv preprint on memory-efficient context parallelism (Untied Ulysses) from Together.ai internship; flash-attention fork pinned on GitHub
Directly relevant GPU/kernel background
Hireability: LOW — GitHub profile now shows NVIDIA + Bengaluru, indicating recent hire in India post-Master's; location mismatch for US requirement though DB still shows Austin

low hireability

PhD student@Google, (PhD, UC Irvine)

Previously: Research Associate @ AMD

Austin, US

ECE PhD student at UT Austin (started 2025) with AMD Research affiliation, focused on GPU benchmarking, computer architecture, and AI hardware acceleration. 3 years at NVIDIA prior to PhD, CUDA Python certified, co-author of GEMM kernel researcher (Ravi Ghadia at AMD)
Based in Austin, TX
Data note: DB conflates two people — LinkedIn avinkumar2020 is the AMD/GPU candidate; avinash0161 GitHub/website belong to a different UC Irvine databases person
Hireability: LOW — just started PhD (2025-2028) with Amazon AI PhD Fellowship (Aug 2025); strong AMD Research ties and fellowship obligations make near-term availability unlikely

#1completed0 qualified / 0 foundApr 28, 9:02 AM