junior CUDA kernel engineers in the US

completed14 qualified1 runApr 27, 11:53 AMjunior-cuda-kernel-engineers-in-the-us

Parsed2 topics · Junior · Engineer · US

Generating seed nodes

0 proposed

Explored 0 queries

0/0 done

Expanding nodes

queued

Qualifying candidates

queued

Qualified Candidates (14)

Justin Lebar

high hireability

Senior Triton contributor (131 commits on openai/triton), authored 'Linear Layouts From First Principles' (Jan 2026 blog post on GPU memory swizzling/tensor layouts)
GitHub bio: 'performance optimization, accelerators like GPUs'
No company listed on GitHub profile as of Apr 2026, resume updated March 2026 — appears actively job searching post-OpenAI
In SF, CA
Seniority note: likely staff/senior level (Google → Waymo → OpenAI, 10+ yrs), may exceed 'junior' target
Hireability: HIGH — no current employer, recent resume update is strong open-to-work signal

Alexander Efimov

medium hireability

Active AMD compiler engineer on ROCm/Triton backend — 86+ merged PRs implementing MFMA layouts, LDS optimization passes, LinearLayout conversions (AMD GPU kernel compiler work)
AMD work account alefimov-amd created April 2023 (~2 years FTE)
No PhD signals — engineering role
Location unconfirmed: no GitHub location set, but commit timing consistent with US Pacific hours; caution advised given Russian name and co-author connections
Hireability: MEDIUM — ~2 years at AMD, within typical transition window, no open-to-work signals

Avinash Kumar

medium hireability

PhD student@Google, (PhD, UC Irvine)

Previously: Research Associate @ AMD

Austin, US

ECE PhD student at UT Austin, AMD PEAC Research Associate (Performant & Efficient AI Computing — LLM security/performance trade-offs) and multiple prior NVIDIA roles (GPU Power Architect, Datacenter Architect Intern)
Co-author of a GEMM Kernels intern at AMD
GPU architecture and hardware acceleration background is adjacent to CUDA kernel engineering; DB papers (all database systems) appear to belong to a different namesake — LinkedIn/pipeline data is the reliable signal here
Hireability: MEDIUM — active PhD student recently named Amazon Fellow, currently engaged via AMD Research, likely 1-2 years from graduation

Jeff Niu

medium hireability

Core maintainer of triton-lang/triton (1 of 8), 271+ commits on openai/triton with daily active Gluon contributions targeting H100/GB300 Blackwell (attention, MoE kernels, MLIR/NVPTX backend)
Also contributes to llvm/llvm-project NVPTX backend
UWaterloo (Waterloop affiliation), likely BEng ~2022, no PhD
Commits in Pacific Time — US West Coast
Hireability: MEDIUM — ~3-4 years into career (within transition window), no open-to-work signals but tenure aligns with typical switching window

Kyle Wang

medium hireability

61 commits on openai/triton; active contributor to ROCm/triton (MoE kernels, async_copy, pingpong, GFX1250 targets) and ROCm/iris (GEMM+ReduceScatter workgroup specialization)
Also merged LLVM VectorCombine PR (Aug 2025)
Based in Santa Clara, CA
Hireability: MEDIUM — no company listed on GitHub, bio says 'Previously working on deep learning training and inference systems' suggesting a recent transition; no explicit 'open to work' signals, but steady ROCm contributions through March 2026 suggest active employment at AMD or close partner

Lixun Zhang

medium hireability

Active Triton contributor (59 commits on openai/triton + many merged PRs to ROCm/triton and ROCm/aiter): deep GPU kernel compiler work covering GEMM, mxfp4 matmul, flashattention optimization, and TDM descriptor lowering for AMD gfx1250/CDNA3/CDNA4
At AMD in Austin TX — strong ROCm/HIP GPU kernel background highly transferable to CUDA
Hireability: MEDIUM — currently employed at AMD with PR merged 4 days ago (Apr 23 2026); no explicit seeking signals, but AMD has had significant layoffs and tenure is unknown

Nick Riasanovsky

medium hireability

Active Triton contributor (31 commits to openai/triton, AMD GPU kernel optimizations — ping-pong scheduler, buffer ops, GEMM passes) and contributor to facebookexperimental/triton at Meta; works on MSLK (Meta Superintelligence Labs Kernels — CUDA/HIP GPU ops for GenAI)
No PhD
US location unconfirmed but Meta GPU kernel roles are US-based
Hireability: MEDIUM — ~15 months at Meta (joined ~early 2025 per first Triton commit Mar 2025), slightly past new-hire window; previously at Bodo AI (compiler engineering). Strong technical match for junior CUDA kernel role

Pengzhan Zhao

medium hireability

Active AMD Triton contributor (45 commits on openai/triton, recent merged PRs on ROCm/triton adding mxfp flash attention kernel for MI350, ongoing [AMD][gfx1250] GLUON kernel work on triton-lang)
Bio: 'working GPU compiler and kernels.' UCLA alum (~2022 CS), SF Bay Area
Hireability: MEDIUM — ~2-3 years at AMD in GPU kernel role, within transition window; no explicit open-to-work signals

Zahi Moudallal

medium hireability

Active Triton compiler contributor (101 commits, AMD backend + frontend work)
Works at OpenAI in SF; also merged MLIR/ROCDL patches into llvm/llvm-project
Directly relevant to GPU kernel engineering via the Triton framework
Hireability: MEDIUM — ~2-3 years at OpenAI (prestigious and high-comp, likely sticky), no explicit open-to-work signals, but squarely in the typical transition window

Corbin Robeck

low hireability

Active Triton compiler contributor (31+ commits on openai/triton) at Meta; built persistent matmul Triton templates for AMD GPUs in PyTorch Inductor (merged PRs April 2026), added MLIR/ROCDL FP8 conversion instructions for GFX950 in llvm-project, and created LLVM/MLIR instrumentation tooling for AMD GPU kernels (CRobeck/instrument-amdgpu-kernels, being productized by AMD)
Strong GPU kernel compiler background, US-based at Meta
Hireability: LOW — very recently active at Meta (PRs merged this week), no open-to-work signals detected

Da Yan

low hireability

Member Of Technical Staff@Anthropic

Previously: Independent Contractor @ OpenAI

New York, US

Deep GPU compiler specialist: 44 commits on openai/triton, authored `turingas` (NVIDIA Volta/Turing GPU assembler, 241 stars), CUDA kernel optimization work (Winograd convolutions)
DB expertise: 'GPU performance optimizing, GPU compiler'
NYC, US
Hireability: LOW — currently MTS at Anthropic working on AI compute & compilers, no signals of transition (no LinkedIn changes, no website activity, no open-to-work signals)

Maksim Levental

low hireability

Active Triton compiler contributor (44+ commits) working on GPU pipelining, loop scheduling, and AMD GFX950/GFX942 backend optimizations in MLIR/TritonGPU
PhD UChicago 2020-2024, now at Apple in Cupertino working on accelerator architectures
Also contributes to LLVM/MLIR and IREE — deep CUDA/GPU compiler expertise bridging kernel optimization and hardware architecture
Location: Cupertino, CA (US)
Hireability: LOW — likely joined Apple ~2024 (recent PhD grad, <2 years in role), posted 'my team is hiring!' on LinkedIn Dec 2025 indicating stable employment and not personally looking

neildhar

low hireability

Active Triton compiler contributor with 200+ PRs on triton-lang/triton and 51 commits on openai/triton; works on core MLIR dialect internals (rematerialization cost heuristics, allowReorder reshapes, CatOp) and LLVM/build infrastructure
Previously at Meta (Hermes JS engine GC/JIT, through June 2025), now at OpenAI
Forks of llvm-project, pytorch, tritonbench confirm deep compiler stack focus
Seniority uncertain — work quality suggests 2-4 years FTE but no LinkedIn to confirm <5yr threshold
Hireability: LOW — likely joined OpenAI ~6-9 months ago (Meta Hermes activity ended June 2025, Triton activity began Jan 2026), still settling in

pawelszczerbuk

low hireability

Active Triton contributor (161 commits, multiple PRs merged April 2026)
Works on FPSan (floating-point sanitizer) and ConSan (concurrency sanitizer) for Triton GPU kernel compiler — includes wgmma (Hopper), mma.sync (Ampere), and MLIR loop pipelining in llvm-project
Commit metadata (hostname: codex-gb201-0.brix.pawelszczerbuk.svc.cluster.local, co-author: Codex <noreply@openai.com>) strongly suggests current OpenAI employee
Hireability: LOW — likely settled at OpenAI doing exactly this work, ~2 years into role; no open-to-work signals

Runs

#1completed0 qualified / 0 foundApr 27, 11:53 AM