Back to dashboard

junior CUDA kernel engineers in the US

completed14 qualified1 runApr 27, 11:53 AMjunior-cuda-kernel-engineers-in-the-us
Parsed2 topics · Junior · Engineer · US
Generating seed nodes
0 proposed
Explored 0 queries
0/0 done
    3
    Expanding nodes
    queued
    4
    Qualifying candidates
    queued

    Qualified Candidates (14)

    JL

    Justin Lebar

    high hireability
    • Senior Triton contributor (131 commits on openai/triton), authored 'Linear Layouts From First Principles' (Jan 2026 blog post on GPU memory swizzling/tensor layouts)
    • GitHub bio: 'performance optimization, accelerators like GPUs'
    • No company listed on GitHub profile as of Apr 2026, resume updated March 2026 — appears actively job searching post-OpenAI
    • In SF, CA
    • Seniority note: likely staff/senior level (Google → Waymo → OpenAI, 10+ yrs), may exceed 'junior' target
    • Hireability: HIGH — no current employer, recent resume update is strong open-to-work signal
    AE

    Alexander Efimov

    medium hireability
    • Active AMD compiler engineer on ROCm/Triton backend — 86+ merged PRs implementing MFMA layouts, LDS optimization passes, LinearLayout conversions (AMD GPU kernel compiler work)
    • AMD work account alefimov-amd created April 2023 (~2 years FTE)
    • No PhD signals — engineering role
    • Location unconfirmed: no GitHub location set, but commit timing consistent with US Pacific hours; caution advised given Russian name and co-author connections
    • Hireability: MEDIUM — ~2 years at AMD, within typical transition window, no open-to-work signals
    AK

    Avinash Kumar

    medium hireability

    PhD student@Google, (PhD, UC Irvine)

    Previously: Research Associate @ AMD

    Austin, US

    • ECE PhD student at UT Austin, AMD PEAC Research Associate (Performant & Efficient AI Computing — LLM security/performance trade-offs) and multiple prior NVIDIA roles (GPU Power Architect, Datacenter Architect Intern)
    • Co-author of a GEMM Kernels intern at AMD
    • GPU architecture and hardware acceleration background is adjacent to CUDA kernel engineering; DB papers (all database systems) appear to belong to a different namesake — LinkedIn/pipeline data is the reliable signal here
    • Hireability: MEDIUM — active PhD student recently named Amazon Fellow, currently engaged via AMD Research, likely 1-2 years from graduation
    JN

    Jeff Niu

    medium hireability
    • Core maintainer of triton-lang/triton (1 of 8), 271+ commits on openai/triton with daily active Gluon contributions targeting H100/GB300 Blackwell (attention, MoE kernels, MLIR/NVPTX backend)
    • Also contributes to llvm/llvm-project NVPTX backend
    • UWaterloo (Waterloop affiliation), likely BEng ~2022, no PhD
    • Commits in Pacific Time — US West Coast
    • Hireability: MEDIUM — ~3-4 years into career (within transition window), no open-to-work signals but tenure aligns with typical switching window
    KW

    Kyle Wang

    medium hireability
    • 61 commits on openai/triton; active contributor to ROCm/triton (MoE kernels, async_copy, pingpong, GFX1250 targets) and ROCm/iris (GEMM+ReduceScatter workgroup specialization)
    • Also merged LLVM VectorCombine PR (Aug 2025)
    • Based in Santa Clara, CA
    • Hireability: MEDIUM — no company listed on GitHub, bio says 'Previously working on deep learning training and inference systems' suggesting a recent transition; no explicit 'open to work' signals, but steady ROCm contributions through March 2026 suggest active employment at AMD or close partner
    LZ

    Lixun Zhang

    medium hireability
    • Active Triton contributor (59 commits on openai/triton + many merged PRs to ROCm/triton and ROCm/aiter): deep GPU kernel compiler work covering GEMM, mxfp4 matmul, flashattention optimization, and TDM descriptor lowering for AMD gfx1250/CDNA3/CDNA4
    • At AMD in Austin TX — strong ROCm/HIP GPU kernel background highly transferable to CUDA
    • Hireability: MEDIUM — currently employed at AMD with PR merged 4 days ago (Apr 23 2026); no explicit seeking signals, but AMD has had significant layoffs and tenure is unknown
    NR

    Nick Riasanovsky

    medium hireability
    • Active Triton contributor (31 commits to openai/triton, AMD GPU kernel optimizations — ping-pong scheduler, buffer ops, GEMM passes) and contributor to facebookexperimental/triton at Meta; works on MSLK (Meta Superintelligence Labs Kernels — CUDA/HIP GPU ops for GenAI)
    • No PhD
    • US location unconfirmed but Meta GPU kernel roles are US-based
    • Hireability: MEDIUM — ~15 months at Meta (joined ~early 2025 per first Triton commit Mar 2025), slightly past new-hire window; previously at Bodo AI (compiler engineering). Strong technical match for junior CUDA kernel role
    PZ

    Pengzhan Zhao

    medium hireability
    • Active AMD Triton contributor (45 commits on openai/triton, recent merged PRs on ROCm/triton adding mxfp flash attention kernel for MI350, ongoing [AMD][gfx1250] GLUON kernel work on triton-lang)
    • Bio: 'working GPU compiler and kernels.' UCLA alum (~2022 CS), SF Bay Area
    • Hireability: MEDIUM — ~2-3 years at AMD in GPU kernel role, within transition window; no explicit open-to-work signals
    ZM

    Zahi Moudallal

    medium hireability
    • Active Triton compiler contributor (101 commits, AMD backend + frontend work)
    • Works at OpenAI in SF; also merged MLIR/ROCDL patches into llvm/llvm-project
    • Directly relevant to GPU kernel engineering via the Triton framework
    • Hireability: MEDIUM — ~2-3 years at OpenAI (prestigious and high-comp, likely sticky), no explicit open-to-work signals, but squarely in the typical transition window
    CR

    Corbin Robeck

    low hireability
    • Active Triton compiler contributor (31+ commits on openai/triton) at Meta; built persistent matmul Triton templates for AMD GPUs in PyTorch Inductor (merged PRs April 2026), added MLIR/ROCDL FP8 conversion instructions for GFX950 in llvm-project, and created LLVM/MLIR instrumentation tooling for AMD GPU kernels (CRobeck/instrument-amdgpu-kernels, being productized by AMD)
    • Strong GPU kernel compiler background, US-based at Meta
    • Hireability: LOW — very recently active at Meta (PRs merged this week), no open-to-work signals detected
    DY

    Da Yan

    low hireability

    Member Of Technical Staff@Anthropic

    Previously: Independent Contractor @ OpenAI

    New York, US

    • Deep GPU compiler specialist: 44 commits on openai/triton, authored `turingas` (NVIDIA Volta/Turing GPU assembler, 241 stars), CUDA kernel optimization work (Winograd convolutions)
    • DB expertise: 'GPU performance optimizing, GPU compiler'
    • NYC, US
    • Hireability: LOW — currently MTS at Anthropic working on AI compute & compilers, no signals of transition (no LinkedIn changes, no website activity, no open-to-work signals)
    ML

    Maksim Levental

    low hireability
    • Active Triton compiler contributor (44+ commits) working on GPU pipelining, loop scheduling, and AMD GFX950/GFX942 backend optimizations in MLIR/TritonGPU
    • PhD UChicago 2020-2024, now at Apple in Cupertino working on accelerator architectures
    • Also contributes to LLVM/MLIR and IREE — deep CUDA/GPU compiler expertise bridging kernel optimization and hardware architecture
    • Location: Cupertino, CA (US)
    • Hireability: LOW — likely joined Apple ~2024 (recent PhD grad, <2 years in role), posted 'my team is hiring!' on LinkedIn Dec 2025 indicating stable employment and not personally looking
    NE

    neildhar

    low hireability
    • Active Triton compiler contributor with 200+ PRs on triton-lang/triton and 51 commits on openai/triton; works on core MLIR dialect internals (rematerialization cost heuristics, allowReorder reshapes, CatOp) and LLVM/build infrastructure
    • Previously at Meta (Hermes JS engine GC/JIT, through June 2025), now at OpenAI
    • Forks of llvm-project, pytorch, tritonbench confirm deep compiler stack focus
    • Seniority uncertain — work quality suggests 2-4 years FTE but no LinkedIn to confirm <5yr threshold
    • Hireability: LOW — likely joined OpenAI ~6-9 months ago (Meta Hermes activity ended June 2025, Triton activity began Jan 2026), still settling in
    PA

    pawelszczerbuk

    low hireability
    • Active Triton contributor (161 commits, multiple PRs merged April 2026)
    • Works on FPSan (floating-point sanitizer) and ConSan (concurrency sanitizer) for Triton GPU kernel compiler — includes wgmma (Hopper), mma.sync (Ampere), and MLIR loop pipelining in llvm-project
    • Commit metadata (hostname: codex-gb201-0.brix.pawelszczerbuk.svc.cluster.local, co-author: Codex <noreply@openai.com>) strongly suggests current OpenAI employee
    • Hireability: LOW — likely settled at OpenAI doing exactly this work, ~2 years into role; no open-to-work signals

    Runs

    #1completed0 qualified / 0 foundApr 27, 11:53 AM