Senior Triton contributor (131 commits on openai/triton), authored 'Linear Layouts From First Principles' (Jan 2026 blog post on GPU memory swizzling/tensor layouts)
GitHub bio: 'performance optimization, accelerators like GPUs'
No company listed on GitHub profile as of Apr 2026, resume updated March 2026 — appears actively job searching post-OpenAI
AMD work account alefimov-amd created April 2023 (~2 years FTE)
No PhD signals — engineering role
Location unconfirmed: no GitHub location set, but commit timing consistent with US Pacific hours; caution advised given Russian name and co-author connections
Hireability: MEDIUM — ~2 years at AMD, within typical transition window, no open-to-work signals
AK
Avinash Kumar
medium hireability
PhD student@Google, (PhD, UC Irvine)
Previously: Research Associate @ AMD
Austin, US
ECE PhD student at UT Austin, AMD PEAC Research Associate (Performant & Efficient AI Computing — LLM security/performance trade-offs) and multiple prior NVIDIA roles (GPU Power Architect, Datacenter Architect Intern)
Co-author of a GEMM Kernels intern at AMD
GPU architecture and hardware acceleration background is adjacent to CUDA kernel engineering; DB papers (all database systems) appear to belong to a different namesake — LinkedIn/pipeline data is the reliable signal here
Hireability: MEDIUM — active PhD student recently named Amazon Fellow, currently engaged via AMD Research, likely 1-2 years from graduation
JN
Jeff Niu
medium hireability
Core maintainer of triton-lang/triton (1 of 8), 271+ commits on openai/triton with daily active Gluon contributions targeting H100/GB300 Blackwell (attention, MoE kernels, MLIR/NVPTX backend)
Also contributes to llvm/llvm-project NVPTX backend
UWaterloo (Waterloop affiliation), likely BEng ~2022, no PhD
Commits in Pacific Time — US West Coast
Hireability: MEDIUM — ~3-4 years into career (within transition window), no open-to-work signals but tenure aligns with typical switching window
KW
Kyle Wang
medium hireability
61 commits on openai/triton; active contributor to ROCm/triton (MoE kernels, async_copy, pingpong, GFX1250 targets) and ROCm/iris (GEMM+ReduceScatter workgroup specialization)
Also merged LLVM VectorCombine PR (Aug 2025)
Based in Santa Clara, CA
Hireability: MEDIUM — no company listed on GitHub, bio says 'Previously working on deep learning training and inference systems' suggesting a recent transition; no explicit 'open to work' signals, but steady ROCm contributions through March 2026 suggest active employment at AMD or close partner
LZ
Lixun Zhang
medium hireability
Active Triton contributor (59 commits on openai/triton + many merged PRs to ROCm/triton and ROCm/aiter): deep GPU kernel compiler work covering GEMM, mxfp4 matmul, flashattention optimization, and TDM descriptor lowering for AMD gfx1250/CDNA3/CDNA4
At AMD in Austin TX — strong ROCm/HIP GPU kernel background highly transferable to CUDA
Hireability: MEDIUM — currently employed at AMD with PR merged 4 days ago (Apr 23 2026); no explicit seeking signals, but AMD has had significant layoffs and tenure is unknown
NR
Nick Riasanovsky
medium hireability
Active Triton contributor (31 commits to openai/triton, AMD GPU kernel optimizations — ping-pong scheduler, buffer ops, GEMM passes) and contributor to facebookexperimental/triton at Meta; works on MSLK (Meta Superintelligence Labs Kernels — CUDA/HIP GPU ops for GenAI)
No PhD
US location unconfirmed but Meta GPU kernel roles are US-based
Hireability: MEDIUM — ~15 months at Meta (joined ~early 2025 per first Triton commit Mar 2025), slightly past new-hire window; previously at Bodo AI (compiler engineering). Strong technical match for junior CUDA kernel role
PZ
Pengzhan Zhao
medium hireability
Active AMD Triton contributor (45 commits on openai/triton, recent merged PRs on ROCm/triton adding mxfp flash attention kernel for MI350, ongoing [AMD][gfx1250] GLUON kernel work on triton-lang)
Bio: 'working GPU compiler and kernels.' UCLA alum (~2022 CS), SF Bay Area
Hireability: MEDIUM — ~2-3 years at AMD in GPU kernel role, within transition window; no explicit open-to-work signals
Works at OpenAI in SF; also merged MLIR/ROCDL patches into llvm/llvm-project
Directly relevant to GPU kernel engineering via the Triton framework
Hireability: MEDIUM — ~2-3 years at OpenAI (prestigious and high-comp, likely sticky), no explicit open-to-work signals, but squarely in the typical transition window
CR
Corbin Robeck
low hireability
Active Triton compiler contributor (31+ commits on openai/triton) at Meta; built persistent matmul Triton templates for AMD GPUs in PyTorch Inductor (merged PRs April 2026), added MLIR/ROCDL FP8 conversion instructions for GFX950 in llvm-project, and created LLVM/MLIR instrumentation tooling for AMD GPU kernels (CRobeck/instrument-amdgpu-kernels, being productized by AMD)
Strong GPU kernel compiler background, US-based at Meta
Hireability: LOW — very recently active at Meta (PRs merged this week), no open-to-work signals detected
DY
Da Yan
low hireability
Member Of Technical Staff@Anthropic
Previously: Independent Contractor @ OpenAI
New York, US
Deep GPU compiler specialist: 44 commits on openai/triton, authored `turingas` (NVIDIA Volta/Turing GPU assembler, 241 stars), CUDA kernel optimization work (Winograd convolutions)
DB expertise: 'GPU performance optimizing, GPU compiler'
NYC, US
Hireability: LOW — currently MTS at Anthropic working on AI compute & compilers, no signals of transition (no LinkedIn changes, no website activity, no open-to-work signals)
ML
Maksim Levental
low hireability
Active Triton compiler contributor (44+ commits) working on GPU pipelining, loop scheduling, and AMD GFX950/GFX942 backend optimizations in MLIR/TritonGPU
PhD UChicago 2020-2024, now at Apple in Cupertino working on accelerator architectures
Also contributes to LLVM/MLIR and IREE — deep CUDA/GPU compiler expertise bridging kernel optimization and hardware architecture
Location: Cupertino, CA (US)
Hireability: LOW — likely joined Apple ~2024 (recent PhD grad, <2 years in role), posted 'my team is hiring!' on LinkedIn Dec 2025 indicating stable employment and not personally looking
NE
neildhar
low hireability
Active Triton compiler contributor with 200+ PRs on triton-lang/triton and 51 commits on openai/triton; works on core MLIR dialect internals (rematerialization cost heuristics, allowReorder reshapes, CatOp) and LLVM/build infrastructure
Previously at Meta (Hermes JS engine GC/JIT, through June 2025), now at OpenAI
Forks of llvm-project, pytorch, tritonbench confirm deep compiler stack focus
Seniority uncertain — work quality suggests 2-4 years FTE but no LinkedIn to confirm <5yr threshold
Hireability: LOW — likely joined OpenAI ~6-9 months ago (Meta Hermes activity ended June 2025, Triton activity began Jan 2026), still settling in
PA
pawelszczerbuk
low hireability
Active Triton contributor (161 commits, multiple PRs merged April 2026)
Works on FPSan (floating-point sanitizer) and ConSan (concurrency sanitizer) for Triton GPU kernel compiler — includes wgmma (Hopper), mma.sync (Ampere), and MLIR loop pipelining in llvm-project