stage2-sonnet e2e #4 — junior CUDA/Triton GPU kernel engineers in the US

completed8 qualified1 runApr 21, 3:33 PMstage2-sonnet-e2e-4-junior-cudatriton-gpu-kernel-engineers-i

Parsed3 topics · Junior · Engineer · United States

Generating seed nodes

0 proposed

Explored 0 queries

0/0 done

Expanding nodes

queued

Qualifying candidates

queued

Qualified Candidates (8)

Alex L Zhang

medium hireability

Researcher@Sakana AI

Previously: Member of Technical Staff @ VantAI

Tokyo, JP

Core GPU MODE leaderboard team member at MIT CSAIL; Triton FlashAttention2 implementation (custom masks), contributor to gpu-mode/reference-kernels, KernelBench (ICML 2025) on LLM-written GPU kernels, and KernelBot (CODEML @ ICML 2025 Spotlight) on heterogeneous GPU code competition platform
Princeton CS '24, now PhD student at MIT; in US (Cambridge, MA)
Hireability: MEDIUM — just started PhD (~2024), but extremely active in GPU industry ecosystem (running NVIDIA Blackwell + AMD $100k-$1M competitions), with multiple position_updates on personal site in early 2025 suggesting active career motion

Hermann Kumbong

medium hireability

Research at Stanford Artificial Intelligence Laboratory (SAIL)@Stanford CS PhD

Previously: Research @ NVIDIA

Stanford, US

Co-first author of FlashFFTConv (ICLR 2024, 38 citations) — GPU kernel paper using Tensor Cores + kernel fusion for FFT convolutions, achieving 8.7x speedup over PyTorch
Research expertise includes High Performance Computing
Stanford CS PhD at SAIL in US
Hireability: MEDIUM-HIGH — ~4-5 years into PhD (first ML papers 2023), likely nearing graduation and entering job market

Oliver Rausch

medium hireability

Member of Technical Staff@Anthropic

Previously: Co-Founder @ Stealth Startup

San Francisco, US

DL compilers and scaling specialist at Anthropic (SF)
BS ETH Zurich 2021 + MS Oxford 2022, no PhD
Built daceml (fastest ML compiler, competed with PyTorch/TF/JAX), worked on ONNX Runtime architectural extensions at Microsoft Research
GitHub bio: 'DL compilers and scaling'
Directly matches 'low level compilers' axis of the query
Likely <3 years FTE
Hireability: MEDIUM — ~2-3 years at Anthropic, within transition window, no active job signals

Anne Ouyang

low hireability

Founder@Standard Kernel

Previously: Deep Learning Engineer @ NVIDIA

San Francisco, US

Direct CUDA/GPU kernel expertise — wrote CUDA kernels on NVIDIA cuDNN team, first-author on KernelBench (GPU kernel benchmark, 23 citations), founded Standard Kernel (GPU kernel startup)
Stanford CS PhD student (on leave from PhD) in SF
Hireability: LOW — currently Founder at Standard Kernel, on leave from PhD specifically to build this company; no signals of pivoting away

Carlo Baronio

low hireability

Research Engineer@Cognition

Previously: Undergraduate Researcher @ Stanford University

San Francisco, US

Published 'Kevin: Multi-Turn RL for Generating CUDA Kernels' (2025, 5 citations) at Cognition AI — trains models to generate and optimize CUDA kernels with correctness + speedup evals
Previously Undergraduate Researcher at Hazy Research (Flash Attention/ThunderKittens lab) at Stanford
BS Math Stanford (class of 2027)
Hireability: LOW — ~12 months at Cognition (new hire), still an undergrad until 2027, open_to_work=false on LinkedIn, no job-seeking signals

Jiaming Tang

low hireability

Ph.D. student@MIT

Previously: Undergraduate researcher @ SJTU EPCC Lab

Boston, US

Strong CUDA/MLSys work at MIT Han Lab — Quest (ICML 2024, CUDA sparse attention kernels), AWQ (MLSys 2024 Best Paper, LLM quantization acceleration), OliVe (ISCA 2023, hardware-friendly quantization)
GitHub bio 'MLSys & Algo.' and pinned Quest repo (Cuda language)
Based in Cambridge, MA
Hireability: LOW — only 2nd-year PhD student (started ~2024), recently added RA position at Physical Intelligence; not on job market yet

Pietro Marsella

low hireability

Research Intern@Cognition

Previously: ML Compiler Researcher @ Stanford University

San Francisco, US

Published 'Kevin: Multi-Turn RL for Generating CUDA Kernels' (arXiv:2507.11948, ICML 2025 workshop) — directly authored work on CUDA kernel generation and optimization using RL
Stanford Math undergrad (no PhD), SF-based
Hireability: LOW — pipeline signals show LinkedIn title change from Research Intern to Member of Technical Staff at Cognition (scraped Feb 2026), likely <6 months in new role as of April 2026

Silas Alberti

low hireability

Founding Team@Cognition

Previously: Student Researcher @ DeepMind

San Francisco, US

Co-authored 'Kevin: Multi-Turn RL for Generating CUDA Kernels' (2025, OpenReview) — directly on-topic CUDA kernel generation research from a Stanford AI PhD
Based in SF, US
Hireability: LOW — Founding Team at Cognition (Devin AI startup), a deep startup commitment that makes near-term availability unlikely

Runs

#1completed0 qualified / 0 foundApr 21, 3:33 PM