Back to dashboard

stage2-sonnet e2e #4 — junior CUDA/Triton GPU kernel engineers in the US

completed8 qualified1 runApr 21, 3:33 PMstage2-sonnet-e2e-4-junior-cudatriton-gpu-kernel-engineers-i
Parsed3 topics · Junior · Engineer · United States
Generating seed nodes
0 proposed
Explored 0 queries
0/0 done
    3
    Expanding nodes
    queued
    4
    Qualifying candidates
    queued

    Qualified Candidates (8)

    AZ

    Alex L Zhang

    medium hireability

    Researcher@Sakana AI

    Previously: Member of Technical Staff @ VantAI

    Tokyo, JP

    • Core GPU MODE leaderboard team member at MIT CSAIL; Triton FlashAttention2 implementation (custom masks), contributor to gpu-mode/reference-kernels, KernelBench (ICML 2025) on LLM-written GPU kernels, and KernelBot (CODEML @ ICML 2025 Spotlight) on heterogeneous GPU code competition platform
    • Princeton CS '24, now PhD student at MIT; in US (Cambridge, MA)
    • Hireability: MEDIUM — just started PhD (~2024), but extremely active in GPU industry ecosystem (running NVIDIA Blackwell + AMD $100k-$1M competitions), with multiple position_updates on personal site in early 2025 suggesting active career motion
    HK

    Hermann Kumbong

    medium hireability

    Research at Stanford Artificial Intelligence Laboratory (SAIL)@Stanford CS PhD

    Previously: Research @ NVIDIA

    Stanford, US

    • Co-first author of FlashFFTConv (ICLR 2024, 38 citations) — GPU kernel paper using Tensor Cores + kernel fusion for FFT convolutions, achieving 8.7x speedup over PyTorch
    • Research expertise includes High Performance Computing
    • Stanford CS PhD at SAIL in US
    • Hireability: MEDIUM-HIGH — ~4-5 years into PhD (first ML papers 2023), likely nearing graduation and entering job market
    OR

    Oliver Rausch

    medium hireability

    Member of Technical Staff@Anthropic

    Previously: Co-Founder @ Stealth Startup

    San Francisco, US

    • DL compilers and scaling specialist at Anthropic (SF)
    • BS ETH Zurich 2021 + MS Oxford 2022, no PhD
    • Built daceml (fastest ML compiler, competed with PyTorch/TF/JAX), worked on ONNX Runtime architectural extensions at Microsoft Research
    • GitHub bio: 'DL compilers and scaling'
    • Directly matches 'low level compilers' axis of the query
    • Likely <3 years FTE
    • Hireability: MEDIUM — ~2-3 years at Anthropic, within transition window, no active job signals
    AO

    Anne Ouyang

    low hireability

    Founder@Standard Kernel

    Previously: Deep Learning Engineer @ NVIDIA

    San Francisco, US

    • Direct CUDA/GPU kernel expertise — wrote CUDA kernels on NVIDIA cuDNN team, first-author on KernelBench (GPU kernel benchmark, 23 citations), founded Standard Kernel (GPU kernel startup)
    • Stanford CS PhD student (on leave from PhD) in SF
    • Hireability: LOW — currently Founder at Standard Kernel, on leave from PhD specifically to build this company; no signals of pivoting away
    CB

    Carlo Baronio

    low hireability

    Research Engineer@Cognition

    Previously: Undergraduate Researcher @ Stanford University

    San Francisco, US

    • Published 'Kevin: Multi-Turn RL for Generating CUDA Kernels' (2025, 5 citations) at Cognition AI — trains models to generate and optimize CUDA kernels with correctness + speedup evals
    • Previously Undergraduate Researcher at Hazy Research (Flash Attention/ThunderKittens lab) at Stanford
    • BS Math Stanford (class of 2027)
    • Hireability: LOW — ~12 months at Cognition (new hire), still an undergrad until 2027, open_to_work=false on LinkedIn, no job-seeking signals
    JT

    Jiaming Tang

    low hireability

    Ph.D. student@MIT

    Previously: Undergraduate researcher @ SJTU EPCC Lab

    Boston, US

    • Strong CUDA/MLSys work at MIT Han Lab — Quest (ICML 2024, CUDA sparse attention kernels), AWQ (MLSys 2024 Best Paper, LLM quantization acceleration), OliVe (ISCA 2023, hardware-friendly quantization)
    • GitHub bio 'MLSys & Algo.' and pinned Quest repo (Cuda language)
    • Based in Cambridge, MA
    • Hireability: LOW — only 2nd-year PhD student (started ~2024), recently added RA position at Physical Intelligence; not on job market yet
    PM

    Pietro Marsella

    low hireability

    Research Intern@Cognition

    Previously: ML Compiler Researcher @ Stanford University

    San Francisco, US

    • Published 'Kevin: Multi-Turn RL for Generating CUDA Kernels' (arXiv:2507.11948, ICML 2025 workshop) — directly authored work on CUDA kernel generation and optimization using RL
    • Stanford Math undergrad (no PhD), SF-based
    • Hireability: LOW — pipeline signals show LinkedIn title change from Research Intern to Member of Technical Staff at Cognition (scraped Feb 2026), likely <6 months in new role as of April 2026
    SA

    Silas Alberti

    low hireability

    Founding Team@Cognition

    Previously: Student Researcher @ DeepMind

    San Francisco, US

    • Co-authored 'Kevin: Multi-Turn RL for Generating CUDA Kernels' (2025, OpenReview) — directly on-topic CUDA kernel generation research from a Stanford AI PhD
    • Based in SF, US
    • Hireability: LOW — Founding Team at Cognition (Devin AI startup), a deep startup commitment that makes near-term availability unlikely

    Runs

    #1completed0 qualified / 0 foundApr 21, 3:33 PM