Back to dashboard

Junior GPU kernel engineers in the US with CUDA/Triton experience

completed7 qualified1 runApr 21, 5:05 PMjunior-gpu-kernel-engineers-in-the-us-with-cudatriton-experi-1776791114
Parsed3 topics · Junior · Engineer · United States
Generating seed nodes
0 proposed
Explored 0 queries
0/0 done
    3
    Expanding nodes
    queued
    4
    Qualifying candidates
    queued

    Qualified Candidates (7)

    AT

    Aditya Tomar

    medium hireability

    Undergraduate Student@UC Berkeley

    Previously: Researcher @ PSSG

    US

    • 3rd-year EECS undergrad at UC Berkeley doing GPU systems research at BAIR under Kurt Keutzer — QuantSpec (ICML 2025) and XQuant for LLM inference optimization, SC 2024 Gordon Bell Prize Finalist for scalable GPU supercomputer training
    • Core work is GPU systems/HPC rather than explicit CUDA/Triton kernel writing, but strong adjacent GPU exposure and upcoming NVIDIA Applied DL Research internship (May–Aug 2026)
    • Hireability: MEDIUM — still undergrad (graduating ~2026/2027), committed to NVIDIA internship through Aug 2026; best window is summer 2027 internship or post-graduation full-time
    DA

    Daiyaan Arfeen

    medium hireability

    PhD student@Graduate Student, Carnegie Mellon University

    Previously: Deep Learning Architecture Intern @ NVIDIA

    San Francisco, US

    • GPU systems PhD at CMU PDL — papers on GPU utilization (PipeFill, MLSys 2025), LLM serving acceleration (SpecInfer, ASPLOS 2024, 386 citations), and ML cluster scheduling (Sia, SOSP 2023)
    • Work is GPU systems-level rather than explicit CUDA/Triton kernel writing, but demonstrates deep GPU architecture knowledge
    • US-based
    • Hireability: MEDIUM — long PhD publication record spanning 2018-2025 (likely 5-7+ years), possibly nearing graduation, but no active job-search signals from pipeline or website
    GO

    Gabriele Oliaro

    medium hireability

    CS PhD Student@Snowflake AI Research

    Previously: Research Scientist Intern @ Snowflake

    • ML systems PhD at CMU with GPU kernel relevance — Korch paper (ASPLOS '24) on optimal kernel orchestration for tensor programs, and FlexFlow C++/CUDA distributed training framework (contributor)
    • SpecInfer (415 citations) for LLM inference acceleration on GPUs. h_index 10
    • Based in Pittsburgh, PA / Snowflake internship in San Mateo, CA (US)
    • Hireability: MEDIUM — 4th year PhD, expected graduation 2027, currently on Snowflake AI Research internship (~2 years); approaching final stretch but not yet in the prime transition window
    HX

    Haocheng Xi

    medium hireability

    MLsys Researcher@University of California, Berkeley

    Previously: Research Intern @ Nvidia

    Berkeley, US

    • Strong GPU kernel engineer — CUDA repos pinned (how-to-optim-algorithm-in-cuda, cuda-tensorcore-hgemm), multiple NVIDIA internships on FP8/INT8 training (COAT at ICLR 2025), PhD at UC Berkeley on ML sys/Efficient ML
    • Based in Berkeley, CA
    • Hireability: MEDIUM — 2nd-year PhD student (started 2024), very active on GitHub (commit 2026-04-20), two NVIDIA internships show industry engagement, but likely 2+ years from full-time market; strong intern candidate now
    HM

    Hiva Mohammadzadeh

    low hireability

    Machine Learning Engineer | Data Scientist@IntuigenceAI

    Previously: Machine Learning Engineer @ Algoverse

    San Francisco, US

    • Co-authored KVQuant (NeurIPS 2024, 322 citations) which developed custom CUDA kernels for KV cache quantization achieving ~1.7x speedups on LLaMA-7B; also co-authored Squeezed Attention and SPEED papers on LLM inference acceleration
    • BS EECS UC Berkeley, currently enrolled in Stanford MSCS AI/Systems (2025-2027) while working as ML Engineer at IntuigenceAI in SF
    • Hireability: LOW — only 6-8 months into a 2-year Stanford MS program (graduates 2027); no open-to-work signals; simultaneously working part-time at IntuigenceAI
    XL

    Xiuyu Li

    low hireability

    PhD candidate@Berkeley AI Research (BAIR) at UC Berkeley

    Previously: Research Consultant @ Together AI

    San Francisco, US

    • Strong CUDA/GPU kernel background via TorchSparse and TorchSparse++ (sparse convolution on GPUs, 144+67 citations, MICRO'23/MLSys'22)
    • PhD from UC Berkeley BAIR, now MTS at xAI focused on coding RL/infra — more senior than 'junior' label but has direct GPU kernel CUDA experience
    • US (Bay Area) ✓
    • Hireability: LOW — pipeline signals show she just transitioned from Research Consultant at Together AI → MTS at xAI (scraped 2026-02-05), ~2.5 months into new role
    ZZ

    Zhihao Zhang

    low hireability

    Ph.D. student@Carnegie Mellon University

    Previously: MS student @ Carnegie Mellon University

    Pittsburgh, US

    • Active CUDA kernel contributor to mirage-project/mirage: implemented Blackwell (SM100) linear kernels with TMA+Epilogue pipelines, MoE kernels with expert-balanced GeMM, and PTX sync optimizations (Oct-Dec 2025)
    • Also authored CUDA kernel impl for TidalDecode sparse attention paper (2024)
    • PhD at CMU Catalyst under Zhihao Jia, Pittsburgh US
    • Hireability: LOW — appears to have just joined Lithos AI startup (lithos-ai/motus PRs merged April 11-21 2026, <2 weeks ago), very likely a new hire

    Runs

    #1completed0 qualified / 0 foundApr 21, 5:05 PM