Junior GPU kernel engineers in the US with CUDA/Triton experience

completed3 qualified1 runApr 21, 8:56 PMjunior-gpu-kernel-engineers-in-the-us-with-cudatriton-experi-1776804972

Parsed3 topics · Junior · Engineer · US

Generating seed nodes

0 proposed

Explored 0 queries

0/0 done

Expanding nodes

queued

Qualifying candidates

queued

Qualified Candidates (3)

medium hireability

CS PhD Student@Snowflake AI Research

Previously: Research Scientist Intern @ Snowflake

Strong GPU kernel match — owns CUDA kernel repo (softmax-argmax fused kernel, not a fork), co-authored Korch paper on optimal kernel orchestration for tensor programs (ASPLOS 2024), forks flashinfer CUDA kernel library, and has multiple CUDA stream/kernel PRs in FlexFlow
Research focus is ML systems + parallel computing + GPU kernel optimization at CMU CATALYST lab
US-based (Pittsburgh, PA)
Hireability: MEDIUM — 4th year CMU PhD with expected graduation 2027, currently Research Intern at Snowflake AI Research; ~1.5 years from graduation, signaling industry interest but not yet in final-push job market mode

medium hireability

Ph.D. student@Carnegie Mellon University

Previously: MS student @ Carnegie Mellon University

Pittsburgh, US

PhD student at CMU Catalyst (advised by Zhihao Jia) building LLM inference systems
Pinned FlashInfer (CUDA kernel library for LLM serving) on GitHub — direct hands-on CUDA kernel work
OSDI 2025 paper on Mirage tensor program superoptimizer (GPU kernel optimization), plus papers at ASPLOS 2024 (SpecInfer), ICLR 2025 (TidalDecode)
Research expertise: ML Systems
Pittsburgh, US
Hireability: MEDIUM — advanced PhD student with strong publication record (ASPLOS/ICLR/ICML/OSDI), website says 'open to collaboration'; LinkedIn profile went private in Jan 2026 (ambiguous signal, possibly nearing graduation)

medium hireability

Ph.D. student@Carnegie Mellon University

Previously: Research Intern @ Meta

New York, US

LLM inference systems researcher at CMU (Beidi Chen + Zhihao Jia lab)
Core papers: SpecInfer (381 citations), MagicPIG LSH attention (46 citations), Sequoia speculative decoding (69 citations)
Deep GPU memory systems expertise; forks flash-linear-attention and flex-block-attn suggesting Triton familiarity, but public repos are Python-only — no explicit CUDA/Triton kernel code
Adjacent to kernel engineering rather than a direct kernel author
Based in US
Hireability: MEDIUM — 3rd year PhD at CMU (started 2023), not yet in final-year transition window; CV update 67 days ago; active with Meta FAIR internship in 2025

#1completed0 qualified / 0 foundApr 21, 8:56 PM