v2 live test — junior CUDA Triton engineers US

completed51 qualified1 runApr 20, 4:06 PMv2-live-test-junior-cuda-triton-engineers-us

Parsed2 topics · Junior · Engineer · US

Generating seed nodes

0 proposed

Explored 0 queries

0/0 done

Expanding nodes

queued

Qualifying candidates

queued

Qualified Candidates (51)

Aniruddha Nrusimha

high hireability

PhD candidate@MIT

Previously: Undergrad student @ University of California Berkeley

Boston, US

PhD candidate at MIT (Boston, US), h_index=9, efficient deep learning and quantization
GitHub: qat-pretrain repo in CUDA (quantization-aware training at kernel level), flashformer paper (FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference — direct CUDA kernel writing)
Email: anin@mit.edu
Strong hands-on CUDA kernel work combined with quantization research
Hireability: HIGH — MIT PhD candidate (junior), CUDA kernel author (FlashFormer), active commits as of 2024

Coleman Hooper

high hireability

Graduate Student - ML Systems@University of California, Berkeley

Previously: Research Intern @ NVIDIA

San Francisco, US

Graduate Student at UC Berkeley ML Systems (San Francisco, US), h_index=12
Papers focused on LLM inference optimization: speculative decoding (SPEED, QuantSpec), KV cache reduction, quantization
ML Systems grad student = clearly junior (0-4 years)
Work is at the GPU/systems level for LLM inference
Hireability: HIGH — actively publishing in ML Systems/efficient inference, strong Berkeley pedigree, likely graduating soon

Corbin Robeck

high hireability

Active Triton contributor at Meta (31 commits)
Very deep Triton+PyTorch Inductor work: AMD GPU kernel templates (addmm, persistent matmul), MLIR compiler instrumentation of AMD GPU kernels (own repo: instrument-amdgpu-kernels), Triton extension ecosystem (triton-ext), Triton-distributed, TileGym tutorials. 116 total PRs but GitHub account dates to 2015 — career trajectory not clear from public info, but scope of work (Triton compiler internals, release management) suggests a strong mid-level contributor, US-located (Meta)
Not obviously senior/principal
Strong Triton + GPU kernel fit

Genghan Zhang

high hireability

PhD student@Ph.D. student of Computer Science, Stanford University

Previously: Intern @ NVIDIA

PhD student at Stanford, GPU kernel optimization specialist (CUDA + domain-specific languages). h_index 7
Pinned repos include dgSPARSE-Lib (CUDA), AccelOpt (LLM kernel optimization agents), ICML 2025 paper on ML library development
Active GitHub commits as of April 2026
AWS intern experience
Strong GPU kernel background, early-career PhD student = junior tier
US-based (Stanford, CA)

Haocheng Xi

high hireability

MLsys Researcher@University of California, Berkeley

Previously: Research Intern @ Nvidia

Berkeley, US

PhD student at UC Berkeley (ML Systems / Efficient ML)
Pinned repos include CUDA TensorCore HGEMM and 'how-to-optim-algorithm-in-cuda'
Active CUDA/ML systems work: INT4 training, speculative decoding (TriForce at COLM 2024). h_index 7
Berkeley, CA
Early-stage researcher, clearly junior

Kyle Wang

high hireability

Active Triton contributor (61 commits) at AMD, Santa Clara CA
Recent PRs to triton-lang/triton: scale swizzling for GFX1250, MoE Gluon kernel, predicate support in TDM Gather
Focuses on Triton/MLIR/LLVM for AMD GPU backends
Prior DL training/inference background
Appears mid-level (not senior/principal), US location confirmed

Mingyuan MA

high hireability

Software Engineer, LLM Inference Workload Performance@NVIDIA

Previously: Research Collaborator @ sgl-project

San Francisco, US

Recent Berkeley grad (email: mamingyuan2001@berkeley.edu suggests ~2001 birth, graduated ~2023), now SW Engineer at NVIDIA doing LLM Inference Workload Performance in San Francisco. h_index=5
Active research: kvcached (virtualized KV cache), multi-LLM serving (PRISM)
Website updated Jan 2026 with new papers
Junior profile — recently graduated and in first industry role
US (SF)
Strong GPU/inference systems focus

Muyang Li

high hireability

Doctoral Student@Massachusetts Institute of Technology

Previously: Research Intern @ NVIDIA

Boston, US

PhD student at MIT (Cambridge/Boston, US), h_index=10, MIT Han Lab
Research on efficient deep learning and generative models: SVDQuant (4-bit diffusion, ICLR 2025 Spotlight), Sparse VideoGen (accelerating video diffusion with spatial-temporal sparsity), DistriFusion (distributed diffusion inference, CVPR 2024 Highlight)
Work involves low-level GPU optimization for diffusion models
Twitter: @lmxyy1999
Hireability: HIGH — MIT PhD student (junior), Han Lab pedigree (Song Han group), ICLR/CVPR publications in GPU-efficient inference

Samuel Ginzburg

high hireability

PhD Princeton 2024 (new grad)
Google GPU/ML Perf Engineering on Hopper/Blackwell for Gemini
Prior: Research Scientist at Meta (ML Compilers, PyTorch/Triton)
Triton PRs: AMD 2:4 structured sparsity (large, 2368 lines), DotOpInterface refactor, gluon AMD compilation fixes
Active GPU kernel/compiler work, US-based, early career post-PhD

Sijia Chen

high hireability

Researcher@OpenAI

Previously: Researcher @ Meta

Researcher at OpenAI (previously Meta), based in Sunnyvale CA. h_index=3 — junior researcher level
Expertise squarely on target: GPU Kernel Performance, LLMs, Inference Optimization, Attention Optimization
Published 'Fast and Simplex: 2-Simplicial Attention in Triton' (ICLR 2026 submission) and ParetoQ on low-bit LLM quantization (NeurIPS 2025)
CMU background
Strong Triton + CUDA kernel focus

Connor Holmes

medium hireability

Researcher@OpenAI

Previously: Researcher @ Microsoft

San Francisco, US

Researcher @ OpenAI SF (US), h_index=14, expertise in GPGPU and deep learning
Direct Triton contributor (person node)
GitHub: grnn repo in CUDA, Megatron-LM training work
Pinned CUDA repo confirms hands-on kernel work. 'Researcher' (not Senior Researcher/Staff) suggests 2-5 years post-grad
Hireability: MEDIUM — currently at OpenAI which pays well; GPGPU + Triton contributor is strong fit

Da Yan

medium hireability

Member Of Technical Staff@Anthropic

Previously: Independent Contractor @ OpenAI

New York, US

MTS at Anthropic (New York, US), h_index=10
Triton contributor with 44 commits (direct person node)
Bio: 'AI compute & compilers.' GitHub: turingas (NVIDIA Volta/Turing GPU assembler), CUDA-Winograd (Fast CUDA Kernels for ResNet Inference), gas (C++)
Extremely strong hands-on CUDA/Triton kernel and GPU assembly work. h_index=10 and MTS title suggest 3-6 years post-grad — borderline but qualifies as early-career
Hireability: MEDIUM — at Anthropic (top-tier employer), but CUDA assembler work is exceptionally relevant

Jagrit Digani

medium hireability

Machine Learning Engineer@Apple

Previously: Undergraduate Researcher: Optimization and AI @ Davoyan Research Group

San Francisco, US

ML Engineer at Apple (San Francisco, US)
GPU Programming expertise, h_index 5
Active contributor to Apple MLX framework — merged PRs include NAX refactor, M5 Pro/Max kernel tuning, attention mask fix
Works on low-level kernel optimization for Apple Silicon
Appears early-to-mid career (MLE title, not senior/staff)

Maksim Levental

medium hireability

PhD student at University of Chicago (multi-year), currently at Apple working on MLIR/compiler/accelerator architectures. 44 Triton commits; pinned repos include triton-lang/triton, iree-org/iree, llvm/eudsl, mlir-python-extras
Blog active with MLIR content since 2022
US (Cupertino, CA)
Grad student profile fits junior/early-career criteria despite Apple affiliation
Deep low-level GPU/compiler knowledge

Michael Melesse

medium hireability

AMD engineer in New York, self-describes as 'engineer working on ML kernels, mostly in Triton.' 244 PRs, GitHub since Dec 2015 — roughly 8-9 years of industry experience
Actively contributing to ROCm/aiter flash attention Triton backend
US location and strong Triton kernel skills, but seniority exceeds junior threshold

Mit Kotak

medium hireability

Research Assistant@Massachusetts Institute of Technology

Previously: Scientific Software Research Intern @ University of Illinois Urbana-Champaign

Boston, US

MIT PhD student (Research Assistant at MIT, OpenReview confirms MIT PhD program)
Expertise: equivariance, geometric deep learning, hardware, kernels
GitHub pinned: pycuda, cudagraph-thesis, fast_flops (GPU benchmarking), arraycontext. 126 PRs mostly on e3nn and atomicarchitects/nequix. h_index=4
US (Boston)
Active CUDA/GPU kernel interest (cudagraph thesis, pycuda fork)
Early-career PhD student fits junior profile

Pengzhan Zhao

medium hireability

Active Triton AMD backend contributor (45 commits, 52 PRs)
Recent GPU kernel work on MQA flash attention, MXFP precision FA, GLUON scaled_dot fixes — concrete AMD GPU kernel engineering at AMD in SF Bay Area
US-based
No seniority red flags found; contribution level and scope suggest mid/early career

Prajwal Singhania

medium hireability

Graduate Assistant@University of Maryland

Previously: Research Intern @ Microsoft

College Park, US

PhD student @ UMD (grad student explicitly OK per search criteria)
IIT Kharagpur 2020 dual degree + 3 yrs industry before PhD
HPC/GPU focus: CUDA CNN inference repo, NVSHMEM GPU collective comms, SC24 Gordon Bell finalist paper on scalable LLM training
Expertise in Systems for ML and HPC
US (College Park, MD). h_index=6

Qinghao Hu

medium hireability

Postdoctoral Researcher@MIT

Previously: Research Assistant Professor @ Nanyang Technological University

Boston, US

Postdoc @ MIT (Boston, US) working on ML Systems
Strong Triton signal: pinned repo is Liger-Kernel (Efficient Triton Kernels for LLM Training)
Papers include LServe, DeltaZip, efficient LLM serving/systems work. h_index=14, postdoc level = early-career researcher
Hireability: MEDIUM — postdoc at MIT is strong signal of talent; actively building Triton kernels for LLM training

Runyu Lu

medium hireability

PhD student@University of Michigan

ex-Huazhong University of Science and Technology

Ann Arbor, US

PhD student @ UMich (SymbioticLab), HUST CS BS 2020
ML Systems focus with GPU/HPC work: TetriServe (ASPLOS'26), flex attention kernel impl, ColossalAI, vLLM contributions
Ann Arbor MI, US. h_index=2, early career
Active researcher publishing at top venues

Shawn Zhong

medium hireability

5th-year PhD student at UW-Madison (ADSL systems group)
Madison, WI — US-based
Deep Triton compiler contributions: Proton profiler (global timestamp/cross-CTA timeline), AMD backend build fixes, frontend float argument passing bug fix — substantive compiler-level work, not just docs. 9 commits across Triton
Early-career systems researcher with GPU tooling expertise

Hanshi Sun

low hireability

Research Scientist@ByteDance

Previously: Teaching Assistant @ Carnegie Mellon University

Bellevue, US

Research Scientist at ByteDance (Bellevue, US), MS CMU
Contributor to Triton-distributed (ByteDance's distributed Triton compiler) and ShadowKV (ICML 2025 Spotlight — KV cache inference)
Also co-authored TriForce (COLM 2024, speculative decoding)
Strong MLSys focus with direct Triton involvement. ~1yr at ByteDance as Research Scientist; likely early-career post-MS
Hireability: LOW — 11 months into current role, not actively job searching

Yilong Zhao

low hireability

Ph.D. student@University of California, Berkeley

Previously: Research Intern @ ByteDance

Berkeley, US

EECS PhD student at UC Berkeley (Berkeley, US), undergrad at SJTU
Direct CUDA kernel work: pinned repos include Atom (MLSys'24, low-bit quantization — CUDA language), Quest (ICML 2024, KV sparsity — CUDA), FlashInfer (kernel library for LLM serving)
Strong hands-on kernel writing at 1.5yr into PhD
Hireability: LOW — early-stage PhD, not actively job searching

Aaryan Singhal

No note

Abhimanyu Rajeshkumar Bambhaniya

No note

Aidan Do

No note

Alex Hu

No note

Benjamin Spector

No note

Bolian Li

PhD candidate@Purdue University

Previously: Applied Scientist Intern @ Amazon

West Lafayette, US

No note

Dongsheng Yang

No note

Dylan Lim

No note

Edenzzzz

No note

Fanjiang Ye

No note

HamidReza Imani

No note

Hyoungwook Nam

No note

Jake Hyun

PhD Student@Cornell University

Previously: Undergraduate Research Intern @ Seoul National University

New York, US

No note

jordan-benjamin

No note

Manan17

No note

MasterJH5574

No note

PKUWZP

No note

Pratik Pramod Fegade

No note

ryanneph

No note

shivam15s

No note

Stuart Sul

No note

vaibhavjindal

No note

xslingcn

No note

Yashas Samaga

No note

yundai424

No note

yyihuang

No note

Zhengyang Wang

No note

Zifan He

No note

Runs

#1completed53 qualified / 89 foundApr 20, 4:06 PM