Back to dashboard

v2 live test — junior CUDA Triton engineers US

completed51 qualified1 runApr 20, 4:06 PMv2-live-test-junior-cuda-triton-engineers-us
Parsed2 topics · Junior · Engineer · US
Generating seed nodes
0 proposed
Explored 0 queries
0/0 done
    3
    Expanding nodes
    queued
    4
    Qualifying candidates
    queued

    Qualified Candidates (51)

    AN

    Aniruddha Nrusimha

    high hireability

    PhD candidate@MIT

    Previously: Undergrad student @ University of California Berkeley

    Boston, US

    • PhD candidate at MIT (Boston, US), h_index=9, efficient deep learning and quantization
    • GitHub: qat-pretrain repo in CUDA (quantization-aware training at kernel level), flashformer paper (FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference — direct CUDA kernel writing)
    • Email: anin@mit.edu
    • Strong hands-on CUDA kernel work combined with quantization research
    • Hireability: HIGH — MIT PhD candidate (junior), CUDA kernel author (FlashFormer), active commits as of 2024
    CH

    Coleman Hooper

    high hireability

    Graduate Student - ML Systems@University of California, Berkeley

    Previously: Research Intern @ NVIDIA

    San Francisco, US

    • Graduate Student at UC Berkeley ML Systems (San Francisco, US), h_index=12
    • Papers focused on LLM inference optimization: speculative decoding (SPEED, QuantSpec), KV cache reduction, quantization
    • ML Systems grad student = clearly junior (0-4 years)
    • Work is at the GPU/systems level for LLM inference
    • Hireability: HIGH — actively publishing in ML Systems/efficient inference, strong Berkeley pedigree, likely graduating soon
    CR

    Corbin Robeck

    high hireability
    • Active Triton contributor at Meta (31 commits)
    • Very deep Triton+PyTorch Inductor work: AMD GPU kernel templates (addmm, persistent matmul), MLIR compiler instrumentation of AMD GPU kernels (own repo: instrument-amdgpu-kernels), Triton extension ecosystem (triton-ext), Triton-distributed, TileGym tutorials. 116 total PRs but GitHub account dates to 2015 — career trajectory not clear from public info, but scope of work (Triton compiler internals, release management) suggests a strong mid-level contributor, US-located (Meta)
    • Not obviously senior/principal
    • Strong Triton + GPU kernel fit
    GZ

    Genghan Zhang

    high hireability

    PhD student@Ph.D. student of Computer Science, Stanford University

    Previously: Intern @ NVIDIA

    • PhD student at Stanford, GPU kernel optimization specialist (CUDA + domain-specific languages). h_index 7
    • Pinned repos include dgSPARSE-Lib (CUDA), AccelOpt (LLM kernel optimization agents), ICML 2025 paper on ML library development
    • Active GitHub commits as of April 2026
    • AWS intern experience
    • Strong GPU kernel background, early-career PhD student = junior tier
    • US-based (Stanford, CA)
    HX

    Haocheng Xi

    high hireability

    MLsys Researcher@University of California, Berkeley

    Previously: Research Intern @ Nvidia

    Berkeley, US

    • PhD student at UC Berkeley (ML Systems / Efficient ML)
    • Pinned repos include CUDA TensorCore HGEMM and 'how-to-optim-algorithm-in-cuda'
    • Active CUDA/ML systems work: INT4 training, speculative decoding (TriForce at COLM 2024). h_index 7
    • Berkeley, CA
    • Early-stage researcher, clearly junior
    KW

    Kyle Wang

    high hireability
    • Active Triton contributor (61 commits) at AMD, Santa Clara CA
    • Recent PRs to triton-lang/triton: scale swizzling for GFX1250, MoE Gluon kernel, predicate support in TDM Gather
    • Focuses on Triton/MLIR/LLVM for AMD GPU backends
    • Prior DL training/inference background
    • Appears mid-level (not senior/principal), US location confirmed
    MM

    Mingyuan MA

    high hireability

    Software Engineer, LLM Inference Workload Performance@NVIDIA

    Previously: Research Collaborator @ sgl-project

    San Francisco, US

    • Recent Berkeley grad (email: mamingyuan2001@berkeley.edu suggests ~2001 birth, graduated ~2023), now SW Engineer at NVIDIA doing LLM Inference Workload Performance in San Francisco. h_index=5
    • Active research: kvcached (virtualized KV cache), multi-LLM serving (PRISM)
    • Website updated Jan 2026 with new papers
    • Junior profile — recently graduated and in first industry role
    • US (SF)
    • Strong GPU/inference systems focus
    ML

    Muyang Li

    high hireability

    Doctoral Student@Massachusetts Institute of Technology

    Previously: Research Intern @ NVIDIA

    Boston, US

    • PhD student at MIT (Cambridge/Boston, US), h_index=10, MIT Han Lab
    • Research on efficient deep learning and generative models: SVDQuant (4-bit diffusion, ICLR 2025 Spotlight), Sparse VideoGen (accelerating video diffusion with spatial-temporal sparsity), DistriFusion (distributed diffusion inference, CVPR 2024 Highlight)
    • Work involves low-level GPU optimization for diffusion models
    • Twitter: @lmxyy1999
    • Hireability: HIGH — MIT PhD student (junior), Han Lab pedigree (Song Han group), ICLR/CVPR publications in GPU-efficient inference
    SG

    Samuel Ginzburg

    high hireability
    • PhD Princeton 2024 (new grad)
    • Google GPU/ML Perf Engineering on Hopper/Blackwell for Gemini
    • Prior: Research Scientist at Meta (ML Compilers, PyTorch/Triton)
    • Triton PRs: AMD 2:4 structured sparsity (large, 2368 lines), DotOpInterface refactor, gluon AMD compilation fixes
    • Active GPU kernel/compiler work, US-based, early career post-PhD
    SC

    Sijia Chen

    high hireability

    Researcher@OpenAI

    Previously: Researcher @ Meta

    • Researcher at OpenAI (previously Meta), based in Sunnyvale CA. h_index=3 — junior researcher level
    • Expertise squarely on target: GPU Kernel Performance, LLMs, Inference Optimization, Attention Optimization
    • Published 'Fast and Simplex: 2-Simplicial Attention in Triton' (ICLR 2026 submission) and ParetoQ on low-bit LLM quantization (NeurIPS 2025)
    • CMU background
    • Strong Triton + CUDA kernel focus
    CH

    Connor Holmes

    medium hireability

    Researcher@OpenAI

    Previously: Researcher @ Microsoft

    San Francisco, US

    • Researcher @ OpenAI SF (US), h_index=14, expertise in GPGPU and deep learning
    • Direct Triton contributor (person node)
    • GitHub: grnn repo in CUDA, Megatron-LM training work
    • Pinned CUDA repo confirms hands-on kernel work. 'Researcher' (not Senior Researcher/Staff) suggests 2-5 years post-grad
    • Hireability: MEDIUM — currently at OpenAI which pays well; GPGPU + Triton contributor is strong fit
    DY

    Da Yan

    medium hireability

    Member Of Technical Staff@Anthropic

    Previously: Independent Contractor @ OpenAI

    New York, US

    • MTS at Anthropic (New York, US), h_index=10
    • Triton contributor with 44 commits (direct person node)
    • Bio: 'AI compute & compilers.' GitHub: turingas (NVIDIA Volta/Turing GPU assembler), CUDA-Winograd (Fast CUDA Kernels for ResNet Inference), gas (C++)
    • Extremely strong hands-on CUDA/Triton kernel and GPU assembly work. h_index=10 and MTS title suggest 3-6 years post-grad — borderline but qualifies as early-career
    • Hireability: MEDIUM — at Anthropic (top-tier employer), but CUDA assembler work is exceptionally relevant
    JD

    Jagrit Digani

    medium hireability

    Machine Learning Engineer@Apple

    Previously: Undergraduate Researcher: Optimization and AI @ Davoyan Research Group

    San Francisco, US

    • ML Engineer at Apple (San Francisco, US)
    • GPU Programming expertise, h_index 5
    • Active contributor to Apple MLX framework — merged PRs include NAX refactor, M5 Pro/Max kernel tuning, attention mask fix
    • Works on low-level kernel optimization for Apple Silicon
    • Appears early-to-mid career (MLE title, not senior/staff)
    ML

    Maksim Levental

    medium hireability
    • PhD student at University of Chicago (multi-year), currently at Apple working on MLIR/compiler/accelerator architectures. 44 Triton commits; pinned repos include triton-lang/triton, iree-org/iree, llvm/eudsl, mlir-python-extras
    • Blog active with MLIR content since 2022
    • US (Cupertino, CA)
    • Grad student profile fits junior/early-career criteria despite Apple affiliation
    • Deep low-level GPU/compiler knowledge
    MM

    Michael Melesse

    medium hireability
    • AMD engineer in New York, self-describes as 'engineer working on ML kernels, mostly in Triton.' 244 PRs, GitHub since Dec 2015 — roughly 8-9 years of industry experience
    • Actively contributing to ROCm/aiter flash attention Triton backend
    • US location and strong Triton kernel skills, but seniority exceeds junior threshold
    MK

    Mit Kotak

    medium hireability

    Research Assistant@Massachusetts Institute of Technology

    Previously: Scientific Software Research Intern @ University of Illinois Urbana-Champaign

    Boston, US

    • MIT PhD student (Research Assistant at MIT, OpenReview confirms MIT PhD program)
    • Expertise: equivariance, geometric deep learning, hardware, kernels
    • GitHub pinned: pycuda, cudagraph-thesis, fast_flops (GPU benchmarking), arraycontext. 126 PRs mostly on e3nn and atomicarchitects/nequix. h_index=4
    • US (Boston)
    • Active CUDA/GPU kernel interest (cudagraph thesis, pycuda fork)
    • Early-career PhD student fits junior profile
    PZ

    Pengzhan Zhao

    medium hireability
    • Active Triton AMD backend contributor (45 commits, 52 PRs)
    • Recent GPU kernel work on MQA flash attention, MXFP precision FA, GLUON scaled_dot fixes — concrete AMD GPU kernel engineering at AMD in SF Bay Area
    • US-based
    • No seniority red flags found; contribution level and scope suggest mid/early career
    PS

    Prajwal Singhania

    medium hireability

    Graduate Assistant@University of Maryland

    Previously: Research Intern @ Microsoft

    College Park, US

    • PhD student @ UMD (grad student explicitly OK per search criteria)
    • IIT Kharagpur 2020 dual degree + 3 yrs industry before PhD
    • HPC/GPU focus: CUDA CNN inference repo, NVSHMEM GPU collective comms, SC24 Gordon Bell finalist paper on scalable LLM training
    • Expertise in Systems for ML and HPC
    • US (College Park, MD). h_index=6
    QH

    Qinghao Hu

    medium hireability

    Postdoctoral Researcher@MIT

    Previously: Research Assistant Professor @ Nanyang Technological University

    Boston, US

    • Postdoc @ MIT (Boston, US) working on ML Systems
    • Strong Triton signal: pinned repo is Liger-Kernel (Efficient Triton Kernels for LLM Training)
    • Papers include LServe, DeltaZip, efficient LLM serving/systems work. h_index=14, postdoc level = early-career researcher
    • Hireability: MEDIUM — postdoc at MIT is strong signal of talent; actively building Triton kernels for LLM training
    RL

    Runyu Lu

    medium hireability

    PhD student@University of Michigan

    ex-Huazhong University of Science and Technology

    Ann Arbor, US

    • PhD student @ UMich (SymbioticLab), HUST CS BS 2020
    • ML Systems focus with GPU/HPC work: TetriServe (ASPLOS'26), flex attention kernel impl, ColossalAI, vLLM contributions
    • Ann Arbor MI, US. h_index=2, early career
    • Active researcher publishing at top venues
    SZ

    Shawn Zhong

    medium hireability
    • 5th-year PhD student at UW-Madison (ADSL systems group)
    • Madison, WI — US-based
    • Deep Triton compiler contributions: Proton profiler (global timestamp/cross-CTA timeline), AMD backend build fixes, frontend float argument passing bug fix — substantive compiler-level work, not just docs. 9 commits across Triton
    • Early-career systems researcher with GPU tooling expertise
    HS

    Hanshi Sun

    low hireability

    Research Scientist@ByteDance

    Previously: Teaching Assistant @ Carnegie Mellon University

    Bellevue, US

    • Research Scientist at ByteDance (Bellevue, US), MS CMU
    • Contributor to Triton-distributed (ByteDance's distributed Triton compiler) and ShadowKV (ICML 2025 Spotlight — KV cache inference)
    • Also co-authored TriForce (COLM 2024, speculative decoding)
    • Strong MLSys focus with direct Triton involvement. ~1yr at ByteDance as Research Scientist; likely early-career post-MS
    • Hireability: LOW — 11 months into current role, not actively job searching
    YZ

    Yilong Zhao

    low hireability

    Ph.D. student@University of California, Berkeley

    Previously: Research Intern @ ByteDance

    Berkeley, US

    • EECS PhD student at UC Berkeley (Berkeley, US), undergrad at SJTU
    • Direct CUDA kernel work: pinned repos include Atom (MLSys'24, low-bit quantization — CUDA language), Quest (ICML 2024, KV sparsity — CUDA), FlashInfer (kernel library for LLM serving)
    • Strong hands-on kernel writing at 1.5yr into PhD
    • Hireability: LOW — early-stage PhD, not actively job searching
    AS

    Aaryan Singhal

    No note
    AB

    Abhimanyu Rajeshkumar Bambhaniya

    No note
    AD

    Aidan Do

    No note
    AH

    Alex Hu

    No note
    BS

    Benjamin Spector

    No note
    BL

    Bolian Li

    PhD candidate@Purdue University

    Previously: Applied Scientist Intern @ Amazon

    West Lafayette, US

    No note
    DY

    Dongsheng Yang

    No note
    DL

    Dylan Lim

    No note
    ED

    Edenzzzz

    No note
    FY

    Fanjiang Ye

    No note
    HI

    HamidReza Imani

    No note
    HN

    Hyoungwook Nam

    No note
    JH

    Jake Hyun

    PhD Student@Cornell University

    Previously: Undergraduate Research Intern @ Seoul National University

    New York, US

    No note
    JO

    jordan-benjamin

    No note
    MA

    Manan17

    No note
    MA

    MasterJH5574

    No note
    PK

    PKUWZP

    No note
    PF

    Pratik Pramod Fegade

    No note
    RY

    ryanneph

    No note
    SH

    shivam15s

    No note
    SS

    Stuart Sul

    No note
    VA

    vaibhavjindal

    No note
    XS

    xslingcn

    No note
    YS

    Yashas Samaga

    No note
    YU

    yundai424

    No note
    YY

    yyihuang

    No note
    ZW

    Zhengyang Wang

    No note
    ZH

    Zifan He

    No note

    Runs

    #1completed53 qualified / 89 foundApr 20, 4:06 PM