Back to dashboard

Neuralace (Sabi) · Systolic Inference Chip Researcher

completed101 qualified1 runMay 7, 1:34 PMcompany-name-neuralace-sabi-locations-usa-europe-china-india-1778160889
ParsedNeuralace · 6 topics · Researcher · no PhD · USA, Europe, China, India
Generating seed nodes
0 proposed
Explored 0 queries
0/0 done
    3
    Expanding nodes
    queued
    4
    Qualifying candidates
    queued

    Qualified Candidates (100)

    AB

    Abhimanyu Rajeshkumar Bambhaniya

    high hireability

    Research Intern@Meta

    Previously: Intern @ Google

    San Francisco, US

    35
    Systolic Array Design75
    Power-Efficient Chip Architecture75
    Multimodal LLM Inference30
    Hybrid LUT-GEMM20
    DVFS / Power Management5
    MX / Microscaled Precision5
    Strengths
    Subgraph stationary HW-SW co-design (MLSys 2023) — fixed-topology inference dataflow design
    AlphaICs GLUON AI accelerator tapeout — 16nm TSMC real-chip experience
    Gaps
    No MX4/MX6/MX9 or microscaled numeric format work
    …click to see all
    AT

    Albert Tseng

    high hireability
    20
    MX / Microscaled Precision72
    Hybrid LUT-GEMM30
    Systolic Array Design5
    Multimodal LLM Inference5
    Power-Efficient Chip Architecture5
    DVFS / Power Management0
    Strengths
    MXFP4 paper (AISTATS 2025): trained LLMs with block-scaled FP4, 2x vs FP8
    QuIP# top contributor (36 commits): 2-bit lattice-codebook LLM quantization
    Gaps
    No hardware or ASIC design — work is purely algorithm-level, not silicon
    …click to see all
    CL

    Changwoo Lee

    high hireability

    Graduate Student Research Assistant@University of Michigan

    Previously: Research Intern @ DeepMind

    Ann Arbor, US

    20
    Power-Efficient Chip Architecture60
    Multimodal LLM Inference20
    Hybrid LUT-GEMM15
    DVFS / Power Management12
    Systolic Array Design10
    MX / Microscaled Precision3
    Strengths
    VLSI 2022: 22nm SoC tapeout, 10 TOPS/W multimodal AI chip
    AIMMI 2024: low-power SoC with on-chip MRAM for IoT inference
    Gaps
    No published systolic array or dataflow architecture work
    …click to see all
    CH

    Coleman Richard Charles Hooper

    high hireability

    Graduate Student - ML Systems@University of California, Berkeley

    Previously: Research Intern @ NVIDIA

    San Francisco, US

    38
    MX / Microscaled Precision72
    Power-Efficient Chip Architecture65
    DVFS / Power Management30
    Hybrid LUT-GEMM28
    Systolic Array Design20
    Multimodal LLM Inference12
    Strengths
    FGMP: NVFP4 (FP4 microscaling) mixed-precision quantization, co-authored with NVIDIA chip team
    SqueezeLLM: 3-bit dense-and-sparse GEMM, 2.3x A6000 speedup (ICML 2024)
    Gaps
    No direct systolic array design papers — quantization researcher, not chip architect
    …click to see all
    CG

    Cong Guo

    high hireability

    Postdoctoral Associate@Duke University

    Previously: Research intern @ Shanghai Qi Zhi Institute

    Durham, US

    51
    Systolic Array Design80
    MX / Microscaled Precision60
    Hybrid LUT-GEMM55
    Power-Efficient Chip Architecture55
    Multimodal LLM Inference50
    DVFS / Power Management5
    Strengths
    Transitive Array (ISCA 2025): GEMM accelerator with result reuse — systolic array design
    ANT (MICRO 2022, IEEE Top Picks): adaptive float/int numeric type for low-bit quantization
    Gaps
    No DVFS or power-budget management papers — key axis uncovered
    …click to see all
    HQ

    Haotong Qin

    high hireability

    Postdoctoral Researcher@ETH Zürich

    Previously: Research Scientist @ ByteDance

    Zurich, CH

    17
    Multimodal LLM Inference40
    MX / Microscaled Precision35
    Power-Efficient Chip Architecture20
    Systolic Array Design5
    Hybrid LUT-GEMM0
    DVFS / Power Management0
    Strengths
    BiLLM (ICML 2024): 1-bit PTQ for LLMs — top-tier extreme weight compression
    Qwen3-Quantization repo + empirical study — direct work on target model family
    Gaps
    No systolic array or ASIC chip design work whatsoever
    …click to see all
    HC

    Hongzheng Chen

    high hireability

    Ph.D. Candidate@Cornell University

    Previously: Undergrad student @ SUN YAT-SEN UNIVERSITY

    Ithaca, US

    14
    Power-Efficient Chip Architecture25
    Systolic Array Design20
    Multimodal LLM Inference20
    MX / Microscaled Precision8
    Hybrid LUT-GEMM5
    DVFS / Power Management5
    Strengths
    FPGA spatial acceleration for LLM inference (arxiv:2312.15159, 2023)
    Allo: composable HLS accelerator design language (PLDI'24)
    Gaps
    No systolic array ASIC design — all FPGA/HLS, no chip tapeout
    …click to see all
    JL

    Junyang Lin

    high hireability

    Research Scientist@Qwen

    Previously: Staff Engineer @ Alibaba

    Beijing, CN

    18
    Multimodal LLM Inference95
    MX / Microscaled Precision5
    Power-Efficient Chip Architecture5
    Hybrid LUT-GEMM0
    Systolic Array Design0
    DVFS / Power Management0
    Strengths
    QwenLM/Qwen3 Tech Lead — built exact model Neuralace deploys on chip
    Qwen-VL + Qwen-Audio + Qwen2-VL: full vision+speech multimodal pipeline
    Gaps
    No chip design background — no ASIC, systolic array, or Verilog work
    …click to see all
    LL

    Lian Liu

    high hireability

    PhD, Institut of Computing Technology

    Ashburn, US

    22
    Power-Efficient Chip Architecture45
    Multimodal LLM Inference30
    MX / Microscaled Precision30
    Systolic Array Design15
    Hybrid LUT-GEMM5
    DVFS / Power Management5
    Strengths
    COMET W4A4KV4 quantization — aggressive LLM compression at ASPLOS 2025
    DNA: Dynamic Neural Network Accelerator — IEEE TC 2025 (top hardware journal)
    Gaps
    No evidence of systolic array architecture design specifically
    …click to see all
    MU

    Muhammad Umar

    high hireability

    PhD student@Cornell University

    8
    MX / Microscaled Precision25
    Systolic Array Design10
    Power-Efficient Chip Architecture10
    Multimodal LLM Inference5
    Hybrid LUT-GEMM0
    DVFS / Power Management0
    Strengths
    FLIQS: mixed-precision FP8+INT quantization, Jouppi (TPU) co-author
    GuardNN: DNN accelerator architecture paper, 74 citations (DAC 2022)
    Gaps
    No published systolic array or fixed-weight inference chip design work
    …click to see all
    ML

    Muyang Li

    high hireability

    Doctoral Student@Massachusetts Institute of Technology

    Previously: Research Intern @ NVIDIA

    Boston, US

    14
    Hybrid LUT-GEMM35
    Multimodal LLM Inference20
    MX / Microscaled Precision20
    Systolic Array Design5
    Power-Efficient Chip Architecture5
    DVFS / Power Management0
    Strengths
    444 commits to nunchaku — primary INT4/INT8 CUDA kernel library author
    SVDQuant (ICLR 2025 Spotlight): 4-bit diffusion model quantization
    Gaps
    No chip design / RTL / ASIC tapeout experience — software-side only
    …click to see all
    PA

    Pierre Abillama

    high hireability

    Graduate Student Research Assistant, EECS, University of Michigan

    Previously: Intern @ IBM

    39
    Power-Efficient Chip Architecture82
    DVFS / Power Management58
    Multimodal LLM Inference40
    MX / Microscaled Precision35
    Systolic Array Design15
    Hybrid LUT-GEMM5
    Strengths
    22nm 25.08 TOPS/W transformer accelerator tapeout — VLSI 2025
    Two-stage task-adaptive power management — DVFS-adjacent chip control
    Gaps
    No explicit systolic array design work found
    …click to see all
    QZ

    Qilin Zheng

    high hireability

    Duke University

    22
    Power-Efficient Chip Architecture75
    MX / Microscaled Precision20
    Multimodal LLM Inference15
    Systolic Array Design10
    Hybrid LUT-GEMM5
    DVFS / Power Management5
    Strengths
    TFSRAM: 249.8 TOPS/W compute-in-SRAM neuromorphic engine (IEEE TCAS-AI 2024)
    DIANA SoC: end-to-end energy-efficient digital/analog hybrid NN chip (ISSCC 2022)
    Gaps
    Core work is CIM/PIM, not systolic array architecture
    …click to see all
    SA

    Saleh Ashkboos

    high hireability

    Research Assistant@ETH Zürich

    Previously: Research Intern @ Apple

    Zurich, CH

    25
    MX / Microscaled Precision85
    Hybrid LUT-GEMM28
    Power-Efficient Chip Architecture20
    Multimodal LLM Inference8
    Systolic Array Design5
    DVFS / Power Management3
    Strengths
    ICLR26: Microscaling FP4 Quantization paper -- direct MX format work
    Quartet (NeurIPS25): FP4 native training for LLMs
    Gaps
    No systolic array or ASIC chip design experience
    …click to see all
    SZ

    Shulin Zeng

    high hireability

    Postdoc@Post Doc, Tsinghua University

    ex-Tsinghua University

    Beijing, CN

    21
    Power-Efficient Chip Architecture48
    Multimodal LLM Inference22
    Hybrid LUT-GEMM20
    MX / Microscaled Precision18
    Systolic Array Design15
    DVFS / Power Management3
    Strengths
    FlightLLM (2024, 121 cites): end-to-end FPGA LLM inference mapping
    FMC-LLM/CD-LLM (2025): 70B+ batched decoding on multi-FPGA
    Gaps
    FPGA only — no ASIC/tapeout or custom systolic array chip design
    …click to see all
    WS

    William Andrew Simon

    high hireability

    Research Scientist on In-Memory Computing@IBM

    Previously: PhD student @ EPFL - EPF Lausanne

    Zurich, CH

    23
    Power-Efficient Chip Architecture72
    Multimodal LLM Inference32
    Systolic Array Design12
    DVFS / Power Management8
    MX / Microscaled Precision8
    Hybrid LUT-GEMM3
    Strengths
    CICC 2025 invited: analog AI hardware for low-latency transformer inference
    BLADE (129 cites): in-cache compute chip for edge AI
    Gaps
    Analog CIM paradigm (PCM/conductance) — skills don't directly map to digital systolic RTL
    …click to see all
    ZZ

    Zhekai Zhang

    high hireability
    39
    Power-Efficient Chip Architecture80
    Systolic Array Design65
    Multimodal LLM Inference40
    MX / Microscaled Precision30
    Hybrid LUT-GEMM15
    DVFS / Power Management5
    Strengths
    LEGO (HPCA 2025): spatial accelerator auto-generation, 2.4x energy vs. Gemmini
    SpAtten-Chip ASIC tapeout — won DAC 2023 demo competition
    Gaps
    No MX/MSFP format work — uses W4A4/W4A8KV4, not Microsoft MX standard
    …click to see all
    AR

    Abbas Rahimi

    medium hireability

    Research Staff Member@IBM

    Previously: Postdoctoral Researcher @ UC Berkeley

    Zurich, CH

    15
    Power-Efficient Chip Architecture62
    Multimodal LLM Inference10
    Systolic Array Design5
    DVFS / Power Management5
    MX / Microscaled Precision3
    Hybrid LUT-GEMM2
    Strengths
    "Efficient scaling of LLMs with MoE + 3D analog in-memory" (2025)
    5μW HD accelerator ASIC — sub-W AI inference chip tapeout
    Gaps
    No systolic array design work — entirely analog CIM / hyperdimensional paradigm
    …click to see all
    AB

    Andrea Bejarano-Carbo

    medium hireability

    University of Michigan

    14
    Power-Efficient Chip Architecture65
    Multimodal LLM Inference10
    Systolic Array Design5
    DVFS / Power Management5
    Hybrid LUT-GEMM0
    MX / Microscaled Precision0
    Strengths
    AIMMI (JSSC 2024): multimodal audio+image SoC, low-power inference
    H.264/AVC accelerator IC (JSSC 2023): algorithm-hardware co-design
    Gaps
    No systolic array architecture work found
    …click to see all
    AP

    Andrei Panferov

    medium hireability
    27
    Hybrid LUT-GEMM75
    MX / Microscaled Precision72
    Multimodal LLM Inference10
    Power-Efficient Chip Architecture5
    Systolic Array Design0
    DVFS / Power Management0
    Strengths
    'Bridging the Gap...Microscaling FP4 Quantization' (2025) — authored
    FLUTE LUT-GEMM: 14 commits + Fast Hadamard Transform kernel PR (merged)
    Gaps
    No systolic array or custom chip design experience
    …click to see all
    AF

    Andrew W Fitzgibbon

    medium hireability

    Engineering Fellow@Graphcore

    Previously: Partner Researcher @ Microsoft

    Cambridge, GB

    21
    MX / Microscaled Precision95
    Multimodal LLM Inference10
    Power-Efficient Chip Architecture10
    Hybrid LUT-GEMM5
    Systolic Array Design3
    DVFS / Power Management0
    Strengths
    graphcore-research/gfloat: implements OCP MX4/MX6/MX9/E8M0 block formats
    IEEE P3109 WG contributor — standards body for ML arithmetic formats
    Gaps
    No systolic array design or fixed-weight chip architecture work
    …click to see all
    AY

    An Yang

    medium hireability

    Researcher@Alibaba

    Previously: MS student @ Peking University

    11
    Multimodal LLM Inference65
    Hybrid LUT-GEMM0
    Systolic Array Design0
    DVFS / Power Management0
    MX / Microscaled Precision0
    Power-Efficient Chip Architecture0
    Strengths
    Qwen3 Technical Report (2025) — core team author, 3857 citations
    Qwen technical report (2023) — confirmed author among 48 contributors
    Gaps
    No evidence of hardware design — zero chip/ASIC/systolic array work
    …click to see all
    AF

    Arash Fayyazi

    medium hireability

    Principal Performance Engineer@d-Matrix

    Previously: Staff Software Engineer, AI Kernels and Workloads @ d-Matrix

    San Francisco, US

    41
    Systolic Array Design78
    Power-Efficient Chip Architecture74
    Hybrid LUT-GEMM62
    MX / Microscaled Precision18
    Multimodal LLM Inference8
    DVFS / Power Management5
    Strengths
    'Sparse Periodic Systolic Dataflow' (2022) — 4.49x energy efficiency on CNN accelerator
    BlendNet/NeuroBlend: binary+fixed-point blended inference engine, 2.5x power reduction
    Gaps
    No MX / microscaled-format (MSFP/MX4/MX6/MX9) work found
    …click to see all
    AS

    Atefeh Sohrabizadeh

    medium hireability

    Research Scientist@NVIDIA

    Previously: Graduate Student Researcher @ UCLA VAST Lab

    San Francisco, US

    20
    Systolic Array Design75
    Power-Efficient Chip Architecture25
    Hybrid LUT-GEMM5
    DVFS / Power Management5
    Multimodal LLM Inference5
    MX / Microscaled Precision5
    Strengths
    2025: Structured Sparse Matrix Acceleration in Systolic Arrays — core JD topic
    Versatile Systolic Array for CNN on FPGA (2022) — direct systolic array design experience
    Gaps
    No evidence of MX/microscaled numeric formats or LUT-GEMM work
    …click to see all
    BR

    Bita Darvish Rouhani

    medium hireability

    Researcher@NVIDIA

    37
    MX / Microscaled Precision98
    Power-Efficient Chip Architecture50
    Multimodal LLM Inference35
    Systolic Array Design30
    Hybrid LUT-GEMM5
    DVFS / Power Management5
    Strengths
    Lead author OCP MX spec — MX4/MX6/MX9 definitive industry standard
    Microscaling Data Formats (140 citations, 2023) — exact format match for chip
    Gaps
    No systolic array RTL or tapeout — format-layer researcher, not chip designer
    …click to see all
    BA

    Byung Hoon Ahn

    medium hireability

    Software Engineer@Apple

    Previously: Research Scientist @ Protopia AI

    San Francisco, US

    25
    Systolic Array Design75
    Power-Efficient Chip Architecture40
    Multimodal LLM Inference20
    Hybrid LUT-GEMM5
    DVFS / Power Management5
    MX / Microscaled Precision5
    Strengths
    Planaria (MICRO 2020, 162 citations): omni-directional systolic array for DNN inference
    Co-author of Tushar Krishna (Georgia Tech DNN accelerator group)
    Gaps
    No evidence of MX/MSFP microscaled precision work
    …click to see all
    CH

    Casper Hansen

    medium hireability
    19
    Hybrid LUT-GEMM58
    Multimodal LLM Inference30
    MX / Microscaled Precision15
    Power-Efficient Chip Architecture8
    Systolic Array Design3
    DVFS / Power Management0
    Strengths
    AutoAWQ — top open-source W4A16 AWQ quantization library (417 commits)
    AutoAWQ_kernels: fused CUDA dequant+GEMM kernels, LUT-GEMM adjacent
    Gaps
    No chip/ASIC/RTL design experience — purely software stack
    …click to see all
    CM

    Changhai Man

    medium hireability

    PhD student@Georgia Institute of Technology

    Atlanta, US

    29
    Systolic Array Design75
    Power-Efficient Chip Architecture50
    MX / Microscaled Precision30
    Multimodal LLM Inference8
    Hybrid LUT-GEMM5
    DVFS / Power Management5
    Strengths
    Multi-bit-width booth vector systolic accelerator (2022, 43 cites) — core DNN chip
    SCALE-Sim TPU (2026) — extends Georgia Tech systolic array simulator for TPU validation
    Gaps
    No MX/MSFP/microscaled format work — multi-bit-width is adjacent but not same
    …click to see all
    CB

    Charlie Blake

    medium hireability

    AI research engineer@Graphcore

    Previously: MS student @ University of Oxford

    15
    MX / Microscaled Precision45
    Multimodal LLM Inference15
    Power-Efficient Chip Architecture15
    Hybrid LUT-GEMM5
    Systolic Array Design5
    DVFS / Power Management5
    Strengths
    FP8 training/inference (NeurIPS 2023) — closest work to MX weight precision
    SparQ Attention (ICML 2024, 77 citations) — bandwidth-efficient LLM inference
    Gaps
    No chip/hardware design experience — pure software/algorithm researcher
    …click to see all
    CS

    Chenfan Sun

    medium hireability

    Software Engineer@NVIDIA

    Previously: Software Engineer @ Apple

    Seattle, US

    20
    MX / Microscaled Precision72
    Power-Efficient Chip Architecture25
    Systolic Array Design15
    Multimodal LLM Inference8
    Hybrid LUT-GEMM0
    DVFS / Power Management0
    Strengths
    NVFP4 paper (2025): co-authored NVIDIA's 4-bit microscaling LLM pretraining format
    Apple ANE patents: compiler-level work on streaming convolutions in neural processor chip
    Gaps
    No systolic array design — compiler/numerics role, not chip architecture
    …click to see all
    CZ

    Cheng Zhang

    medium hireability

    Founding Engineer@AI Sequrity Company

    Previously: Research Intern @ Microsoft

    London, GB

    27
    MX / Microscaled Precision68
    Power-Efficient Chip Architecture42
    Systolic Array Design28
    Multimodal LLM Inference15
    Hybrid LUT-GEMM5
    DVFS / Power Management5
    Strengths
    LQER (ICML'24) + QERA (ICLR'25): top-venue LLM quantization error reconstruction
    Sub-8-bit LLM inference (EMNLP'23) -- direct MX weight precision relevance
    Gaps
    No explicit systolic array design publications
    …click to see all
    DL

    Daniel Lo

    medium hireability

    Researcher@Microsoft

    Previously: PhD student @ Cornell University

    Ithaca, US

    25
    MX / Microscaled Precision80
    Power-Efficient Chip Architecture40
    Systolic Array Design15
    Multimodal LLM Inference10
    DVFS / Power Management5
    Hybrid LUT-GEMM0
    Strengths
    MSFP NeurIPS 2020 (2nd author) — pioneered microscaled FP for FPGA inference
    Computer Architecture + FPGA expertise at Microsoft Research
    Gaps
    No evidence of systolic array design or fixed-weight inference ASIC work
    …click to see all
    DL

    Dayiheng Liu

    medium hireability

    Researcher@Alibaba

    Previously: Intern @ Microsoft

    Hangzhou, CN

    12
    Multimodal LLM Inference72
    Hybrid LUT-GEMM0
    Systolic Array Design0
    DVFS / Power Management0
    MX / Microscaled Precision0
    Power-Efficient Chip Architecture0
    Strengths
    Qwen2.5-Omni co-author — speech+vision+language matches Neuralace's target model exactly
    Qwen2-VL (2,541 citations) — defines vision encoder architecture they will deploy
    Gaps
    Zero chip/hardware experience: no systolic array, MX precision, LUT-GEMM, or DVFS
    …click to see all
    DD

    DaYou Du

    medium hireability

    PhD student@University of Edinburgh

    Previously: Research Intern @ Microsoft

    Edinburgh, GB

    18
    MX / Microscaled Precision50
    Multimodal LLM Inference20
    Power-Efficient Chip Architecture20
    Hybrid LUT-GEMM5
    Systolic Array Design5
    DVFS / Power Management5
    Strengths
    AFPQ (asymmetric FP quant) + STBLLM (1-bit) — deep low-bit weight compression
    BitDecoding (HPCA 2026): tensor core exploitation for low-bit KV cache decoding
    Gaps
    No systolic array or ASIC/chip design experience — GPU software, not RTL
    …click to see all
    DN

    Dimin Niu

    medium hireability

    Research Scientist@Alibaba

    Previously: Senior / Staff Engineer @ Samsung

    San Francisco, US

    27
    Power-Efficient Chip Architecture72
    Multimodal LLM Inference45
    Systolic Array Design20
    Hybrid LUT-GEMM10
    DVFS / Power Management10
    MX / Microscaled Precision5
    Strengths
    H-LLM (ISCA 2025): hardware-dataflow co-design for LLM inference chip
    HD-MoE (ICCAD 2025): MoE LLM inference on 3D-stacked NMP accelerator
    Gaps
    No systolic array work — all compute is near-memory/PIM not systolic
    …click to see all
    ES

    Eric Sather

    medium hireability

    Technical Lead Manager, Machine Learning@Cerebras Systems

    Previously: Principal Machine Learning Engineer @ Rivian

    San Francisco, US

    33
    Multimodal LLM Inference70
    Power-Efficient Chip Architecture65
    MX / Microscaled Precision35
    Systolic Array Design20
    Hybrid LUT-GEMM5
    DVFS / Power Management3
    Strengths
    20+ Perceive patents on inference circuits, ternary/discrete weight storage (2018–2023)
    DREAM (NeurIPS 2025): 3.6x speedup on multimodal VLM speculative decoding
    Gaps
    No explicit systolic array design evidence — Perceive/Cerebras are non-systolic architectures
    …click to see all
    FS

    Fei Sun

    medium hireability

    Software Engineer@Meta

    Previously: Research Scientist @ Alibaba Group

    San Francisco, US

    28
    Multimodal LLM Inference60
    Power-Efficient Chip Architecture55
    Systolic Array Design20
    Hybrid LUT-GEMM15
    MX / Microscaled Precision12
    DVFS / Power Management5
    Strengths
    'Generative AI beyond LLMs' (ISPASS 2024) — multimodal inference system analysis
    184QPS/W ISSCC 2022 chip — direct power-efficiency metric in real tapeout
    Gaps
    No evidence of systolic array architecture work specifically
    …click to see all
    GK

    Geethan Karunaratne

    medium hireability

    Researcher@IBM

    Previously: Postdoctoral Researcher @ IBM

    Zurich, CH

    21
    Power-Efficient Chip Architecture78
    Systolic Array Design18
    Multimodal LLM Inference10
    MX / Microscaled Precision8
    Hybrid LUT-GEMM5
    DVFS / Power Management5
    Strengths
    HERMES-Core: 14nm CMOS+PCM in-memory chip, ISSCC 2022 — production tapeout
    64-core mixed-signal PCM chip: 63.1 TOPS / 9.76 TOPS/W (Nature Electronics)
    Gaps
    In-memory computing (PCM analog), not systolic array architecture
    …click to see all
    GJ

    Geonhwa Jeong

    medium hireability

    Research Scientist@Meta

    Previously: Graduate Research Assistant @ Georgia Institute of Technology

    San Francisco, US

    27
    Systolic Array Design85
    Power-Efficient Chip Architecture35
    Multimodal LLM Inference20
    Hybrid LUT-GEMM10
    DVFS / Power Management5
    MX / Microscaled Precision5
    Strengths
    RASA (ISCA 2021): systolic array matrix engine design for CPU
    MAESTRO: DNN dataflow cost model for spatial/systolic accelerators
    Gaps
    No evidence of MX/MSFP or microscaled weight precision work
    …click to see all
    HC

    Han Cai

    medium hireability

    AI Research Scientist@NVIDIA

    Previously: Research Intern @ NVIDIA

    Boston, US

    13
    Multimodal LLM Inference35
    Power-Efficient Chip Architecture20
    MX / Microscaled Precision15
    Systolic Array Design5
    DVFS / Power Management5
    Hybrid LUT-GEMM0
    Strengths
    Jet-Nemotron (NVlabs): 53.6× LLM inference speedup on H100 GPUs
    Once-for-All: hardware-aware NAS across MCU/GPU/FPGA deployment targets
    Gaps
    No chip design / HDL / ASIC / tapeout experience — purely model-level
    …click to see all
    HD

    Hassan Dbouk

    medium hireability

    Senior Engineer@Qualcomm

    Previously: Graduate Research Assistant @ University of Illinois Urbana-Champaign

    San Francisco, US

    19
    Power-Efficient Chip Architecture65
    MX / Microscaled Precision20
    Multimodal LLM Inference15
    Systolic Array Design8
    DVFS / Power Management5
    Hybrid LUT-GEMM0
    Strengths
    KeyRAM ISSCC 2020: 0.34 μJ/decision in-memory chip tapeout
    JSSC 2022: energy-delay-accuracy fundamental limits for inference HW
    Gaps
    CIM architecture, not systolic arrays — different design paradigm
    …click to see all
    IB

    Irem Boybat

    medium hireability

    Research Staff Member@IBM

    Previously: Postdoctoral Researcher @ IBM

    Zurich, CH

    20
    Power-Efficient Chip Architecture75
    Multimodal LLM Inference20
    MX / Microscaled Precision12
    Systolic Array Design5
    DVFS / Power Management5
    Hybrid LUT-GEMM0
    Strengths
    IBM Zurich AIMC ASIC architect — PCM crossbar chips for DNN inference (VLSI, CICC, IEEE TC)
    ALPINE paper: tight analog-digital co-integration for low-latency inference
    Gaps
    Analog CIM paradigm (PCM crossbars) — no digital systolic array design
    …click to see all
    JW

    Jianyu Wei

    medium hireability

    PhD student@USTC & MSRA

    CN

    38
    Hybrid LUT-GEMM93
    Power-Efficient Chip Architecture58
    MX / Microscaled Precision42
    Systolic Array Design15
    Multimodal LLM Inference12
    DVFS / Power Management5
    Strengths
    T-MAC (EuroSys 2025): LUT-based GEMM for low-bit LLM on CPU/NPU — core T-MAC author
    LUT Tensor Core (ISCA 2025): HW-SW co-design for LUT-based low-bit LLM inference
    Gaps
    No systolic array architecture work (CPU/NPU focused, not custom ASIC)
    …click to see all
    LL

    Ling Liang

    medium hireability

    助理研究员@Peking University

    Previously: 隐私计算研究 @ Alibaba

    中国

    20
    Power-Efficient Chip Architecture70
    Multimodal LLM Inference15
    MX / Microscaled Precision15
    Systolic Array Design10
    DVFS / Power Management5
    Hybrid LUT-GEMM3
    Strengths
    28nm ISSCC tapeout: 29.2TFLOPS/W BF16, 36.5TOPS/W INT8 — direct W/TOPS metrics
    TranCIM ISSCC 2022: 15.59µJ/Token sparse transformer accelerator chip
    Gaps
    CIM paradigm, not systolic arrays — fundamentally different dataflow
    …click to see all
    MN

    Mahdi Nazemi

    medium hireability

    Machine Learning Engineer@NVIDIA

    Previously: Machine Learning Researcher @ MatX

    San Francisco, US

    45
    Hybrid LUT-GEMM75
    Power-Efficient Chip Architecture72
    DVFS / Power Management55
    MX / Microscaled Precision45
    Multimodal LLM Inference15
    Systolic Array Design5
    Strengths
    BlendNet: hybrid binary+fixed-point inference, 2.5x power reduction on FPGA
    US Patent 18/086,989: hybrid arithmetic/logic processing of neural networks (2023)
    Gaps
    No systolic array design work found
    …click to see all
    MN

    Markus Nagel

    medium hireability

    Research Scientist (Senior Staff Engineer)@Qualcomm

    Previously: Research Scientist (Staff Engineer) @ Qualcomm

    Amsterdam, NL

    15
    MX / Microscaled Precision68
    Power-Efficient Chip Architecture15
    Multimodal LLM Inference8
    Hybrid LUT-GEMM0
    Systolic Array Design0
    DVFS / Power Management0
    Strengths
    FP8 Quantization: The Power of the Exponent — NeurIPS 2022 block-float landmark
    AIMET author — Qualcomm's production AI quantization toolkit
    Gaps
    No systolic array or custom ASIC chip design experience
    …click to see all
    MD

    Martin G Dixon

    medium hireability

    Director of Engineering@Google

    Previously: Intel Fellow & Vice President @ Intel

    San Francisco, US

    29
    Power-Efficient Chip Architecture65
    Systolic Array Design40
    MX / Microscaled Precision35
    DVFS / Power Management25
    Hybrid LUT-GEMM5
    Multimodal LLM Inference5
    Strengths
    "Matrix multiply accumulate instruction" patent (2018, 71 citations) — MMA/AMX core work
    Intel Fellow + SoC Architect — 9 years designing heterogeneous processor systems
    Gaps
    No explicit systolic array DNN inference chip design found
    …click to see all
    MD

    Martino Dazzi

    medium hireability

    Researcher@Axelera AI

    Previously: Researcher @ IBM

    40
    Power-Efficient Chip Architecture88
    Hybrid LUT-GEMM72
    Systolic Array Design40
    MX / Microscaled Precision20
    Multimodal LLM Inference15
    DVFS / Power Management5
    Strengths
    Metis AIPU (ISSCC 2024) — 15TOPS/W real tapeout at Axelera AI
    LUT-based ANN hardware paper (2025) — direct LUT-GEMM relevance
    Gaps
    CIM arrays ≠ systolic arrays — different fixed-weight paradigm
    …click to see all
    MW

    Mengdi Wang

    medium hireability

    PhD candidate@Institute of Computing Technology, Chinese Academy of Sciences

    Previously: Intern, Department of AI @ Jeejio

    CN

    15
    Power-Efficient Chip Architecture38
    MX / Microscaled Precision22
    Systolic Array Design15
    Hybrid LUT-GEMM5
    DVFS / Power Management5
    Multimodal LLM Inference5
    Strengths
    Real SoC tapeout: NPU in Jeejio JX2/JX3 commercial chips
    MT-DLA: multi-task DNN accelerator, GLSVLSI 2021 Best Paper
    Gaps
    No systolic array design work — generic multi-core NPU, not SA-based
    …click to see all
    MC

    Minsik Cho

    medium hireability

    Machine Intelligence R&D, AI/ML@Apple

    Previously: Siri R&D, AI/ML @ Apple

    Austin, US

    27
    Hybrid LUT-GEMM50
    Multimodal LLM Inference45
    Power-Efficient Chip Architecture30
    MX / Microscaled Precision20
    Systolic Array Design10
    DVFS / Power Management5
    Strengths
    DKM/eDKM: codebook GEMM weight clustering — direct LUT-GEMM foundation
    "LLM in a Flash" (191 citations) — power/memory-constrained LLM inference
    Gaps
    No systolic array or fixed-weight dataflow architecture papers
    …click to see all
    NK

    Nithesh Kurella

    medium hireability

    Senior Principal ML Architect@d-Matrix

    Previously: Principal ML Architect @ d-Matrix

    San Francisco, US

    41
    Power-Efficient Chip Architecture72
    DVFS / Power Management65
    Multimodal LLM Inference50
    MX / Microscaled Precision38
    Systolic Array Design18
    Hybrid LUT-GEMM5
    Strengths
    Corsair (2025): d-Matrix flagship inference chiplet architecture paper co-author
    Patents on in-memory compute chiplets for transformer workloads (2024-2025)
    Gaps
    d-Matrix uses DIMC not systolic arrays — no direct systolic design experience
    …click to see all
    QZ

    Qirui Zhang

    medium hireability

    Postdoc@Postdoctoral Researcher, EECS, University of Michigan

    Ann Arbor, US

    41
    Power-Efficient Chip Architecture88
    Multimodal LLM Inference55
    DVFS / Power Management45
    MX / Microscaled Precision30
    Systolic Array Design20
    Hybrid LUT-GEMM5
    Strengths
    VLSI 2025: 25.08 TOPS/W transformer accelerator, mixed precision + power mgmt tapeout
    AIMMI JSSC 2024: audio+image multimodal SoC in 22nm silicon
    Gaps
    No explicit systolic array architecture work found
    …click to see all
    RH

    Ramyad Hadidi

    medium hireability

    Senior Staff -- ML Computer Architect@d-Matrix

    Previously: Senior Scientist @ Rain AI

    San Francisco, US

    36
    Systolic Array Design90
    Power-Efficient Chip Architecture85
    DVFS / Power Management20
    MX / Microscaled Precision10
    Hybrid LUT-GEMM5
    Multimodal LLM Inference5
    Strengths
    ERIDANUS (2019): 41-cite systolic array DNN inference paper
    MEISSA (2020): scalable systolic matrix multiply architecture
    Gaps
    No MX/MSFP or microscaled precision format research
    …click to see all
    RS

    Rasoul Shafipour

    medium hireability

    Senior AI and Machine Learning Engineer@NVIDIA

    Previously: AI/ML Research Scientist @ Apple

    Seattle, US

    27
    MX / Microscaled Precision90
    Power-Efficient Chip Architecture25
    Multimodal LLM Inference20
    Hybrid LUT-GEMM18
    Systolic Array Design3
    DVFS / Power Management3
    Strengths
    MX paper co-author (arXiv:2310.10537): canonical microscaling standard, 162 citations
    Shared Microexponents (2023): BDR framework defining MX4/MX6/MX9 inference formats
    Gaps
    No systolic array design or ASIC/chip architecture work found
    …click to see all
    RZ

    Ritchie Zhao

    medium hireability

    Senior AI and Machine Learning Engineer@NVIDIA

    Previously: Senior Data Science Manager @ Microsoft

    Redmond, US

    33
    MX / Microscaled Precision95
    Power-Efficient Chip Architecture40
    Systolic Array Design35
    Multimodal LLM Inference20
    Hybrid LUT-GEMM5
    DVFS / Power Management5
    Strengths
    Co-authored OCP MX spec (2310.10537) — definitive microscaling standard
    Shared Microexponents (ISCA 2023) — microscaled BFP format for inference
    Gaps
    No LUT-GEMM or hybrid LUT/GEMM kernel work found
    …click to see all
    SA

    Saurabh Adya

    medium hireability

    Apple

    25
    MX / Microscaled Precision55
    Hybrid LUT-GEMM30
    Multimodal LLM Inference30
    Power-Efficient Chip Architecture20
    Systolic Array Design15
    DVFS / Power Management0
    Strengths
    eDKM: LLaMA 7B compressed to 3-bit — state-of-art LLM weight clustering
    DKM (ICLR 2022): weight codebooks via differentiable k-means — LUT-adjacent precision
    Gaps
    No systolic array architecture papers or chip tapeout experience
    …click to see all
    SD

    Saurabh Dash

    medium hireability

    Member of Technical Staff@Cohere

    Previously: Machine Learning Researcher @ Apple

    Toronto, CA

    32
    Multimodal LLM Inference65
    MX / Microscaled Precision60
    Power-Efficient Chip Architecture40
    Systolic Array Design20
    Hybrid LUT-GEMM5
    DVFS / Power Management0
    Strengths
    PIM/ReRAM chip (IEEE TCAD 2022) — real hardware inference accelerator design
    Hessian-driven mixed-precision on hardware — directly targets efficient weight representation
    Gaps
    No systolic array design work — PIM/ReRAM is a different inference architecture
    …click to see all
    SY

    Shang Yang

    medium hireability

    PhD student@MIT EECS

    Previously: Intern @ MIT

    Boston, US

    19
    Multimodal LLM Inference65
    MX / Microscaled Precision30
    Power-Efficient Chip Architecture12
    Systolic Array Design5
    Hybrid LUT-GEMM3
    DVFS / Power Management0
    Strengths
    AWQ MLSys 2024 Best Paper — W4 activation-aware quantization (1498 citations)
    QServe W4A8KV4 — quantized inference with custom kernel co-design (MLSys 2025)
    Gaps
    No chip/ASIC design — entirely software/systems layer contributions
    …click to see all
    SL

    Shiyu Li

    medium hireability

    Senior Deep Learning Architect@NVIDIA

    Previously: PhD student @ Duke University

    San Francisco, US

    37
    Power-Efficient Chip Architecture70
    Systolic Array Design60
    MX / Microscaled Precision50
    DVFS / Power Management20
    Multimodal LLM Inference15
    Hybrid LUT-GEMM5
    Strengths
    INCA (2023): input-stationary dataflow — canonical systolic array variant
    Block-wise mixed-precision quantization for ReRAM DNN accelerators (TCAD 2024)
    Gaps
    No explicit systolic array chip tapeout or fixed-weight accelerator paper
    …click to see all
    SD

    Shunan Dong

    medium hireability

    Nanjing University

    18
    Power-Efficient Chip Architecture45
    Systolic Array Design30
    MX / Microscaled Precision15
    DVFS / Power Management10
    Hybrid LUT-GEMM5
    Multimodal LLM Inference5
    Strengths
    Self-described 'chip architecture and algorithms' student at Tsinghua
    Ds-open: SystemVerilog sparse GEMM accelerator — real HDL chip design
    Gaps
    No direct evidence of systolic array or fixed-weight inference chip design
    …click to see all
    SK

    Souvik Kundu

    medium hireability

    Inference and SLM Optimization Lead@Intel

    Previously: Staff Research Scientist @ Intel

    Los Angeles, US

    39
    MX / Microscaled Precision85
    Systolic Array Design50
    Power-Efficient Chip Architecture50
    Hybrid LUT-GEMM28
    Multimodal LLM Inference12
    DVFS / Power Management8
    Strengths
    MicroScopiQ (2025): MX microscaling quant + systolic array co-design at NeurIPS
    ShiftAddLLM (NeurIPS 2024): multiplication-free LLM via shift+add, adjacent to LUT-GEMM
    Gaps
    No DVFS or dynamic power management research
    …click to see all
    SS

    Suvinay Subramanian

    medium hireability

    Software Engineer@Google

    San Francisco, US

    33
    Systolic Array Design75
    Power-Efficient Chip Architecture75
    Multimodal LLM Inference20
    DVFS / Power Management15
    MX / Microscaled Precision10
    Hybrid LUT-GEMM5
    Strengths
    TPU v4 co-author — named contributor on Google's systolic array inference chip
    FLAT: 49% energy savings + 1.94x speedup for attention inference hardware
    Gaps
    No MX / microscaled numeric format work (MX4/MX6/MX9) found
    …click to see all
    TX

    Tianhua Xia

    medium hireability

    PhD student@New York University

    New York, US

    27
    Multimodal LLM Inference65
    Power-Efficient Chip Architecture50
    Systolic Array Design20
    MX / Microscaled Precision15
    DVFS / Power Management8
    Hybrid LUT-GEMM5
    Strengths
    PICACHU (ASPLOS 2025) — CGRA spatial array design for LLM ops
    HAAN (DATE 2025) — algorithm-hardware co-design for LLM normalization
    Gaps
    No systolic array work — CGRA design is related but architecturally different
    …click to see all
    VM

    Valavan Manohararajah

    medium hireability

    Chief Product Architect@Cerebras

    Previously: Distinguished Engineer @ Cerebras

    Vaughan, CA

    23
    Power-Efficient Chip Architecture50
    Systolic Array Design30
    MX / Microscaled Precision25
    Hybrid LUT-GEMM15
    Multimodal LLM Inference10
    DVFS / Power Management5
    Strengths
    Chief Product Architect at Cerebras — wafer-scale AI inference silicon
    15 years Intel PSG — FPGA & hardware inference accelerator design
    Gaps
    No explicit systolic array design or fixed-weight architecture papers
    …click to see all
    VC

    Vikas Chandra

    medium hireability

    Senior Director, AI@Meta

    Previously: Director, Applied ML @ Arm

    San Francisco, US

    51
    Power-Efficient Chip Architecture78
    MX / Microscaled Precision65
    Multimodal LLM Inference62
    Systolic Array Design52
    DVFS / Power Management42
    Hybrid LUT-GEMM5
    Strengths
    ARM Senior Director — built ARM ML IP and neural engine silicon
    Heterogeneous Dataflow Accelerators (HPCA 2021, 215 cites) — inference ASIC dataflow
    Gaps
    No explicit systolic array paper — dataflow accelerator work is adjacent
    …click to see all
    VT

    Vithursan Thangarasa

    medium hireability

    Principal Research Scientist@Cerebras Systems

    Previously: Lead Research Scientist @ Cerebras Systems

    San Francisco, US

    15
    Multimodal LLM Inference65
    Power-Efficient Chip Architecture18
    Hybrid LUT-GEMM2
    Systolic Array Design2
    DVFS / Power Management2
    MX / Microscaled Precision2
    Strengths
    MASSV (NeurIPS 2025): multimodal speculative decoding for VLMs
    DREAM (NeurIPS 2025): multimodal speculative decoding with cross-attention fusion
    Gaps
    No evidence of systolic array or fixed-weight chip architecture work
    …click to see all
    WC

    Wang, Chang

    medium hireability
    20
    MX / Microscaled Precision78
    Multimodal LLM Inference30
    Hybrid LUT-GEMM12
    Systolic Array Design0
    DVFS / Power Management0
    Power-Efficient Chip Architecture0
    Strengths
    132 commits to intel/neural-compressor — MXFP4/MXFP8/INT4/NVFP4 all covered
    MXFP8 PRs merged in vllm-project/llm-compressor (active May 2026)
    Gaps
    Zero hardware/ASIC/chip design experience — pure software
    …click to see all
    WB

    William Brandon

    medium hireability

    PhD student@MIT CSAIL

    Previously: Research Assistant @ MIT Media Lab

    Cambridge, US

    18
    Hybrid LUT-GEMM92
    Multimodal LLM Inference10
    MX / Microscaled Precision5
    Systolic Array Design0
    DVFS / Power Management0
    Power-Efficient Chip Architecture0
    Strengths
    FLUTE paper: LUT-GEMM for quantized LLMs — direct axis match (EMNLP 2024)
    Hydra: speculative decoding with GPU-level inference optimization
    Gaps
    No chip design, RTL, or ASIC experience
    …click to see all
    WJ

    Wonsuk Jang

    medium hireability

    Research Scientist Intern@Meta

    Previously: Digital Design Engineer @ Samsung

    San Francisco, US

    28
    MX / Microscaled Precision68
    Power-Efficient Chip Architecture48
    Systolic Array Design20
    Hybrid LUT-GEMM15
    Multimodal LLM Inference12
    DVFS / Power Management5
    Strengths
    BlockDialect: FP4 block-wise quantization with MXFP4 comparison (2025)
    DialectFP4: scaled-integer arithmetic for hardware energy efficiency
    Gaps
    No direct systolic array design or tapeout work by Wonsuk personally
    …click to see all
    XH

    Xin He

    medium hireability
    16
    MX / Microscaled Precision60
    Multimodal LLM Inference22
    Hybrid LUT-GEMM15
    Systolic Array Design0
    DVFS / Power Management0
    Power-Efficient Chip Architecture0
    Strengths
    298 commits to intel/neural-compressor — MX/FP8/INT4 format implementation depth
    intel/auto-round contributor — quantization algorithms for LLMs and VLMs
    Gaps
    No chip design or hardware architecture evidence — entirely software-layer
    …click to see all
    YF

    Yonggan Fu

    medium hireability

    Research Scientist@NVIDIA

    Previously: Research Intern @ NVIDIA

    San Francisco, US

    24
    Power-Efficient Chip Architecture50
    Multimodal LLM Inference30
    Hybrid LUT-GEMM25
    MX / Microscaled Precision20
    Systolic Array Design15
    DVFS / Power Management5
    Strengths
    ShiftAddNAS: LUT-adjacent shift-add inference hardware (ICML 2022)
    Auto-NBA: joint bitwidth + accelerator co-search (ICML 2021)
    Gaps
    No systolic array-specific research (output-stationary, weight-stationary designs)
    …click to see all
    ZY

    Zhihang Yuan

    medium hireability

    Algorithm Researcher@Bytedance

    Previously: Researcher @ Infinigence AI

    Beijing, CN

    33
    MX / Microscaled Precision68
    Multimodal LLM Inference52
    Power-Efficient Chip Architecture50
    Systolic Array Design15
    DVFS / Power Management8
    Hybrid LUT-GEMM5
    Strengths
    GSQ/GSE group-shared exponent quant — direct MX-format analog
    I-LLM (2025): integer-only inference for fully-quantized LLMs
    Gaps
    No systolic array RTL/architecture work; CiM ≠ systolic dataflow
    …click to see all
    AH

    Ahmed Hasssan

    low hireability

    MTS Software Development Engineer@AMD

    Previously: Graduate Student Research Assistant @ Cornell Tech

    Pueblo, US

    29
    Hybrid LUT-GEMM65
    Power-Efficient Chip Architecture40
    MX / Microscaled Precision30
    Multimodal LLM Inference20
    Systolic Array Design15
    DVFS / Power Management5
    Strengths
    PhD research: LUT-based HW accelerator for DNN inference — direct LUT-GEMM match
    Torch2Chip (MLSys 2024): SW/HW co-design toolkit for quantization + accelerator prototyping
    Gaps
    No systolic array design work found — core Neuralace architecture absent
    …click to see all
    AA

    Akhil Arunkumar

    low hireability

    Sr. Principal Software Engineer@d-Matrix

    Previously: SoC Performance Architect @ AMD

    San Francisco, US

    36
    Power-Efficient Chip Architecture72
    Multimodal LLM Inference45
    MX / Microscaled Precision40
    Systolic Array Design30
    DVFS / Power Management25
    Hybrid LUT-GEMM5
    Strengths
    Corsair (IEEE Micro 2025): co-author on d-Matrix's CIM inference chiplet paper
    d-Matrix DIMC uses Block Floating Point — adjacent to MX block-exponent formats
    Gaps
    CIM architecture (d-Matrix) is fundamentally different from fixed-weight systolic arrays
    …click to see all
    AY

    Amir Yazdanbakhsh

    low hireability

    Research Scientist@DeepMind

    Previously: Research Scientist @ Google

    San Francisco, US

    43
    Systolic Array Design72
    Power-Efficient Chip Architecture65
    Multimodal LLM Inference50
    Hybrid LUT-GEMM32
    MX / Microscaled Precision30
    DVFS / Power Management10
    Strengths
    'Structured Sparse Matrix Acceleration in Systolic Arrays' (2025) — direct match
    FLAT (2022): hardware dataflow paper for attention bottleneck mitigation
    Gaps
    No MX/MSFP/microscaled format-specific work — quantization is post-training, not fixed-point MX
    …click to see all
    CL

    Carlo Luschi

    low hireability

    VP & Head of Research@Graphcore

    Previously: Director of Research @ Graphcore

    Oxford, GB

    28
    MX / Microscaled Precision85
    Power-Efficient Chip Architecture45
    Multimodal LLM Inference20
    Systolic Array Design8
    Hybrid LUT-GEMM5
    DVFS / Power Management5
    Strengths
    MXNorm (2026) — MXFP8 block-scale reuse for LLM inference, 2.4x kernel speedup
    Optimal Formats for Weight Quantisation (2025) — weight-precision theory for MX
    Gaps
    No systolic array design — Graphcore IPU is dataflow/BSP, not systolic
    …click to see all
    CC

    Chi-Chih Chang

    low hireability

    Ph.D. Student@Cornell University

    Previously: Remote Intern @ University of Washington

    34
    Systolic Array Design75
    MX / Microscaled Precision60
    Power-Efficient Chip Architecture50
    Multimodal LLM Inference10
    Hybrid LUT-GEMM5
    DVFS / Power Management5
    Strengths
    Systolic Sparse Tensor Slices (FPGA 2025) — direct systolic array design for AI
    Power of Negative Zero — custom datatypes for quantized LLM inference
    Gaps
    All hardware work is FPGA-based; no ASIC or custom chip tapeout
    …click to see all
    EF

    Elias Frantar

    low hireability

    Member of Technical Staff@OpenAI

    Previously: PHD Candidate @ Institute of Science and Technology Austria

    San Francisco, US

    21
    MX / Microscaled Precision65
    Hybrid LUT-GEMM50
    Multimodal LLM Inference5
    Power-Efficient Chip Architecture5
    Systolic Array Design0
    DVFS / Power Management0
    Strengths
    GPTQ (ICLR 2023): lead author, gold standard INT4 LLM weight quantization
    Marlin: custom INT4xFP16 inference GEMM kernel, near-ideal 4x speedup
    Gaps
    No hardware design — zero RTL, ASIC, or systolic array experience
    …click to see all
    EC

    Eric Chung

    low hireability

    VP of AI Computing@NVIDIA

    Previously: GM & Partner Group Engineering Manager @ Microsoft

    Seattle, US

    51
    MX / Microscaled Precision95
    Systolic Array Design88
    Power-Efficient Chip Architecture82
    Multimodal LLM Inference20
    Hybrid LUT-GEMM10
    DVFS / Power Management8
    Strengths
    Project Brainwave: fixed-weight systolic array ASIC (ISCA 2018, 748 citations)
    Co-authored OCP MX standard — microscaled data formats for DL (2023)
    Gaps
    No hybrid LUT-GEMM work (FPGA/LUT background is only indirect)
    …click to see all
    FY

    Fan Yang

    low hireability

    Sr. Principal Research Manager@Microsoft

    Previously: Principal Research Manager @ Microsoft

    CN

    38
    Hybrid LUT-GEMM92
    Power-Efficient Chip Architecture55
    MX / Microscaled Precision50
    Systolic Array Design15
    Multimodal LLM Inference10
    DVFS / Power Management5
    Strengths
    LUT Tensor Core (ISCA 2025) — SW-HW co-design for LUT-based low-bit LLM inference
    LUT-DLA (HPCA 2025) — lookup-table accelerator for extreme low-bit DNNs
    Gaps
    No systolic array design — LUT-based architecture, not weight-stationary/systolic
    …click to see all
    GX

    Guangxuan Xiao

    low hireability

    Member of Technical Staff@Thinking Machines Lab

    Previously: Research Intern @ NVIDIA

    USA

    26
    MX / Microscaled Precision55
    Multimodal LLM Inference50
    Power-Efficient Chip Architecture25
    Hybrid LUT-GEMM20
    Systolic Array Design5
    DVFS / Power Management0
    Strengths
    SmoothQuant (ICML 2023, 1.5K citations) — W8A8 PTQ leader for LLMs
    QServe W4A8KV4 — system co-design for quantized LLM serving
    Gaps
    No chip/ASIC/RTL experience — entirely software-side quantization
    …click to see all
    HG

    Han Guo

    low hireability

    Research Intern@Together AI

    Previously: Research Intern @ IBM

    San Francisco, US

    23
    Hybrid LUT-GEMM90
    MX / Microscaled Precision28
    Multimodal LLM Inference12
    Power-Efficient Chip Architecture5
    Systolic Array Design3
    DVFS / Power Management0
    Strengths
    FLUTE (EMNLP 2024): lead author of premier LUT-GEMM paper for LLMs
    FLUTE C++ implementation — 2-4x speedup via offline table restructuring
    Gaps
    No hardware/ASIC experience — entirely software stack, no RTL/chip design
    …click to see all
    HY

    Haoran You

    low hireability

    Research Scientist@Adobe

    Previously: Research Scholar @ SRC Research Scholars Program

    Seattle, US

    38
    Hybrid LUT-GEMM80
    Power-Efficient Chip Architecture65
    Systolic Array Design30
    Multimodal LLM Inference25
    MX / Microscaled Precision20
    DVFS / Power Management5
    Strengths
    ShiftAddLLM (NeurIPS 2024) — LUT-GEMM analog, collab w/ Intel & Google DeepMind
    ShiftAdd series 2020–2024: 5-yr program replacing multipliers with shift+add ops
    Gaps
    No systolic array design — accelerator co-design from algorithm side, not RTL/silicon
    …click to see all
    HK

    Harshit Khaitan

    low hireability

    Director, AI Accelerators@Meta

    Previously: Technical Lead / Manager, TPU HW Design @ Google

    San Francisco, US

    38
    Systolic Array Design85
    Power-Efficient Chip Architecture85
    Multimodal LLM Inference30
    DVFS / Power Management15
    MX / Microscaled Precision10
    Hybrid LUT-GEMM5
    Strengths
    TPU paper (6.6K citations) — co-authored canonical systolic array DNN inference paper
    Director, AI Accelerators at Meta — leads MTIA inference chip program
    Gaps
    No MX/MSFP/microscaled precision work found
    …click to see all
    HS

    H Ekin Sumbul

    low hireability

    Head of IP@Architect Labs

    Previously: Research Scientist @ Meta

    San Francisco, US

    26
    Power-Efficient Chip Architecture80
    Multimodal LLM Inference25
    MX / Microscaled Precision20
    Systolic Array Design15
    DVFS / Power Management10
    Hybrid LUT-GEMM5
    Strengths
    617 TOPS/W BNN accelerator in 10nm (ISSCC 2020) — world-class W/TOPS
    3.8 pJ/SOP SNN chip (JSSC 2019, h=24) — proven ASIC power efficiency
    Gaps
    No systolic array design work — expertise is CIM/CNM, a distinct architecture
    …click to see all
    HM

    Hesham Mostafa

    low hireability

    Researcher@Intel

    33
    MX / Microscaled Precision87
    Power-Efficient Chip Architecture68
    Multimodal LLM Inference18
    Systolic Array Design12
    Hybrid LUT-GEMM5
    DVFS / Power Management5
    Strengths
    MF-QAT (2026): MXINT/MXFP multi-format QAT for elastic inference — direct MX hit
    Technical Lead ML at d-Matrix, production LLM inference chip experience
    Gaps
    No systolic array work — d-Matrix uses DIMC, not systolic
    …click to see all
    JT

    Jiaming Tang

    low hireability

    Ph.D. student@MIT

    Previously: Undergraduate researcher @ SJTU EPCC Lab

    Boston, US

    28
    Power-Efficient Chip Architecture62
    Systolic Array Design35
    Hybrid LUT-GEMM25
    MX / Microscaled Precision25
    Multimodal LLM Inference20
    DVFS / Power Management3
    Strengths
    Transitive Array (ISCA 2025): multiplication-free GEMM accelerator, 2.31× energy saving
    OliVe (ISCA 2023): hardware-LLM quantization co-design, 4.5× speedup
    Gaps
    No published systolic array or weight-stationary dataflow chip work
    …click to see all
    LC

    Lukas Cavigelli

    low hireability

    Researcher (Expert/Architect)@Huawei

    Previously: Researcher (Principal Engineer) @ Huawei

    Zurich, CH

    54
    Power-Efficient Chip Architecture92
    Systolic Array Design78
    Hybrid LUT-GEMM75
    MX / Microscaled Precision58
    DVFS / Power Management10
    Multimodal LLM Inference8
    Strengths
    Stella Nera: Maddness LUT hardware accelerator, 161 TOp/s/W (ISVLSI 2025)
    Chipmunk: systolically scalable inference chip tapeout, 3.08 Gop/s/mW (CICC 2018)
    Gaps
    No DVFS or dynamic power-budget management papers found
    …click to see all
    MG

    Manuel Le Gallo

    low hireability

    Staff Research Scientist@IBM

    Previously: PhD student @ ETH Zurich

    Zurich, CH

    21
    Power-Efficient Chip Architecture82
    Multimodal LLM Inference20
    MX / Microscaled Precision12
    Systolic Array Design8
    DVFS / Power Management5
    Hybrid LUT-GEMM0
    Strengths
    HERMES tapeout: 1.59 TOPS/mm² at 14nm — power-efficient inference chip
    2025 paper: LLM scaling with MoE via 3D analog in-memory computing
    Gaps
    Architecture mismatch — analog CIM (PCM crossbar), not digital systolic arrays
    …click to see all
    MR

    Mohammad Rastegari

    low hireability

    CEO and Co-Founder@Elastix.AI

    Previously: Distinguished Scientist @ Meta

    Seattle, US

    31
    Hybrid LUT-GEMM80
    MX / Microscaled Precision40
    Power-Efficient Chip Architecture35
    Multimodal LLM Inference15
    Systolic Array Design10
    DVFS / Power Management5
    Strengths
    LCNN (2017): seminal lookup-based CNN, directly relevant to LUT-GEMM
    XNOR-Net: 1-bit binary weights — extreme quantization for inference
    Gaps
    No systolic array or fixed-weight ASIC design work
    …click to see all
    NJ

    Norman P Jouppi

    low hireability

    VP, Engineering Fellow@Google

    Previously: Senior Fellow @ HP

    45
    Systolic Array Design100
    Power-Efficient Chip Architecture88
    Multimodal LLM Inference35
    DVFS / Power Management20
    MX / Microscaled Precision20
    Hybrid LUT-GEMM5
    Strengths
    TPU v1 (2017) — defines weight-stationary systolic array for DNN inference
    Multi-Modal Systolic Array paper (2024) — direct match to Neuralace chip target
    Gaps
    No evidence of MX / MSFP / microscaled precision format work
    …click to see all
    PW

    Paul N. Whatmough

    low hireability

    Senior Director, AI Research@Qualcomm

    Previously: Director, AI Research @ Qualcomm

    Boston, US

    56
    Power-Efficient Chip Architecture95
    Systolic Array Design88
    DVFS / Power Management72
    MX / Microscaled Precision45
    Multimodal LLM Inference22
    Hybrid LUT-GEMM12
    Strengths
    Sparse Systolic Tensor Array (arXiv:2009.02381) — 16nm tapeout, 16.8 TOPS/W
    ISSCC 2023 12nm chip — 18.1 TFLOPs/W sparse transformer processor
    Gaps
    No OCP MX4/MX6/MX9 standard-compliant microscaling work found
    …click to see all
    RP

    Raghu Prabhakar

    low hireability

    Engineering@SambaNova Systems

    Previously: Software Engineer @ NVIDIA

    San Francisco, US

    37
    Systolic Array Design80
    Power-Efficient Chip Architecture78
    DVFS / Power Management30
    Multimodal LLM Inference15
    MX / Microscaled Precision15
    Hybrid LUT-GEMM5
    Strengths
    2025 papers: systolic array PE design + MatMul on systolic array (direct match)
    SN40L co-architect — 5nm 2.5D AI inference chip (ISSCC 2025)
    Gaps
    SambaNova uses reconfigurable dataflow, not fixed-weight systolic — key architectural mismatch
    …click to see all
    RH

    Randy Huang

    low hireability

    Principal Engineer@Amazon

    Previously: Principal Engineer, Programmable Solutions Group @ Intel

    San Francisco, US

    29
    Power-Efficient Chip Architecture68
    Systolic Array Design65
    Hybrid LUT-GEMM20
    DVFS / Power Management10
    Multimodal LLM Inference5
    MX / Microscaled Precision5
    Strengths
    Annapurna Labs Principal Eng — builds AWS Inferentia (systolic array inference ASIC)
    Intel FPGA Programmable Solutions Group — next-gen FPGA architecture work
    Gaps
    No evidence of MX/microscaled numeric format work
    …click to see all
    SS

    Sayeh Sharify

    low hireability

    Principal Machine Learning Research Scientist@d-Matrix

    Previously: Co-Founder @ Tartan AI

    San Francisco, US

    31
    MX / Microscaled Precision95
    Power-Efficient Chip Architecture45
    Systolic Array Design20
    Multimodal LLM Inference15
    Hybrid LUT-GEMM8
    DVFS / Power Management5
    Strengths
    "Post Training Quantization of LLMs with MX Formats" — MXINT4/8 PTQ (NeurIPS 2024)
    "MF-QAT" (2026) — multi-format MXINT/MXFP QAT for MX inference chips
    Gaps
    No systolic array design — d-Matrix uses DIMC, not systolic architecture
    …click to see all
    SK

    Se Jung Kwon

    low hireability

    Director@NAVER

    Previously: Leader @ NAVER

    Seoul, KR

    31
    Hybrid LUT-GEMM97
    MX / Microscaled Precision50
    Multimodal LLM Inference15
    Power-Efficient Chip Architecture12
    Systolic Array Design5
    DVFS / Power Management5
    Strengths
    LUT-GEMM (ICLR 2024, 184 citations) — invented the exact JD technique
    CodeGEMM (NeurIPS 2025) — 8.93x speedup at 2-bit via codebook-centric GEMM
    Gaps
    No systolic array or chip architecture work — algorithm-only researcher
    …click to see all
    SL

    Sheng Li

    low hireability

    Principal Research Scientist@Google

    30
    MX / Microscaled Precision72
    Power-Efficient Chip Architecture50
    Systolic Array Design40
    Multimodal LLM Inference10
    Hybrid LUT-GEMM5
    DVFS / Power Management5
    Strengths
    ICML 2024: SF4/E2M1+supernormal 4-bit LLM weight formats — directly relevant to MX precision
    FLIQS: mixed-precision FP/INT quantization search across vision + transformer models
    Gaps
    Research-level analyst/modeler — no ASIC tapeout or silicon design evidence
    …click to see all
    SC

    Shijie Cao

    low hireability

    Senior Researcher@Microsoft

    Previously: Senior Researcher @ Microsoft Research Asia

    38
    Hybrid LUT-GEMM95
    MX / Microscaled Precision65
    Power-Efficient Chip Architecture35
    Systolic Array Design15
    Multimodal LLM Inference10
    DVFS / Power Management5
    Strengths
    T-MAC: LUT-based low-bit LLM inference on CPU/NPU — defining work in LUT-GEMM
    LUT Tensor Core (2025): SW-HW co-design for LUT-based low-bit LLM inference
    Gaps
    No systolic array design or ASIC tapeout publications
    …click to see all
    US

    Utkarsh Saxena

    low hireability

    Member of Technical Staff@AMD

    Previously: Graduate Research Assistant @ Purdue University

    San Francisco, US

    35
    MX / Microscaled Precision90
    Power-Efficient Chip Architecture72
    Multimodal LLM Inference20
    Systolic Array Design15
    Hybrid LUT-GEMM5
    DVFS / Power Management5
    Strengths
    MX microscaling paper (2024) — direct match on precision quantization axis
    ResQ: ICML 2025 Spotlight — mixed-precision LLM quantization expertise
    Gaps
    No systolic array architecture work — CIM is a fundamentally different paradigm
    …click to see all
    XW

    Xin Wang

    low hireability

    Director, Machine Learning@d-Matrix

    Previously: Principal Scientist & Manager, Machine Learning Research @ Cerebras Systems

    San Francisco, US

    33
    MX / Microscaled Precision75
    Power-Efficient Chip Architecture52
    Multimodal LLM Inference40
    Systolic Array Design15
    Hybrid LUT-GEMM8
    DVFS / Power Management5
    Strengths
    Flexpoint NeurIPS 2017 (363 cites) — foundational block floating point for DNN inference
    Block format error bounds paper (2022) — precise BFP/MX-format precision analysis
    Gaps
    No systolic array design papers — d-Matrix uses CIM, not systolic
    …click to see all

    Runs

    #1completed0 qualified / 0 foundMay 7, 1:34 PM