Neuralace (Sabi) · Systolic Inference Chip Researcher

completed101 qualified1 runMay 7, 1:34 PMcompany-name-neuralace-sabi-locations-usa-europe-china-india-1778160889

ParsedNeuralace · 6 topics · Researcher · no PhD · USA, Europe, China, India

Generating seed nodes

0 proposed

Explored 0 queries

0/0 done

Expanding nodes

queued

Qualifying candidates

queued

Qualified Candidates (100)

Abhimanyu Rajeshkumar Bambhaniya

high hireability

Research Intern@Meta

Previously: Intern @ Google

San Francisco, US

Systolic Array Design75

Power-Efficient Chip Architecture75

Multimodal LLM Inference30

Hybrid LUT-GEMM20

DVFS / Power Management5

MX / Microscaled Precision5

Strengths

Subgraph stationary HW-SW co-design (MLSys 2023) — fixed-topology inference dataflow design

AlphaICs GLUON AI accelerator tapeout — 16nm TSMC real-chip experience

Gaps

No MX4/MX6/MX9 or microscaled numeric format work

…click to see all

Albert Tseng

high hireability

MX / Microscaled Precision72

Hybrid LUT-GEMM30

Systolic Array Design5

Multimodal LLM Inference5

Power-Efficient Chip Architecture5

DVFS / Power Management0

Strengths

MXFP4 paper (AISTATS 2025): trained LLMs with block-scaled FP4, 2x vs FP8

QuIP# top contributor (36 commits): 2-bit lattice-codebook LLM quantization

Gaps

No hardware or ASIC design — work is purely algorithm-level, not silicon

…click to see all

Changwoo Lee

high hireability

Graduate Student Research Assistant@University of Michigan

Previously: Research Intern @ DeepMind

Ann Arbor, US

Power-Efficient Chip Architecture60

Multimodal LLM Inference20

Hybrid LUT-GEMM15

DVFS / Power Management12

Systolic Array Design10

MX / Microscaled Precision3

Strengths

VLSI 2022: 22nm SoC tapeout, 10 TOPS/W multimodal AI chip

AIMMI 2024: low-power SoC with on-chip MRAM for IoT inference

Gaps

No published systolic array or dataflow architecture work

…click to see all

Coleman Richard Charles Hooper

high hireability

Graduate Student - ML Systems@University of California, Berkeley

Previously: Research Intern @ NVIDIA

San Francisco, US

MX / Microscaled Precision72

Power-Efficient Chip Architecture65

DVFS / Power Management30

Hybrid LUT-GEMM28

Systolic Array Design20

Multimodal LLM Inference12

Strengths

FGMP: NVFP4 (FP4 microscaling) mixed-precision quantization, co-authored with NVIDIA chip team

SqueezeLLM: 3-bit dense-and-sparse GEMM, 2.3x A6000 speedup (ICML 2024)

Gaps

No direct systolic array design papers — quantization researcher, not chip architect

…click to see all

Cong Guo

high hireability

Postdoctoral Associate@Duke University

Previously: Research intern @ Shanghai Qi Zhi Institute

Durham, US

Systolic Array Design80

MX / Microscaled Precision60

Hybrid LUT-GEMM55

Power-Efficient Chip Architecture55

Multimodal LLM Inference50

DVFS / Power Management5

Strengths

Transitive Array (ISCA 2025): GEMM accelerator with result reuse — systolic array design

ANT (MICRO 2022, IEEE Top Picks): adaptive float/int numeric type for low-bit quantization

Gaps

No DVFS or power-budget management papers — key axis uncovered

…click to see all

Haotong Qin

high hireability

Postdoctoral Researcher@ETH Zürich

Previously: Research Scientist @ ByteDance

Zurich, CH

Multimodal LLM Inference40

MX / Microscaled Precision35

Power-Efficient Chip Architecture20

Systolic Array Design5

Hybrid LUT-GEMM0

DVFS / Power Management0

Strengths

BiLLM (ICML 2024): 1-bit PTQ for LLMs — top-tier extreme weight compression

Qwen3-Quantization repo + empirical study — direct work on target model family

Gaps

No systolic array or ASIC chip design work whatsoever

…click to see all

Hongzheng Chen

high hireability

Ph.D. Candidate@Cornell University

Previously: Undergrad student @ SUN YAT-SEN UNIVERSITY

Ithaca, US

Power-Efficient Chip Architecture25

Systolic Array Design20

Multimodal LLM Inference20

MX / Microscaled Precision8

Hybrid LUT-GEMM5

DVFS / Power Management5

Strengths

FPGA spatial acceleration for LLM inference (arxiv:2312.15159, 2023)

Allo: composable HLS accelerator design language (PLDI'24)

Gaps

No systolic array ASIC design — all FPGA/HLS, no chip tapeout

…click to see all

Junyang Lin

high hireability

Research Scientist@Qwen

Previously: Staff Engineer @ Alibaba

Beijing, CN

Multimodal LLM Inference95

MX / Microscaled Precision5

Power-Efficient Chip Architecture5

Hybrid LUT-GEMM0

Systolic Array Design0

DVFS / Power Management0

Strengths

QwenLM/Qwen3 Tech Lead — built exact model Neuralace deploys on chip

Qwen-VL + Qwen-Audio + Qwen2-VL: full vision+speech multimodal pipeline

Gaps

No chip design background — no ASIC, systolic array, or Verilog work

…click to see all

Lian Liu

high hireability

PhD, Institut of Computing Technology

Ashburn, US

Power-Efficient Chip Architecture45

Multimodal LLM Inference30

MX / Microscaled Precision30

Systolic Array Design15

Hybrid LUT-GEMM5

DVFS / Power Management5

Strengths

COMET W4A4KV4 quantization — aggressive LLM compression at ASPLOS 2025

DNA: Dynamic Neural Network Accelerator — IEEE TC 2025 (top hardware journal)

Gaps

No evidence of systolic array architecture design specifically

…click to see all

Muhammad Umar

high hireability

PhD student@Cornell University

MX / Microscaled Precision25

Systolic Array Design10

Power-Efficient Chip Architecture10

Multimodal LLM Inference5

Hybrid LUT-GEMM0

DVFS / Power Management0

Strengths

FLIQS: mixed-precision FP8+INT quantization, Jouppi (TPU) co-author

GuardNN: DNN accelerator architecture paper, 74 citations (DAC 2022)

Gaps

No published systolic array or fixed-weight inference chip design work

…click to see all

Muyang Li

high hireability

Doctoral Student@Massachusetts Institute of Technology

Previously: Research Intern @ NVIDIA

Boston, US

Hybrid LUT-GEMM35

Multimodal LLM Inference20

MX / Microscaled Precision20

Systolic Array Design5

Power-Efficient Chip Architecture5

DVFS / Power Management0

Strengths

444 commits to nunchaku — primary INT4/INT8 CUDA kernel library author

SVDQuant (ICLR 2025 Spotlight): 4-bit diffusion model quantization

Gaps

No chip design / RTL / ASIC tapeout experience — software-side only

…click to see all

Pierre Abillama

high hireability

Graduate Student Research Assistant, EECS, University of Michigan

Previously: Intern @ IBM

Power-Efficient Chip Architecture82

DVFS / Power Management58

Multimodal LLM Inference40

MX / Microscaled Precision35

Systolic Array Design15

Hybrid LUT-GEMM5

Strengths

22nm 25.08 TOPS/W transformer accelerator tapeout — VLSI 2025

Two-stage task-adaptive power management — DVFS-adjacent chip control

Gaps

No explicit systolic array design work found

…click to see all

Qilin Zheng

high hireability

Duke University

Power-Efficient Chip Architecture75

MX / Microscaled Precision20

Multimodal LLM Inference15

Systolic Array Design10

Hybrid LUT-GEMM5

DVFS / Power Management5

Strengths

TFSRAM: 249.8 TOPS/W compute-in-SRAM neuromorphic engine (IEEE TCAS-AI 2024)

DIANA SoC: end-to-end energy-efficient digital/analog hybrid NN chip (ISSCC 2022)

Gaps

Core work is CIM/PIM, not systolic array architecture

…click to see all

Saleh Ashkboos

high hireability

Research Assistant@ETH Zürich

Previously: Research Intern @ Apple

Zurich, CH

MX / Microscaled Precision85

Hybrid LUT-GEMM28

Power-Efficient Chip Architecture20

Multimodal LLM Inference8

Systolic Array Design5

DVFS / Power Management3

Strengths

ICLR26: Microscaling FP4 Quantization paper -- direct MX format work

Quartet (NeurIPS25): FP4 native training for LLMs

Gaps

No systolic array or ASIC chip design experience

…click to see all

Shulin Zeng

high hireability

Postdoc@Post Doc, Tsinghua University

ex-Tsinghua University

Beijing, CN

Power-Efficient Chip Architecture48

Multimodal LLM Inference22

Hybrid LUT-GEMM20

MX / Microscaled Precision18

Systolic Array Design15

DVFS / Power Management3

Strengths

FlightLLM (2024, 121 cites): end-to-end FPGA LLM inference mapping

FMC-LLM/CD-LLM (2025): 70B+ batched decoding on multi-FPGA

Gaps

FPGA only — no ASIC/tapeout or custom systolic array chip design

…click to see all

William Andrew Simon

high hireability

Research Scientist on In-Memory Computing@IBM

Previously: PhD student @ EPFL - EPF Lausanne

Zurich, CH

Power-Efficient Chip Architecture72

Multimodal LLM Inference32

Systolic Array Design12

DVFS / Power Management8

MX / Microscaled Precision8

Hybrid LUT-GEMM3

Strengths

CICC 2025 invited: analog AI hardware for low-latency transformer inference

BLADE (129 cites): in-cache compute chip for edge AI

Gaps

Analog CIM paradigm (PCM/conductance) — skills don't directly map to digital systolic RTL

…click to see all

Zhekai Zhang

high hireability

Power-Efficient Chip Architecture80

Systolic Array Design65

Multimodal LLM Inference40

MX / Microscaled Precision30

Hybrid LUT-GEMM15

DVFS / Power Management5

Strengths

LEGO (HPCA 2025): spatial accelerator auto-generation, 2.4x energy vs. Gemmini

SpAtten-Chip ASIC tapeout — won DAC 2023 demo competition

Gaps

No MX/MSFP format work — uses W4A4/W4A8KV4, not Microsoft MX standard

…click to see all

Abbas Rahimi

medium hireability

Research Staff Member@IBM

Previously: Postdoctoral Researcher @ UC Berkeley

Zurich, CH

Power-Efficient Chip Architecture62

Multimodal LLM Inference10

Systolic Array Design5

DVFS / Power Management5

MX / Microscaled Precision3

Hybrid LUT-GEMM2

Strengths

"Efficient scaling of LLMs with MoE + 3D analog in-memory" (2025)

5μW HD accelerator ASIC — sub-W AI inference chip tapeout

Gaps

No systolic array design work — entirely analog CIM / hyperdimensional paradigm

…click to see all

Andrea Bejarano-Carbo

medium hireability

University of Michigan

Power-Efficient Chip Architecture65

Multimodal LLM Inference10

Systolic Array Design5

DVFS / Power Management5

Hybrid LUT-GEMM0

MX / Microscaled Precision0

Strengths

AIMMI (JSSC 2024): multimodal audio+image SoC, low-power inference

H.264/AVC accelerator IC (JSSC 2023): algorithm-hardware co-design

Gaps

No systolic array architecture work found

…click to see all

Andrei Panferov

medium hireability

Hybrid LUT-GEMM75

MX / Microscaled Precision72

Multimodal LLM Inference10

Power-Efficient Chip Architecture5

Systolic Array Design0

DVFS / Power Management0

Strengths

'Bridging the Gap...Microscaling FP4 Quantization' (2025) — authored

FLUTE LUT-GEMM: 14 commits + Fast Hadamard Transform kernel PR (merged)

Gaps

No systolic array or custom chip design experience

…click to see all

Andrew W Fitzgibbon

medium hireability

Engineering Fellow@Graphcore

Previously: Partner Researcher @ Microsoft

Cambridge, GB

MX / Microscaled Precision95

Multimodal LLM Inference10

Power-Efficient Chip Architecture10

Hybrid LUT-GEMM5

Systolic Array Design3

DVFS / Power Management0

Strengths

graphcore-research/gfloat: implements OCP MX4/MX6/MX9/E8M0 block formats

IEEE P3109 WG contributor — standards body for ML arithmetic formats

Gaps

No systolic array design or fixed-weight chip architecture work

…click to see all

An Yang

medium hireability

Researcher@Alibaba

Previously: MS student @ Peking University

Multimodal LLM Inference65

Hybrid LUT-GEMM0

Systolic Array Design0

DVFS / Power Management0

MX / Microscaled Precision0

Power-Efficient Chip Architecture0

Strengths

Qwen3 Technical Report (2025) — core team author, 3857 citations

Qwen technical report (2023) — confirmed author among 48 contributors

Gaps

No evidence of hardware design — zero chip/ASIC/systolic array work

…click to see all

Arash Fayyazi

medium hireability

Principal Performance Engineer@d-Matrix

Previously: Staff Software Engineer, AI Kernels and Workloads @ d-Matrix

San Francisco, US

Systolic Array Design78

Power-Efficient Chip Architecture74

Hybrid LUT-GEMM62

MX / Microscaled Precision18

Multimodal LLM Inference8

DVFS / Power Management5

Strengths

'Sparse Periodic Systolic Dataflow' (2022) — 4.49x energy efficiency on CNN accelerator

BlendNet/NeuroBlend: binary+fixed-point blended inference engine, 2.5x power reduction

Gaps

No MX / microscaled-format (MSFP/MX4/MX6/MX9) work found

…click to see all

Atefeh Sohrabizadeh

medium hireability

Research Scientist@NVIDIA

Previously: Graduate Student Researcher @ UCLA VAST Lab

San Francisco, US

Systolic Array Design75

Power-Efficient Chip Architecture25

Hybrid LUT-GEMM5

DVFS / Power Management5

Multimodal LLM Inference5

MX / Microscaled Precision5

Strengths

2025: Structured Sparse Matrix Acceleration in Systolic Arrays — core JD topic

Versatile Systolic Array for CNN on FPGA (2022) — direct systolic array design experience

Gaps

No evidence of MX/microscaled numeric formats or LUT-GEMM work

…click to see all

Bita Darvish Rouhani

medium hireability

Researcher@NVIDIA

MX / Microscaled Precision98

Power-Efficient Chip Architecture50

Multimodal LLM Inference35

Systolic Array Design30

Hybrid LUT-GEMM5

DVFS / Power Management5

Strengths

Lead author OCP MX spec — MX4/MX6/MX9 definitive industry standard

Microscaling Data Formats (140 citations, 2023) — exact format match for chip

Gaps

No systolic array RTL or tapeout — format-layer researcher, not chip designer

…click to see all

Byung Hoon Ahn

medium hireability

Software Engineer@Apple

Previously: Research Scientist @ Protopia AI

San Francisco, US

Systolic Array Design75

Power-Efficient Chip Architecture40

Multimodal LLM Inference20

Hybrid LUT-GEMM5

DVFS / Power Management5

MX / Microscaled Precision5

Strengths

Planaria (MICRO 2020, 162 citations): omni-directional systolic array for DNN inference

Co-author of Tushar Krishna (Georgia Tech DNN accelerator group)

Gaps

No evidence of MX/MSFP microscaled precision work

…click to see all

Casper Hansen

medium hireability

Hybrid LUT-GEMM58

Multimodal LLM Inference30

MX / Microscaled Precision15

Power-Efficient Chip Architecture8

Systolic Array Design3

DVFS / Power Management0

Strengths

AutoAWQ — top open-source W4A16 AWQ quantization library (417 commits)

AutoAWQ_kernels: fused CUDA dequant+GEMM kernels, LUT-GEMM adjacent

Gaps

No chip/ASIC/RTL design experience — purely software stack

…click to see all

Changhai Man

medium hireability

PhD student@Georgia Institute of Technology

Atlanta, US

Systolic Array Design75

Power-Efficient Chip Architecture50

MX / Microscaled Precision30

Multimodal LLM Inference8

Hybrid LUT-GEMM5

DVFS / Power Management5

Strengths

Multi-bit-width booth vector systolic accelerator (2022, 43 cites) — core DNN chip

SCALE-Sim TPU (2026) — extends Georgia Tech systolic array simulator for TPU validation

Gaps

No MX/MSFP/microscaled format work — multi-bit-width is adjacent but not same

…click to see all

Charlie Blake

medium hireability

AI research engineer@Graphcore

Previously: MS student @ University of Oxford

MX / Microscaled Precision45

Multimodal LLM Inference15

Power-Efficient Chip Architecture15

Hybrid LUT-GEMM5

Systolic Array Design5

DVFS / Power Management5

Strengths

FP8 training/inference (NeurIPS 2023) — closest work to MX weight precision

SparQ Attention (ICML 2024, 77 citations) — bandwidth-efficient LLM inference

Gaps

No chip/hardware design experience — pure software/algorithm researcher

…click to see all

Chenfan Sun

medium hireability

Software Engineer@NVIDIA

Previously: Software Engineer @ Apple

Seattle, US

MX / Microscaled Precision72

Power-Efficient Chip Architecture25

Systolic Array Design15

Multimodal LLM Inference8

Hybrid LUT-GEMM0

DVFS / Power Management0

Strengths

NVFP4 paper (2025): co-authored NVIDIA's 4-bit microscaling LLM pretraining format

Apple ANE patents: compiler-level work on streaming convolutions in neural processor chip

Gaps

No systolic array design — compiler/numerics role, not chip architecture

…click to see all

Cheng Zhang

medium hireability

Founding Engineer@AI Sequrity Company

Previously: Research Intern @ Microsoft

London, GB

MX / Microscaled Precision68

Power-Efficient Chip Architecture42

Systolic Array Design28

Multimodal LLM Inference15

Hybrid LUT-GEMM5

DVFS / Power Management5

Strengths

LQER (ICML'24) + QERA (ICLR'25): top-venue LLM quantization error reconstruction

Sub-8-bit LLM inference (EMNLP'23) -- direct MX weight precision relevance

Gaps

No explicit systolic array design publications

…click to see all

Daniel Lo

medium hireability

Researcher@Microsoft

Previously: PhD student @ Cornell University

Ithaca, US

MX / Microscaled Precision80

Power-Efficient Chip Architecture40

Systolic Array Design15

Multimodal LLM Inference10

DVFS / Power Management5

Hybrid LUT-GEMM0

Strengths

MSFP NeurIPS 2020 (2nd author) — pioneered microscaled FP for FPGA inference

Computer Architecture + FPGA expertise at Microsoft Research

Gaps

No evidence of systolic array design or fixed-weight inference ASIC work

…click to see all

Dayiheng Liu

medium hireability

Researcher@Alibaba

Previously: Intern @ Microsoft

Hangzhou, CN

Multimodal LLM Inference72

Hybrid LUT-GEMM0

Systolic Array Design0

DVFS / Power Management0

MX / Microscaled Precision0

Power-Efficient Chip Architecture0

Strengths

Qwen2.5-Omni co-author — speech+vision+language matches Neuralace's target model exactly

Qwen2-VL (2,541 citations) — defines vision encoder architecture they will deploy

Gaps

Zero chip/hardware experience: no systolic array, MX precision, LUT-GEMM, or DVFS

…click to see all

DaYou Du

medium hireability

PhD student@University of Edinburgh

Previously: Research Intern @ Microsoft

Edinburgh, GB

MX / Microscaled Precision50

Multimodal LLM Inference20

Power-Efficient Chip Architecture20

Hybrid LUT-GEMM5

Systolic Array Design5

DVFS / Power Management5

Strengths

AFPQ (asymmetric FP quant) + STBLLM (1-bit) — deep low-bit weight compression

BitDecoding (HPCA 2026): tensor core exploitation for low-bit KV cache decoding

Gaps

No systolic array or ASIC/chip design experience — GPU software, not RTL

…click to see all

Dimin Niu

medium hireability

Research Scientist@Alibaba

Previously: Senior / Staff Engineer @ Samsung

San Francisco, US

Power-Efficient Chip Architecture72

Multimodal LLM Inference45

Systolic Array Design20

Hybrid LUT-GEMM10

DVFS / Power Management10

MX / Microscaled Precision5

Strengths

H-LLM (ISCA 2025): hardware-dataflow co-design for LLM inference chip

HD-MoE (ICCAD 2025): MoE LLM inference on 3D-stacked NMP accelerator

Gaps

No systolic array work — all compute is near-memory/PIM not systolic

…click to see all

Eric Sather

medium hireability

Technical Lead Manager, Machine Learning@Cerebras Systems

Previously: Principal Machine Learning Engineer @ Rivian

San Francisco, US

Multimodal LLM Inference70

Power-Efficient Chip Architecture65

MX / Microscaled Precision35

Systolic Array Design20

Hybrid LUT-GEMM5

DVFS / Power Management3

Strengths

20+ Perceive patents on inference circuits, ternary/discrete weight storage (2018–2023)

DREAM (NeurIPS 2025): 3.6x speedup on multimodal VLM speculative decoding

Gaps

No explicit systolic array design evidence — Perceive/Cerebras are non-systolic architectures

…click to see all

Fei Sun

medium hireability

Software Engineer@Meta

Previously: Research Scientist @ Alibaba Group

San Francisco, US

Multimodal LLM Inference60

Power-Efficient Chip Architecture55

Systolic Array Design20

Hybrid LUT-GEMM15

MX / Microscaled Precision12

DVFS / Power Management5

Strengths

'Generative AI beyond LLMs' (ISPASS 2024) — multimodal inference system analysis

184QPS/W ISSCC 2022 chip — direct power-efficiency metric in real tapeout

Gaps

No evidence of systolic array architecture work specifically

…click to see all

Geethan Karunaratne

medium hireability

Researcher@IBM

Previously: Postdoctoral Researcher @ IBM

Zurich, CH

Power-Efficient Chip Architecture78

Systolic Array Design18

Multimodal LLM Inference10

MX / Microscaled Precision8

Hybrid LUT-GEMM5

DVFS / Power Management5

Strengths

HERMES-Core: 14nm CMOS+PCM in-memory chip, ISSCC 2022 — production tapeout

64-core mixed-signal PCM chip: 63.1 TOPS / 9.76 TOPS/W (Nature Electronics)

Gaps

In-memory computing (PCM analog), not systolic array architecture

…click to see all

Geonhwa Jeong

medium hireability

Research Scientist@Meta

Previously: Graduate Research Assistant @ Georgia Institute of Technology

San Francisco, US

Systolic Array Design85

Power-Efficient Chip Architecture35

Multimodal LLM Inference20

Hybrid LUT-GEMM10

DVFS / Power Management5

MX / Microscaled Precision5

Strengths

RASA (ISCA 2021): systolic array matrix engine design for CPU

MAESTRO: DNN dataflow cost model for spatial/systolic accelerators

Gaps

No evidence of MX/MSFP or microscaled weight precision work

…click to see all

Han Cai

medium hireability

AI Research Scientist@NVIDIA

Previously: Research Intern @ NVIDIA

Boston, US

Multimodal LLM Inference35

Power-Efficient Chip Architecture20

MX / Microscaled Precision15

Systolic Array Design5

DVFS / Power Management5

Hybrid LUT-GEMM0

Strengths

Jet-Nemotron (NVlabs): 53.6× LLM inference speedup on H100 GPUs

Once-for-All: hardware-aware NAS across MCU/GPU/FPGA deployment targets

Gaps

No chip design / HDL / ASIC / tapeout experience — purely model-level

…click to see all

Hassan Dbouk

medium hireability

Senior Engineer@Qualcomm

Previously: Graduate Research Assistant @ University of Illinois Urbana-Champaign

San Francisco, US

Power-Efficient Chip Architecture65

MX / Microscaled Precision20

Multimodal LLM Inference15

Systolic Array Design8

DVFS / Power Management5

Hybrid LUT-GEMM0

Strengths

KeyRAM ISSCC 2020: 0.34 μJ/decision in-memory chip tapeout

JSSC 2022: energy-delay-accuracy fundamental limits for inference HW

Gaps

CIM architecture, not systolic arrays — different design paradigm

…click to see all

Irem Boybat

medium hireability

Research Staff Member@IBM

Previously: Postdoctoral Researcher @ IBM

Zurich, CH

Power-Efficient Chip Architecture75

Multimodal LLM Inference20

MX / Microscaled Precision12

Systolic Array Design5

DVFS / Power Management5

Hybrid LUT-GEMM0

Strengths

IBM Zurich AIMC ASIC architect — PCM crossbar chips for DNN inference (VLSI, CICC, IEEE TC)

ALPINE paper: tight analog-digital co-integration for low-latency inference

Gaps

Analog CIM paradigm (PCM crossbars) — no digital systolic array design

…click to see all

Jianyu Wei

medium hireability

PhD student@USTC & MSRA

Hybrid LUT-GEMM93

Power-Efficient Chip Architecture58

MX / Microscaled Precision42

Systolic Array Design15

Multimodal LLM Inference12

DVFS / Power Management5

Strengths

T-MAC (EuroSys 2025): LUT-based GEMM for low-bit LLM on CPU/NPU — core T-MAC author

LUT Tensor Core (ISCA 2025): HW-SW co-design for LUT-based low-bit LLM inference

Gaps

No systolic array architecture work (CPU/NPU focused, not custom ASIC)

…click to see all

Ling Liang

medium hireability

助理研究员@Peking University

Previously: 隐私计算研究 @ Alibaba

中国

Power-Efficient Chip Architecture70

Multimodal LLM Inference15

MX / Microscaled Precision15

Systolic Array Design10

DVFS / Power Management5

Hybrid LUT-GEMM3

Strengths

28nm ISSCC tapeout: 29.2TFLOPS/W BF16, 36.5TOPS/W INT8 — direct W/TOPS metrics

TranCIM ISSCC 2022: 15.59µJ/Token sparse transformer accelerator chip

Gaps

CIM paradigm, not systolic arrays — fundamentally different dataflow

…click to see all

Mahdi Nazemi

medium hireability

Machine Learning Engineer@NVIDIA

Previously: Machine Learning Researcher @ MatX

San Francisco, US

Hybrid LUT-GEMM75

Power-Efficient Chip Architecture72

DVFS / Power Management55

MX / Microscaled Precision45

Multimodal LLM Inference15

Systolic Array Design5

Strengths

BlendNet: hybrid binary+fixed-point inference, 2.5x power reduction on FPGA

US Patent 18/086,989: hybrid arithmetic/logic processing of neural networks (2023)

Gaps

No systolic array design work found

…click to see all

Markus Nagel

medium hireability

Research Scientist (Senior Staff Engineer)@Qualcomm

Previously: Research Scientist (Staff Engineer) @ Qualcomm

Amsterdam, NL

MX / Microscaled Precision68

Power-Efficient Chip Architecture15

Multimodal LLM Inference8

Hybrid LUT-GEMM0

Systolic Array Design0

DVFS / Power Management0

Strengths

FP8 Quantization: The Power of the Exponent — NeurIPS 2022 block-float landmark

AIMET author — Qualcomm's production AI quantization toolkit

Gaps

No systolic array or custom ASIC chip design experience

…click to see all

Martin G Dixon

medium hireability

Director of Engineering@Google

Previously: Intel Fellow & Vice President @ Intel

San Francisco, US

Power-Efficient Chip Architecture65

Systolic Array Design40

MX / Microscaled Precision35

DVFS / Power Management25

Hybrid LUT-GEMM5

Multimodal LLM Inference5

Strengths

"Matrix multiply accumulate instruction" patent (2018, 71 citations) — MMA/AMX core work

Intel Fellow + SoC Architect — 9 years designing heterogeneous processor systems

Gaps

No explicit systolic array DNN inference chip design found

…click to see all

Martino Dazzi

medium hireability

Researcher@Axelera AI

Previously: Researcher @ IBM

Power-Efficient Chip Architecture88

Hybrid LUT-GEMM72

Systolic Array Design40

MX / Microscaled Precision20

Multimodal LLM Inference15

DVFS / Power Management5

Strengths

Metis AIPU (ISSCC 2024) — 15TOPS/W real tapeout at Axelera AI

LUT-based ANN hardware paper (2025) — direct LUT-GEMM relevance

Gaps

CIM arrays ≠ systolic arrays — different fixed-weight paradigm

…click to see all

Mengdi Wang

medium hireability

PhD candidate@Institute of Computing Technology, Chinese Academy of Sciences

Previously: Intern, Department of AI @ Jeejio

Power-Efficient Chip Architecture38

MX / Microscaled Precision22

Systolic Array Design15

Hybrid LUT-GEMM5

DVFS / Power Management5

Multimodal LLM Inference5

Strengths

Real SoC tapeout: NPU in Jeejio JX2/JX3 commercial chips

MT-DLA: multi-task DNN accelerator, GLSVLSI 2021 Best Paper

Gaps

No systolic array design work — generic multi-core NPU, not SA-based

…click to see all

Minsik Cho

medium hireability

Machine Intelligence R&D, AI/ML@Apple

Previously: Siri R&D, AI/ML @ Apple

Austin, US

Hybrid LUT-GEMM50

Multimodal LLM Inference45

Power-Efficient Chip Architecture30

MX / Microscaled Precision20

Systolic Array Design10

DVFS / Power Management5

Strengths

DKM/eDKM: codebook GEMM weight clustering — direct LUT-GEMM foundation

"LLM in a Flash" (191 citations) — power/memory-constrained LLM inference

Gaps

No systolic array or fixed-weight dataflow architecture papers

…click to see all

Nithesh Kurella

medium hireability

Senior Principal ML Architect@d-Matrix

Previously: Principal ML Architect @ d-Matrix

San Francisco, US

Power-Efficient Chip Architecture72

DVFS / Power Management65

Multimodal LLM Inference50

MX / Microscaled Precision38

Systolic Array Design18

Hybrid LUT-GEMM5

Strengths

Corsair (2025): d-Matrix flagship inference chiplet architecture paper co-author

Patents on in-memory compute chiplets for transformer workloads (2024-2025)

Gaps

d-Matrix uses DIMC not systolic arrays — no direct systolic design experience

…click to see all

Qirui Zhang

medium hireability

Postdoc@Postdoctoral Researcher, EECS, University of Michigan

Ann Arbor, US

Power-Efficient Chip Architecture88

Multimodal LLM Inference55

DVFS / Power Management45

MX / Microscaled Precision30

Systolic Array Design20

Hybrid LUT-GEMM5

Strengths

VLSI 2025: 25.08 TOPS/W transformer accelerator, mixed precision + power mgmt tapeout

AIMMI JSSC 2024: audio+image multimodal SoC in 22nm silicon

Gaps

No explicit systolic array architecture work found

…click to see all

Ramyad Hadidi

medium hireability

Senior Staff -- ML Computer Architect@d-Matrix

Previously: Senior Scientist @ Rain AI

San Francisco, US

Systolic Array Design90

Power-Efficient Chip Architecture85

DVFS / Power Management20

MX / Microscaled Precision10

Hybrid LUT-GEMM5

Multimodal LLM Inference5

Strengths

ERIDANUS (2019): 41-cite systolic array DNN inference paper

MEISSA (2020): scalable systolic matrix multiply architecture

Gaps

No MX/MSFP or microscaled precision format research

…click to see all

Rasoul Shafipour

medium hireability

Senior AI and Machine Learning Engineer@NVIDIA

Previously: AI/ML Research Scientist @ Apple

Seattle, US

MX / Microscaled Precision90

Power-Efficient Chip Architecture25

Multimodal LLM Inference20

Hybrid LUT-GEMM18

Systolic Array Design3

DVFS / Power Management3

Strengths

MX paper co-author (arXiv:2310.10537): canonical microscaling standard, 162 citations

Shared Microexponents (2023): BDR framework defining MX4/MX6/MX9 inference formats

Gaps

No systolic array design or ASIC/chip architecture work found

…click to see all

Ritchie Zhao

medium hireability

Senior AI and Machine Learning Engineer@NVIDIA

Previously: Senior Data Science Manager @ Microsoft

Redmond, US

MX / Microscaled Precision95

Power-Efficient Chip Architecture40

Systolic Array Design35

Multimodal LLM Inference20

Hybrid LUT-GEMM5

DVFS / Power Management5

Strengths

Co-authored OCP MX spec (2310.10537) — definitive microscaling standard

Shared Microexponents (ISCA 2023) — microscaled BFP format for inference

Gaps

No LUT-GEMM or hybrid LUT/GEMM kernel work found

…click to see all

Saurabh Adya

medium hireability

Apple

MX / Microscaled Precision55

Hybrid LUT-GEMM30

Multimodal LLM Inference30

Power-Efficient Chip Architecture20

Systolic Array Design15

DVFS / Power Management0

Strengths

eDKM: LLaMA 7B compressed to 3-bit — state-of-art LLM weight clustering

DKM (ICLR 2022): weight codebooks via differentiable k-means — LUT-adjacent precision

Gaps

No systolic array architecture papers or chip tapeout experience

…click to see all

Saurabh Dash

medium hireability

Member of Technical Staff@Cohere

Previously: Machine Learning Researcher @ Apple

Toronto, CA

Multimodal LLM Inference65

MX / Microscaled Precision60

Power-Efficient Chip Architecture40

Systolic Array Design20

Hybrid LUT-GEMM5

DVFS / Power Management0

Strengths

PIM/ReRAM chip (IEEE TCAD 2022) — real hardware inference accelerator design

Hessian-driven mixed-precision on hardware — directly targets efficient weight representation

Gaps

No systolic array design work — PIM/ReRAM is a different inference architecture

…click to see all

Shang Yang

medium hireability

PhD student@MIT EECS

Previously: Intern @ MIT

Boston, US

Multimodal LLM Inference65

MX / Microscaled Precision30

Power-Efficient Chip Architecture12

Systolic Array Design5

Hybrid LUT-GEMM3

DVFS / Power Management0

Strengths

AWQ MLSys 2024 Best Paper — W4 activation-aware quantization (1498 citations)

QServe W4A8KV4 — quantized inference with custom kernel co-design (MLSys 2025)

Gaps

No chip/ASIC design — entirely software/systems layer contributions

…click to see all

Shiyu Li

medium hireability

Senior Deep Learning Architect@NVIDIA

Previously: PhD student @ Duke University

San Francisco, US

Power-Efficient Chip Architecture70

Systolic Array Design60

MX / Microscaled Precision50

DVFS / Power Management20

Multimodal LLM Inference15

Hybrid LUT-GEMM5

Strengths

INCA (2023): input-stationary dataflow — canonical systolic array variant

Block-wise mixed-precision quantization for ReRAM DNN accelerators (TCAD 2024)

Gaps

No explicit systolic array chip tapeout or fixed-weight accelerator paper

…click to see all

Shunan Dong

medium hireability

Nanjing University

Power-Efficient Chip Architecture45

Systolic Array Design30

MX / Microscaled Precision15

DVFS / Power Management10

Hybrid LUT-GEMM5

Multimodal LLM Inference5

Strengths

Self-described 'chip architecture and algorithms' student at Tsinghua

Ds-open: SystemVerilog sparse GEMM accelerator — real HDL chip design

Gaps

No direct evidence of systolic array or fixed-weight inference chip design

…click to see all

Souvik Kundu

medium hireability

Inference and SLM Optimization Lead@Intel

Previously: Staff Research Scientist @ Intel

Los Angeles, US

MX / Microscaled Precision85

Systolic Array Design50

Power-Efficient Chip Architecture50

Hybrid LUT-GEMM28

Multimodal LLM Inference12

DVFS / Power Management8

Strengths

MicroScopiQ (2025): MX microscaling quant + systolic array co-design at NeurIPS

ShiftAddLLM (NeurIPS 2024): multiplication-free LLM via shift+add, adjacent to LUT-GEMM

Gaps

No DVFS or dynamic power management research

…click to see all

Suvinay Subramanian

medium hireability

Software Engineer@Google

San Francisco, US

Systolic Array Design75

Power-Efficient Chip Architecture75

Multimodal LLM Inference20

DVFS / Power Management15

MX / Microscaled Precision10

Hybrid LUT-GEMM5

Strengths

TPU v4 co-author — named contributor on Google's systolic array inference chip

FLAT: 49% energy savings + 1.94x speedup for attention inference hardware

Gaps

No MX / microscaled numeric format work (MX4/MX6/MX9) found

…click to see all

Tianhua Xia

medium hireability

PhD student@New York University

New York, US

Multimodal LLM Inference65

Power-Efficient Chip Architecture50

Systolic Array Design20

MX / Microscaled Precision15

DVFS / Power Management8

Hybrid LUT-GEMM5

Strengths

PICACHU (ASPLOS 2025) — CGRA spatial array design for LLM ops

HAAN (DATE 2025) — algorithm-hardware co-design for LLM normalization

Gaps

No systolic array work — CGRA design is related but architecturally different

…click to see all

Valavan Manohararajah

medium hireability

Chief Product Architect@Cerebras

Previously: Distinguished Engineer @ Cerebras

Vaughan, CA

Power-Efficient Chip Architecture50

Systolic Array Design30

MX / Microscaled Precision25

Hybrid LUT-GEMM15

Multimodal LLM Inference10

DVFS / Power Management5

Strengths

Chief Product Architect at Cerebras — wafer-scale AI inference silicon

15 years Intel PSG — FPGA & hardware inference accelerator design

Gaps

No explicit systolic array design or fixed-weight architecture papers

…click to see all

Vikas Chandra

medium hireability

Senior Director, AI@Meta

Previously: Director, Applied ML @ Arm

San Francisco, US

Power-Efficient Chip Architecture78

MX / Microscaled Precision65

Multimodal LLM Inference62

Systolic Array Design52

DVFS / Power Management42

Hybrid LUT-GEMM5

Strengths

ARM Senior Director — built ARM ML IP and neural engine silicon

Heterogeneous Dataflow Accelerators (HPCA 2021, 215 cites) — inference ASIC dataflow

Gaps

No explicit systolic array paper — dataflow accelerator work is adjacent

…click to see all

Vithursan Thangarasa

medium hireability

Principal Research Scientist@Cerebras Systems

Previously: Lead Research Scientist @ Cerebras Systems

San Francisco, US

Multimodal LLM Inference65

Power-Efficient Chip Architecture18

Hybrid LUT-GEMM2

Systolic Array Design2

DVFS / Power Management2

MX / Microscaled Precision2

Strengths

MASSV (NeurIPS 2025): multimodal speculative decoding for VLMs

DREAM (NeurIPS 2025): multimodal speculative decoding with cross-attention fusion

Gaps

No evidence of systolic array or fixed-weight chip architecture work

…click to see all

Wang, Chang

medium hireability

MX / Microscaled Precision78

Multimodal LLM Inference30

Hybrid LUT-GEMM12

Systolic Array Design0

DVFS / Power Management0

Power-Efficient Chip Architecture0

Strengths

132 commits to intel/neural-compressor — MXFP4/MXFP8/INT4/NVFP4 all covered

MXFP8 PRs merged in vllm-project/llm-compressor (active May 2026)

Gaps

Zero hardware/ASIC/chip design experience — pure software

…click to see all

William Brandon

medium hireability

PhD student@MIT CSAIL

Previously: Research Assistant @ MIT Media Lab

Cambridge, US

Hybrid LUT-GEMM92

Multimodal LLM Inference10

MX / Microscaled Precision5

Systolic Array Design0

DVFS / Power Management0

Power-Efficient Chip Architecture0

Strengths

FLUTE paper: LUT-GEMM for quantized LLMs — direct axis match (EMNLP 2024)

Hydra: speculative decoding with GPU-level inference optimization

Gaps

No chip design, RTL, or ASIC experience

…click to see all

Wonsuk Jang

medium hireability

Research Scientist Intern@Meta

Previously: Digital Design Engineer @ Samsung

San Francisco, US

MX / Microscaled Precision68

Power-Efficient Chip Architecture48

Systolic Array Design20

Hybrid LUT-GEMM15

Multimodal LLM Inference12

DVFS / Power Management5

Strengths

BlockDialect: FP4 block-wise quantization with MXFP4 comparison (2025)

DialectFP4: scaled-integer arithmetic for hardware energy efficiency

Gaps

No direct systolic array design or tapeout work by Wonsuk personally

…click to see all

Xin He

medium hireability

MX / Microscaled Precision60

Multimodal LLM Inference22

Hybrid LUT-GEMM15

Systolic Array Design0

DVFS / Power Management0

Power-Efficient Chip Architecture0

Strengths

298 commits to intel/neural-compressor — MX/FP8/INT4 format implementation depth

intel/auto-round contributor — quantization algorithms for LLMs and VLMs

Gaps

No chip design or hardware architecture evidence — entirely software-layer

…click to see all

Yonggan Fu

medium hireability

Research Scientist@NVIDIA

Previously: Research Intern @ NVIDIA

San Francisco, US

Power-Efficient Chip Architecture50

Multimodal LLM Inference30

Hybrid LUT-GEMM25

MX / Microscaled Precision20

Systolic Array Design15

DVFS / Power Management5

Strengths

ShiftAddNAS: LUT-adjacent shift-add inference hardware (ICML 2022)

Auto-NBA: joint bitwidth + accelerator co-search (ICML 2021)

Gaps

No systolic array-specific research (output-stationary, weight-stationary designs)

…click to see all

Zhihang Yuan

medium hireability

Algorithm Researcher@Bytedance

Previously: Researcher @ Infinigence AI

Beijing, CN

MX / Microscaled Precision68

Multimodal LLM Inference52

Power-Efficient Chip Architecture50

Systolic Array Design15

DVFS / Power Management8

Hybrid LUT-GEMM5

Strengths

GSQ/GSE group-shared exponent quant — direct MX-format analog

I-LLM (2025): integer-only inference for fully-quantized LLMs

Gaps

No systolic array RTL/architecture work; CiM ≠ systolic dataflow

…click to see all

Ahmed Hasssan

low hireability

MTS Software Development Engineer@AMD

Previously: Graduate Student Research Assistant @ Cornell Tech

Pueblo, US

Hybrid LUT-GEMM65

Power-Efficient Chip Architecture40

MX / Microscaled Precision30

Multimodal LLM Inference20

Systolic Array Design15

DVFS / Power Management5

Strengths

PhD research: LUT-based HW accelerator for DNN inference — direct LUT-GEMM match

Torch2Chip (MLSys 2024): SW/HW co-design toolkit for quantization + accelerator prototyping

Gaps

No systolic array design work found — core Neuralace architecture absent

…click to see all

Akhil Arunkumar

low hireability

Sr. Principal Software Engineer@d-Matrix

Previously: SoC Performance Architect @ AMD

San Francisco, US

Power-Efficient Chip Architecture72

Multimodal LLM Inference45

MX / Microscaled Precision40

Systolic Array Design30

DVFS / Power Management25

Hybrid LUT-GEMM5

Strengths

Corsair (IEEE Micro 2025): co-author on d-Matrix's CIM inference chiplet paper

d-Matrix DIMC uses Block Floating Point — adjacent to MX block-exponent formats

Gaps

CIM architecture (d-Matrix) is fundamentally different from fixed-weight systolic arrays

…click to see all

Amir Yazdanbakhsh

low hireability

Research Scientist@DeepMind

Previously: Research Scientist @ Google

San Francisco, US

Systolic Array Design72

Power-Efficient Chip Architecture65

Multimodal LLM Inference50

Hybrid LUT-GEMM32

MX / Microscaled Precision30

DVFS / Power Management10

Strengths

'Structured Sparse Matrix Acceleration in Systolic Arrays' (2025) — direct match

FLAT (2022): hardware dataflow paper for attention bottleneck mitigation

Gaps

No MX/MSFP/microscaled format-specific work — quantization is post-training, not fixed-point MX

…click to see all

Carlo Luschi

low hireability

VP & Head of Research@Graphcore

Previously: Director of Research @ Graphcore

Oxford, GB

MX / Microscaled Precision85

Power-Efficient Chip Architecture45

Multimodal LLM Inference20

Systolic Array Design8

Hybrid LUT-GEMM5

DVFS / Power Management5

Strengths

MXNorm (2026) — MXFP8 block-scale reuse for LLM inference, 2.4x kernel speedup

Optimal Formats for Weight Quantisation (2025) — weight-precision theory for MX

Gaps

No systolic array design — Graphcore IPU is dataflow/BSP, not systolic

…click to see all

Chi-Chih Chang

low hireability

Ph.D. Student@Cornell University

Previously: Remote Intern @ University of Washington

Systolic Array Design75

MX / Microscaled Precision60

Power-Efficient Chip Architecture50

Multimodal LLM Inference10

Hybrid LUT-GEMM5

DVFS / Power Management5

Strengths

Systolic Sparse Tensor Slices (FPGA 2025) — direct systolic array design for AI

Power of Negative Zero — custom datatypes for quantized LLM inference

Gaps

All hardware work is FPGA-based; no ASIC or custom chip tapeout

…click to see all

Elias Frantar

low hireability

Member of Technical Staff@OpenAI

Previously: PHD Candidate @ Institute of Science and Technology Austria

San Francisco, US

MX / Microscaled Precision65

Hybrid LUT-GEMM50

Multimodal LLM Inference5

Power-Efficient Chip Architecture5

Systolic Array Design0

DVFS / Power Management0

Strengths

GPTQ (ICLR 2023): lead author, gold standard INT4 LLM weight quantization

Marlin: custom INT4xFP16 inference GEMM kernel, near-ideal 4x speedup

Gaps

No hardware design — zero RTL, ASIC, or systolic array experience

…click to see all

Eric Chung

low hireability

VP of AI Computing@NVIDIA

Previously: GM & Partner Group Engineering Manager @ Microsoft

Seattle, US

MX / Microscaled Precision95

Systolic Array Design88

Power-Efficient Chip Architecture82

Multimodal LLM Inference20

Hybrid LUT-GEMM10

DVFS / Power Management8

Strengths

Project Brainwave: fixed-weight systolic array ASIC (ISCA 2018, 748 citations)

Co-authored OCP MX standard — microscaled data formats for DL (2023)

Gaps

No hybrid LUT-GEMM work (FPGA/LUT background is only indirect)

…click to see all

Fan Yang

low hireability

Sr. Principal Research Manager@Microsoft

Previously: Principal Research Manager @ Microsoft

Hybrid LUT-GEMM92

Power-Efficient Chip Architecture55

MX / Microscaled Precision50

Systolic Array Design15

Multimodal LLM Inference10

DVFS / Power Management5

Strengths

LUT Tensor Core (ISCA 2025) — SW-HW co-design for LUT-based low-bit LLM inference

LUT-DLA (HPCA 2025) — lookup-table accelerator for extreme low-bit DNNs

Gaps

No systolic array design — LUT-based architecture, not weight-stationary/systolic

…click to see all

Guangxuan Xiao

low hireability

Member of Technical Staff@Thinking Machines Lab

Previously: Research Intern @ NVIDIA

USA

MX / Microscaled Precision55

Multimodal LLM Inference50

Power-Efficient Chip Architecture25

Hybrid LUT-GEMM20

Systolic Array Design5

DVFS / Power Management0

Strengths

SmoothQuant (ICML 2023, 1.5K citations) — W8A8 PTQ leader for LLMs

QServe W4A8KV4 — system co-design for quantized LLM serving

Gaps

No chip/ASIC/RTL experience — entirely software-side quantization

…click to see all

Han Guo

low hireability

Research Intern@Together AI

Previously: Research Intern @ IBM

San Francisco, US

Hybrid LUT-GEMM90

MX / Microscaled Precision28

Multimodal LLM Inference12

Power-Efficient Chip Architecture5

Systolic Array Design3

DVFS / Power Management0

Strengths

FLUTE (EMNLP 2024): lead author of premier LUT-GEMM paper for LLMs

FLUTE C++ implementation — 2-4x speedup via offline table restructuring

Gaps

No hardware/ASIC experience — entirely software stack, no RTL/chip design

…click to see all

Haoran You

low hireability

Research Scientist@Adobe

Previously: Research Scholar @ SRC Research Scholars Program

Seattle, US

Hybrid LUT-GEMM80

Power-Efficient Chip Architecture65

Systolic Array Design30

Multimodal LLM Inference25

MX / Microscaled Precision20

DVFS / Power Management5

Strengths

ShiftAddLLM (NeurIPS 2024) — LUT-GEMM analog, collab w/ Intel & Google DeepMind

ShiftAdd series 2020–2024: 5-yr program replacing multipliers with shift+add ops

Gaps

No systolic array design — accelerator co-design from algorithm side, not RTL/silicon

…click to see all

Harshit Khaitan

low hireability

Director, AI Accelerators@Meta

Previously: Technical Lead / Manager, TPU HW Design @ Google

San Francisco, US

Systolic Array Design85

Power-Efficient Chip Architecture85

Multimodal LLM Inference30

DVFS / Power Management15

MX / Microscaled Precision10

Hybrid LUT-GEMM5

Strengths

TPU paper (6.6K citations) — co-authored canonical systolic array DNN inference paper

Director, AI Accelerators at Meta — leads MTIA inference chip program

Gaps

No MX/MSFP/microscaled precision work found

…click to see all

H Ekin Sumbul

low hireability

Head of IP@Architect Labs

Previously: Research Scientist @ Meta

San Francisco, US

Power-Efficient Chip Architecture80

Multimodal LLM Inference25

MX / Microscaled Precision20

Systolic Array Design15

DVFS / Power Management10

Hybrid LUT-GEMM5

Strengths

617 TOPS/W BNN accelerator in 10nm (ISSCC 2020) — world-class W/TOPS

3.8 pJ/SOP SNN chip (JSSC 2019, h=24) — proven ASIC power efficiency

Gaps

No systolic array design work — expertise is CIM/CNM, a distinct architecture

…click to see all

Hesham Mostafa

low hireability

Researcher@Intel

MX / Microscaled Precision87

Power-Efficient Chip Architecture68

Multimodal LLM Inference18

Systolic Array Design12

Hybrid LUT-GEMM5

DVFS / Power Management5

Strengths

MF-QAT (2026): MXINT/MXFP multi-format QAT for elastic inference — direct MX hit

Technical Lead ML at d-Matrix, production LLM inference chip experience

Gaps

No systolic array work — d-Matrix uses DIMC, not systolic

…click to see all

Jiaming Tang

low hireability

Ph.D. student@MIT

Previously: Undergraduate researcher @ SJTU EPCC Lab

Boston, US

Power-Efficient Chip Architecture62

Systolic Array Design35

Hybrid LUT-GEMM25

MX / Microscaled Precision25

Multimodal LLM Inference20

DVFS / Power Management3

Strengths

Transitive Array (ISCA 2025): multiplication-free GEMM accelerator, 2.31× energy saving

OliVe (ISCA 2023): hardware-LLM quantization co-design, 4.5× speedup

Gaps

No published systolic array or weight-stationary dataflow chip work

…click to see all

Lukas Cavigelli

low hireability

Researcher (Expert/Architect)@Huawei

Previously: Researcher (Principal Engineer) @ Huawei

Zurich, CH

Power-Efficient Chip Architecture92

Systolic Array Design78

Hybrid LUT-GEMM75

MX / Microscaled Precision58

DVFS / Power Management10

Multimodal LLM Inference8

Strengths

Stella Nera: Maddness LUT hardware accelerator, 161 TOp/s/W (ISVLSI 2025)

Chipmunk: systolically scalable inference chip tapeout, 3.08 Gop/s/mW (CICC 2018)

Gaps

No DVFS or dynamic power-budget management papers found

…click to see all

Manuel Le Gallo

low hireability

Staff Research Scientist@IBM

Previously: PhD student @ ETH Zurich

Zurich, CH

Power-Efficient Chip Architecture82

Multimodal LLM Inference20

MX / Microscaled Precision12

Systolic Array Design8

DVFS / Power Management5

Hybrid LUT-GEMM0

Strengths

HERMES tapeout: 1.59 TOPS/mm² at 14nm — power-efficient inference chip

2025 paper: LLM scaling with MoE via 3D analog in-memory computing

Gaps

Architecture mismatch — analog CIM (PCM crossbar), not digital systolic arrays

…click to see all

Mohammad Rastegari

low hireability

CEO and Co-Founder@Elastix.AI

Previously: Distinguished Scientist @ Meta

Seattle, US

Hybrid LUT-GEMM80

MX / Microscaled Precision40

Power-Efficient Chip Architecture35

Multimodal LLM Inference15

Systolic Array Design10

DVFS / Power Management5

Strengths

LCNN (2017): seminal lookup-based CNN, directly relevant to LUT-GEMM

XNOR-Net: 1-bit binary weights — extreme quantization for inference

Gaps

No systolic array or fixed-weight ASIC design work

…click to see all

Norman P Jouppi

low hireability

VP, Engineering Fellow@Google

Previously: Senior Fellow @ HP

Systolic Array Design100

Power-Efficient Chip Architecture88

Multimodal LLM Inference35

DVFS / Power Management20

MX / Microscaled Precision20

Hybrid LUT-GEMM5

Strengths

TPU v1 (2017) — defines weight-stationary systolic array for DNN inference

Multi-Modal Systolic Array paper (2024) — direct match to Neuralace chip target

Gaps

No evidence of MX / MSFP / microscaled precision format work

…click to see all

Paul N. Whatmough

low hireability

Senior Director, AI Research@Qualcomm

Previously: Director, AI Research @ Qualcomm

Boston, US

Power-Efficient Chip Architecture95

Systolic Array Design88

DVFS / Power Management72

MX / Microscaled Precision45

Multimodal LLM Inference22

Hybrid LUT-GEMM12

Strengths

Sparse Systolic Tensor Array (arXiv:2009.02381) — 16nm tapeout, 16.8 TOPS/W

ISSCC 2023 12nm chip — 18.1 TFLOPs/W sparse transformer processor

Gaps

No OCP MX4/MX6/MX9 standard-compliant microscaling work found

…click to see all

Raghu Prabhakar

low hireability

Engineering@SambaNova Systems

Previously: Software Engineer @ NVIDIA

San Francisco, US

Systolic Array Design80

Power-Efficient Chip Architecture78

DVFS / Power Management30

Multimodal LLM Inference15

MX / Microscaled Precision15

Hybrid LUT-GEMM5

Strengths

2025 papers: systolic array PE design + MatMul on systolic array (direct match)

SN40L co-architect — 5nm 2.5D AI inference chip (ISSCC 2025)

Gaps

SambaNova uses reconfigurable dataflow, not fixed-weight systolic — key architectural mismatch

…click to see all

Randy Huang

low hireability

Principal Engineer@Amazon

Previously: Principal Engineer, Programmable Solutions Group @ Intel

San Francisco, US

Power-Efficient Chip Architecture68

Systolic Array Design65

Hybrid LUT-GEMM20

DVFS / Power Management10

Multimodal LLM Inference5

MX / Microscaled Precision5

Strengths

Annapurna Labs Principal Eng — builds AWS Inferentia (systolic array inference ASIC)

Intel FPGA Programmable Solutions Group — next-gen FPGA architecture work

Gaps

No evidence of MX/microscaled numeric format work

…click to see all

Sayeh Sharify

low hireability

Principal Machine Learning Research Scientist@d-Matrix

Previously: Co-Founder @ Tartan AI

San Francisco, US

MX / Microscaled Precision95

Power-Efficient Chip Architecture45

Systolic Array Design20

Multimodal LLM Inference15

Hybrid LUT-GEMM8

DVFS / Power Management5

Strengths

"Post Training Quantization of LLMs with MX Formats" — MXINT4/8 PTQ (NeurIPS 2024)

"MF-QAT" (2026) — multi-format MXINT/MXFP QAT for MX inference chips

Gaps

No systolic array design — d-Matrix uses DIMC, not systolic architecture

…click to see all

Se Jung Kwon

low hireability

Director@NAVER

Previously: Leader @ NAVER

Seoul, KR

Hybrid LUT-GEMM97

MX / Microscaled Precision50

Multimodal LLM Inference15

Power-Efficient Chip Architecture12

Systolic Array Design5

DVFS / Power Management5

Strengths

LUT-GEMM (ICLR 2024, 184 citations) — invented the exact JD technique

CodeGEMM (NeurIPS 2025) — 8.93x speedup at 2-bit via codebook-centric GEMM

Gaps

No systolic array or chip architecture work — algorithm-only researcher

…click to see all

Sheng Li

low hireability

Principal Research Scientist@Google

MX / Microscaled Precision72

Power-Efficient Chip Architecture50

Systolic Array Design40

Multimodal LLM Inference10

Hybrid LUT-GEMM5

DVFS / Power Management5

Strengths

ICML 2024: SF4/E2M1+supernormal 4-bit LLM weight formats — directly relevant to MX precision

FLIQS: mixed-precision FP/INT quantization search across vision + transformer models

Gaps

Research-level analyst/modeler — no ASIC tapeout or silicon design evidence

…click to see all

Shijie Cao

low hireability

Senior Researcher@Microsoft

Previously: Senior Researcher @ Microsoft Research Asia

Hybrid LUT-GEMM95

MX / Microscaled Precision65

Power-Efficient Chip Architecture35

Systolic Array Design15

Multimodal LLM Inference10

DVFS / Power Management5

Strengths

T-MAC: LUT-based low-bit LLM inference on CPU/NPU — defining work in LUT-GEMM

LUT Tensor Core (2025): SW-HW co-design for LUT-based low-bit LLM inference

Gaps

No systolic array design or ASIC tapeout publications

…click to see all

Utkarsh Saxena

low hireability

Member of Technical Staff@AMD

Previously: Graduate Research Assistant @ Purdue University

San Francisco, US

MX / Microscaled Precision90

Power-Efficient Chip Architecture72

Multimodal LLM Inference20

Systolic Array Design15

Hybrid LUT-GEMM5

DVFS / Power Management5

Strengths

MX microscaling paper (2024) — direct match on precision quantization axis

ResQ: ICML 2025 Spotlight — mixed-precision LLM quantization expertise

Gaps

No systolic array architecture work — CIM is a fundamentally different paradigm

…click to see all

Xin Wang

low hireability

Director, Machine Learning@d-Matrix

Previously: Principal Scientist & Manager, Machine Learning Research @ Cerebras Systems

San Francisco, US

MX / Microscaled Precision75

Power-Efficient Chip Architecture52

Multimodal LLM Inference40

Systolic Array Design15

Hybrid LUT-GEMM8

DVFS / Power Management5

Strengths

Flexpoint NeurIPS 2017 (363 cites) — foundational block floating point for DNN inference

Block format error bounds paper (2022) — precise BFP/MX-format precision analysis

Gaps

No systolic array design papers — d-Matrix uses CIM, not systolic

…click to see all

Runs

#1completed0 qualified / 0 foundMay 7, 1:34 PM