Video ai researchers with experience in multimodal learning, diffusion models, a…

completed49 qualified1 runApr 22, 1:25 AMvideo-ai-researchers-with-experience-in-multimodal-learning

Parsed4 topics · Researcher

Generating seed nodes

0 proposed

Explored 0 queries

0/0 done

Expanding nodes

queued

Qualifying candidates

queued

Qualified Candidates (49)

Ceyuan Yang

medium hireability

Research Scientist@Shanghai Artificial Intelligence Laboratory

Previously: Researcher @ ByteDance

San Francisco, US

Core video diffusion researcher — AnimateDiff (1.2k+ citations, ICLR 2024 Spotlight), LaVie, CameraCtrl, and SparseCtrl are all landmark video diffusion papers directly matching the query
Published 'Diffusion Adversarial Post-Training for One-Step Video Generation' (ICLR 2025) showing direct inference optimization expertise
Now Research Scientist at ByteDance Seed team (DB outdated: shows Shanghai AI Lab) building flagship video models Seedance 1.0 and Seaweed-7B
H-index 29
Hireability: MEDIUM — deeply embedded at ByteDance with extensive 2025 output, no open-to-work signals (website says they are hiring interns, not looking themselves), but within tenure window and ByteDance faces US regulatory headwinds

Chaoyang Wang

medium hireability

Research Scientist@Snap

Previously: Graduate Research Assistant @ Carnegie Mellon University

Los Angeles, US

Research Scientist at Snap Creative Vision Team leading 4D reconstruction and generation
Strong match across all three query dimensions: video diffusion (VD3D — video diffusion transformers, ICLR 2025, 82 citations; 4Real — 4D scene generation via video diffusion, 48 citations; 4Real-Video 2025), inference/efficiency (DELTAv2: Accelerating Dense 3D Tracking; DELTA: Dense Efficient Long-Range 3D Tracking; LightSpeed: Light and Fast Neural Light Fields on Mobile), and multimodal/text-guided work (language-guided 3D scene editing, text-to-character blendshapes)
PhD CMU Robotics Institute, h-index 19
Hireability: MEDIUM — Research Scientist at Snap for ~4+ years, active publication output through Dec 2025, no explicit mobility signals but well within the 3-5 year transition window

De-An Huang

medium hireability

VP, Global Supply Chain & Procurement@NVIDIA

Previously: Senior Director, Semiconductor Sourcing @ NVIDIA

Research Scientist at NVIDIA with near-perfect match to query
Leads NVILA (efficient frontier VLMs), Eagle 2/2.5 (long-context multimodal learning for video), STORM (token-efficient long video understanding), T-Stitch (diffusion sampling acceleration via trajectory stitching), and Efficient Video Diffusion Models
PhD Stanford, h-index 38, with 8+ highly-cited 2025 publications across all three query pillars
Hireability: MEDIUM — confirmed Research Scientist at NVIDIA (pipeline signals updated LinkedIn title to 'AI Research Scientist' in Jan 2026); no explicit open-to-work signals but active publication cadence and role identity update suggest continued engagement in research market

Dejia Xu

medium hireability

Research Scientist@Luma AI

Previously: Research Assistant @ University of Texas at Austin

San Francisco, US

Research Scientist at Luma AI (video generation company)
Strong query match: Diffusion4D (video diffusion models, 2024), CamCo (image-to-video generation, 2025 CVPR Highlight), LightGaussian (15x 3DGS compression for 200+ FPS — direct inference optimization, 340 citations). h_index 28, PhD UT Austin (VITA Group)
Hireability: MEDIUM — estimated ~2-3 years at Luma AI based on paper timeline (started ~2023-2024), within typical transition window; no explicit open-to-work signals but website states 'open to coffee chats'

Denis A Gudovskiy

medium hireability

Senior Deep Learning Researcher@Panasonic

Previously: Senior Wireless Engineer @ Intel

San Francisco, US

Strong multimodal and inference optimization researcher at Panasonic AI Lab (Mountain View, CA)
SparseVLM (ICML'25, 113 cites) on visual token sparsification for efficient VLM inference; 2025 paper on shortcutting diffusion/flow models for faster sampling; DFM: Dual Flow Matching (2024); CFLOW-AD (WACV'22, 719 cites) on flow-based generative models
Strong match on multimodal learning, diffusion/flow models, and inference optimization — video-specific work not evident. h-index 14
Hireability: MEDIUM — long tenure at Panasonic (~7+ years), Jan 2026 LinkedIn title change appears internal (still at Panasonic), no open-to-work signals

Difei Gao

medium hireability

National U. of Singapore; Institute of Computing Technology, Chinese Academy of Sciences

Previously: Postdoc @ National U. of Singapore

Strong match: published 'Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation' (292 citations, 2023) directly covering diffusion models for video; 'VideoLLM-online' (CVPR 2024, 97 citations) covers online/streaming video inference optimization; 'Egocentric Video-Language Pretraining' (NeurIPS 2022, 266 citations) covers multimodal video learning
H-index 20, based in Singapore at Show Lab NUS
Hireability: MEDIUM — long publication history since 2015 (likely postdoc or senior researcher at NUS), not listed in current Show Lab PhD roster, no explicit availability signals, but active publishing through 2025 and within typical transition window for academics

Forrest Iandola

medium hireability

AI Research Scientist@Meta

Previously: Head of Perception @ Anduril Industries

San Francisco, US

Video AI researcher at Meta Reality Labs (2022-present) with strong alignment to all three query dimensions: published VEditBench (text-guided video editing, 2025) and EfficientSAM (CVPR 2024); co-authored score distillation/diffusion papers (SteinDreamer, Taming Mode Collapse 2024); inference optimization is his core brand — creator of SqueezeNet (50x parameter reduction), SqueezeBERT, MobileLLM (on-device LLMs)
PhD EECS UC Berkeley, h_index 26
Hireability: MEDIUM — ~4 years at Meta within typical transition window, but no active job-seeking signals (no LinkedIn changes, no website CV updates, no open-to-work bio)

Guilin Liu

medium hireability

Research Scientist@NVIDIA

Previously: Research Intern @ Adobe

San Francisco, US

Research Scientist at NVIDIA with deep expertise across all three query dimensions: video AI (Video-to-Video Synthesis 2018, 1351 citations; slow-fast video multimodal LLM 2025), multimodal learning (Eagle/Eagle 2/Eagle 2.5 VLM series), and diffusion models (DiffiT, PYOCO video diffusion, 318 citations). h-index 24, strong publication record at NeurIPS/ECCV/ICCV
Hireability: MEDIUM — ~9 years at NVIDIA (long tenure), but pipeline signals show a position_update on website June 2025 suggesting possible role change; no explicit open-to-work signals

Guocheng Gordon Qian

medium hireability

Research Scientist@Snap

Previously: Research Intern @ Snap

San Francisco, US

Senior Research Scientist at Snap Research leading pretraining/post-training for 5B-30B VLM-diffusion models
Covers all three query dimensions: video generation with diffusion models (VD3D and AC3D — video diffusion transformers, 80+ citations each), multimodal learning (VLM-diffusion integration, Canvas-to-Image with multimodal controls), and diffusion models broadly (Magic123, ICLR24, 432 citations)
Some inference optimization work (GES efficient radiance field rendering, DELTAv2 accelerating 3D tracking, Dr2Net memory-efficient finetuning) but not his primary focus
PhD KAUST 2023, h-index 18
Hireability: MEDIUM — ~2-3 years at Snap (within transition window); website had moderately recent CV updates (56 days ago); no explicit open-to-work signal on GitHub or LinkedIn

Haofan Wang

medium hireability

Member of Technical Staff@Lovart AI

Previously: Senior Research Engineer @ Xiaohongshu

Singapore, SG

Leading diffusion model researcher — InstantID (378 citations, 2024), InstantStyle (172 citations, 2024), EasyControl (2025 FLUX DiT control); founded InstantX open-source generative models team
MTS at Lovart AI (video generative AI startup)
Multimodal work includes video-language pre-training and CLIP/ECLIP
Inference optimization is implicit (InstantStyle 'free lunch' efficiency, EasyControl 'efficient' control) but not a primary focus
No PhD (MS CMU)
Hireability: MEDIUM — ~2 years at Lovart AI (joined 2024), within transition window; website actively updated through March 2026 with no explicit open-to-work signals

Harry Saini

medium hireability

Weaver (Founding Research Engineer)@Black Forest Labs

Previously: Research Engineer @ Stability AI

Founding Research Engineer at Black Forest Labs; co-authored 'Scaling Rectified Flow Transformers for High-Resolution Image Synthesis' (SD3, 2024) and FLUX.1 Kontext (2025) — core diffusion/flow-matching and multimodal image-text generation work
Strong match on diffusion models and multimodal learning; video AI is indirect (image generation background highly transferable to video diffusion); no direct inference optimisation evidence
Hireability: MEDIUM — founding team member at BFL (~2 years in role, within transition window), recently relocated from India to San Francisco per pipeline signals, no open-to-work signals detected

Hongxu Yin

medium hireability

Principal Research Scientist@NVIDIA

Previously: Senior II / Staff Research Scientist @ NVIDIA

San Francisco, US

Principal Research Scientist & Research Lead at NVIDIA leading the VILA multimodal LLM series — video AI (AutoGaze CVPR 2026: 100x token reduction, 19x speedup on long-form video), multimodal learning (VILA/NVILA/LongVILA, 654+ citations), diffusion-guided generation (Loss-Guided Diffusion Models, 157 citations), and inference optimization (NVILA efficiency work). h-index 38
Hireability: MEDIUM — pipeline shows position_update signals Jul 2025 (likely internal promotion at NVIDIA), currently recruiting for own NVIDIA team; no open-to-work signals, but senior enough to be worth a targeted approach

Huan Ling

medium hireability

Senior Research Scientist@Nvidia

Previously: Research Scientist @ Nvidia

Toronto, CA

Core contributor to video diffusion research: co-authored 'Align your Latents' (1519 citations, foundational video LDM), Sana-Video (ICLR 2026 Oral, efficient Block Linear Diffusion Transformer — strong inference optimization angle), NVIDIA Cosmos/Cosmos-Transfer1 (multimodal physical AI world models), and Gen3C (video generation with camera control). h_index 21, Research Manager at NVIDIA
Website says 'Working on a startup and building a research team
We're hiring' — strong signal of career transition
Hireability: MEDIUM — promoted to Research Manager at NVIDIA in July 2025 (~9 months in role), but active startup-building signals suggest potential openness to conversations

Karsten Kreis

medium hireability

Principal Research Scientist@NVIDIA

Previously: Senior Research Scientist II @ NVIDIA

Vancouver, CA

Principal Research Scientist at NVIDIA Research (Vancouver, H-index 32) with landmark contributions across all three query dimensions: authored 'Align Your Latents: High-Resolution Video Synthesis with Latent Diffusion Models' (2023, 1507 citations), 'Align Your Steps: Optimizing Sampling Schedules in Diffusion Models' (2024, inference optimization), and 'eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers' (984 citations, multimodal)
Deep expertise in score-based and denoising diffusion models
Hireability: MEDIUM — senior/stable role at NVIDIA, but pipeline shows a position_update in March 2025 and recent GitHub activity (April 2026), signalling career motion; recent pivot toward protein/molecular design may indicate openness to new directions

Kelsey R Allen

medium hireability

Senior Research Scientist@DeepMind

Previously: Research Assistant @ UC Davis/Stanford

London, GB

Senior RSc at DeepMind (London) with recent pivot toward diffusion models (Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models, ICLR 2024) and video AI (Scaling 4D Representations 2024; Direct Motion Models for Assessing Generated Videos 2025)
Also has VLM/multimodal papers in 2025
Primary background is cognitive science + physics simulation; inference optimisation is absent
Co-author of Aleksander Holynski (video generation researcher)
Hireability: MEDIUM — Senior RSc at DeepMind with no pipeline mobility signals, established role, no open-to-work indicators

Ruiqi Gao

medium hireability

Staff Research Scientist@DeepMind

Previously: Research Scientist @ Google

San Francisco, US

Co-author of Imagen Video (video + diffusion models, 2022) and CAT4D (4D multi-view video diffusion, CVPR 2025)
Key inference optimization work: EM Distillation for one-step diffusion (NeurIPS 2024) and On Distillation of Guided Diffusion Models
Research at Google DeepMind spans multimodal learning across language, image, video, and 3D
Staff RS, h-index 24, based in SF
Hireability: MEDIUM — Staff RS at Google DeepMind (4+ years), but removed CV link from public website on April 10, 2026 (12 days ago) after a July 2025 cv_update — a subtle career-motion signal worth noting

Yam Levi

medium hireability

Research Engineer@Black Forest Labs

Previously: Data Research @ Stability AI

Vancouver, CA

Founding Research Engineer at Black Forest Labs; co-authored 'Scaling Rectified Flow Transformers for High-Resolution Image Synthesis' (SD3/FLUX foundation paper, ICML 2024 Oral, 2896 citations) — direct evidence of diffusion model expertise and multimodal text-image architecture (MM-DiT bidirectional image+text token flow)
Works on the FLUX model series at BFL, which has expanded into video generation
Hireability: MEDIUM — founding team member at BFL (~1.5-2 years), title recently upgraded to 'Founding Research Engineer', no open-to-work signals detected

Zion English

medium hireability

Research Scientist@Black Forest Labs

Previously: Machine Learning Engineer @ Stability AI

Irvine, US

Core author on SDXL and SD3 (Scaling Rectified Flow Transformers) at Stability AI — two landmark diffusion model papers
Now Research Scientist at Black Forest Labs (co-authored with co-founder Andreas Blattmann), working on FLUX models which cover text-to-image and inference-optimised generation (FLUX.2 [klein] sub-second inference, 2026)
SD3 architecture uses bidirectional multimodal attention between image and text tokens
Based in Irvine, US
Hireability: MEDIUM — ~1.5-2 years into current role at BFL (founded mid-2024), no explicit open-to-work signals, within typical transition window

Alex Dimakis

low hireability

Co-Founder and Chief Scientist@Bespoke Labs

Previously: Professor @ University of Texas at Austin

San Francisco, US

Tenured Full Professor at UC Berkeley EECS + Co-Founder/Chief Scientist at Bespoke Labs
Strong diffusion models researcher (h-index 74): Soft Diffusion, Ambient Diffusion, multiple NeurIPS papers on diffusion for inverse problems + DDIM-type sampler analysis
Multimodal: DataComp (611 citations) and DataComp-LM (155 citations)
Recent video AI work: Warped Diffusion (solving video inverse problems with image diffusion models, 2024), ego-exo viewpoint video papers (2024-2025)
Inference optimization is weakest pillar
Hireability: LOW — tenured professor and actively running his own startup (Bespoke Labs), no pipeline signals of career motion

Alex Schwing

low hireability

Associate Professor@University of Illinois Urbana-Champaign

Previously: Assistant Professor @ University of Illinois Urbana-Champaign

Urbana-Champaign, US

Strong match across all three query dimensions: video AI (CVPR 2024 video object segmentation, ICCV 2023 Tracking Anything), multimodal learning (MMAudio CVPR 2025 multimodal video-to-audio synthesis), and diffusion/inference efficiency (DiT-Air 2025 diffusion architecture efficiency, Variational Rectified Flow ICML 2025)
H-index 74, leading vision/ML researcher at UIUC ECE
Hireability: LOW — tenured Associate Professor with AMD named fellowship, NSF CAREER Award, and active industry research collaborations (NVIDIA, Adobe, Microsoft); no job-seeking signals

Ali Hatamizadeh

low hireability

Research Scientist@NVIDIA

Previously: PhD student @ University of California, Los Angeles

San Francisco, US

Research Scientist at NVIDIA with direct relevance across all 3 query dimensions: diffusion models (DiffiT, ECCV 2024, 122 citations), inference optimization (FasterViT, ICLR 2024, 148 citations), and multimodal/video-applicable vision backbones (MambaVision, CVPR 2025, 333 citations)
Also contributing to Mamba/SSM architecture research (Gated Delta Networks, ICLR 2025)
H-index 31, based in SF
Hireability: LOW — extremely productive at NVIDIA with back-to-back top venue papers (CVPR 2025, ICLR 2025, ICLR 2026), zero mobility signals in pipeline or GitHub, no open-to-work indicators anywhere. Very settled

Angjoo Kanazawa

low hireability

Assistant Professor@University of California, Berkeley

Previously: Research Scientist @ Google

San Francisco, US

Top-tier video AI researcher at UC Berkeley (KAIR lab, h-index 55)
Directly relevant work across all three query dimensions: video AI (Shape of Motion, MegaSaM CVPR 2025 Best Paper HM, Segment Any Motion), diffusion models (Decentralized Diffusion Models CVPR 2025, Rethinking Score Distillation, State of the Art on Diffusion Models for Visual Computing survey), and inference optimization (NerfAcc efficient NeRF sampling, PlenOctrees real-time rendering)
Also Amazon Scholar on Frontier AI & Robotics team, ex-Google Research, ex-Luma AI CTA
Hireability: LOW — tenured-track professor at UC Berkeley, Sloan Fellow 2023, PAMI Young Researcher 2024, no pipeline signals of career movement. Exceptional candidate to reach out to but unlikely to move

Axel Sauer

low hireability

Co-Founder@Black Forest Labs

Previously: Research Scientist @ Stability AI

Freiburg, DE

Co-Founder of Black Forest Labs (FLUX image generation models) and core author of Stable Diffusion 3 (2273 citations, ICML 2024 Best Paper)
ADD (Adversarial Diffusion Distillation) and LADD papers are directly relevant to inference optimisation in diffusion models
Image-focused, not video-specific, but his diffusion + inference optimisation expertise is highly query-relevant
Hireability: LOW — Co-Founder at BFL, actively building the company; very unlikely to be seeking new roles

Ben Mildenhall

low hireability

Co-Founder@World Labs

Previously: Research Scientist @ Google

San Francisco, US

NeRF inventor and co-author of DreamFusion (text-to-3D via diffusion, 3.3K citations) and ReconFusion (3D reconstruction with diffusion priors) — directly relevant to diffusion models and multimodal learning
Inference optimization evidenced by MERF (memory-efficient radiance fields for real-time view synthesis) and Baking NeRF
Currently Co-Founder at World Labs building 3D world models
Hireability: LOW — co-founder of a well-funded startup (World Labs, led by Fei-Fei Li); no open-to-work signals; pipeline shows a single cv_update 8 months ago (Aug 2025), which is not a strong mobility signal

Ben Poole

low hireability

Senior Staff Research Scientist@DeepMind

Previously: Research Scientist @ Google

San Francisco, US

Core video AI + diffusion researcher at Google DeepMind: leads GenMedia 3D team, senior author on Veo 3, Imagen Video (2022, 2034 citations), CAT4D (CVPR 2025 Oral), DreamFusion (text-to-3D, 3287 citations), Variational Diffusion Models (1546 citations), and EM Distillation for One-step Diffusion Models (2024, inference optimization)
PhD Stanford. h-index 45
Hireability: LOW — Senior Staff at DeepMind leading their flagship video generation program (Veo), no LinkedIn or website activity signals, just shipped Veo 3, deeply embedded

Bernard Ghanem

low hireability

Advisor@CAMEL-AI.org

Previously: Deputy Director of AI Initiative @ KAUST

London, GB

Full Professor at KAUST and Chair of Center of Excellence for Generative AI; leads IVUL (Image and Video Understanding Lab)
Strong match on all three query dimensions: video AI (temporal action detection, video generation), multimodal learning (multimodal egocentric datasets, BOLT for long-form video), and diffusion/inference acceleration (Vivid-ZOO NeurIPS 2024, Adaptive Guidance for diffusion)
H-index 82
Hireability: LOW — tenured Full Professor at KAUST with multiple senior leadership roles (Chair of CoE for Generative AI, PI of IVUL); no pipeline signals of career transition; DB pre-computed hireability also rated low

Bolei Zhou

low hireability

Associate Professor@University of California, Los Angeles

Previously: Chief AI Scientist @ Coco Robotics

Los Angeles, US

Strong video AI and multimodal researcher at UCLA (h-index 78): Temporal Relation Networks for video understanding, audio-driven video portrait generation, diffusion-based scene generation ('Urban Scene Diffusion' 2024, 'Ctrl-X' 2024), and VLA/multimodal learning work (X-fusion 2025, co-speech gesture generation)
Inference optimization is a minor thread (QuantV2X quantization)
Hireability: LOW — recently promoted to Associate Professor at UCLA (pipeline shows position_updates May-June 2025), holds NSF CAREER + ONR Young Investigator + Intel Rising Star awards; well-entrenched in academia with no signals of seeking industry roles

Brian Curless

low hireability

Researcher@Google

Previously: Professor @ University of Washington

Seattle, US

Active video AI researcher with strong recent output on diffusion models and video synthesis: 'MusicInfuser' (multimodal audio+video diffusion, 2025), 'Generative Inbetweening' (image-to-video keyframe interpolation, 2025), 'ExtraNeRF' (NeRF + diffusion models, 2024), 'HumanNeRF' (free-viewpoint video rendering, 2022, 665 citations), 'FILM' (video frame interpolation, 2022)
No inference optimization papers found
Hireability: LOW — tenured Professor at UW Allen School (h-index 67), Google Research collaborator; no signals of job search activity

Bryan Catanzaro

low hireability

Vice President, Applied Deep Learning Research@NVIDIA

Previously: Senior Researcher @ Baidu

San Francisco, US

VP of Applied Deep Learning Research at NVIDIA (h-index 73); leads multimodal LLM work (NVLM, Eagle VLMs), pioneered Few-shot Video-to-Video Synthesis (2019), authored DiffWave (diffusion model), and co-created cuDNN — the foundational GPU inference library
Hits all three pillars: multimodal, diffusion/generative, and inference optimization
Hireability: LOW — long-tenured VP at NVIDIA with no pipeline or open-to-work signals

Dahua Lin

low hireability

Associate Professor@The Chinese University of Hong Kong

Previously: Assistant Professor @ The Chinese University of Hong Kong

Hong Kong, HK

Prolific video AI + multimodal researcher (h-index 118) with directly relevant work: LaVie (cascaded latent diffusion for video generation, 417 citations), Vchitect-2.0 (parallel transformer for video diffusion, 2025), InternVL3 (open-source multimodal models, 370 citations), PyramidDrop (inference acceleration for vision-language models, 86 citations)
Director of CUHK-SenseTime Joint Laboratory
Hireability: LOW — tenured Associate Professor at CUHK since 2020 with no signals of career transition; typically this profile stays in academia

David Jacobs

low hireability

Professor@University of Maryland

Previously: Engineering Manager @ Meta

Bethesda, US

Tenured CS professor at UMD (h-index 68) with strong video AI output directly relevant to query: 'Preserve your own correlation: noise prior for video diffusion models' (2023, 318 cites), 'Long video generation with VQGAN+transformer' (2022, 293 cites), CinePile video QA benchmark (2024, 77 cites), plus 2025 papers on multimodal agentic control and video captioning
No specific inference optimisation work
Hireability: LOW — tenured professor with no pipeline signals of career transition or open-to-work indicators

Dominik Lorenz

low hireability

Researcher@Stability AI

Previously: Doctoral Researcher @ Karlsruher Institut für Technologie (KIT)

Karlsruhe, DE

Core member of Robin Rombach's group — co-authored Stable Video Diffusion, Adversarial Diffusion Distillation (1–4 step inference), Latent Diffusion Models (Stable Diffusion), SD3, and multi-modal flow matching (image, video, audio)
Now at Black Forest Labs publishing FLUX.1 Kontext and self-supervised multi-modal synthesis (2026)
Directly matches query across video AI, diffusion models, and inference optimisation
Hireability: LOW — recently joined BFL (2024), actively embedded in the founding group, has followed Rombach/Blattmann/Esser through KIT → Stability AI → BFL; no open-to-work signals

Elisa Ricci

low hireability

Head of Research Unit@Fondazione Bruno Kessler

Previously: Associate Professor @ Università di Trento

Trento, IT

Full Professor at UniTrento / Head of Research Unit at FBK (h-index 64)
Strong video AI profile: co-authored 'Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis' (2024, 102 cites), multiple diffusion-for-video papers (2023-2024), and 2025 work on multimodal LLMs for video
Diffusion models + video + multimodal trifecta clearly hit
Inference optimisation less explicit but test-time/training-free methods present
Hireability: LOW — Full Professor (confirmed OpenReview) + Head of Research Unit at FBK, no LinkedIn/website activity or job-change signals; tenured academic unlikely to leave

Hsin-Ying Lee

low hireability

Researcher@stealth mode startup

Previously: Research Scientist @ Snap

Senior researcher (h-index 34, PhD UC Merced 2020) with direct match on all 3 query dimensions: video AI (Panda-70m 2024, VD3D 2025, 4Real 2024), multimodal learning (cross-modality video captioning, Show Me What and Tell Me How), and diffusion models (video diffusion priors, 4D scene generation). 5 years at Snap Research
Hireability: LOW — recently left Snap to found/join stealth startup focused on AI+3D+architecture; likely building her own venture and unlikely to be open to external roles

Igor Gilitschenski

low hireability

Assistant Professor@University of Toronto

Previously: Research Scientist (visiting) @ Toyota Research Institute

Toronto, CA

Lab director at UofT TISL with prolific, current output in video generation (SG-I2V image-to-video 2025, DenseDPO video diffusion 2025, Mind the Time 2025) and diffusion models (SlotDiffusion 68 cites, SPAD 48 cites)
Multimodal work via Vid2Robot (video-conditioned cross-attention transformers, 49 cites) and EventCLIP (event-camera + CLIP, 30 cites)
Some inference efficiency work (neural pruning 19 cites, efficient latent-space NeRF)
H-index 40
Hireability: LOW — actively running own research lab as Assistant Professor at UofT, no pipeline signals of career movement

Ira Kemelmacher-Shlizerman

low hireability

Principal Scientist, Director@Google

Previously: Senior Staff Research Scientist @ Google

Seattle, US

Prolific video AI and diffusion model researcher (h-index 36): led TryOnDiffusion (2023, 189 cites), Fashion-VDM video diffusion model (2024), MusicInfuser multimodal video+audio generation (2025), and Generative Inbetweening for video models (2024)
Directly covers all three query pillars — video AI, diffusion models, and multimodal learning — with active 2024-2025 output
Hireability: LOW — tenured professor at UW + Principal Scientist/Director at Google with no mobility signals (no LinkedIn changes, no website activity detected)

Ivan Skorokhodov

low hireability

Research Scientist@Snap

Previously: Research Scientist @ Snap

San Francisco, US

Core video AI researcher and author of Snap Video (text-to-video spatiotemporal transformers, CVPR 2024, 95 cites), SF-V (single-forward-pass inference optimisation for video diffusion, 2024), Hierarchical Patch Diffusion Models for high-res video (CVPR 2024), and VIMI (multimodal instruction grounding for video generation)
Research expertise per DB: 'diffusion models, video generation, autoencoders, generative models'. h-index 21, PhD KAUST 2023
Palo Alto, US
Hireability: LOW — GitHub bio updated to 'Research at RhodaAI' with pipeline position_update on Jan 15 2026; ~3 months into a new startup role, unlikely to move so soon

Jia-Bin Huang

low hireability

Research Scientist@Meta

Previously: Assistant Professor @ Virginia Tech

College Park, US

Capital One endowed Associate Professor at UMD with strong video AI + diffusion model track record: PYoCo (noise prior for video diffusion, 318 citations), Latent-Shift (efficient latent diffusion for text-to-video, 151 citations), FlowVid (video-to-video synthesis via optical flow + diffusion)
Teaches a Multimodal Foundation Models course
Inference-efficiency angle present (Latent-Shift). h-index 69
Hireability: LOW — endowed associate professorship at UMD, actively recruiting PhD students for Fall 2026; no open-to-work signals; career firmly in academia

Jian Ren

low hireability

Research Scientist@Snap Inc.

Previously: Staff Research Scientist @ Snap

Los Angeles, US

Former Principal Research Scientist at Snap Inc., now Head of Creative Tech Research at Netflix (acquired via InterPositive as founding CSO)
H-index 34
Covers all three query dimensions: SnapFusion (text-to-image diffusion on mobile in 2s — inference optimization), Panda-70M (70M video dataset with cross-modality captioning — video AI + multimodal), Snap Video (spatiotemporal transformers for text-to-video), plus BitsFusion (1.99-bit diffusion weight quantization) and EfficientFormer (mobile-speed vision transformers)
Hireability: LOW — joined Netflix through acquisition ~11 months ago, now in research leadership role; no open-to-work signals

Jun-Yan Zhu

low hireability

Michael B. Donohue Assistant Professor of Computer Science and Robotics@Carnegie Mellon University

Previously: Research Scientist @ Adobe

Pittsburgh, US

Exceptional match across all three query dimensions: (1) Video AI — MotionStream (ICLR 2026, real-time video generation), Video-to-Video Synthesis (NeurIPS 2018), multi-subject video personalization (CVPR 2025); (2) Diffusion models — SVDQuant (4-bit diffusion quantization, ICLR 2025), Custom Diffusion (CVPR 2023), SDEdit (2K+ citations), img2img-turbo (one-step SD); (3) Inference optimization — SVDQuant (4-bit quantization), GAN Compression, Efficient Spatially Sparse Inference for GANs/diffusion
H-index 59 with foundational work on CycleGAN and pix2pix
Hireability: LOW — tenured-track assistant professor at CMU running the Generative Intelligence Lab with active PhD students; no job-seeking signals anywhere (no LinkedIn changes, no website activity, neutral GitHub bio)

Kevin Patrick Murphy

low hireability

Principal Scientist@DeepMind

Previously: Senior Staff Research Scientist @ Google

San Francisco, US

Exceptional fit: authored VideoBERT (1678 citations, video+multimodal), multiple 2024-2025 diffusion papers including EM Distillation for one-step diffusion (inference optimisation), and 'Direct Motion Models for Assessing Generated Videos' (2025)
H-index 108
Principal Scientist at Google DeepMind, SF
Hireability: LOW — long-tenured, extremely established senior researcher at DeepMind with no pipeline signals of career motion (no LinkedIn changes, no website activity)

Kfir Aberman

low hireability

Founding Member, US office@Decart

Previously: Principal Research Scientist @ Snap

San Francisco, US

World-class diffusion models researcher (DreamBooth 3.8K citations; HyperDreamBooth for fast inference-time personalization; VideoAlchemy + Multi-subject video personalization 2025)
MyVLM shows multimodal experience
H-index 30, SF-based, now Founding Member at Decart (video AI company)
Hireability: LOW — joined Decart ~3 months ago as Founding Member (high equity/commitment), DB pipeline confirms tenure_months=3 with high confidence

Michael Rubinstein

low hireability

Principal Scientist@DeepMind

Previously: Research Scientist @ Google

Boston, US

Senior video AI researcher at Google DeepMind (Principal Scientist / Director)
Led Lumiere (2024, space-time video diffusion model), DreamBooth (CVPR 2023 Best Student Paper HM), Muse (text-to-image masked generative transformer), and StyleDrop — strong alignment with video generation, diffusion models, and multimodal learning. h-index 48
No inference optimisation work specifically but very strong on the video+diffusion dimensions
Hireability: LOW — entrenched Principal Scientist/Director at DeepMind, no LinkedIn changes or website activity detected, no open-to-work signals

Neal Wadhwa

low hireability

Staff Software Engineer@Google

Previously: Staff Software Engineer @ Google

New York, US

PhD MIT / h-index 30, Staff SWE at Google Research (NYC)
Strong fit on video AI (Phase-based Video Motion Processing, ReCapture 2024/2025 generative video camera controls) and diffusion models (HyperDreamBooth 273-cite, RealFill image completion)
Weaker on inference optimisation — no published work, though llama.cpp fork suggests practical interest
Hireability: LOW — ~9.5 years at Google with no open-to-work signals, no LinkedIn or website activity changes detected, actively publishing in 2025 which suggests he's comfortable where he is

Patrick Esser

low hireability

PhD student@Heidelberg University

Co-founder of Black Forest Labs and co-creator of Stable Diffusion, Latent Diffusion Models, and FLUX.1
Directly relevant to all three query dimensions: video synthesis ("Structure and content-guided video synthesis with diffusion models", 2025), diffusion models (foundational LDM/SD work, SD3 Rectified Flow 2024), and inference optimization (LADD — adversarial diffusion distillation for 4-step synthesis, 2024). h-index 19, 4000+ citations for Taming Transformers alone
Hireability: LOW — co-founder of BFL (~50 people, founded 2024), actively shipping FLUX.1 Kontext (2025); unlikely to leave his own company

Rahim Entezari

low hireability

Applied Scientist@Wayve

Previously: Research Scientist @ Stability AI

London, GB

SD3 co-author (2592 citations, ICML 2024 best paper) with strong diffusion model experience
Also on SD3.5-Flash (fast inference via flow distillation) and Stable Cinemetrics (professional video generation eval, NeurIPS 2025)
Multimodal background via DataComp (663 citations)
Research expertise: text-to-image/text-to-video, diffusion models, inference optimization, multimodal data curation
Based in London at Wayve
Hireability: LOW — joined Wayve ~Jan 2026 (~3-4 months ago), already promoted to Senior Applied Scientist; too new to be moving again

Robin Rombach

low hireability

Researcher@Black Forest Labs

Previously: Researcher @ Stability AI

Original creator of Stable Diffusion / Latent Diffusion Models (LDM) and SDXL; co-authored image-to-video synthesis (CVPR 2021); Stability AI generative-models repo includes Stable Video Diffusion
Strong match on diffusion models + multimodal (text-to-image conditioning)
Hireability: LOW — co-founder of Black Forest Labs (est. 2024, building FLUX frontier generative models), actively building own company

Sumith Kulal

low hireability

co-founder and research scientist@Black Forest Labs

Previously: research scientist @ Stability AI

Co-founder of Black Forest Labs and core contributor to FLUX.1 and SD3 (Scaling Rectified Flow Transformers)
Strong diffusion model background from Stability AI and Black Forest Labs; PhD from Stanford
Primarily image generation (FLUX.1), not video specifically, but foundational to the broader diffusion+multimodal domain the query targets
Hireability: LOW — co-founder actively building Black Forest Labs, no open-to-work signals, no LinkedIn changes or recent CV updates

Tim Dockhorn

low hireability

Co-Founder@Black Forest Labs

Previously: Research Scientist @ Stability AI

Waterloo, CA

Co-founder & research scientist at Black Forest Labs (FLUX.1)
Deep diffusion model expertise: co-authored SDXL (ICLR 2024 Spotlight), Stable Video Diffusion, FLUX/SD3 (ICML 2024 Best Paper), and GENIE (higher-order denoising diffusion solvers — inference optimization)
PhD from U Waterloo, h-index 12
Hits all three query dimensions: video AI (SVD), diffusion models (SDXL/FLUX/SD3), and inference optimization (GENIE, LADD distillation)
Hireability: LOW — co-founder of a well-funded AI startup (BFL), actively building FLUX.1; no job-seeking signals and no website activity in 6+ months

Runs

#1completed0 qualified / 0 foundApr 22, 1:25 AM