Video ai researchers with experience in multimodal learning, diffusion models, a…

completed9 qualified1 runApr 22, 1:47 AMvideo-ai-researchers-with-experience-in-multimodal-learning-1776822446

ParsedOpenAI · 4 topics · Junior · Researcher · United States

Generating seed nodes

0 proposed

Explored 0 queries

0/0 done

Expanding nodes

queued

Qualifying candidates

queued

Qualified Candidates (9)

Soo Min Kwon

high hireability

Graduate Research Assistant@University of Michigan, Ann Arbor

Previously: Student Researcher @ Google

Ann Arbor, US

Final-year PhD student at UMich (ECE) specializing in diffusion models for inverse problems and efficient neural network inference
Strong query-relevant work: ICLR 2024 Spotlight on latent diffusion models (hard data consistency), NeurIPS 2024 BLAST paper on block-level structured matrices for efficient DNN inference
No explicit video AI or multimodal work, but diffusion model expertise directly transfers to video generation; inference optimization work (BLAST) is directly relevant
US-based (Ann Arbor), not from excluded companies
Hireability: HIGH — final year PhD student in prime transition window, Google Research NYC internship completed Nov 2025, 20 website changes including multiple cv_update and position_update signals indicating active job market engagement

Bikram Boote

medium hireability

Graduate Research Assistant@University of Illinois Urbana-Champaign

Previously: Software Development Engineer @ Amazon

Champaign, US

Strong video AI + multimodal researcher at UIUC Rehg Lab: Ego-Exo4D (CVPR 2024, 333 citations), CVPR 2024 Oral on multimodal social interaction modeling, ECCV 2024 point tracking
Egocentric vision and hand-object interaction are core expertise
No diffusion model or inference optimization work evident
US-based (Champaign, IL), PhD student ~yr 2-3, <3 years industry experience fits query
Hireability: MEDIUM — 2-3 years into PhD at UIUC, not yet in final-year transition window, no open-to-work signals detected

Bipasha Sen

medium hireability

Founder@Stealth Startup

Previously: Graduate Research Assistant @ MIT CSAIL

San Francisco, US

Video AI researcher with INR-V (video generation, TMLR 2022), FaceOff (video-to-video face swapping, WACV 2023), diffusion models work (EDMP, 2024), and multimodal research (ConceptGraphs, lipreading)
MIT PhD, early-career, no OpenAI/DeepMind/xAI
Inference optimization not clearly evidenced
Hireability: MEDIUM — currently Founder at stealth startup (low by default), but GitHub bio explicitly states 'looking to work on the next big challenge!', signaling openness to new roles

Daksh Aggarwal

medium hireability

AI Research Summer Associate@Balyasny Asset Management

Previously: Undergraduate Researcher @ The Fields Institute For Research In Mathematical Sciences

Austin, US

Published 'Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals' (NeurIPS 2025) and a follow-up 'Goal Force' (2026), both directly on video generation
Also co-authored 'Self-Correcting Self-Consuming Loops For Generative Model Training' (ICML 2024 + ICLR 2025) on stabilizing generative model training
Math/AI PhD student at Brown with clear pivot to video generation and generative models — 4 top-venue papers in 3 years
Not at excluded companies (summer intern at Balyasny hedge fund, not OpenAI/DeepMind/xAI)
Inference optimization is not covered, but video generation + diffusion training is strong
Hireability: MEDIUM — PhD student at Brown, started publishing 2022, likely 3rd-4th year; no explicit job-seeking signals but within normal transition window

Fiona Ryan

medium hireability

Graduate Researcher@Georgia Institute of Technology

Previously: Student Researcher @ Meta

Atlanta, US

Strong video AI and multimodal learning researcher (Ego4D + Ego-Exo4D co-author, egocentric gaze estimation, audio-visual gaze anticipation, CVPR 2025 x3 including two Highlights)
No clear diffusion model work; inference optimization absent — Polar-VL uses LoRA-style parameter updates (adjacent but not inference opt)
US-based (Atlanta), <3 years industry experience (PhD student throughout)
Hireability: MEDIUM — just defended PhD dissertation (April 2026), but removed 'looking for postdoc opportunities' statement from website in Nov 2025, suggesting she may already have a role lined up

Nate Gillman

medium hireability

Research Intern@Google

Previously: Research Intern @ Amazon

New York, US

Video generative modeling PhD student at Brown with NeurIPS 2025 paper on physics-based video generation (Force Prompting), ICML 2024 paper on generative model training (Self-Correcting Self-Consuming Loops), and ICLR 2025 paper on LLM distributions
Strong fit for video AI + diffusion models
No explicit inference optimization work
US-based (NY), <3 years work exp (internships at Google Research 2025, Amazon Science 2024)
Not at excluded companies
Hireability: MEDIUM — still active PhD student at Brown (latest commit Feb 2026), graduation timeline unclear from available signals

Xinyu Hu

medium hireability

applied scientist@Microsoft

Previously: MS student @ Stanford University

Currently building agentic RL for video AI independently (nomadic, SF-based); formerly Applied Scientist & Tech Lead at Microsoft AI on long-horizon multimodal reasoning
Strong diffusion models paper (Solving Inverse Problems with Latent Diffusion Models, 157 citations, ICLR adjacent), video generation metric paper (WYSIWYM 2024), and multimodal evaluation work
DB expertise: diffusion models, multimodal foundation models, LLMs
Stanford MS CME, CA-based
Work experience borderline: ~3-4 years post-MS (nominally above <3yr requirement)
Hireability: MEDIUM — building independently (genmini-ai/OpenCanvas startup mode), website quiet for 7+ months; not explicitly on market but may be open given solo-building phase

Zhengxu Tang

medium hireability

PhD student at UMich (Liyue Shen lab) with strong diffusion model and video AI work: CCS controllable diffusion sampling (NeurIPS 2025 poster), latent space disentanglement in diffusion transformers, and SeqBench benchmarking text-to-video models
Solid multimodal learning background via vision-language pre-training
Inference optimization is weak (no dedicated speed/efficiency work)
Based in Ann Arbor, MI
Hireability: MEDIUM — appears 2-3 years into PhD with no explicit graduation signal or job market activity detected

Zilai Zeng

medium hireability

Ph.D. student@Brown University

Previously: Research Intern @ ByteDance

Video-centric ML researcher at Brown University; uses diffusion models for policy learning (NeurIPS 2024 'Text-Aware Diffusion for Policy Learning') and internet video knowledge for robotic tasks (ICLR 2025)
Strong fit on video AI + diffusion models, moderate on multimodal; no inference optimization work visible
Interned at ByteDance Seed
Hireability: MEDIUM — active PhD student (~year 4-5 based on 2023-2026 publication range), ByteDance internship shows industry interest, but no explicit job-seeking signals detected

Runs

#1completed0 qualified / 0 foundApr 22, 1:47 AM