Yonatan Bitton

Yonatan Bitton

Senior Research Scientist at Google, CS PhD

The Hebrew University of Jerusalem

Biography

I am a Research Scientist at Google Research in Tel-Aviv where I work on multimodal consistency.

My research is centered on improving large vision-and-language models. I develop feedback models for text-to-image and text-to-video applications, specifically designed to enhance the alignment of visual outputs with their corresponding textual prompts. Additionally, I work on multimodal factuality, including visual understanding and image or video-to-text evaluation, ensuring that the generated text is factually correct and attributable to trustworthy textual or visual sources.

I completed my PhD in The Hebrew University of Jerusalem, Israel. During my time there, I had the privilege of being advised by Dr. Roy Schwartz and Dr. Gabriel Stanovsky. My PhD talk "Bridging Vision and Language with Data: From Perception to Understanding" 🎬 record is available here. I did my MSc with Prof. Michael Elhadad and Prof. Eitan Bachmat, at the Ben Gurion University.

Download my complete CV.

Education
  • PhD in Computer Science (Vision-and-Language), 2020-2023

    The Hebrew University of Jerusalem, Israel

  • MSc in Computer Science (Natural Language Processing), Magna cum laude, 2018-2019

    Ben Gurion University of the Negev, Israel

  • BSc in Computer Science, 2015-2018

    Ben Gurion University of the Negev, Israel

Students

I've had the opportunity to collaborate with several MSc and PhD students towards their publication goals:

  • Mor Ventura - NL-Eye: Abductive NLI for Images
  • Orr Zohar - Video-STaR: Bootstrapping Weak Video Supervision for Visual Instruction Tuning
  • Hritik Bansal -
    • Video-Con: Robust Video-Language Alignment via Contrast Captions
    • TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation
    • VideoPhy: Evaluating Physical Commonsense In Video Generation - 2406.03520
  • Brian Gordon - Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment
  • Nitzan Bitton-Guetta -
    • WHOOPS! Commonsense-defying image with text-to-image models
    • Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models
  • Ron Yosef - IRFL: Figurative language and visual metaphors
  • Oren Sultan - ParallelPARC: A Scalable Pipeline for Generating Natural-Language Analogies
  • Netta Madvil - Read, Look or Listen?: Multimodal models analysis

If you want to work together on vision-and-language research, feel free to shoot me an email.

Work Experience

 
 
 
 
 
Google Research
Senior Research Scientist
Google Research
April 2024 – Present Israel
Advancing multimodal consistency. Developing feedback models for text-to-image and text-to-video applications and enhance multimodal factuality to ensure the accuracy of text generated from visual sources.
 
 
 
 
 
Google Research
Research Scientist
Google Research
Jun 2023 – April 2024 Israel
Focusing on vision-and-language. Recent works include image-text alignment, improving text-to-image models, and visual instruction tuning.
 
 
 
 
 
Google Research
Research Intern
Google Research
Jul 2022 – Jun 2023 Israel
Cerebra team, Conversational AI, working with LLMs (LaMDA, PaLM, BARD, etc)
 
 
 
 
 
Amazon Lab126
Applied Scientist Intern
Amazon Lab126
Oct 2019 – July 2022 Israel
Visual Fitness - Halo team
Developed a virtual fitness trainer, specializing in 2D/3D pose estimation, action recognition, error correction, on-device deployment and more.
 
 
 
 
 
IBM Research
Research Student
IBM Research
Jun 2017 – Oct 2019 Israel
Using data-science and machine-learning methods in order to detect frauds

Invited Talks

Bridging Vision and Language with Data: From Perception to Understanding
Commonsense Benchmarks for Vision and Language
q2d: Turning Questions into Dialogs to Teach Models How to Search
WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models
VASR: Visual Analogies of Situation Recognition

Others

Managing Research
This talk deals with several research related questions. For example findings new research ideas, choose a research topic, staying updated with new research, working with your supervisors, and more.
AirPal
A platform that connects drone pilots with people in need of drone services.
This project participated in Starter - Jump course and won 1st place in the final Demo Day event.
Press coverage: telecomnews, israeldefense, sheva7.