Yonatan Bitton

Yonatan Bitton

Research Scientist at Google, CS PhD

The Hebrew University of Jerusalem


I am a Research Scientist at Google Research in Tel-Aviv where I work on vision-and-language. I completed my PhD in The Hebrew University of Jerusalem, Israel. During my time there, I had the privilege of being advised by Dr. Roy Schwartz and Dr. Gabriel Stanovsky.

The goal of my research is to improve vision and language generalization. Specifically, I aim to develop models with better compositionality abilities, less biased and better perform on real-world examples. My recent works and interest areas include image-text alignment, improving text-to-image models, and visual instruction tuning. See my publications for more details. My PhD talk "Bridging Vision and Language with Data: From Perception to Understanding" ๐ŸŽฌ record is available here.

I did my MSc withย Prof. Michael Elhadad and Prof. Eitan Bachmat, at the Ben Gurion University.

Download my complete CV.

  • PhD in Computer Science (Vision-and-Language), 2020-2023

    The Hebrew University of Jerusalem, Israel

  • MSc in Computer Science (Natural Language Processing), Magna cum laude, 2018-2019

    Ben Gurion University of the Negev, Israel

  • BSc in Computer Science, 2015-2018

    Ben Gurion University of the Negev, Israel


I've had the opportunity to collaborate with several MSc and new PhD students towards their publication goals:

  • Brian Gordon - Mismatch Quest - Visual and Textual Feedback for Image-Text Misalignment

  • Hritik Bansal - Video-Con - Robust Video-Language Alignment via Contrast Captions

  • Oren Sultan - Analogies research project (in-progress)

  • Nitzan Bitton-Guetta - WHOOPS! - Commonsense-defying image with text-to-image models

  • Netta Madvil - Read, Look or Listen? Multimodal models analysis

  • Ron Yosef - IRFL - Figurative language and visual metaphors

If you want to work together on vision-and-language research, feel free to shoot me an email.

Work Experience

Google Research
Research Scientist
Google Research
Jun 2023 โ€“ Present Israel
Focusing on vision-and-language. Recent works include image-text alignment, improving text-to-image models, and visual instruction tuning.
Google Research
Research Intern
Google Research
Jul 2022 โ€“ Jun 2023 Israel
Cerebra team, Conversational AI, working with LLMs (LaMDA, PaLM, BARD, etc)
Amazon Lab126
Applied Scientist Intern
Amazon Lab126
Oct 2019 โ€“ July 2022 Israel
Visual Fitness - Halo team
Developed a virtual fitness trainer, specializing in 2D/3D pose estimation, action recognition, error correction, on-device deployment and more.
IBM Research
Research Student
IBM Research
Jun 2017 โ€“ Oct 2019 Israel
Using data-science and machine-learning methods in order to detect frauds

Invited Talks

Bridging Vision and Language with Data: From Perception to Understanding
Commonsense Benchmarks for Vision and Language
q2d: Turning Questions into Dialogs to Teach Models How to Search
WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models
VASR: Visual Analogies of Situation Recognition


Managing Research
This talk deals with several research related questions. For example findings new research ideas, choose a research topic, staying updated with new research, working with your supervisors, and more.
A platform that connects drone pilots with people in need of drone services.
This project participated in Starter - Jump course and won 1st place in the final Demo Day event.
Press coverage: telecomnews, israeldefense, sheva7.