I am a Research Scientist at Google Research in Tel-Aviv where I work on multimodal consistency.
My research is centered on improving large vision-and-language models. I develop feedback models for text-to-image and text-to-video applications, specifically designed to enhance the alignment of visual outputs with their corresponding textual prompts. Additionally, I work on multimodal factuality, including visual understanding and image or video-to-text evaluation, ensuring that the generated text is factually correct and attributable to trustworthy textual or visual sources.
I completed my PhD in The Hebrew University of Jerusalem, Israel. During my time there, I had the privilege of being advised by Dr. Roy Schwartz and Dr. Gabriel Stanovsky. My PhD talk "Bridging Vision and Language with Data: From Perception to Understanding" 🎬 record is available here. I did my MSc with Prof. Michael Elhadad and Prof. Eitan Bachmat, at the Ben Gurion University.
Download my complete CV.PhD in Computer Science (Vision-and-Language), 2020-2023
The Hebrew University of Jerusalem, Israel
MSc in Computer Science (Natural Language Processing), Magna cum laude, 2018-2019
Ben Gurion University of the Negev, Israel
BSc in Computer Science, 2015-2018
Ben Gurion University of the Negev, Israel
I've had the opportunity to collaborate with several MSc and PhD students towards their publication goals:
If you want to work together on vision-and-language research, feel free to shoot me an email.