Skip to main content

Showing 1–8 of 8 results for author: Ayan, B K

  1. arXiv:2405.02793  [pdf, other

    cs.CV cs.CL

    ImageInWords: Unlocking Hyper-Detailed Image Descriptions

    Authors: Roopal Garg, Andrea Burns, Burcu Karagol Ayan, Yonatan Bitton, Ceslee Montgomery, Yasumasa Onoe, Andrew Bunner, Ranjay Krishna, Jason Baldridge, Radu Soricut

    Abstract: Despite the longstanding adage "an image is worth a thousand words," creating accurate and hyper-detailed image descriptions for training Vision-Language models remains challenging. Current datasets typically have web-scraped descriptions that are short, low-granularity, and often contain details unrelated to the visual content. As a result, models trained on such data generate descriptions replet… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: Webpage (https://google.github.io/imageinwords), GitHub (https://github.com/google/imageinwords), HuggingFace (https://huggingface.co/datasets/google/imageinwords)

  2. arXiv:2402.01787  [pdf, other

    cs.CY cs.AI cs.LG

    Harm Amplification in Text-to-Image Models

    Authors: Susan Hao, Renee Shelby, Yuchi Liu, Hansa Srinivasan, Mukul Bhutani, Burcu Karagol Ayan, Ryan Poplin, Shivani Poddar, Sarah Laszlo

    Abstract: Text-to-image (T2I) models have emerged as a significant advancement in generative AI; however, there exist safety concerns regarding their potential to produce harmful image outputs even when users input seemingly safe prompts. This phenomenon, where T2I models generate harmful representations that were not explicit in the input, poses a potentially greater risk than adversarial prompts, leaving… ▽ More

    Submitted 17 May, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  3. arXiv:2209.06794  [pdf, other

    cs.CV cs.CL

    PaLI: A Jointly-Scaled Multilingual Language-Image Model

    Authors: Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari, Gaurav Mishra, Linting Xue, Ashish Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme, Andreas Steiner , et al. (4 additional authors not shown)

    Abstract: Effective scaling and a flexible task interface enable large language models to excel at many tasks. We present PaLI (Pathways Language and Image model), a model that extends this approach to the joint modeling of language and vision. PaLI generates text based on visual and textual inputs, and with this interface performs many vision, language, and multimodal tasks, in many languages. To train PaL… ▽ More

    Submitted 5 June, 2023; v1 submitted 14 September, 2022; originally announced September 2022.

    Comments: ICLR 2023 (Notable-top-5%)

  4. arXiv:2206.10789  [pdf, other

    cs.CV cs.LG

    Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

    Authors: Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, Zirui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, Yonghui Wu

    Abstract: We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge. Parti treats text-to-image generation as a sequence-to-sequence modeling problem, akin to machine translation, with sequences of image tokens as the target outputs rather than text tokens in a… ▽ More

    Submitted 21 June, 2022; originally announced June 2022.

    Comments: Preprint

  5. arXiv:2205.11487  [pdf, other

    cs.CV cs.LG

    Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

    Authors: Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi

    Abstract: We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Imagen builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation. Our key discovery is that generic large language models (e.g. T5), pretrained on text-only c… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

  6. arXiv:2101.00391  [pdf, other

    cs.CL

    Which Linguist Invented the Lightbulb? Presupposition Verification for Question-Answering

    Authors: Najoung Kim, Ellie Pavlick, Burcu Karagol Ayan, Deepak Ramachandran

    Abstract: Many Question-Answering (QA) datasets contain unanswerable questions, but their treatment in QA systems remains primitive. Our analysis of the Natural Questions (Kwiatkowski et al. 2019) dataset reveals that a substantial portion of unanswerable questions ($\sim$21%) can be explained based on the presence of unverifiable presuppositions. We discuss the shortcomings of current models in handling su… ▽ More

    Submitted 3 September, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

    Comments: ACL 2021 Camera-ready

  7. arXiv:2005.08469  [pdf, other

    cs.CL

    Text Classification with Few Examples using Controlled Generalization

    Authors: Abhijit Mahabal, Jason Baldridge, Burcu Karagol Ayan, Vincent Perot, Dan Roth

    Abstract: Training data for text classification is often limited in practice, especially for applications with many output classes or involving many related classification problems. This means classifiers must generalize from limited evidence, but the manner and extent of generalization is task dependent. Current practice primarily relies on pre-trained word embeddings to map words unseen in training to sim… ▽ More

    Submitted 18 May, 2020; originally announced May 2020.

    Journal ref: Proceedings of NAACL-HLT 2019

  8. arXiv:2005.08319  [pdf, other

    cs.CL

    Context-Based Quotation Recommendation

    Authors: Ansel MacLaughlin, Tao Chen, Burcu Karagol Ayan, Dan Roth

    Abstract: While composing a new document, anything from a news article to an email or essay, authors often utilize direct quotes from a variety of sources. Although an author may know what point they would like to make, selecting an appropriate quote for the specific context may be time-consuming and difficult. We therefore propose a novel context-aware quote recommendation system which utilizes the content… ▽ More

    Submitted 19 August, 2020; v1 submitted 17 May, 2020; originally announced May 2020.

    Comments: 12 pages, 3 figures