Skip to main content

Showing 1–8 of 8 results for author: Sheynin, S

  1. arXiv:2403.09334  [pdf, other

    cs.CV

    Video Editing via Factorized Diffusion Distillation

    Authors: Uriel Singer, Amit Zohar, Yuval Kirstain, Shelly Sheynin, Adam Polyak, Devi Parikh, Yaniv Taigman

    Abstract: We introduce Emu Video Edit (EVE), a model that establishes a new state-of-the art in video editing without relying on any supervised video editing data. To develop EVE we separately train an image editing adapter and a video generation adapter, and attach both to the same text-to-image model. Then, to align the adapters towards video editing we introduce a new unsupervised distillation procedure,… ▽ More

    Submitted 24 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  2. arXiv:2311.10089  [pdf, other

    cs.CV cs.AI cs.LG

    Emu Edit: Precise Image Editing via Recognition and Generation Tasks

    Authors: Shelly Sheynin, Adam Polyak, Uriel Singer, Yuval Kirstain, Amit Zohar, Oron Ashual, Devi Parikh, Yaniv Taigman

    Abstract: Instruction-based image editing holds immense potential for a variety of applications, as it enables users to perform any editing operation using a natural language instruction. However, current models in this domain often struggle with accurately executing user instructions. We present Emu Edit, a multi-task image editing model which sets state-of-the-art results in instruction-based image editin… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  3. arXiv:2309.02591  [pdf, other

    cs.LG cs.CL cs.CV

    Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning

    Authors: Lili Yu, Bowen Shi, Ramakanth Pasunuru, Benjamin Muller, Olga Golovneva, Tianlu Wang, Arun Babu, Binh Tang, Brian Karrer, Shelly Sheynin, Candace Ross, Adam Polyak, Russell Howes, Vasu Sharma, Puxin Xu, Hovhannes Tamoyan, Oron Ashual, Uriel Singer, Shang-Wen Li, Susan Zhang, Richard James, Gargi Ghosh, Yaniv Taigman, Maryam Fazel-Zarandi, Asli Celikyilmaz , et al. (2 additional authors not shown)

    Abstract: We present CM3Leon (pronounced "Chameleon"), a retrieval-augmented, token-based, decoder-only multi-modal language model capable of generating and infilling both text and images. CM3Leon uses the CM3 multi-modal architecture but additionally shows the extreme benefits of scaling up and tuning on more diverse instruction-style data. It is the first multi-modal model trained with a recipe adapted fr… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  4. arXiv:2301.11280  [pdf, other

    cs.CV cs.AI cs.LG

    Text-To-4D Dynamic Scene Generation

    Authors: Uriel Singer, Shelly Sheynin, Adam Polyak, Oron Ashual, Iurii Makarov, Filippos Kokkinos, Naman Goyal, Andrea Vedaldi, Devi Parikh, Justin Johnson, Yaniv Taigman

    Abstract: We present MAV3D (Make-A-Video3D), a method for generating three-dimensional dynamic scenes from text descriptions. Our approach uses a 4D dynamic Neural Radiance Field (NeRF), which is optimized for scene appearance, density, and motion consistency by querying a Text-to-Video (T2V) diffusion-based model. The dynamic video output generated from the provided text can be viewed from any camera locat… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

  5. arXiv:2204.02849  [pdf, other

    cs.CV cs.AI cs.CL cs.GR cs.LG

    KNN-Diffusion: Image Generation via Large-Scale Retrieval

    Authors: Shelly Sheynin, Oron Ashual, Adam Polyak, Uriel Singer, Oran Gafni, Eliya Nachmani, Yaniv Taigman

    Abstract: Recent text-to-image models have achieved impressive results. However, since they require large-scale datasets of text-image pairs, it is impractical to train them on new domains where data is scarce or not labeled. In this work, we propose using large-scale retrieval methods, in particular, efficient k-Nearest-Neighbors (kNN), which offers novel capabilities: (1) training a substantially small an… ▽ More

    Submitted 2 October, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

  6. arXiv:2203.13131  [pdf, other

    cs.CV cs.AI cs.CL cs.GR cs.LG

    Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

    Authors: Oran Gafni, Adam Polyak, Oron Ashual, Shelly Sheynin, Devi Parikh, Yaniv Taigman

    Abstract: Recent text-to-image generation methods provide a simple yet exciting conversion capability between text and image domains. While these methods have incrementally improved the generated image fidelity and text relevancy, several pivotal gaps remain unanswered, limiting applicability and quality. We propose a novel text-to-image method that addresses these gaps by (i) enabling a simple control mech… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

  7. arXiv:2112.05080  [pdf, other

    cs.CV cs.AI

    Locally Shifted Attention With Early Global Integration

    Authors: Shelly Sheynin, Sagie Benaim, Adam Polyak, Lior Wolf

    Abstract: Recent work has shown the potential of transformers for computer vision applications. An image is first partitioned into patches, which are then used as input tokens for the attention mechanism. Due to the expensive quadratic cost of the attention mechanism, either a large patch size is used, resulting in coarse-grained global interactions, or alternatively, attention is applied only on a local re… ▽ More

    Submitted 22 December, 2021; v1 submitted 9 December, 2021; originally announced December 2021.

  8. arXiv:2104.14535  [pdf, other

    cs.CV cs.LG

    A Hierarchical Transformation-Discriminating Generative Model for Few Shot Anomaly Detection

    Authors: Shelly Sheynin, Sagie Benaim, Lior Wolf

    Abstract: Anomaly detection, the task of identifying unusual samples in data, often relies on a large set of training samples. In this work, we consider the setting of few-shot anomaly detection in images, where only a few images are given at training. We devise a hierarchical generative model that captures the multi-scale patch distribution of each training image. We further enhance the representation of o… ▽ More

    Submitted 29 April, 2021; originally announced April 2021.