Skip to main content

Showing 1–19 of 19 results for author: Mosseri, I

  1. arXiv:2407.08674  [pdf, other

    cs.CV

    Still-Moving: Customized Video Generation without Customized Video Data

    Authors: Hila Chefer, Shiran Zada, Roni Paiss, Ariel Ephrat, Omer Tov, Michael Rubinstein, Lior Wolf, Tali Dekel, Tomer Michaeli, Inbar Mosseri

    Abstract: Customizing text-to-image (T2I) models has seen tremendous progress recently, particularly in areas such as personalization, stylization, and conditional generation. However, expanding this progress to video generation is still in its infancy, primarily due to the lack of customized video data. In this work, we introduce Still-Moving, a novel generic framework for customizing a text-to-video (T2V)… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Webpage: https://still-moving.github.io/ | Video: https://www.youtube.com/watch?v=U7UuV_VIjnA

  2. arXiv:2401.12945  [pdf, other

    cs.CV

    Lumiere: A Space-Time Diffusion Model for Video Generation

    Authors: Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa Hur, Guanghui Liu, Amit Raj, Yuanzhen Li, Michael Rubinstein, Tomer Michaeli, Oliver Wang, Deqing Sun, Tali Dekel, Inbar Mosseri

    Abstract: We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis. To this end, we introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synth… ▽ More

    Submitted 5 February, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: Webpage: https://lumiere-video.github.io/ | Video: https://www.youtube.com/watch?v=wxLr02Dz2Sc

  3. arXiv:2311.01462  [pdf, other

    cs.CV cs.LG

    Idempotent Generative Network

    Authors: Assaf Shocher, Amil Dravid, Yossi Gandelsman, Inbar Mosseri, Michael Rubinstein, Alexei A. Efros

    Abstract: We propose a new approach for generative modeling based on training a neural network to be idempotent. An idempotent operator is one that can be applied sequentially without changing the result beyond the initial application, namely $f(f(z))=f(z)$. The proposed model $f$ is trained to map a source distribution (e.g, Gaussian noise) to a target distribution (e.g. realistic images) using the followi… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  4. arXiv:2306.00966  [pdf, other

    cs.CV

    The Hidden Language of Diffusion Models

    Authors: Hila Chefer, Oran Lang, Mor Geva, Volodymyr Polosukhin, Assaf Shocher, Michal Irani, Inbar Mosseri, Lior Wolf

    Abstract: Text-to-image diffusion models have demonstrated an unparalleled ability to generate high-quality, diverse images from a textual prompt. However, the internal representations learned by these models remain an enigma. In this work, we present Conceptor, a novel method to interpret the internal representation of a textual concept by a diffusion model. This interpretation is obtained by decomposing t… ▽ More

    Submitted 5 October, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

  5. arXiv:2302.12066  [pdf, other

    cs.CV

    Teaching CLIP to Count to Ten

    Authors: Roni Paiss, Ariel Ephrat, Omer Tov, Shiran Zada, Inbar Mosseri, Michal Irani, Tali Dekel

    Abstract: Large vision-language models (VLMs), such as CLIP, learn rich joint image-text representations, facilitating advances in numerous downstream tasks, including zero-shot classification and text-to-image generation. Nevertheless, existing VLMs exhibit a prominent well-documented limitation - they fail to encapsulate compositional concepts such as counting. We introduce a simple yet effective method t… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

  6. arXiv:2210.09276  [pdf, other

    cs.CV

    Imagic: Text-Based Real Image Editing with Diffusion Models

    Authors: Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, Michal Irani

    Abstract: Text-conditioned image editing has recently attracted considerable interest. However, most methods are currently either limited to specific editing types (e.g., object overlay, style transfer), or apply to synthetically generated images, or require multiple input images of a common object. In this paper we demonstrate, for the very first time, the ability to apply complex (e.g., non-rigid) text-gu… ▽ More

    Submitted 20 March, 2023; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: Project page: https://imagic-editing.github.io/

  7. arXiv:2203.17272  [pdf, other

    cs.CV cs.GR cs.LG

    MyStyle: A Personalized Generative Prior

    Authors: Yotam Nitzan, Kfir Aberman, Qiurui He, Orly Liba, Michal Yarom, Yossi Gandelsman, Inbar Mosseri, Yael Pritch, Daniel Cohen-or

    Abstract: We introduce MyStyle, a personalized deep generative prior trained with a few shots of an individual. MyStyle allows to reconstruct, enhance and edit images of a specific person, such that the output is faithful to the person's key facial characteristics. Given a small reference set of portrait images of a person (~100), we tune the weights of a pretrained StyleGAN face generator to form a local,… ▽ More

    Submitted 6 October, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: SIGGRAPH ASIA 2022, Project webpage: https://mystyle-personalized-prior.github.io/, Video: https://youtu.be/QvOdQR3tlOc

  8. arXiv:2202.12211  [pdf, other

    cs.CV

    Self-Distilled StyleGAN: Towards Generation from Internet Photos

    Authors: Ron Mokady, Michal Yarom, Omer Tov, Oran Lang, Daniel Cohen-Or, Tali Dekel, Michal Irani, Inbar Mosseri

    Abstract: StyleGAN is known to produce high-fidelity images, while also offering unprecedented semantic editing. However, these fascinating abilities have been demonstrated only on a limited set of datasets, which are usually structurally aligned and well curated. In this paper, we show how StyleGAN can be adapted to work on raw uncurated images collected from the Internet. Such image collections impose two… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

  9. arXiv:2109.01980  [pdf, other

    cs.CV cs.GR cs.LG

    Deep Saliency Prior for Reducing Visual Distraction

    Authors: Kfir Aberman, Junfeng He, Yossi Gandelsman, Inbar Mosseri, David E. Jacobs, Kai Kohlhoff, Yael Pritch, Michael Rubinstein

    Abstract: Using only a model that was trained to predict where people look at images, and no additional training data, we can produce a range of powerful editing effects for reducing distraction in images. Given an image and a mask specifying the region to edit, we backpropagate through a state-of-the-art saliency model to parameterize a differentiable editing operator, such that the saliency within the mas… ▽ More

    Submitted 4 September, 2021; originally announced September 2021.

    Comments: https://deep-saliency-prior.github.io/

  10. Multiclass Permanent Magnets Superstructure for Indoor Localization using Artificial Intelligence

    Authors: Amir Ivry, Elad Fisher, Roger Alimi, Idan Mosseri, Kanna Nahir

    Abstract: Smartphones have become a popular tool for indoor localization and position estimation of users. Existing solutions mainly employ Wi-Fi, RFID, and magnetic sensing techniques to track movements in crowded venues. These are highly sensitive to magnetic clutters and depend on local ambient magnetic fields, which frequently degrades their performance. Also, these techniques often require pre-known ma… ▽ More

    Submitted 14 July, 2021; originally announced July 2021.

    Comments: Accepted to IEEE Transactions on Magnetics

    Journal ref: year 2021

  11. arXiv:2104.13369  [pdf, other

    cs.CV cs.LG cs.NE eess.IV stat.ML

    Explaining in Style: Training a GAN to explain a classifier in StyleSpace

    Authors: Oran Lang, Yossi Gandelsman, Michal Yarom, Yoav Wald, Gal Elidan, Avinatan Hassidim, William T. Freeman, Phillip Isola, Amir Globerson, Michal Irani, Inbar Mosseri

    Abstract: Image classification models can depend on multiple different semantic attributes of the image. An explanation of the decision of the classifier needs to both discover and visualize these properties. Here we present StylEx, a method for doing this, by training a generative model to specifically explain multiple attributes that underlie classifier decisions. A natural source for such attributes is t… ▽ More

    Submitted 1 September, 2021; v1 submitted 27 April, 2021; originally announced April 2021.

    Comments: Accepted to ICCV 2021. Project page: https://explaining-in-style.github.io/, Code: https://github.com/google/explaining-in-style

  12. arXiv:2005.13304  [pdf, other

    cs.DC

    ComPar: Optimized Multi-Compiler for Automatic OpenMP S2S Parallelization

    Authors: Idan Mosseri, Lee-or Alon, Re'em Harel, Gal Oren

    Abstract: Parallelization schemes are essential in order to exploit the full benefits of multi-core architectures. In said architectures, the most comprehensive parallelization API is OpenMP. However, the introduction of correct and optimal OpenMP parallelization to applications is not always a simple task, due to common parallel management pitfalls, architecture heterogeneity and the current necessity for… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    Comments: 15 pages

  13. arXiv:2004.06130  [pdf, other

    cs.CV

    SpeedNet: Learning the Speediness in Videos

    Authors: Sagie Benaim, Ariel Ephrat, Oran Lang, Inbar Mosseri, William T. Freeman, Michael Rubinstein, Michal Irani, Tali Dekel

    Abstract: We wish to automatically predict the "speediness" of moving objects in videos---whether they move faster, at, or slower than their "natural" speed. The core component in our approach is SpeedNet---a novel deep network trained to detect if a video is playing at normal rate, or if it is sped up. SpeedNet is trained on a large corpus of natural videos in a self-supervised manner, without requiring an… ▽ More

    Submitted 26 July, 2020; v1 submitted 13 April, 2020; originally announced April 2020.

    Comments: Accepted to CVPR 2020 (oral). Project webpage: http://speednet-cvpr20.github.io

  14. arXiv:2003.06221  [pdf, other

    cs.CV cs.LG

    Semantic Pyramid for Image Generation

    Authors: Assaf Shocher, Yossi Gandelsman, Inbar Mosseri, Michal Yarom, Michal Irani, William T. Freeman, Tali Dekel

    Abstract: We present a novel GAN-based model that utilizes the space of deep features learned by a pre-trained classification model. Inspired by classical image pyramid representations, we construct our model as a Semantic Generation Pyramid -- a hierarchical framework which leverages the continuum of semantic information encapsulated in such deep features; this ranges from low level information contained i… ▽ More

    Submitted 16 March, 2020; v1 submitted 13 March, 2020; originally announced March 2020.

    Journal ref: IEEE Conference on Computer Vision and Pattern Recognition, 2020. CVPR 2020

  15. arXiv:1910.06415  [pdf, other

    cs.DC cs.SE

    BACKUS: Comprehensive High-Performance Research Software Engineering Approach for Simulations in Supercomputing Systems

    Authors: Matan Rusanovsky, Re'em Harel, Lee-or Alon, Idan Mosseri, Harel Levin, Gal Oren

    Abstract: High-Performance Computing (HPC) platforms enable scientific software to achieve breakthroughs in many research fields such as physics, biology, and chemistry, by employing Research Software Engineering (RSE) techniques. These include 1) novel parallelism paradigms such as Shared Memory Parallelism (with e.g. OpenMP 4.5); Distributed Memory Parallelism (with e.g. MPI 4); Hybrid Parallelism which c… ▽ More

    Submitted 14 October, 2019; originally announced October 2019.

    Comments: 19 pages, 4 figures

  16. arXiv:1905.09773  [pdf, other

    cs.CV cs.MM

    Speech2Face: Learning the Face Behind a Voice

    Authors: Tae-Hyun Oh, Tali Dekel, Changil Kim, Inbar Mosseri, William T. Freeman, Michael Rubinstein, Wojciech Matusik

    Abstract: How much can we infer about a person's looks from the way they speak? In this paper, we study the task of reconstructing a facial image of a person from a short audio recording of that person speaking. We design and train a deep neural network to perform this task using millions of natural Internet/YouTube videos of people speaking. During training, our model learns voice-face correlations that al… ▽ More

    Submitted 23 May, 2019; originally announced May 2019.

    Comments: To appear in CVPR2019. Project page: http://speech2face.github.io

  17. arXiv:1804.03619  [pdf, other

    cs.SD cs.CV eess.AS

    Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation

    Authors: Ariel Ephrat, Inbar Mosseri, Oran Lang, Tali Dekel, Kevin Wilson, Avinatan Hassidim, William T. Freeman, Michael Rubinstein

    Abstract: We present a joint audio-visual model for isolating a single speech signal from a mixture of sounds such as other speakers and background noise. Solving this task using only audio as input is extremely challenging and does not provide an association of the separated speech signals with speakers in the video. In this paper, we present a deep network-based model that incorporates both visual and aud… ▽ More

    Submitted 9 August, 2018; v1 submitted 10 April, 2018; originally announced April 2018.

    Comments: Accepted to SIGGRAPH 2018. Project webpage: https://looking-to-listen.github.io

    Journal ref: ACM Trans. Graph. 37(4): 112:1-112:11 (2018)

  18. arXiv:1711.05139  [pdf, other

    cs.CV

    XGAN: Unsupervised Image-to-Image Translation for Many-to-Many Mappings

    Authors: Amélie Royer, Konstantinos Bousmalis, Stephan Gouws, Fred Bertsch, Inbar Mosseri, Forrester Cole, Kevin Murphy

    Abstract: Style transfer usually refers to the task of applying color and texture information from a specific style image to a given content image while preserving the structure of the latter. Here we tackle the more generic problem of semantic style transfer: given two unpaired collections of images, we aim to learn a mapping between the corpus-level style of each collection, while preserving semantic cont… ▽ More

    Submitted 10 July, 2018; v1 submitted 14 November, 2017; originally announced November 2017.

    Comments: Domain Adaptation for Visual Understanding at ICML'18

  19. arXiv:1701.04851  [pdf, other

    cs.CV stat.ML

    Synthesizing Normalized Faces from Facial Identity Features

    Authors: Forrester Cole, David Belanger, Dilip Krishnan, Aaron Sarna, Inbar Mosseri, William T. Freeman

    Abstract: We present a method for synthesizing a frontal, neutral-expression image of a person's face given an input face photograph. This is achieved by learning to generate facial landmarks and textures from features extracted from a facial-recognition network. Unlike previous approaches, our encoding feature vector is largely invariant to lighting, pose, and facial expression. Exploiting this invariance,… ▽ More

    Submitted 17 October, 2017; v1 submitted 17 January, 2017; originally announced January 2017.