Skip to main content

Showing 1–18 of 18 results for author: Gandelsman, Y

  1. arXiv:2406.09413  [pdf, other

    cs.CV cs.GR cs.LG

    Interpreting the Weight Space of Customized Diffusion Models

    Authors: Amil Dravid, Yossi Gandelsman, Kuan-Chieh Wang, Rameen Abdal, Gordon Wetzstein, Alexei A. Efros, Kfir Aberman

    Abstract: We investigate the space of weights spanned by a large collection of customized diffusion models. We populate this space by creating a dataset of over 60,000 models, each of which is a base model fine-tuned to insert a different person's visual identity. We model the underlying manifold of these weights as a subspace, which we term weights2weights. We demonstrate three immediate applications of th… ▽ More

    Submitted 17 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Project Page: https://snap-research.github.io/weights2weights

  2. arXiv:2406.04341  [pdf, other

    cs.CV

    Interpreting the Second-Order Effects of Neurons in CLIP

    Authors: Yossi Gandelsman, Alexei A. Efros, Jacob Steinhardt

    Abstract: We interpret the function of individual neurons in CLIP by automatically describing them using text. Analyzing the direct effects (i.e. the flow from a neuron through the residual stream to the output) or the indirect effects (overall contribution) fails to capture the neurons' function in CLIP. Therefore, we present the "second-order lens", analyzing the effect flowing from a neuron through the l… ▽ More

    Submitted 23 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: project page: https://yossigandelsman.github.io/clip_neurons/index.html

  3. arXiv:2404.03652  [pdf, other

    cs.CV

    The More You See in 2D, the More You Perceive in 3D

    Authors: Xinyang Han, Zelin Gao, Angjoo Kanazawa, Shubham Goel, Yossi Gandelsman

    Abstract: Humans can infer 3D structure from 2D images of an object based on past experience and improve their 3D understanding as they see more images. Inspired by this behavior, we introduce SAP3D, a system for 3D reconstruction and novel view synthesis from an arbitrary number of unposed images. Given a few unposed images of an object, we adapt a pre-trained view-conditioned diffusion model together with… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Project page: https://sap3d.github.io/

  4. arXiv:2401.10889  [pdf, other

    cs.CV cs.AI

    Synthesizing Moving People with 3D Control

    Authors: Boyi Li, Jathushan Rajasegaran, Yossi Gandelsman, Alexei A. Efros, Jitendra Malik

    Abstract: In this paper, we present a diffusion model-based framework for animating people from a single image for a given target 3D motion sequence. Our approach has two core components: a) learning priors about invisible parts of the human body and clothing, and b) rendering novel body poses with proper clothing and texture. For the first part, we learn an in-filling diffusion model to hallucinate unseen… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

  5. arXiv:2312.01771  [pdf, other

    cs.CV

    IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks

    Authors: Jiarui Xu, Yossi Gandelsman, Amir Bar, Jianwei Yang, Jianfeng Gao, Trevor Darrell, Xiaolong Wang

    Abstract: In-context learning allows adapting a model to new tasks given a task description at test time. In this paper, we present IMProv - a generative model that is able to in-context learn visual tasks from multimodal prompts. Given a textual description of a visual task (e.g. "Left: input image, Right: foreground segmentation"), a few input-output visual examples, or both, the model in-context learns t… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Project page: https://jerryxu.net/IMProv

  6. arXiv:2311.01462  [pdf, other

    cs.CV cs.LG

    Idempotent Generative Network

    Authors: Assaf Shocher, Amil Dravid, Yossi Gandelsman, Inbar Mosseri, Michael Rubinstein, Alexei A. Efros

    Abstract: We propose a new approach for generative modeling based on training a neural network to be idempotent. An idempotent operator is one that can be applied sequentially without changing the result beyond the initial application, namely $f(f(z))=f(z)$. The proposed model $f$ is trained to map a source distribution (e.g, Gaussian noise) to a target distribution (e.g. realistic images) using the followi… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  7. arXiv:2310.05916  [pdf, other

    cs.CV cs.AI

    Interpreting CLIP's Image Representation via Text-Based Decomposition

    Authors: Yossi Gandelsman, Alexei A. Efros, Jacob Steinhardt

    Abstract: We investigate the CLIP image encoder by analyzing how individual model components affect the final representation. We decompose the image representation as a sum across individual image patches, model layers, and attention heads, and use CLIP's text representation to interpret the summands. Interpreting the attention heads, we characterize each head's role by automatically finding text representa… ▽ More

    Submitted 28 March, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: Project page and code: https://yossigandelsman.github.io/clip_decomposition/

  8. arXiv:2309.17123  [pdf, other

    cs.CV cs.LG

    Reconstruction of Patient-Specific Confounders in AI-based Radiologic Image Interpretation using Generative Pretraining

    Authors: Tianyu Han, Laura Žigutytė, Luisa Huck, Marc Huppertz, Robert Siepmann, Yossi Gandelsman, Christian Blüthgen, Firas Khader, Christiane Kuhl, Sven Nebelung, Jakob Kather, Daniel Truhn

    Abstract: Detecting misleading patterns in automated diagnostic assistance systems, such as those powered by Artificial Intelligence, is critical to ensuring their reliability, particularly in healthcare. Current techniques for evaluating deep learning models cannot visualize confounding factors at a diagnostic level. Here, we propose a self-conditioned diffusion model termed DiffChest and train it on a dat… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

  9. arXiv:2307.05014  [pdf, other

    cs.CV cs.LG

    Test-Time Training on Video Streams

    Authors: Renhao Wang, Yu Sun, Yossi Gandelsman, Xinlei Chen, Alexei A. Efros, Xiaolong Wang

    Abstract: Prior work has established test-time training (TTT) as a general framework to further improve a trained model at test time. Before making a prediction on each test instance, the model is trained on the same instance using a self-supervised task, such as image reconstruction with masked autoencoders. We extend TTT to the streaming setting, where multiple test instances - video frames in our case -… ▽ More

    Submitted 12 July, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

    Comments: Project website with videos, dataset and code: https://video-ttt.github.io/

  10. arXiv:2306.09346  [pdf, other

    cs.CV

    Rosetta Neurons: Mining the Common Units in a Model Zoo

    Authors: Amil Dravid, Yossi Gandelsman, Alexei A. Efros, Assaf Shocher

    Abstract: Do different neural networks, trained for various vision tasks, share some common representations? In this paper, we demonstrate the existence of common features we call "Rosetta Neurons" across a range of models with different architectures, different tasks (generative and discriminative), and different types of supervision (class-supervised, text-supervised, self-supervised). We present an algor… ▽ More

    Submitted 16 June, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Project page: https://yossigandelsman.github.io/rosetta_neurons/

  11. arXiv:2209.07522  [pdf, other

    cs.CV cs.LG

    Test-Time Training with Masked Autoencoders

    Authors: Yossi Gandelsman, Yu Sun, Xinlei Chen, Alexei A. Efros

    Abstract: Test-time training adapts to a new test distribution on the fly by optimizing a model for each test input using self-supervision. In this paper, we use masked autoencoders for this one-sample learning problem. Empirically, our simple method improves generalization on many visual benchmarks for distribution shifts. Theoretically, we characterize this improvement in terms of the bias-variance trade-… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: Project page: https://yossigandelsman.github.io/ttt_mae/index.html

  12. arXiv:2209.00647  [pdf, other

    cs.CV

    Visual Prompting via Image Inpainting

    Authors: Amir Bar, Yossi Gandelsman, Trevor Darrell, Amir Globerson, Alexei A. Efros

    Abstract: How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification? Inspired by prompting in NLP, this paper investigates visual prompting: given input-output image example(s) of a new task at test time and a new input image, the goal is to automatically produce the output image, consistent with the given examples. We show that posing… ▽ More

    Submitted 1 September, 2022; originally announced September 2022.

    Comments: Project page: https://yossigandelsman.github.io/visual_prompt

  13. arXiv:2203.17272  [pdf, other

    cs.CV cs.GR cs.LG

    MyStyle: A Personalized Generative Prior

    Authors: Yotam Nitzan, Kfir Aberman, Qiurui He, Orly Liba, Michal Yarom, Yossi Gandelsman, Inbar Mosseri, Yael Pritch, Daniel Cohen-or

    Abstract: We introduce MyStyle, a personalized deep generative prior trained with a few shots of an individual. MyStyle allows to reconstruct, enhance and edit images of a specific person, such that the output is faithful to the person's key facial characteristics. Given a small reference set of portrait images of a person (~100), we tune the weights of a pretrained StyleGAN face generator to form a local,… ▽ More

    Submitted 6 October, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: SIGGRAPH ASIA 2022, Project webpage: https://mystyle-personalized-prior.github.io/, Video: https://youtu.be/QvOdQR3tlOc

  14. arXiv:2112.05814  [pdf, other

    cs.CV

    Deep ViT Features as Dense Visual Descriptors

    Authors: Shir Amir, Yossi Gandelsman, Shai Bagon, Tali Dekel

    Abstract: We study the use of deep features extracted from a pretrained Vision Transformer (ViT) as dense visual descriptors. We observe and empirically demonstrate that such features, when extractedfrom a self-supervised ViT model (DINO-ViT), exhibit several striking properties, including: (i) the features encode powerful, well-localized semantic information, at high spatial granularity, such as object par… ▽ More

    Submitted 15 October, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

    Comments: Revised version - high res figures

  15. arXiv:2109.01980  [pdf, other

    cs.CV cs.GR cs.LG

    Deep Saliency Prior for Reducing Visual Distraction

    Authors: Kfir Aberman, Junfeng He, Yossi Gandelsman, Inbar Mosseri, David E. Jacobs, Kai Kohlhoff, Yael Pritch, Michael Rubinstein

    Abstract: Using only a model that was trained to predict where people look at images, and no additional training data, we can produce a range of powerful editing effects for reducing distraction in images. Given an image and a mask specifying the region to edit, we backpropagate through a state-of-the-art saliency model to parameterize a differentiable editing operator, such that the saliency within the mas… ▽ More

    Submitted 4 September, 2021; originally announced September 2021.

    Comments: https://deep-saliency-prior.github.io/

  16. arXiv:2104.13369  [pdf, other

    cs.CV cs.LG cs.NE eess.IV stat.ML

    Explaining in Style: Training a GAN to explain a classifier in StyleSpace

    Authors: Oran Lang, Yossi Gandelsman, Michal Yarom, Yoav Wald, Gal Elidan, Avinatan Hassidim, William T. Freeman, Phillip Isola, Amir Globerson, Michal Irani, Inbar Mosseri

    Abstract: Image classification models can depend on multiple different semantic attributes of the image. An explanation of the decision of the classifier needs to both discover and visualize these properties. Here we present StylEx, a method for doing this, by training a generative model to specifically explain multiple attributes that underlie classifier decisions. A natural source for such attributes is t… ▽ More

    Submitted 1 September, 2021; v1 submitted 27 April, 2021; originally announced April 2021.

    Comments: Accepted to ICCV 2021. Project page: https://explaining-in-style.github.io/, Code: https://github.com/google/explaining-in-style

  17. arXiv:2003.06221  [pdf, other

    cs.CV cs.LG

    Semantic Pyramid for Image Generation

    Authors: Assaf Shocher, Yossi Gandelsman, Inbar Mosseri, Michal Yarom, Michal Irani, William T. Freeman, Tali Dekel

    Abstract: We present a novel GAN-based model that utilizes the space of deep features learned by a pre-trained classification model. Inspired by classical image pyramid representations, we construct our model as a Semantic Generation Pyramid -- a hierarchical framework which leverages the continuum of semantic information encapsulated in such deep features; this ranges from low level information contained i… ▽ More

    Submitted 16 March, 2020; v1 submitted 13 March, 2020; originally announced March 2020.

    Journal ref: IEEE Conference on Computer Vision and Pattern Recognition, 2020. CVPR 2020

  18. arXiv:1812.00467  [pdf, other

    cs.CV cs.LG

    "Double-DIP": Unsupervised Image Decomposition via Coupled Deep-Image-Priors

    Authors: Yossi Gandelsman, Assaf Shocher, Michal Irani

    Abstract: Many seemingly unrelated computer vision tasks can be viewed as a special case of image decomposition into separate layers. For example, image segmentation (separation into foreground and background layers); transparent layer separation (into reflection and transmission layers); Image dehazing (separation into a clear image and a haze map), and more. In this paper we propose a unified framework fo… ▽ More

    Submitted 5 December, 2018; v1 submitted 2 December, 2018; originally announced December 2018.

    Comments: Project page: http://www.wisdom.weizmann.ac.il/~vision/DoubleDIP/