Skip to main content

Showing 1–19 of 19 results for author: Huh, M

  1. arXiv:2405.14813  [pdf, other

    cs.LG

    Scalable Optimization in the Modular Norm

    Authors: Tim Large, Yang Liu, Minyoung Huh, Hyojin Bahng, Phillip Isola, Jeremy Bernstein

    Abstract: To improve performance in contemporary deep learning, one is interested in scaling up the neural network in terms of both the number and the size of the layers. When ramping up the width of a single layer, graceful scaling of training has been linked to the need to normalize the weights and their updates in the "natural norm" particular to that layer. In this paper, we significantly generalize thi… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  2. arXiv:2405.07987  [pdf, other

    cs.LG cs.AI cs.CV cs.NE

    The Platonic Representation Hypothesis

    Authors: Minyoung Huh, Brian Cheung, Tongzhou Wang, Phillip Isola

    Abstract: We argue that representations in AI models, particularly deep networks, are converging. First, we survey many examples of convergence in the literature: over time and across multiple domains, the ways by which different neural networks represent data are becoming more aligned. Next, we demonstrate convergence across data modalities: as vision models and language models get larger, they measure dis… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: Equal contributions

  3. arXiv:2402.16828  [pdf, other

    cs.LG cs.AI cs.CV

    Training Neural Networks from Scratch with Parallel Low-Rank Adapters

    Authors: Minyoung Huh, Brian Cheung, Jeremy Bernstein, Phillip Isola, Pulkit Agrawal

    Abstract: The scalability of deep learning models is fundamentally limited by computing resources, memory, and communication. Although methods like low-rank adaptation (LoRA) have reduced the cost of model finetuning, its application in model pre-training remains largely unexplored. This paper explores extending LoRA to model pre-training, identifying the inherent constraints and limitations of standard LoR… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  4. Making Short-Form Videos Accessible with Hierarchical Video Summaries

    Authors: Tess Van Daele, Akhil Iyer, Yuning Zhang, Jalyn C. Derry, Mina Huh, Amy Pavel

    Abstract: Short videos on platforms such as TikTok, Instagram Reels, and YouTube Shorts (i.e. short-form videos) have become a primary source of information and entertainment. Many short-form videos are inaccessible to blind and low vision (BLV) viewers due to their rapid visual changes, on-screen text, and music or meme-audio overlays. In our formative study, 7 BLV viewers who regularly watched short-form… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: To appear at CHI 2024

  5. arXiv:2401.06354  [pdf, other

    cs.RO

    Initial Analysis of Data-Driven Haptic Search for the Smart Suction Cup

    Authors: Jungpyo Lee, Sebastian D. Lee, Tae Myung Huh, Hannah S. Stuart

    Abstract: Suction cups offer a useful gripping solution, particularly in industrial robotics and warehouse applications. Vision-based grasp algorithms, like Dex-Net, show promise but struggle to accurately perceive dark or reflective objects, sub-resolution features, and occlusions, resulting in suction cup grip failures. In our prior work, we designed the Smart Suction Cup, which estimates the flow state w… ▽ More

    Submitted 21 October, 2023; originally announced January 2024.

  6. Haptic search with the Smart Suction Cup on adversarial objects

    Authors: Jungpyo Lee, Sebastian D. Lee, Tae Myung Huh, Hannah S. Stuart

    Abstract: Suction cups are an important gripper type in industrial robot applications, and prior literature focuses on using vision-based planners to improve grasping success in these tasks. Vision-based planners can fail due to adversarial objects or lose generalizability for unseen scenarios, without retraining learned algorithms. We propose haptic exploration to improve suction cup grasping when visual g… ▽ More

    Submitted 15 January, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: Accepted final version to appear in the IEEE Transactions on Robotics

  7. arXiv:2307.07589  [pdf, other

    cs.HC

    GenAssist: Making Image Generation Accessible

    Authors: Mina Huh, Yi-Hao Peng, Amy Pavel

    Abstract: Blind and low vision (BLV) creators use images to communicate with sighted audiences. However, creating or retrieving images is challenging for BLV creators as it is difficult to use authoring tools or assess image search results. Thus, creators limit the types of images they create or recruit sighted collaborators. While text-to-image generation models let creators generate high-fidelity images b… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: For accessibility tagged pdf, please refer to the ancillary file

  8. arXiv:2305.08842  [pdf, other

    cs.LG cs.AI

    Straightening Out the Straight-Through Estimator: Overcoming Optimization Challenges in Vector Quantized Networks

    Authors: Minyoung Huh, Brian Cheung, Pulkit Agrawal, Phillip Isola

    Abstract: This work examines the challenges of training neural networks using vector quantization using straight-through estimation. We find that a primary cause of training instability is the discrepancy between the model embedding and the code-vector distribution. We identify the factors that contribute to this issue, including the codebook gradient sparsity and the asymmetric nature of the commitment los… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

  9. arXiv:2303.00510  [pdf, other

    cs.SD cs.AI eess.AS

    A Comparison of Speech Data Augmentation Methods Using S3PRL Toolkit

    Authors: Mina Huh, Ruchira Ray, Corey Karnei

    Abstract: Data augmentations are known to improve robustness in speech-processing tasks. In this study, we summarize and compare different data augmentation strategies using S3PRL toolkit. We explore how HuBERT and wav2vec perform using different augmentation techniques (SpecAugment, Gaussian Noise, Speed Perturbation) for Phoneme Recognition (PR) and Automatic Speech Recognition (ASR) tasks. We evaluate mo… ▽ More

    Submitted 29 March, 2024; v1 submitted 27 February, 2023; originally announced March 2023.

  10. AVscript: Accessible Video Editing with Audio-Visual Scripts

    Authors: Mina Huh, Saelyne Yang, Yi-Hao Peng, Xiang 'Anthony' Chen, Young-Ho Kim, Amy Pavel

    Abstract: Sighted and blind and low vision (BLV) creators alike use videos to communicate with broad audiences. Yet, video editing remains inaccessible to BLV creators. Our formative study revealed that current video editing tools make it difficult to access the visual content, assess the visual quality, and efficiently navigate the timeline. We present AVscript, an accessible text-based video editor. AVscr… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

    Comments: CHI 2023

  11. arXiv:2209.13032  [pdf, other

    cs.CV

    Totems: Physical Objects for Verifying Visual Integrity

    Authors: Jingwei Ma, Lucy Chai, Minyoung Huh, Tongzhou Wang, Ser-Nam Lim, Phillip Isola, Antonio Torralba

    Abstract: We introduce a new approach to image forensics: placing physical refractive objects, which we call totems, into a scene so as to protect any photograph taken of that scene. Totems bend and redirect light rays, thus providing multiple, albeit distorted, views of the scene within a single image. A defender can use these distorted totem pixels to detect if an image has been manipulated. Our approach… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: ECCV 2022 camera ready version; project page https://jingweim.github.io/totems/

  12. "It Feels Like Taking a Gamble": Exploring Perceptions, Practices, and Challenges of Using Makeup and Cosmetics for People with Visual Impairments

    Authors: Franklin Mingzhe Li, Franchesca Spektor, Meng Xia, Mina Huh, Peter Cederberg, Yuqi Gong, Kristen Shinohara, Patrick Carrington

    Abstract: Makeup and cosmetics offer the potential for self-expression and the reshaping of social roles for visually impaired people. However, there exist barriers to conducting a beauty regime because of the reliance on visual information and color variances in makeup. We present a content analysis of 145 YouTube videos to demonstrate visually impaired individuals' unique practices before, during, and aft… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

    Comments: In CHI Conference on Human Factors in Computing Systems (CHI '22), April 29-May 5, 2022, New Orleans, LA, USA. ACM, New York, NY, USA, 15 pages. https://doi.org/10.1145/3491102.3517490

  13. arXiv:2110.15349  [pdf, other

    cs.LG cs.AI cs.CL cs.MA

    Learning to Ground Multi-Agent Communication with Autoencoders

    Authors: Toru Lin, Minyoung Huh, Chris Stauffer, Ser-Nam Lim, Phillip Isola

    Abstract: Communication requires having a common language, a lingua franca, between agents. This language could emerge via a consensus process, but it may require many generations of trial and error. Alternatively, the lingua franca can be given by the environment, where agents ground their language in representations of the observed world. We demonstrate a simple way to ground language in learned represent… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

    Comments: Project page, code, and videos can be found at https://toruowo.github.io/marl-ae-comm/

  14. arXiv:2105.02345  [pdf, other

    cs.RO

    A Multi-Chamber Smart Suction Cup for Adaptive Gripping and Haptic Exploration

    Authors: Tae Myung Huh, Kate Sanders, Michael Danielczuk, Monica Li, Yunliang Chen, Ken Goldberg, Hannah S. Stuart

    Abstract: We present a novel robot end-effector for gripping and haptic exploration. Tactile sensing through suction flow monitoring is applied to a new suction cup design that contains multiple chambers for air flow. Each chamber connects with its own remote pressure transducer, which enables both absolute and differential pressure measures between chambers. By changing the overall vacuum applied to this s… ▽ More

    Submitted 18 October, 2021; v1 submitted 5 May, 2021; originally announced May 2021.

  15. arXiv:2103.10427  [pdf, other

    cs.LG cs.CV

    The Low-Rank Simplicity Bias in Deep Networks

    Authors: Minyoung Huh, Hossein Mobahi, Richard Zhang, Brian Cheung, Pulkit Agrawal, Phillip Isola

    Abstract: Modern deep neural networks are highly over-parameterized compared to the data on which they are trained, yet they often generalize remarkably well. A flurry of recent work has asked: why do deep networks not overfit to their training data? In this work, we make a series of empirical observations that investigate and extend the hypothesis that deeper networks are inductively biased to find solutio… ▽ More

    Submitted 23 March, 2023; v1 submitted 18 March, 2021; originally announced March 2021.

  16. arXiv:2005.01703  [pdf, other

    cs.CV

    Transforming and Projecting Images into Class-conditional Generative Networks

    Authors: Minyoung Huh, Richard Zhang, Jun-Yan Zhu, Sylvain Paris, Aaron Hertzmann

    Abstract: We present a method for projecting an input image into the space of a class-conditional generative neural network. We propose a method that optimizes for transformation to counteract the model biases in generative neural networks. Specifically, we demonstrate that one can solve for image translation, scale, and global color transformation, during the projection optimization to address the object-c… ▽ More

    Submitted 27 August, 2020; v1 submitted 4 May, 2020; originally announced May 2020.

    Comments: Accepted to ECCV2020 (oral)

  17. Perceived Intensities of Normal and Shear Skin Stimuli using a Wearable Haptic Bracelet

    Authors: Mine Sarac, Tae Myung Huh, Hojung Choi, Mark Cutkosky, Massimiliano Di Luca, Allison M. Okamura

    Abstract: Our aim is to provide effective interaction with virtual objects, despite the lack of co-location of virtual and real-world contacts, while taking advantage of relatively large skin area and ease of mounting on the forearm. We performed two human participant studies to determine the effects of haptic feedback in the normal and shear directions during virtual manipulation using haptic devices worn… ▽ More

    Submitted 12 April, 2022; v1 submitted 5 November, 2019; originally announced November 2019.

    Comments: 8 pages, In Press IEEE Robotic Automation Letters, IEEE International Conference on Robotics and Automation

  18. arXiv:1805.04096  [pdf, other

    cs.CV

    Fighting Fake News: Image Splice Detection via Learned Self-Consistency

    Authors: Minyoung Huh, Andrew Liu, Andrew Owens, Alexei A. Efros

    Abstract: Advances in photo editing and manipulation tools have made it significantly easier to create fake imagery. Learning to detect such manipulations, however, remains a challenging problem due to the lack of sufficient amounts of manipulated training data. In this paper, we propose a learning algorithm for detecting visual image manipulations that is trained only using a large dataset of real photogra… ▽ More

    Submitted 5 September, 2018; v1 submitted 10 May, 2018; originally announced May 2018.

  19. arXiv:1608.08614  [pdf, other

    cs.CV cs.AI cs.LG

    What makes ImageNet good for transfer learning?

    Authors: Minyoung Huh, Pulkit Agrawal, Alexei A. Efros

    Abstract: The tremendous success of ImageNet-trained deep features on a wide range of transfer tasks begs the question: what are the properties of the ImageNet dataset that are critical for learning good, general-purpose features? This work provides an empirical investigation of various facets of this question: Is more pre-training data always better? How does feature quality depend on the number of trainin… ▽ More

    Submitted 10 December, 2016; v1 submitted 30 August, 2016; originally announced August 2016.