Skip to main content

Showing 1–26 of 26 results for author: Chou, S

  1. arXiv:2401.12419  [pdf, other

    cs.CV

    Multi-modal News Understanding with Professionally Labelled Videos (ReutersViLNews)

    Authors: Shih-Han Chou, Matthew Kowal, Yasmin Niknam, Diana Moyano, Shayaan Mehdi, Richard Pito, Cheng Zhang, Ian Knopke, Sedef Akinli Kocak, Leonid Sigal, Yalda Mohsenzadeh

    Abstract: While progress has been made in the domain of video-language understanding, current state-of-the-art algorithms are still limited in their ability to understand videos at high levels of abstraction, such as news-oriented videos. Alternatively, humans easily amalgamate information from video and language to infer information beyond what is visually observable in the pixels. An example of this is wa… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  2. arXiv:2312.00050  [pdf, other

    cs.CR cs.AI cs.LG

    Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift

    Authors: Shengwei An, Sheng-Yen Chou, Kaiyuan Zhang, Qiuling Xu, Guanhong Tao, Guangyu Shen, Siyuan Cheng, Shiqing Ma, Pin-Yu Chen, Tsung-Yi Ho, Xiangyu Zhang

    Abstract: Diffusion models (DM) have become state-of-the-art generative models because of their capability to generate high-quality images from noises without adversarial training. However, they are vulnerable to backdoor attacks as reported by recent studies. When a data input (e.g., some Gaussian noise) is stamped with a trigger (e.g., a white patch), the backdoored model always generates the target image… ▽ More

    Submitted 4 February, 2024; v1 submitted 27 November, 2023; originally announced December 2023.

    Comments: AAAI 2024

  3. arXiv:2311.16646  [pdf, other

    cs.LG cs.CR

    Rethinking Backdoor Attacks on Dataset Distillation: A Kernel Method Perspective

    Authors: Ming-Yu Chung, Sheng-Yen Chou, Chia-Mu Yu, Pin-Yu Chen, Sy-Yen Kuo, Tsung-Yi Ho

    Abstract: Dataset distillation offers a potential means to enhance data efficiency in deep learning. Recent studies have shown its ability to counteract backdoor risks present in original training samples. In this study, we delve into the theoretical aspects of backdoor attacks and dataset distillation based on kernel methods. We introduce two new theory-driven trigger pattern generation methods specialized… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: 19 pages, 4 figures

  4. arXiv:2306.06874  [pdf, other

    cs.CR cs.CV cs.LG

    VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models

    Authors: Sheng-Yen Chou, Pin-Yu Chen, Tsung-Yi Ho

    Abstract: Diffusion Models (DMs) are state-of-the-art generative models that learn a reversible corruption process from iterative noise addition and denoising. They are the backbone of many generative AI applications, such as text-to-image conditional generation. However, recent studies have shown that basic unconditional DMs (e.g., DDPM and DDIM) are vulnerable to backdoor injection, a type of output manip… ▽ More

    Submitted 29 December, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

    Comments: Accepted by NeurIPS 2023, NeurIPS 2023 BUGS Workshop Oral

  5. arXiv:2303.07545  [pdf, other

    cs.CV

    Implicit and Explicit Commonsense for Multi-sentence Video Captioning

    Authors: Shih-Han Chou, James J. Little, Leonid Sigal

    Abstract: Existing dense or paragraph video captioning approaches rely on holistic representations of videos, possibly coupled with learned object/action representations, to condition hierarchical language decoders. However, they fundamentally lack the commonsense knowledge of the world required to reason about progression of events, causality, and even the function of certain objects within a scene. To add… ▽ More

    Submitted 8 January, 2024; v1 submitted 13 March, 2023; originally announced March 2023.

    Comments: The paper is under consideration at Computer Vision and Image Understanding Journal

  6. arXiv:2212.05400  [pdf, other

    cs.CV cs.CR cs.LG

    How to Backdoor Diffusion Models?

    Authors: Sheng-Yen Chou, Pin-Yu Chen, Tsung-Yi Ho

    Abstract: Diffusion models are state-of-the-art deep learning empowered generative models that are trained based on the principle of learning forward and reverse diffusion processes via progressive noise-addition and denoising. To gain a better understanding of the limitations and potential risks, this paper presents the first study on the robustness of diffusion models against backdoor attacks. Specificall… ▽ More

    Submitted 8 June, 2023; v1 submitted 10 December, 2022; originally announced December 2022.

    Comments: Accepted by CVPR 2023

  7. arXiv:2204.04090  [pdf, other

    cs.LG

    Single-level Adversarial Data Synthesis based on Neural Tangent Kernels

    Authors: Yu-Rong Zhang, Ruei-Yang Su, Sheng Yen Chou, Shan-Hung Wu

    Abstract: Abstract Generative adversarial networks (GANs) have achieved impressive performance in data synthesis and have driven the development of many applications. However, GANs are known to be hard to train due to their bilevel objective, which leads to the problems of convergence, mode collapse, and gradient vanishing. In this paper, we propose a new generative model called the generative adversarial N… ▽ More

    Submitted 20 November, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

  8. arXiv:2204.01483  [pdf, other

    cs.CY cs.LG stat.AP

    Assessing dengue fever risk in Costa Rica by using climate variables and machine learning techniques

    Authors: Luis A. Barboza, Shu-Wei Chou, Paola Vásquez, Yury E. García, Juan G. Calvo, Hugo C. Hidalgo, Fabio Sanchez

    Abstract: Dengue fever is a vector-borne disease mostly endemic to tropical and subtropical countries that affect millions every year and is considered a significant burden for public health. Its geographic distribution makes it highly sensitive to climate conditions. Here, we explore the effect of climate variables using the Generalized Additive Model for location, scale, and shape (GAMLSS) and Random Fore… ▽ More

    Submitted 23 March, 2022; originally announced April 2022.

    Comments: 13 pages, 4 figures

  9. arXiv:2112.01394  [pdf, other

    cs.MS cs.PL

    Dynamic Sparse Tensor Algebra Compilation

    Authors: Stephen Chou, Saman Amarasinghe

    Abstract: This paper shows how to generate efficient tensor algebra code that compute on dynamic sparse tensors, which have sparsity structures that evolve over time. We propose a language for precisely specifying recursive, pointer-based data structures, and we show how this language can express a wide range of dynamic data structures that support efficient modification, such as linked lists, binary search… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

    Comments: 15 pages, 16 figures

  10. arXiv:2110.00186  [pdf, other

    cs.MS cs.PL

    An Attempt to Generate Code for Symmetric Tensor Computations

    Authors: Jessica Shi, Stephen Chou, Fredrik Kjolstad, Saman Amarasinghe

    Abstract: This document describes an attempt to develop a compiler-based approach for computations with symmetric tensors. Given a computation and the symmetries of its input tensors, we derive formulas for random access under a storage scheme that eliminates redundancies; construct intermediate representations to describe the loop structure; and translate this information, using the taco tensor algebra com… ▽ More

    Submitted 30 September, 2021; originally announced October 2021.

  11. arXiv:2011.02164  [pdf, other

    cs.CV cs.CL

    An Improved Attention for Visual Question Answering

    Authors: Tanzila Rahman, Shih-Han Chou, Leonid Sigal, Giuseppe Carenini

    Abstract: We consider the problem of Visual Question Answering (VQA). Given an image and a free-form, open-ended, question, expressed in natural language, the goal of VQA system is to provide accurate answer to this question with respect to the image. The task is challenging because it requires simultaneous and intricate understanding of both visual and textual information. Attention, which captures intra-… ▽ More

    Submitted 3 June, 2021; v1 submitted 4 November, 2020; originally announced November 2020.

    Comments: 8 pages

  12. Sparse Tensor Transpositions

    Authors: Suzanne Mueller, Willow Ahrens, Stephen Chou, Fredrik Kjolstad, Saman Amarasinghe

    Abstract: We present a new algorithm for transposing sparse tensors called Quesadilla. The algorithm converts the sparse tensor data structure to a list of coordinates and sorts it with a fast multi-pass radix algorithm that exploits knowledge of the requested transposition and the tensors input partial coordinate ordering to provably minimize the number of parallel partial sorting passes. We evaluate bot… ▽ More

    Submitted 20 May, 2020; originally announced May 2020.

    Comments: This work will be the subject of a brief announcement at the 32nd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '20)

  13. arXiv:2004.03466  [pdf

    eess.IV cs.CV cs.LG

    U-Net Using Stacked Dilated Convolutions for Medical Image Segmentation

    Authors: Shuhang Wang, Szu-Yeu Hu, Eugene Cheah, Xiaohong Wang, Jingchao Wang, Lei Chen, Masoud Baikpour, Arinc Ozturk, Qian Li, Shinn-Huey Chou, Constance D. Lehman, Viksit Kumar, Anthony Samir

    Abstract: This paper proposes a novel U-Net variant using stacked dilated convolutions for medical image segmentation (SDU-Net). SDU-Net adopts the architecture of vanilla U-Net with modifications in the encoder and decoder operations (an operation indicates all the processing for feature maps of the same resolution). Unlike vanilla U-Net which incorporates two standard convolutions in each encoder/decoder… ▽ More

    Submitted 10 April, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

    Comments: 8 pages MICCAI

  14. arXiv:2001.03339  [pdf, other

    cs.CV

    Visual Question Answering on 360° Images

    Authors: Shih-Han Chou, Wei-Lun Chao, Wei-Sheng Lai, Min Sun, Ming-Hsuan Yang

    Abstract: In this work, we introduce VQA 360, a novel task of visual question answering on 360 images. Unlike a normal field-of-view image, a 360 image captures the entire visual content around the optical center of a camera, demanding more sophisticated spatial understanding and reasoning. To address this problem, we collect the first VQA 360 dataset, containing around 17,000 real-world image-question-answ… ▽ More

    Submitted 10 January, 2020; originally announced January 2020.

    Comments: Accepted to WACV 2020

  15. Automatic Generation of Efficient Sparse Tensor Format Conversion Routines

    Authors: Stephen Chou, Fredrik Kjolstad, Saman Amarasinghe

    Abstract: This paper shows how to generate code that efficiently converts sparse tensors between disparate storage formats (data layouts) such as CSR, DIA, ELL, and many others. We decompose sparse tensor conversion into three logical phases: coordinate remapping, analysis, and assembly. We then develop a language that precisely describes how different formats group together and order a tensor's nonzeros in… ▽ More

    Submitted 29 June, 2020; v1 submitted 8 January, 2020; originally announced January 2020.

    Comments: Presented at PLDI 2020

  16. arXiv:1910.01712  [pdf, other

    cs.CV

    360-Indoor: Towards Learning Real-World Objects in 360° Indoor Equirectangular Images

    Authors: Shih-Han Chou, Cheng Sun, Wen-Yen Chang, Wan-Ting Hsu, Min Sun, Jianlong Fu

    Abstract: While there are several widely used object detection datasets, current computer vision algorithms are still limited in conventional images. Such images narrow our vision in a restricted region. On the other hand, 360° images provide a thorough sight. In this paper, our goal is to provide a standard dataset to facilitate the vision and machine learning communities in 360° domain. To facilitate the… ▽ More

    Submitted 3 October, 2019; originally announced October 2019.

  17. arXiv:1812.01808  [pdf, other

    cs.IR

    Enriching Article Recommendation with Phrase Awareness

    Authors: Chia-Wei Chen, Sheng-Chuan Chou, Lun-Wei Ku

    Abstract: Recent deep learning methods for recommendation systems are highly sophisticated. For article recommendation task, a neural network encoder which generates a latent representation of the article content would prove useful. However, using raw text with embedding for models could degrade sentence meanings and deteriorate performance. In this paper, we propose PhrecSys (Phrase-based Recommendation Sy… ▽ More

    Submitted 12 December, 2018; v1 submitted 4 December, 2018; originally announced December 2018.

    Comments: AAAI 2019 Workshop on Recommender Systems Meets NLP

  18. arXiv:1812.01269  [pdf, other

    cs.SD eess.AS

    Learning to match transient sound events using attentional similarity for few-shot sound recognition

    Authors: Szu-Yu Chou, Kai-Hsiang Cheng, Jyh-Shing Roger Jang, Yi-Hsuan Yang

    Abstract: In this paper, we introduce a novel attentional similarity module for the problem of few-shot sound recognition. Given a few examples of an unseen sound event, a classifier must be quickly adapted to recognize the new sound event without much fine-tuning. The proposed attentional similarity module can be plugged into any metric-based learning method for few-shot learning, allowing the resulting mo… ▽ More

    Submitted 18 February, 2019; v1 submitted 4 December, 2018; originally announced December 2018.

    Comments: This is a pre-print version of an ICASSP 2019 paper

  19. arXiv:1804.10112  [pdf, other

    cs.MS cs.PL

    Format Abstraction for Sparse Tensor Algebra Compilers

    Authors: Stephen Chou, Fredrik Kjolstad, Saman Amarasinghe

    Abstract: This paper shows how to build a sparse tensor algebra compiler that is agnostic to tensor formats (data layouts). We develop an interface that describes formats in terms of their capabilities and properties, and show how to build a modular code generator where new formats can be added as plugins. We then describe six implementations of the interface that compose to form the dense, CSR/CSF, COO, DI… ▽ More

    Submitted 11 November, 2018; v1 submitted 23 April, 2018; originally announced April 2018.

    Comments: Presented at OOPSLA 2018

    Journal ref: Proc. ACM Program. Lang. 2, OOPSLA, Article 123 (November 2018)

  20. arXiv:1802.10495  [pdf

    eess.AS cs.AI cs.MM cs.SD

    Pop Music Highlighter: Marking the Emotion Keypoints

    Authors: Yu-Siang Huang, Szu-Yu Chou, Yi-Hsuan Yang

    Abstract: The goal of music highlight extraction is to get a short consecutive segment of a piece of music that provides an effective representation of the whole piece. In a previous work, we introduced an attention-based convolutional recurrent neural network that uses music emotion classification as a surrogate task for music highlight extraction, for Pop songs. The rationale behind that approach is that… ▽ More

    Submitted 25 September, 2018; v1 submitted 28 February, 2018; originally announced February 2018.

    Comments: Transactions of the ISMIR vol. 1, no. 1

  21. arXiv:1711.08664  [pdf, other

    cs.CV

    Self-view Grounding Given a Narrated 360° Video

    Authors: Shih-Han Chou, Yi-Chun Chen, Kuo-Hao Zeng, Hou-Ning Hu, Jianlong Fu, Min Sun

    Abstract: Narrated 360° videos are typically provided in many touring scenarios to mimic real-world experience. However, previous work has shown that smart assistance (i.e., providing visual guidance) can significantly help users to follow the Normal Field of View (NFoV) corresponding to the narrative. In this project, we aim at automatically grounding the NFoVs of a 360° video given subtitles of the narrat… ▽ More

    Submitted 23 November, 2017; originally announced November 2017.

  22. arXiv:1709.04384  [pdf, other

    stat.ML cs.LG cs.SD

    Generating Music Medleys via Playing Music Puzzle Games

    Authors: Yu-Siang Huang, Szu-Yu Chou, Yi-Hsuan Yang

    Abstract: Generating music medleys is about finding an optimal permutation of a given set of music clips. Toward this goal, we propose a self-supervised learning task, called the music puzzle game, to train neural network models to learn the sequential patterns in music. In essence, such a game requires machines to correctly sort a few multisecond music fragments. In the training stage, we learn the model b… ▽ More

    Submitted 16 November, 2017; v1 submitted 13 September, 2017; originally announced September 2017.

    Comments: Accepted at AAAI 2018

  23. arXiv:1705.06560  [pdf, other

    cs.CV

    Agent-Centric Risk Assessment: Accident Anticipation and Risky Region Localization

    Authors: Kuo-Hao Zeng, Shih-Han Chou, Fu-Hsiang Chan, Juan Carlos Niebles, Min Sun

    Abstract: For survival, a living agent must have the ability to assess risk (1) by temporally anticipating accidents before they occur, and (2) by spatially localizing risky regions in the environment to move away from threats. In this paper, we take an agent-centric approach to study the accident anticipation and risky region localization tasks. We propose a novel soft-attention Recurrent Neural Network (R… ▽ More

    Submitted 18 May, 2017; originally announced May 2017.

  24. arXiv:1704.01280  [pdf, other

    cs.SD cs.LG stat.ML

    Revisiting the problem of audio-based hit song prediction using convolutional neural networks

    Authors: Li-Chia Yang, Szu-Yu Chou, Jen-Yu Liu, Yi-Hsuan Yang, Yi-An Chen

    Abstract: Being able to predict whether a song can be a hit has impor- tant applications in the music industry. Although it is true that the popularity of a song can be greatly affected by exter- nal factors such as social and commercial influences, to which degree audio features computed from musical signals (whom we regard as internal factors) can predict song popularity is an interesting research questio… ▽ More

    Submitted 5 April, 2017; originally announced April 2017.

    Comments: To appear in the proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  25. arXiv:1703.10847  [pdf, other

    cs.SD cs.AI

    MidiNet: A Convolutional Generative Adversarial Network for Symbolic-domain Music Generation

    Authors: Li-Chia Yang, Szu-Yu Chou, Yi-Hsuan Yang

    Abstract: Most existing neural network models for music generation use recurrent neural networks. However, the recent WaveNet model proposed by DeepMind shows that convolutional neural networks (CNNs) can also generate realistic musical waveforms in the audio domain. Following this light, we investigate using CNNs for generating melody (a series of MIDI notes) one bar after another in the symbolic domain. I… ▽ More

    Submitted 18 July, 2017; v1 submitted 31 March, 2017; originally announced March 2017.

    Comments: 8 pages, Accepted to ISMIR (International Society of Music Information Retrieval) Conference 2017

  26. arXiv:1606.07722  [pdf

    cs.IR cs.AI cs.LG

    Neural Network Based Next-Song Recommendation

    Authors: Kai-Chun Hsu, Szu-Yu Chou, Yi-Hsuan Yang, Tai-Shih Chi

    Abstract: Recently, the next-item/basket recommendation system, which considers the sequential relation between bought items, has drawn attention of researchers. The utilization of sequential patterns has boosted performance on several kinds of recommendation tasks. Inspired by natural language processing (NLP) techniques, we propose a novel neural network (NN) based next-song recommender, CNN-rec, in this… ▽ More

    Submitted 24 June, 2016; originally announced June 2016.

    Comments: 5 pages, 3 figures, the 1st Workshop on Deep Learning for Recommender Systems (DLRS 2016)