subscribe to arXiv mailings

Diffusion Model Patching via Mixture-of-Prompts

Authors: Seokil Ham, Sangmin Woo, Jin-Young Kim, Hyojun Go, Byeongjun Park, Changick Kim

Abstract: We present Diffusion Model Patching (DMP), a simple method to boost the performance of pre-trained diffusion models that have already reached convergence, with a negligible increase in parameters. DMP inserts a small, learnable set of prompts into the model's input space while keeping the original model frozen. The effectiveness of DMP is not merely due to the addition of parameters but stems from… ▽ More We present Diffusion Model Patching (DMP), a simple method to boost the performance of pre-trained diffusion models that have already reached convergence, with a negligible increase in parameters. DMP inserts a small, learnable set of prompts into the model's input space while keeping the original model frozen. The effectiveness of DMP is not merely due to the addition of parameters but stems from its dynamic gating mechanism, which selects and combines a subset of learnable prompts at every step of the generative process (e.g., reverse denoising steps). This strategy, which we term "mixture-of-prompts", enables the model to draw on the distinct expertise of each prompt, essentially "patching" the model's functionality at every step with minimal yet specialized parameters. Uniquely, DMP enhances the model by further training on the same dataset on which it was originally trained, even in a scenario where significant improvements are typically not expected due to model convergence. Experiments show that DMP significantly enhances the converged FID of DiT-L/2 on FFHQ 256x256 by 10.38%, achieved with only a 1.43% parameter increase and 50K additional training iterations. △ Less

Submitted 30 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: Project page: https://sangminwoo.github.io/DMP/

arXiv:2403.09176 [pdf, other]

Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts

Authors: Byeongjun Park, Hyojun Go, Jin-Young Kim, Sangmin Woo, Seokil Ham, Changick Kim

Abstract: Diffusion models have achieved remarkable success across a range of generative tasks. Recent efforts to enhance diffusion model architectures have reimagined them as a form of multi-task learning, where each task corresponds to a denoising task at a specific noise level. While these efforts have focused on parameter isolation and task routing, they fall short of capturing detailed inter-task relat… ▽ More Diffusion models have achieved remarkable success across a range of generative tasks. Recent efforts to enhance diffusion model architectures have reimagined them as a form of multi-task learning, where each task corresponds to a denoising task at a specific noise level. While these efforts have focused on parameter isolation and task routing, they fall short of capturing detailed inter-task relationships and risk losing semantic information, respectively. In response, we introduce Switch Diffusion Transformer (Switch-DiT), which establishes inter-task relationships between conflicting tasks without compromising semantic information. To achieve this, we employ a sparse mixture-of-experts within each transformer block to utilize semantic information and facilitate handling conflicts in tasks through parameter isolation. Additionally, we propose a diffusion prior loss, encouraging similar tasks to share their denoising paths while isolating conflicting ones. Through these, each transformer block contains a shared expert across all tasks, where the common and task-specific denoising paths enable the diffusion model to construct its beneficial way of synergizing denoising tasks. Extensive experiments validate the effectiveness of our approach in improving both image quality and convergence rate, and further analysis demonstrates that Switch-DiT constructs tailored denoising paths across various generation scenarios. △ Less

Submitted 10 July, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Comments: Project Page: https://byeongjun-park.github.io/Switch-DiT/

arXiv:2311.00428 [pdf, other]

NEO-KD: Knowledge-Distillation-Based Adversarial Training for Robust Multi-Exit Neural Networks

Authors: Seokil Ham, Jungwuk Park, Dong-Jun Han, Jaekyun Moon

Abstract: While multi-exit neural networks are regarded as a promising solution for making efficient inference via early exits, combating adversarial attacks remains a challenging problem. In multi-exit networks, due to the high dependency among different submodels, an adversarial example targeting a specific exit not only degrades the performance of the target exit but also reduces the performance of all o… ▽ More While multi-exit neural networks are regarded as a promising solution for making efficient inference via early exits, combating adversarial attacks remains a challenging problem. In multi-exit networks, due to the high dependency among different submodels, an adversarial example targeting a specific exit not only degrades the performance of the target exit but also reduces the performance of all other exits concurrently. This makes multi-exit networks highly vulnerable to simple adversarial attacks. In this paper, we propose NEO-KD, a knowledge-distillation-based adversarial training strategy that tackles this fundamental challenge based on two key contributions. NEO-KD first resorts to neighbor knowledge distillation to guide the output of the adversarial examples to tend to the ensemble outputs of neighbor exits of clean data. NEO-KD also employs exit-wise orthogonal knowledge distillation for reducing adversarial transferability across different submodels. The result is a significantly improved robustness against adversarial attacks. Experimental results on various datasets/models show that our method achieves the best adversarial accuracy with reduced computation budgets, compared to the baselines relying on existing adversarial training or knowledge distillation techniques for multi-exit networks. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: 10 pages, 4 figures, accepted by 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2307.08169 [pdf, other]

Discovering User Types: Mapping User Traits by Task-Specific Behaviors in Reinforcement Learning

Authors: L. L. Ankile, B. S. Ham, K. Mao, E. Shin, S. Swaroop, F. Doshi-Velez, W. Pan

Abstract: When assisting human users in reinforcement learning (RL), we can represent users as RL agents and study key parameters, called \emph{user traits}, to inform intervention design. We study the relationship between user behaviors (policy classes) and user traits. Given an environment, we introduce an intuitive tool for studying the breakdown of "user types": broad sets of traits that result in the s… ▽ More When assisting human users in reinforcement learning (RL), we can represent users as RL agents and study key parameters, called \emph{user traits}, to inform intervention design. We study the relationship between user behaviors (policy classes) and user traits. Given an environment, we introduce an intuitive tool for studying the breakdown of "user types": broad sets of traits that result in the same behavior. We show that seemingly different real-world environments admit the same set of user types and formalize this observation as an equivalence relation defined on environments. By transferring intervention design between environments within the same equivalence class, we can help rapidly personalize interventions. △ Less

Submitted 16 July, 2023; originally announced July 2023.

arXiv:2306.14384 [pdf, other]

Multitask Learning for Multiple Recognition Tasks: A Framework for Lower-limb Exoskeleton Robot Applications

Authors: Joonhyun Kim, Seongmin Ha, Dongbin Shin, Seoyeon Ham, Jaepil Jang, Wansoo Kim

Abstract: To control the lower-limb exoskeleton robot effectively, it is essential to accurately recognize user status and environmental conditions. Previous studies have typically addressed these recognition challenges through independent models for each task, resulting in an inefficient model development process. In this study, we propose a Multitask learning approach that can address multiple recognition… ▽ More To control the lower-limb exoskeleton robot effectively, it is essential to accurately recognize user status and environmental conditions. Previous studies have typically addressed these recognition challenges through independent models for each task, resulting in an inefficient model development process. In this study, we propose a Multitask learning approach that can address multiple recognition challenges simultaneously. This approach can enhance data efficiency by enabling knowledge sharing between each recognition model. We demonstrate the effectiveness of this approach using Gait phase recognition (GPR) and Terrain classification (TC) as examples, the most conventional recognition tasks in lower-limb exoskeleton robots. We first created a high-performing GPR model that achieved a Root mean square error (RMSE) value of 2.345 $\pm$ 0.08 and then utilized its knowledge-sharing backbone feature network to learn a TC model with an extremely limited dataset. Using a limited dataset for the TC model allows us to validate the data efficiency of our proposed Multitask learning approach. We compared the accuracy of the proposed TC model against other TC baseline models. The proposed model achieved 99.5 $\pm$ 0.044% accuracy with a limited dataset, outperforming other baseline models, demonstrating its effectiveness in terms of data efficiency. Future research will focus on extending the Multitask learning framework to encompass additional recognition tasks. △ Less

Submitted 25 June, 2023; originally announced June 2023.

Comments: Accepted for publication in the Proceedings of the 2023 IEEE International Conference on RO-MAN 2023 BUSAN, 7 pages

arXiv:2110.02797 [pdf, other]

Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs

Authors: Philipp Benz, Soomin Ham, Chaoning Zhang, Adil Karjauv, In So Kweon

Abstract: Convolutional Neural Networks (CNNs) have become the de facto gold standard in computer vision applications in the past years. Recently, however, new model architectures have been proposed challenging the status quo. The Vision Transformer (ViT) relies solely on attention modules, while the MLP-Mixer architecture substitutes the self-attention modules with Multi-Layer Perceptrons (MLPs). Despite t… ▽ More Convolutional Neural Networks (CNNs) have become the de facto gold standard in computer vision applications in the past years. Recently, however, new model architectures have been proposed challenging the status quo. The Vision Transformer (ViT) relies solely on attention modules, while the MLP-Mixer architecture substitutes the self-attention modules with Multi-Layer Perceptrons (MLPs). Despite their great success, CNNs have been widely known to be vulnerable to adversarial attacks, causing serious concerns for security-sensitive applications. Thus, it is critical for the community to know whether the newly proposed ViT and MLP-Mixer are also vulnerable to adversarial attacks. To this end, we empirically evaluate their adversarial robustness under several adversarial attack setups and benchmark them against the widely used CNNs. Overall, we find that the two architectures, especially ViT, are more robust than their CNN models. Using a toy example, we also provide empirical evidence that the lower adversarial robustness of CNNs can be partially attributed to their shift-invariant property. Our frequency analysis suggests that the most robust ViT architectures tend to rely more on low-frequency features compared with CNNs. Additionally, we have an intriguing finding that MLP-Mixer is extremely vulnerable to universal adversarial perturbations. △ Less

Submitted 11 October, 2021; v1 submitted 6 October, 2021; originally announced October 2021.

Comments: Code: https://github.com/phibenz/robustness_comparison_vit_mlp-mixer_cnn

arXiv:2009.07139 [pdf, other]

Analyzing the effect of APOE on Alzheimer's disease progression using an event-based model for stratified populations

Authors: Vikram Venkatraghavan, Stefan Klein, Lana Fani, Leontine S. Ham, Henri Vrooman, M. Kamran Ikram, Wiro J. Niessen, Esther E. Bron

Abstract: Alzheimer's disease (AD) is the most common form of dementia and is phenotypically heterogeneous. APOE is a triallelic gene which correlates with phenotypic heterogeneity in AD. In this work, we determined the effect of APOE alleles on the disease progression timeline of AD using a discriminative event-based model (DEBM). Since DEBM is a data-driven model, stratification into smaller disease subgr… ▽ More Alzheimer's disease (AD) is the most common form of dementia and is phenotypically heterogeneous. APOE is a triallelic gene which correlates with phenotypic heterogeneity in AD. In this work, we determined the effect of APOE alleles on the disease progression timeline of AD using a discriminative event-based model (DEBM). Since DEBM is a data-driven model, stratification into smaller disease subgroups would lead to more inaccurate models as compared to fitting the model on the entire dataset. Hence our secondary aim is to propose and evaluate novel approaches in which we split the different steps of DEBM into group-aspecific and group-specific parts, where the entire dataset is used to train the group-aspecific parts and only the data from a specific group is used to train the group-specific parts of the DEBM. We performed simulation experiments to benchmark the accuracy of the proposed approaches and to select the optimal approach. Subsequently, the chosen approach was applied to the baseline data of 417 cognitively normal, 235 mild cognitively impaired who convert to AD within 3 years, and 342 AD patients from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset to gain new insights into the effect of APOE carriership on the disease progression timeline of AD. The presented models could aid understanding of the disease, and in selecting homogeneous group of presymptomatic subjects at-risk of developing symptoms for clinical trials. △ Less

Submitted 15 September, 2020; originally announced September 2020.

arXiv:2008.05833 [pdf]

Experimental demonstrations of unconditional security in a purely classical regime

Authors: Byoung S. Ham

Abstract: So far, unconditional security in key distribution processes has been confined to quantum key distribution (QKD) protocols based on the no-cloning theorem of nonorthogonal bases. Recently, a completely different approach, the unconditionally secured classical key distribution (USCKD), has been proposed for unconditional security in the purely classical regime. Unlike QKD, both classical channels a… ▽ More So far, unconditional security in key distribution processes has been confined to quantum key distribution (QKD) protocols based on the no-cloning theorem of nonorthogonal bases. Recently, a completely different approach, the unconditionally secured classical key distribution (USCKD), has been proposed for unconditional security in the purely classical regime. Unlike QKD, both classical channels and orthogonal bases are key ingredients in USCKD, where unconditional security is provided by deterministic randomness via path superposition-based reversible unitary transformations in a coupled Mach-Zehnder interferometer. Here, the first experimental demonstration of the USCKD protocol is presented. △ Less

Submitted 13 August, 2020; originally announced August 2020.

Comments: 8 pages, 4 figures, 1 table

arXiv:1705.06882 [pdf, other]

QuickTalk: An Association-Free Communication Method for IoT Devices in Proximity

Authors: Seongmin Ham, Jihyung Lee, Kyunghan Lee

Abstract: IoT devices are in general considered to be straightforward to use. However, we find that there are a number of situations where the usability becomes poor. The situations include but not limited to the followings: 1) when initializing an IoT device, 2) when trying to control an IoT device which is initialized and registered by another person, and 3) when trying to control an IoT device out of man… ▽ More IoT devices are in general considered to be straightforward to use. However, we find that there are a number of situations where the usability becomes poor. The situations include but not limited to the followings: 1) when initializing an IoT device, 2) when trying to control an IoT device which is initialized and registered by another person, and 3) when trying to control an IoT device out of many of the same type. We tackle these situations by proposing a new association-free communication method, QuickTalk. QuickTalk lets a user device such as a smartphone pinpoint and activate an IoT device with the help of an IR transmitter and communicate with the pinpointed IoT device through the broadcast channel of WiFi. By the nature of its association-free communication, QuickTalk allows a user device to immediately give a command to a specific IoT device in proximity even when the IoT device is uninitialized, unregistered to the control interface of the user, or registered but being physically confused with others. Our experiments of QuickTalk implemented on Raspberry Pi 2 devices show that the end-to-end delay of QuickTalk is upper bounded by 2.5 seconds and its median is only about 0.74 seconds. We further confirm that even when an IoT device has ongoing data sessions, QuickTalk can still establish a reliable communication channel to the IoT device with little impact to the ongoing sessions. △ Less

Submitted 19 May, 2017; originally announced May 2017.

Comments: 18 pages, 9 figures, IMWUT(Ubicomp) conference

arXiv:1404.0106 [pdf]

Traffic Monitoring Using M2M Communication

Authors: Shiu Kumar, Eun Sik Ham, Seong Ro Lee

Abstract: This paper presents an intelligent traffic monitoring system using wireless vision sensor network that captures and processes the real-time video image to obtain the traffic flow rate and vehicle speeds along different urban roadways. This system will display the traffic states on the front roadways that can guide the drivers to select the right way and avoid potential traffic congestions. On the… ▽ More This paper presents an intelligent traffic monitoring system using wireless vision sensor network that captures and processes the real-time video image to obtain the traffic flow rate and vehicle speeds along different urban roadways. This system will display the traffic states on the front roadways that can guide the drivers to select the right way and avoid potential traffic congestions. On the other hand, it will also monitor the vehicle speeds and store the vehicle details, for those breaking the roadway speed limits, in its database. The real-time traffic data is processed by the Personal Computer (PC) at the sub roadway station and the traffic flow rate data is transmitted to the main roadway station Arduino 3G via email, where the data is extracted and traffic flow rate displayed. △ Less

Submitted 31 March, 2014; originally announced April 2014.

Comments: 2 pages, 2 figures, presented in local conference in Korea South

Journal ref: General Fall Conference of Korea Information and Communications Society (KICS) 2012, Seoul, South Korea, 2012, pp. 233-234

Showing 1–10 of 10 results for author: Ham, S