-
Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment
Authors:
Janghwan Lee,
Seongmin Park,
Sukjin Hong,
Minsoo Kim,
Du-Seong Chang,
Jungwook Choi
Abstract:
The rapid advancement of large language models (LLMs) has facilitated their transformation into conversational chatbots that can grasp contextual nuances and generate pertinent sentences, closely mirroring human values through advanced techniques such as instruction tuning and reinforcement learning from human feedback (RLHF). However, the computational efficiency required for LLMs, achieved throu…
▽ More
The rapid advancement of large language models (LLMs) has facilitated their transformation into conversational chatbots that can grasp contextual nuances and generate pertinent sentences, closely mirroring human values through advanced techniques such as instruction tuning and reinforcement learning from human feedback (RLHF). However, the computational efficiency required for LLMs, achieved through techniques like post-training quantization (PTQ), presents challenges such as token-flipping that can impair chatbot performance. In response, we propose a novel preference alignment approach, quantization-aware direct preference optimization (QDPO), that aligns quantized LLMs with their full-precision counterparts, improving conversational abilities. Evaluated on two instruction-tuned LLMs in various languages, QDPO demonstrated superior performance in improving conversational abilities compared to established PTQ and knowledge-distillation fine-tuning techniques, marking a significant step forward in the development of efficient and effective conversational LLMs.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Revisiting Random Walks for Learning on Graphs
Authors:
Jinwoo Kim,
Olga Zaghen,
Ayhan Suleymanzade,
Youngmin Ryou,
Seunghoon Hong
Abstract:
We revisit a simple idea for machine learning on graphs, where a random walk on a graph produces a machine-readable record, and this record is processed by a deep neural network to directly make vertex-level or graph-level predictions. We refer to these stochastic machines as random walk neural networks, and show that we can design them to be isomorphism invariant while capable of universal approx…
▽ More
We revisit a simple idea for machine learning on graphs, where a random walk on a graph produces a machine-readable record, and this record is processed by a deep neural network to directly make vertex-level or graph-level predictions. We refer to these stochastic machines as random walk neural networks, and show that we can design them to be isomorphism invariant while capable of universal approximation of graph functions in probability. A useful finding is that almost any kind of record of random walk guarantees probabilistic invariance as long as the vertices are anonymized. This enables us to record random walks in plain text and adopt a language model to read these text records to solve graph tasks. We further establish a parallelism to message passing neural networks using tools from Markov chain theory, and show that over-smoothing in message passing is alleviated by construction in random walk neural networks, while over-squashing manifests as probabilistic under-reaching. We show that random walk neural networks based on pre-trained language models can solve several hard problems on graphs, such as separating strongly regular graphs where the 3-WL test fails, counting substructures, and transductive classification on arXiv citation network without training. Code is available at https://github.com/jw9730/random-walk.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Wooly Graphs : A Mathematical Framework For Knitting
Authors:
Kathryn Gray,
Brian Bell,
Diana Sieper,
Stephen Kobourov,
Falk Schreiber,
Karsten Klein,
Seokhee Hong
Abstract:
This paper aims to develop a mathematical foundation to model knitting with graphs. We provide a precise definition for knit objects with a knot theoretic component and propose a simple undirected graph, a simple directed graph, and a directed multigraph model for any arbitrary knit object. Using these models, we propose natural categories related to the complexity of knitting structures. We use t…
▽ More
This paper aims to develop a mathematical foundation to model knitting with graphs. We provide a precise definition for knit objects with a knot theoretic component and propose a simple undirected graph, a simple directed graph, and a directed multigraph model for any arbitrary knit object. Using these models, we propose natural categories related to the complexity of knitting structures. We use these categories to explore the hardness of determining whether a knit object of each class exists for a given graph. We show that while this problem is NP-hard in general, under specific cases, there are linear and polynomial time algorithms which take advantage of unique properties of common knitting techniques. This work aims to bridge the gap between textile arts and graph theory, offering a useful and rigorous framework for analyzing knitting objects using their corresponding graphs and for generating knitting objects from graphs.
△ Less
Submitted 3 July, 2024; v1 submitted 29 June, 2024;
originally announced July 2024.
-
Light-weight End-to-End Graph Interest Network for CTR Prediction in E-commerce Search
Authors:
Pai Peng,
Yunqing Jia,
Ziqiang Zhou,
Shuang Hong,
Zichong Xiao
Abstract:
Click-through-rate (CTR) prediction has an essential impact on improving user experience and revenue in e-commerce search. With the development of deep learning, graph-based methods are well exploited to utilize graph structure extracted from user behaviors and other information to help embedding learning. However, most of the previous graph-based methods mainly focus on recommendation scenarios,…
▽ More
Click-through-rate (CTR) prediction has an essential impact on improving user experience and revenue in e-commerce search. With the development of deep learning, graph-based methods are well exploited to utilize graph structure extracted from user behaviors and other information to help embedding learning. However, most of the previous graph-based methods mainly focus on recommendation scenarios, and therefore their graph structures highly depend on item's sequential information from user behaviors, ignoring query's sequential signal and query-item correlation. In this paper, we propose a new approach named Light-weight End-to-End Graph Interest Network (EGIN) to effectively mine users' search interests and tackle previous challenges. (i) EGIN utilizes query and item's correlation and sequential information from the search system to build a heterogeneous graph for better CTR prediction in e-commerce search. (ii) EGIN's graph embedding learning shares the same training input and is jointly trained with CTR prediction, making the end-to-end framework effortless to deploy in large-scale search systems. The proposed EGIN is composed of three parts: query-item heterogeneous graph, light-weight graph sampling, and multi-interest network. The query-item heterogeneous graph captures correlation and sequential information of query and item efficiently by the proposed light-weight graph sampling. The multi-interest network is well designed to utilize graph embedding to capture various similarity relationships between query and item to enhance the final CTR prediction. We conduct extensive experiments on both public and industrial datasets to demonstrate the effectiveness of the proposed EGIN. At the same time, the training cost of graph learning is relatively low compared with the main CTR prediction task, ensuring efficiency in practical applications.
△ Less
Submitted 2 July, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
A Complete Survey on LLM-based AI Chatbots
Authors:
Sumit Kumar Dam,
Choong Seon Hong,
Yu Qiao,
Chaoning Zhang
Abstract:
The past few decades have witnessed an upsurge in data, forming the foundation for data-hungry, learning-based AI technology. Conversational agents, often referred to as AI chatbots, rely heavily on such data to train large language models (LLMs) and generate new content (knowledge) in response to user prompts. With the advent of OpenAI's ChatGPT, LLM-based chatbots have set new standards in the A…
▽ More
The past few decades have witnessed an upsurge in data, forming the foundation for data-hungry, learning-based AI technology. Conversational agents, often referred to as AI chatbots, rely heavily on such data to train large language models (LLMs) and generate new content (knowledge) in response to user prompts. With the advent of OpenAI's ChatGPT, LLM-based chatbots have set new standards in the AI community. This paper presents a complete survey of the evolution and deployment of LLM-based chatbots in various sectors. We first summarize the development of foundational chatbots, followed by the evolution of LLMs, and then provide an overview of LLM-based chatbots currently in use and those in the development phase. Recognizing AI chatbots as tools for generating new knowledge, we explore their diverse applications across various industries. We then discuss the open challenges, considering how the data used to train the LLMs and the misuse of the generated knowledge can cause several issues. Finally, we explore the future outlook to augment their efficiency and reliability in numerous applications. By addressing key milestones and the present-day context of LLM-based chatbots, our survey invites readers to delve deeper into this realm, reflecting on how their next generation will reshape conversational AI.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Lesion-Aware Cross-Phase Attention Network for Renal Tumor Subtype Classification on Multi-Phase CT Scans
Authors:
Kwang-Hyun Uhm,
Seung-Won Jung,
Sung-Hoo Hong,
Sung-Jea Ko
Abstract:
Multi-phase computed tomography (CT) has been widely used for the preoperative diagnosis of kidney cancer due to its non-invasive nature and ability to characterize renal lesions. However, since enhancement patterns of renal lesions across CT phases are different even for the same lesion type, the visual assessment by radiologists suffers from inter-observer variability in clinical practice. Altho…
▽ More
Multi-phase computed tomography (CT) has been widely used for the preoperative diagnosis of kidney cancer due to its non-invasive nature and ability to characterize renal lesions. However, since enhancement patterns of renal lesions across CT phases are different even for the same lesion type, the visual assessment by radiologists suffers from inter-observer variability in clinical practice. Although deep learning-based approaches have been recently explored for differential diagnosis of kidney cancer, they do not explicitly model the relationships between CT phases in the network design, limiting the diagnostic performance. In this paper, we propose a novel lesion-aware cross-phase attention network (LACPANet) that can effectively capture temporal dependencies of renal lesions across CT phases to accurately classify the lesions into five major pathological subtypes from time-series multi-phase CT images. We introduce a 3D inter-phase lesion-aware attention mechanism to learn effective 3D lesion features that are used to estimate attention weights describing the inter-phase relations of the enhancement patterns. We also present a multi-scale attention scheme to capture and aggregate temporal patterns of lesion features at different spatial scales for further improvement. Extensive experiments on multi-phase CT scans of kidney cancer patients from the collected dataset demonstrate that our LACPANet outperforms state-of-the-art approaches in diagnostic accuracy.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Deep Imbalanced Regression to Estimate Vascular Age from PPG Data: a Novel Digital Biomarker for Cardiovascular Health
Authors:
Guangkun Nie,
Qinghao Zhao,
Gongzheng Tang,
Jun Li,
Shenda Hong
Abstract:
Photoplethysmography (PPG) is emerging as a crucial tool for monitoring human hemodynamics, with recent studies highlighting its potential in assessing vascular aging through deep learning. However, real-world age distributions are often imbalanced, posing significant challenges for deep learning models. In this paper, we introduce a novel, simple, and effective loss function named the Dist Loss t…
▽ More
Photoplethysmography (PPG) is emerging as a crucial tool for monitoring human hemodynamics, with recent studies highlighting its potential in assessing vascular aging through deep learning. However, real-world age distributions are often imbalanced, posing significant challenges for deep learning models. In this paper, we introduce a novel, simple, and effective loss function named the Dist Loss to address deep imbalanced regression tasks. We trained a one-dimensional convolutional neural network (Net1D) incorporating the Dist Loss on the extensive UK Biobank dataset (n=502,389) to estimate vascular age from PPG signals and validate its efficacy in characterizing cardiovascular health. The model's performance was validated on a 40% held-out test set, achieving state-of-the-art results, especially in regions with small sample sizes. Furthermore, we divided the population into three subgroups based on the difference between predicted vascular age and chronological age: less than -10 years, between -10 and 10 years, and greater than 10 years. We analyzed the relationship between predicted vascular age and several cardiovascular events over a follow-up period of up to 10 years, including death, coronary heart disease, and heart failure. Our results indicate that the predicted vascular age has significant potential to reflect an individual's cardiovascular health status. Our code will be available at https://github.com/Ngk03/AI-vascular-age.
△ Less
Submitted 2 July, 2024; v1 submitted 21 June, 2024;
originally announced June 2024.
-
Design Optimization of NOMA Aided Multi-STAR-RIS for Indoor Environments: A Convex Approximation Imitated Reinforcement Learning Approach
Authors:
Yu Min Park,
Sheikh Salman Hassan,
Yan Kyaw Tun,
Eui-Nam Huh,
Walid Saad,
Choong Seon Hong
Abstract:
Sixth-generation (6G) networks leverage simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) to overcome the limitations of traditional RISs. STAR-RISs offer 360-degree full-space coverage and optimized transmission and reflection for enhanced network performance and dynamic control of the indoor propagation environment. However, deploying STAR-RISs indoors pr…
▽ More
Sixth-generation (6G) networks leverage simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) to overcome the limitations of traditional RISs. STAR-RISs offer 360-degree full-space coverage and optimized transmission and reflection for enhanced network performance and dynamic control of the indoor propagation environment. However, deploying STAR-RISs indoors presents challenges in interference mitigation, power consumption, and real-time configuration. In this work, a novel network architecture utilizing multiple access points (APs) and STAR-RISs is proposed for indoor communication. An optimization problem encompassing user assignment, access point beamforming, and STAR-RIS phase control for reflection and transmission is formulated. The inherent complexity of the formulated problem necessitates a decomposition approach for an efficient solution. This involves tackling different sub-problems with specialized techniques: a many-to-one matching algorithm is employed to assign users to appropriate access points, optimizing resource allocation. To facilitate efficient resource management, access points are grouped using a correlation-based K-means clustering algorithm. Multi-agent deep reinforcement learning (MADRL) is leveraged to optimize the control of the STAR-RIS. Within the proposed MADRL framework, a novel approach is introduced where each decision variable acts as an independent agent, enabling collaborative learning and decision-making. Additionally, the proposed MADRL approach incorporates convex approximation (CA). This technique utilizes suboptimal solutions from successive convex approximation (SCA) to accelerate policy learning for the agents, thereby leading to faster environment adaptation and convergence. Simulations demonstrate significant network utility improvements compared to baseline approaches.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Effective Rank Analysis and Regularization for Enhanced 3D Gaussian Splatting
Authors:
Junha Hyung,
Susung Hong,
Sungwon Hwang,
Jaeseong Lee,
Jaegul Choo,
Jin-Hwa Kim
Abstract:
3D reconstruction from multi-view images is one of the fundamental challenges in computer vision and graphics. Recently, 3D Gaussian Splatting (3DGS) has emerged as a promising technique capable of real-time rendering with high-quality 3D reconstruction. This method utilizes 3D Gaussian representation and tile-based splatting techniques, bypassing the expensive neural field querying. Despite its p…
▽ More
3D reconstruction from multi-view images is one of the fundamental challenges in computer vision and graphics. Recently, 3D Gaussian Splatting (3DGS) has emerged as a promising technique capable of real-time rendering with high-quality 3D reconstruction. This method utilizes 3D Gaussian representation and tile-based splatting techniques, bypassing the expensive neural field querying. Despite its potential, 3DGS encounters challenges, including needle-like artifacts, suboptimal geometries, and inaccurate normals, due to the Gaussians converging into anisotropic Gaussians with one dominant variance. We propose using effective rank analysis to examine the shape statistics of 3D Gaussian primitives, and identify the Gaussians indeed converge into needle-like shapes with the effective rank 1. To address this, we introduce effective rank as a regularization, which constrains the structure of the Gaussians. Our new regularization method enhances normal and geometry reconstruction while reducing needle-like artifacts. The approach can be integrated as an add-on module to other 3DGS variants, improving their quality without compromising visual fidelity.
△ Less
Submitted 18 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
DeepJEB: 3D Deep Learning-based Synthetic Jet Engine Bracket Dataset
Authors:
Seongjun Hong,
Yongmin Kwon,
Dongju Shin,
Jangseop Park,
Namwoo Kang
Abstract:
Recent advancements in artificial intelligence (AI) have significantly influenced various fields, including mechanical engineering. Nonetheless, the development of high-quality, diverse datasets for structural analysis still needs to be improved. Although traditional datasets, such as simulated jet engine bracket dataset, are useful, they are constrained by a small number of samples, which must be…
▽ More
Recent advancements in artificial intelligence (AI) have significantly influenced various fields, including mechanical engineering. Nonetheless, the development of high-quality, diverse datasets for structural analysis still needs to be improved. Although traditional datasets, such as simulated jet engine bracket dataset, are useful, they are constrained by a small number of samples, which must be improved for developing robust data-driven surrogate models. This study presents the DeepJEB dataset, which has been created using deep generative models and automated engineering simulation pipelines, to overcome these challenges. Moreover, this study provides comprehensive 3D geometries and their corresponding structural analysis data.
Key experiments validated the effectiveness of the DeepJEB dataset, demonstrating significant improvements in the prediction accuracy and reliability of surrogate models trained on this data. The enhanced dataset showed a broader design space and better generalization capabilities than traditional datasets. These findings highlight the potential of DeepJEB as a benchmark dataset for developing reliable surrogate models in structural engineering. The DeepJEB dataset supports advanced modeling techniques, such as graph neural networks (GNNs) and high-dimensional convolutional networks (CNNs), leveraging node-level field data for precise predictions. This dataset is set to drive innovation in engineering design applications, enabling more accurate and efficient structural performance predictions. The DeepJEB dataset is publicly accessible at: https://www.narnia.ai/dataset
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Deep Reinforcement Learning-based Quadcopter Controller: A Practical Approach and Experiments
Authors:
Truong-Dong Do,
Nguyen Xuan Mung,
Sung Kyung Hong
Abstract:
Quadcopters have been studied for decades thanks to their maneuverability and capability of operating in a variety of circumstances. However, quadcopters suffer from dynamical nonlinearity, actuator saturation, as well as sensor noise that make it challenging and time consuming to obtain accurate dynamic models and achieve satisfactory control performance. Fortunately, deep reinforcement learning…
▽ More
Quadcopters have been studied for decades thanks to their maneuverability and capability of operating in a variety of circumstances. However, quadcopters suffer from dynamical nonlinearity, actuator saturation, as well as sensor noise that make it challenging and time consuming to obtain accurate dynamic models and achieve satisfactory control performance. Fortunately, deep reinforcement learning came and has shown significant potential in system modelling and control of autonomous multirotor aerial vehicles, with recent advancements in deployment, performance enhancement, and generalization. In this paper, an end-to-end deep reinforcement learning-based controller for quadcopters is proposed that is secure for real-world implementation, data-efficient, and free of human gain adjustments. First, a novel actor-critic-based architecture is designed to map the robot states directly to the motor outputs. Then, a quadcopter dynamics-based simulator was devised to facilitate the training of the controller policy. Finally, the trained policy is deployed on a real Crazyflie nano quadrotor platform, without any additional fine-tuning process. Experimental results show that the quadcopter exhibits satisfactory performance as it tracks a given complicated trajectory, which demonstrates the effectiveness and feasibility of the proposed method and signifies its capability in filling the simulation-to-reality gap.
△ Less
Submitted 18 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Scalable Defect Detection via Traversal on Code Graph
Authors:
Zhengyao Liu,
Xitong Zhong,
Xingjing Deng,
Shuo Hong,
Xiang Gao,
Hailong Sun
Abstract:
Detecting defects and vulnerabilities in the early stage has long been a challenge in software engineering. Static analysis, a technique that inspects code without execution, has emerged as a key strategy to address this challenge. Among recent advancements, the use of graph-based representations, particularly Code Property Graph (CPG), has gained traction due to its comprehensive depiction of cod…
▽ More
Detecting defects and vulnerabilities in the early stage has long been a challenge in software engineering. Static analysis, a technique that inspects code without execution, has emerged as a key strategy to address this challenge. Among recent advancements, the use of graph-based representations, particularly Code Property Graph (CPG), has gained traction due to its comprehensive depiction of code structure and semantics. Despite the progress, existing graph-based analysis tools still face performance and scalability issues. The main bottleneck lies in the size and complexity of CPG, which makes analyzing large codebases inefficient and memory-consuming. Also, query rules used by the current tools can be over-specific. Hence, we introduce QVoG, a graph-based static analysis platform for detecting defects and vulnerabilities. It employs a compressed CPG representation to maintain a reasonable graph size, thereby enhancing the overall query efficiency. Based on the CPG, it also offers a declarative query language to simplify the queries. Furthermore, it takes a step forward to integrate machine learning to enhance the generality of vulnerability detection. For projects consisting of 1,000,000+ lines of code, QVoG can complete analysis in approximately 15 minutes, as opposed to 19 minutes with CodeQL.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
A Large Medical Model based on Visual Physiological Monitoring for Public Health
Authors:
Bin Huang,
Changchen Zhao,
Zimeng Liu,
Shenda Hong,
Baochang Zhang,
Wenjin Wang,
Hui Liu
Abstract:
The widespread outbreak of the COVID-19 pandemic has sounded a warning about the globalization challenges in public health. In this context, the establishment of large-scale public health datasets, of medical models, and of decision-making systems with a human-centric approach holds strategic significance. Recently, groundbreaking advancements have emerged in AI methods for physiological signal mo…
▽ More
The widespread outbreak of the COVID-19 pandemic has sounded a warning about the globalization challenges in public health. In this context, the establishment of large-scale public health datasets, of medical models, and of decision-making systems with a human-centric approach holds strategic significance. Recently, groundbreaking advancements have emerged in AI methods for physiological signal monitoring and disease diagnosis based on camera sensors. These approaches, requiring no specialized medical equipment, offer convenient manners of collecting large-scale medical data in response to public health events. Not only do these methods facilitate the acquisition of unbiased datasets, but also enable the development of fair large medical models (LMMs). Therefore, we outline a prospective framework and heuristic vision for a public health large medical model (PHLMM) utilizing visual-based physiological monitoring (VBPM) technology. The PHLMM can be considered as a "convenient and universal" framework for public health, advancing the United Nations' "Sustainable Development Goals 2030", particularly in its promotion of Universal Health Coverage (UHC) in low- and middle-income countries. Furthermore, this paper provides an outlook on the crucial application prospects of PHLMM in response to public health challenges and its significant role in the field of AI for medicine (AI4medicine). In summary, PHLMM serves as a solution for constructing a large-scale medical database and LMM, eliminating the issue of dataset bias and unfairness in AI models. The outcomes will contribute to the establishment of an LMM framework for public health, acting as a crucial bridge for advancing AI4medicine.
△ Less
Submitted 21 April, 2024;
originally announced June 2024.
-
Optimizing Multi-User Semantic Communication via Transfer Learning and Knowledge Distillation
Authors:
Loc X. Nguyen,
Kitae Kim,
Ye Lin Tun,
Sheikh Salman Hassan,
Yan Kyaw Tun,
Zhu Han,
Choong Seon Hong
Abstract:
Semantic communication, notable for ensuring quality of service by jointly optimizing source and channel coding, effectively extracts data semantics, reduces transmission length, and mitigates channel noise. However, most studies overlook multi-user scenarios and resource availability, limiting real-world application. This paper addresses this gap by focusing on downlink communication from a base…
▽ More
Semantic communication, notable for ensuring quality of service by jointly optimizing source and channel coding, effectively extracts data semantics, reduces transmission length, and mitigates channel noise. However, most studies overlook multi-user scenarios and resource availability, limiting real-world application. This paper addresses this gap by focusing on downlink communication from a base station to multiple users with varying computing capacities. Users employ variants of Swin transformer models for source decoding and a simple architecture for channel decoding. We propose a novel training regimen, incorporating transfer learning and knowledge distillation to improve low-computing users' performance. Extensive simulations validate the proposed methods.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Advancing Ultra-Reliable 6G: Transformer and Semantic Localization Empowered Robust Beamforming in Millimeter-Wave Communications
Authors:
Avi Deb Raha,
Kitae Kim,
Apurba Adhikary,
Mrityunjoy Gain,
Choong Seon Hong
Abstract:
Advancements in 6G wireless technology have elevated the importance of beamforming, especially for attaining ultra-high data rates via millimeter-wave (mmWave) frequency deployment. Although promising, mmWave bands require substantial beam training to achieve precise beamforming. While initial deep learning models that use RGB camera images demonstrated promise in reducing beam training overhead,…
▽ More
Advancements in 6G wireless technology have elevated the importance of beamforming, especially for attaining ultra-high data rates via millimeter-wave (mmWave) frequency deployment. Although promising, mmWave bands require substantial beam training to achieve precise beamforming. While initial deep learning models that use RGB camera images demonstrated promise in reducing beam training overhead, their performance suffers due to sensitivity to lighting and environmental variations. Due to this sensitivity, Quality of Service (QoS) fluctuates, eventually affecting the stability and dependability of networks in dynamic environments. This emphasizes a critical need for more robust solutions. This paper proposes a robust beamforming technique to ensure consistent QoS under varying environmental conditions. An optimization problem has been formulated to maximize users' data rates. To solve the formulated NP-hard optimization problem, we decompose it into two subproblems: the semantic localization problem and the optimal beam selection problem. To solve the semantic localization problem, we propose a novel method that leverages the k-means clustering and YOLOv8 model. To solve the beam selection problem, we propose a novel lightweight hybrid architecture that utilizes various data sources and a weighted entropy-based mechanism to predict the optimal beams. Rapid and accurate beam predictions are needed to maintain QoS. A novel metric, Accuracy-Complexity Efficiency (ACE), has been proposed to quantify this. Six testing scenarios have been developed to evaluate the robustness of the proposed model. Finally, the simulation result demonstrates that the proposed model outperforms several state-of-the-art baselines regarding beam prediction accuracy, received power, and ACE in the developed test scenarios.
△ Less
Submitted 21 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
SpaFL: Communication-Efficient Federated Learning with Sparse Models and Low computational Overhead
Authors:
Minsu Kim,
Walid Saad,
Merouane Debbah,
Choong Seon Hong
Abstract:
The large communication and computation overhead of federated learning (FL) is one of the main challenges facing its practical deployment over resource-constrained clients and systems. In this work, SpaFL: a communication-efficient FL framework is proposed to optimize sparse model structures with low computational overhead. In SpaFL, a trainable threshold is defined for each filter/neuron to prune…
▽ More
The large communication and computation overhead of federated learning (FL) is one of the main challenges facing its practical deployment over resource-constrained clients and systems. In this work, SpaFL: a communication-efficient FL framework is proposed to optimize sparse model structures with low computational overhead. In SpaFL, a trainable threshold is defined for each filter/neuron to prune its all connected parameters, thereby leading to structured sparsity. To optimize the pruning process itself, only thresholds are communicated between a server and clients instead of parameters, thereby learning how to prune. Further, global thresholds are used to update model parameters by extracting aggregated parameter importance. The generalization bound of SpaFL is also derived, thereby proving key insights on the relation between sparsity and performance. Experimental results show that SpaFL improves accuracy while requiring much less communication and computing resources compared to sparse baselines.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Masked Language Modeling Becomes Conditional Density Estimation for Tabular Data Synthesis
Authors:
Seunghwan An,
Gyeongdong Woo,
Jaesung Lim,
ChangHyun Kim,
Sungchul Hong,
Jong-June Jeon
Abstract:
In this paper, our goal is to generate synthetic data for heterogeneous (mixed-type) tabular datasets with high machine learning utility (MLu). Given that the MLu performance relies on accurately approximating the conditional distributions, we focus on devising a synthetic data generation method based on conditional distribution estimation. We propose a novel synthetic data generation method, MaCo…
▽ More
In this paper, our goal is to generate synthetic data for heterogeneous (mixed-type) tabular datasets with high machine learning utility (MLu). Given that the MLu performance relies on accurately approximating the conditional distributions, we focus on devising a synthetic data generation method based on conditional distribution estimation. We propose a novel synthetic data generation method, MaCoDE, by redefining the multi-class classification task of Masked Language Modeling (MLM) as histogram-based non-parametric conditional density estimation. Our proposed method enables estimating conditional densities across arbitrary combinations of target and conditional variables. Furthermore, we demonstrate that our proposed method bridges the theoretical gap between distributional learning and MLM. To validate the effectiveness of our proposed model, we conduct synthetic data generation experiments on 10 real-world datasets. Given the analogy between predicting masked input tokens in MLM and missing data imputation, we also evaluate the performance of multiple imputations on incomplete datasets with various missing data mechanisms. Moreover, our proposed model offers the advantage of enabling adjustments to data privacy levels without requiring re-training.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Data Service Maximization in Integrated Terrestrial-Non-Terrestrial 6G Networks: A Deep Reinforcement Learning Approach
Authors:
Nway Nway Ei,
Kitae Kim,
Yan Kyaw Tun,
Choong Seon Hong
Abstract:
Integrating terrestrial and non-terrestrial networks has emerged as a promising paradigm to fulfill the constantly growing demand for connectivity, low transmission delay, and quality of services (QoS). This integration brings together the strengths of terrestrial and non-terrestrial networks, such as the reliability of terrestrial networks, broad coverage, and service continuity of non-terrestria…
▽ More
Integrating terrestrial and non-terrestrial networks has emerged as a promising paradigm to fulfill the constantly growing demand for connectivity, low transmission delay, and quality of services (QoS). This integration brings together the strengths of terrestrial and non-terrestrial networks, such as the reliability of terrestrial networks, broad coverage, and service continuity of non-terrestrial networks like low earth orbit (LEO) satellites. In this work, we study a data service maximization problem in an integrated terrestrial-non-terrestrial network (I-TNT) where the ground base stations (GBSs) and LEO satellites cooperatively serve the coexisting aerial users (AUs) and ground users (GUs). Then, by considering the spectrum scarcity, interference, and QoS requirements of the users, we jointly optimize the user association, AUE's trajectory, and power allocation. To tackle the formulated mixed-integer non-convex problem, we disintegrate it into two subproblems: 1) user association problem and 2) trajectory and power allocation problem. Since the user association problem is a binary integer programming problem, we use the standard convex optimization method to solve it. Meanwhile, the trajectory and power allocation problem is solved by the deep deterministic policy gradient (DDPG) method to cope with the problem's non-convexity and dynamic network environments. Then, the two subproblems are alternately solved by the proposed iterative algorithm. By comparing with the baselines in the existing literature, extensive simulations are conducted to evaluate the performance of the proposed framework.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Improving SMOTE via Fusing Conditional VAE for Data-adaptive Noise Filtering
Authors:
Sungchul Hong,
Seunghwan An,
Jong-June Jeon
Abstract:
Recent advances in a generative neural network model extend the development of data augmentation methods. However, the augmentation methods based on the modern generative models fail to achieve notable performance for class imbalance data compared to the conventional model, the SMOTE. We investigate the problem of the generative model for imbalanced classification and introduce a framework to enha…
▽ More
Recent advances in a generative neural network model extend the development of data augmentation methods. However, the augmentation methods based on the modern generative models fail to achieve notable performance for class imbalance data compared to the conventional model, the SMOTE. We investigate the problem of the generative model for imbalanced classification and introduce a framework to enhance the SMOTE algorithm using Variational Autoencoders (VAE). Our approach systematically quantifies the density of data points in a low-dimensional latent space using the VAE, simultaneously incorporating information on class labels and classification difficulty. Then, the data points potentially degrading the augmentation are systematically excluded, and the neighboring observations are directly augmented on the data space. Empirical studies on several imbalanced datasets represent that this simple process innovatively improves the conventional SMOTE algorithm over the deep learning models. Consequently, we conclude that the selection of minority data and the interpolation in the data space are beneficial for imbalanced classification problems with a relatively small number of data points.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Enhancing Sufficient Dimension Reduction via Hellinger Correlation
Authors:
Seungbeom Hong,
Ilmun Kim,
Jun Song
Abstract:
In this work, we develop a new theory and method for sufficient dimension reduction (SDR) in single-index models, where SDR is a sub-field of supervised dimension reduction based on conditional independence. Our work is primarily motivated by the recent introduction of the Hellinger correlation as a dependency measure. Utilizing this measure, we develop a method capable of effectively detecting th…
▽ More
In this work, we develop a new theory and method for sufficient dimension reduction (SDR) in single-index models, where SDR is a sub-field of supervised dimension reduction based on conditional independence. Our work is primarily motivated by the recent introduction of the Hellinger correlation as a dependency measure. Utilizing this measure, we develop a method capable of effectively detecting the dimension reduction subspace, complete with theoretical justification. Through extensive numerical experiments, we demonstrate that our proposed method significantly enhances and outperforms existing SDR methods. This improvement is largely attributed to our proposed method's deeper understanding of data dependencies and the refinement of existing SDR techniques.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
$i$REPO: $i$mplicit Reward Pairwise Difference based Empirical Preference Optimization
Authors:
Long Tan Le,
Han Shu,
Tung-Anh Nguyen,
Choong Seon Hong,
Nguyen H. Tran
Abstract:
While astonishingly capable, large Language Models (LLM) can sometimes produce outputs that deviate from human expectations. Such deviations necessitate an alignment phase to prevent disseminating untruthful, toxic, or biased information. Traditional alignment methods based on reinforcement learning often struggle with the identified instability, whereas preference optimization methods are limited…
▽ More
While astonishingly capable, large Language Models (LLM) can sometimes produce outputs that deviate from human expectations. Such deviations necessitate an alignment phase to prevent disseminating untruthful, toxic, or biased information. Traditional alignment methods based on reinforcement learning often struggle with the identified instability, whereas preference optimization methods are limited by their overfitting to pre-collected hard-label datasets. In this paper, we propose a novel LLM alignment framework named $i$REPO, which utilizes implicit Reward pairwise difference regression for Empirical Preference Optimization. Particularly, $i$REPO employs self-generated datasets labelled by empirical human (or AI annotator) preference to iteratively refine the aligned policy through a novel regression-based loss function. Furthermore, we introduce an innovative algorithm backed by theoretical guarantees for achieving optimal results under ideal assumptions and providing a practical performance-gap result without such assumptions. Experimental results with Phi-2 and Mistral-7B demonstrate that $i$REPO effectively achieves self-alignment using soft-label, self-generated responses and the logit of empirical AI annotators. Furthermore, our approach surpasses preference optimization baselines in evaluations using the Language Model Evaluation Harness and Multi-turn benchmarks.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned Proportions
Authors:
Victor Agostinelli,
Sanghyun Hong,
Lizhong Chen
Abstract:
A promising approach to preserving model performance in linearized transformers is to employ position-based re-weighting functions. However, state-of-the-art re-weighting functions rely heavily on target sequence lengths, making it difficult or impossible to apply them to autoregressive and simultaneous tasks, where the target and sometimes even the input sequence length are unknown. To address th…
▽ More
A promising approach to preserving model performance in linearized transformers is to employ position-based re-weighting functions. However, state-of-the-art re-weighting functions rely heavily on target sequence lengths, making it difficult or impossible to apply them to autoregressive and simultaneous tasks, where the target and sometimes even the input sequence length are unknown. To address this issue, we propose Learned Proportions (LeaP) and LeaPformers. Our contribution is built on two major components. First, we generalize the dependence on explicit positional representations and sequence lengths into dependence on sequence proportions for re-weighting. Second, we replace static positional representations with dynamic proportions derived via a compact module, enabling more flexible attention concentration patterns. We evaluate LeaPformer against eight representative efficient transformers on the Long-Range Arena benchmark, showing that LeaPformer achieves the best quality-throughput trade-off, as well as LeaPformer to Wikitext-103 autoregressive language modeling and simultaneous speech-to-text translation for two language pairs, achieving competitive results.
△ Less
Submitted 18 May, 2024;
originally announced May 2024.
-
Collaborative Design for Job-Seekers with Autism: A Conceptual Framework for Future Research
Authors:
Sungsoo Ray Hong,
Marcos Zampieri,
Brittany N. Hand,
Vivian Motti,
Dongjun Chung,
Ozlem Uzuner
Abstract:
The success of employment is highly related to a job seeker's capability of communicating and collaborating with others. While leveraging one's network during the job-seeking process is intuitive to the neurotypical, this can be challenging for people with autism. Recent empirical findings have started to show how facilitating collaboration between people with autism and their social surroundings…
▽ More
The success of employment is highly related to a job seeker's capability of communicating and collaborating with others. While leveraging one's network during the job-seeking process is intuitive to the neurotypical, this can be challenging for people with autism. Recent empirical findings have started to show how facilitating collaboration between people with autism and their social surroundings through new design can improve their chances of employment. This work aims to provide actionable guidelines and conceptual frameworks that future researchers and practitioners can apply to improve collaborative design for job-seekers with autism. Built upon the literature on past technological interventions built for supporting job-seekers with autism, we define three major research challenges of (1) communication support, (2) employment stage-wise support, and (3) group work support. For each challenge, we review the current state-of-the-art practices and possible future solutions. We then suggest future designs that can provide breakthroughs from the interdisciplinary lens of human-AI collaboration, health services, group work, accessibility computing, and natural language processing.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Hard Work Does Not Always Pay Off: Poisoning Attacks on Neural Architecture Search
Authors:
Zachary Coalson,
Huazheng Wang,
Qingyun Wu,
Sanghyun Hong
Abstract:
In this paper, we study the robustness of "data-centric" approaches to finding neural network architectures (known as neural architecture search) to data distribution shifts. To audit this robustness, we present a data poisoning attack, when injected to the training data used for architecture search that can prevent the victim algorithm from finding an architecture with optimal accuracy. We first…
▽ More
In this paper, we study the robustness of "data-centric" approaches to finding neural network architectures (known as neural architecture search) to data distribution shifts. To audit this robustness, we present a data poisoning attack, when injected to the training data used for architecture search that can prevent the victim algorithm from finding an architecture with optimal accuracy. We first define the attack objective for crafting poisoning samples that can induce the victim to generate sub-optimal architectures. To this end, we weaponize existing search algorithms to generate adversarial architectures that serve as our objectives. We also present techniques that the attacker can use to significantly reduce the computational costs of crafting poisoning samples. In an extensive evaluation of our poisoning attack on a representative architecture search algorithm, we show its surprising robustness. Because our attack employs clean-label poisoning, we also evaluate its robustness against label noise. We find that random label-flipping is more effective in generating sub-optimal architectures than our clean-label attack. Our results suggests that care must be taken for the data this emerging approach uses, and future work is needed to develop robust algorithms.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
SVD-AE: Simple Autoencoders for Collaborative Filtering
Authors:
Seoyoung Hong,
Jeongwhan Choi,
Yeon-Chang Lee,
Srijan Kumar,
Noseong Park
Abstract:
Collaborative filtering (CF) methods for recommendation systems have been extensively researched, ranging from matrix factorization and autoencoder-based to graph filtering-based methods. Recently, lightweight methods that require almost no training have been recently proposed to reduce overall computation. However, existing methods still have room to improve the trade-offs among accuracy, efficie…
▽ More
Collaborative filtering (CF) methods for recommendation systems have been extensively researched, ranging from matrix factorization and autoencoder-based to graph filtering-based methods. Recently, lightweight methods that require almost no training have been recently proposed to reduce overall computation. However, existing methods still have room to improve the trade-offs among accuracy, efficiency, and robustness. In particular, there are no well-designed closed-form studies for \emph{balanced} CF in terms of the aforementioned trade-offs. In this paper, we design SVD-AE, a simple yet effective singular vector decomposition (SVD)-based linear autoencoder, whose closed-form solution can be defined based on SVD for CF. SVD-AE does not require iterative training processes as its closed-form solution can be calculated at once. Furthermore, given the noisy nature of the rating matrix, we explore the robustness against such noisy interactions of existing CF methods and our SVD-AE. As a result, we demonstrate that our simple design choice based on truncated SVD can be used to strengthen the noise robustness of the recommendation while improving efficiency. Code is available at https://github.com/seoyoungh/svd-ae.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Unicorn: U-Net for Sea Ice Forecasting with Convolutional Neural Ordinary Differential Equations
Authors:
Jaesung Park,
Sungchul Hong,
Yoonseo Cho,
Jong-June Jeon
Abstract:
Sea ice at the North Pole is vital to global climate dynamics. However, accurately forecasting sea ice poses a significant challenge due to the intricate interaction among multiple variables. Leveraging the capability to integrate multiple inputs and powerful performances seamlessly, many studies have turned to neural networks for sea ice forecasting. This paper introduces a novel deep architectur…
▽ More
Sea ice at the North Pole is vital to global climate dynamics. However, accurately forecasting sea ice poses a significant challenge due to the intricate interaction among multiple variables. Leveraging the capability to integrate multiple inputs and powerful performances seamlessly, many studies have turned to neural networks for sea ice forecasting. This paper introduces a novel deep architecture named Unicorn, designed to forecast weekly sea ice. Our model integrates multiple time series images within its architecture to enhance its forecasting performance. Moreover, we incorporate a bottleneck layer within the U-Net architecture, serving as neural ordinary differential equations with convolution operations, to capture the spatiotemporal dynamics of latent variables. Through real data analysis with datasets spanning from 1998 to 2021, our proposed model demonstrates significant improvements over state-of-the-art models in the sea ice concentration forecasting task. It achieves an average MAE improvement of 12% compared to benchmark models. Additionally, our method outperforms existing approaches in sea ice extent forecasting, achieving a classification performance improvement of approximately 18%. These experimental results show the superiority of our proposed model.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Deep Learning for Detecting and Early Predicting Chronic Obstructive Pulmonary Disease from Spirogram Time Series: A UK Biobank Study
Authors:
Shuhao Mei,
Yuxi Zhou,
Jiahao Xu,
Yuxuan Wan,
Shan Cao,
Qinghao Zhao,
Shijia Geng,
Junqing Xie,
Shenda Hong
Abstract:
Chronic Obstructive Pulmonary Disease (COPD) is a chronic inflammatory lung condition that causes airflow obstruction. The existing methods can only detect patients who already have COPD based on obvious features shown in the spirogram (In this article, the spirogram specifically involves measuring Volume-Flow curve time series). Early prediction of COPD risk is vital for monitoring COPD disease p…
▽ More
Chronic Obstructive Pulmonary Disease (COPD) is a chronic inflammatory lung condition that causes airflow obstruction. The existing methods can only detect patients who already have COPD based on obvious features shown in the spirogram (In this article, the spirogram specifically involves measuring Volume-Flow curve time series). Early prediction of COPD risk is vital for monitoring COPD disease progression, slowing it down, or even preventing its onset. However, these methods fail to early predict an individual's probability of COPD in the future based on subtle features in the spirogram. To address this gap, for the first time, we propose DeepSpiro, a method based on deep learning for early prediction of future COPD risk. DeepSpiro consists of four parts. First, we construct Volume-Flow curves guided by Time-Volume instability smoothing (SpiroSmoother) to enhance the stability of the original Volume-Flow curves precisely. Second, we extract critical features from the evolution of varied-length key patches (SpiroEncoder) to capture the key temporal evolution from original high-dimensional dynamic sequences to a unified low-dimensional temporal representation. Third, we explain the model based on temporal attention and heterogeneous feature fusion (SpiroExplainer), which integrates information from heterogeneous data such as spirogram and demographic information. Fourth, we predict the risk of COPD based on the evolution of key patch concavity (SpiroPredictor), enabling accurate prediction of the risk of disease in high-risk patients who are not yet diagnosed, for up to 1, 2, 3, 4, 5 years, and beyond. We conduct experiments on the UK Biobank dataset. Results show that DeepSpiro achieves an AUC value of 0.8328 in the task of detecting COPD. In early prediction tasks, high-risk and low-risk groups show significant differences in the future, with a p-value of <0.001.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Learning to Compose: Improving Object Centric Learning by Injecting Compositionality
Authors:
Whie Jung,
Jaehoon Yoo,
Sungjin Ahn,
Seunghoon Hong
Abstract:
Learning compositional representation is a key aspect of object-centric learning as it enables flexible systematic generalization and supports complex visual reasoning. However, most of the existing approaches rely on auto-encoding objective, while the compositionality is implicitly imposed by the architectural or algorithmic bias in the encoder. This misalignment between auto-encoding objective a…
▽ More
Learning compositional representation is a key aspect of object-centric learning as it enables flexible systematic generalization and supports complex visual reasoning. However, most of the existing approaches rely on auto-encoding objective, while the compositionality is implicitly imposed by the architectural or algorithmic bias in the encoder. This misalignment between auto-encoding objective and learning compositionality often results in failure of capturing meaningful object representations. In this study, we propose a novel objective that explicitly encourages compositionality of the representations. Built upon the existing object-centric learning framework (e.g., slot attention), our method incorporates additional constraints that an arbitrary mixture of object representations from two images should be valid by maximizing the likelihood of the composite data. We demonstrate that incorporating our objective to the existing framework consistently improves the objective-centric learning and enhances the robustness to the architectural choices.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild
Authors:
Donggyun Kim,
Seongwoong Cho,
Semin Kim,
Chong Luo,
Seunghoon Hong
Abstract:
Large language models have evolved data-efficient generalists, benefiting from the universal language interface and large-scale pre-training. However, constructing a data-efficient generalist for dense visual prediction presents a distinct challenge due to the variation in label structures across different tasks. Consequently, generalization to unseen dense prediction tasks in the low-data regime…
▽ More
Large language models have evolved data-efficient generalists, benefiting from the universal language interface and large-scale pre-training. However, constructing a data-efficient generalist for dense visual prediction presents a distinct challenge due to the variation in label structures across different tasks. Consequently, generalization to unseen dense prediction tasks in the low-data regime is not straightforward and has received less attention from previous vision generalists. In this study, we explore a universal model that can flexibly adapt to unseen dense label structures with a few examples, enabling it to serve as a data-efficient vision generalist in diverse real-world scenarios. To this end, we base our method on a powerful meta-learning framework and explore several axes to improve its performance and versatility for real-world problems, such as flexible adaptation mechanisms and scalability. We evaluate our model across a spectrum of unseen real-world scenarios where low-shot learning is desirable, including video, 3D, medical, biological, and user-interactive tasks. Equipped with a generic architecture and an effective adaptation mechanism, our model flexibly adapts to all of these tasks with at most 50 labeled images, showcasing a significant advancement over existing data-efficient generalist approaches. Codes are available at https://github.com/GitGyun/chameleon.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Understanding and Shaping Human-Technology Assemblages in the Age of Generative AI
Authors:
Josh Andres,
Chris Danta,
Andrea Bianchi,
Sungyeon Hong,
Zhuying Li,
Eduardo B. Sandoval,
Charles Martin,
Ned Cooper
Abstract:
Generative AI capabilities are rapidly transforming how we perceive, interact with, and relate to machines. This one-day workshop invites HCI researchers, designers, and practitioners to imaginatively inhabit and explore the possible futures that might emerge from humans combining generative AI capabilities into everyday technologies at massive scale. Workshop participants will craft stories, visu…
▽ More
Generative AI capabilities are rapidly transforming how we perceive, interact with, and relate to machines. This one-day workshop invites HCI researchers, designers, and practitioners to imaginatively inhabit and explore the possible futures that might emerge from humans combining generative AI capabilities into everyday technologies at massive scale. Workshop participants will craft stories, visualisations, and prototypes through scenario-based design to investigate these possible futures, resulting in the production of an open-annotated scenario library and a journal or interactions article to disseminate the findings. We aim to gather the DIS community knowledge to explore, understand and shape the relations this new interaction paradigm is forging between humans, their technologies and the environment in safe, sustainable, enriching, and responsible ways.
△ Less
Submitted 4 May, 2024; v1 submitted 28 April, 2024;
originally announced April 2024.
-
Review of Data-centric Time Series Analysis from Sample, Feature, and Period
Authors:
Chenxi Sun,
Hongyan Li,
Yaliang Li,
Shenda Hong
Abstract:
Data is essential to performing time series analysis utilizing machine learning approaches, whether for classic models or today's large language models. A good time-series dataset is advantageous for the model's accuracy, robustness, and convergence, as well as task outcomes and costs. The emergence of data-centric AI represents a shift in the landscape from model refinement to prioritizing data q…
▽ More
Data is essential to performing time series analysis utilizing machine learning approaches, whether for classic models or today's large language models. A good time-series dataset is advantageous for the model's accuracy, robustness, and convergence, as well as task outcomes and costs. The emergence of data-centric AI represents a shift in the landscape from model refinement to prioritizing data quality. Even though time-series data processing methods frequently come up in a wide range of research fields, it hasn't been well investigated as a specific topic. To fill the gap, in this paper, we systematically review different data-centric methods in time series analysis, covering a wide range of research topics. Based on the time-series data characteristics at sample, feature, and period, we propose a taxonomy for the reviewed data selection methods. In addition to discussing and summarizing their characteristics, benefits, and drawbacks targeting time-series data, we also introduce the challenges and opportunities by proposing recommendations, open problems, and possible research topics.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Translation of Multifaceted Data without Re-Training of Machine Translation Systems
Authors:
Hyeonseok Moon,
Seungyoon Lee,
Seongtae Hong,
Seungjun Lee,
Chanjun Park,
Heuiseok Lim
Abstract:
Translating major language resources to build minor language resources becomes a widely-used approach. Particularly in translating complex data points composed of multiple components, it is common to translate each component separately. However, we argue that this practice often overlooks the interrelation between components within the same data point. To address this limitation, we propose a nove…
▽ More
Translating major language resources to build minor language resources becomes a widely-used approach. Particularly in translating complex data points composed of multiple components, it is common to translate each component separately. However, we argue that this practice often overlooks the interrelation between components within the same data point. To address this limitation, we propose a novel MT pipeline that considers the intra-data relation in implementing MT for training data. In our MT pipeline, all the components in a data point are concatenated to form a single translation sequence and subsequently reconstructed to the data components after translation. We introduce a Catalyst Statement (CS) to enhance the intra-data relation, and Indicator Token (IT) to assist the decomposition of a translated sequence into its respective data components. Through our approach, we have achieved a considerable improvement in translation quality itself, along with its effectiveness as training data. Compared with the conventional approach that translates each data component separately, our method yields better training data that enhances the performance of the trained model by 2.690 points for the web page ranking (WPR) task, and 0.845 for the question generation (QG) task in the XGLUE benchmark.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
On Modeling Multi-Criteria Decision Making with Uncertain Information using Probabilistic Rules
Authors:
Shengxin Hong,
Xiuyi Fan
Abstract:
Decision-making processes often involve dealing with uncertainty, which is traditionally addressed through probabilistic models. However, in practical scenarios, assessing probabilities reliably can be challenging, compounded by diverse perceptions of probabilistic information among decision makers. To address this variability and accommodate diverse preferences regarding uncertainty, we introduce…
▽ More
Decision-making processes often involve dealing with uncertainty, which is traditionally addressed through probabilistic models. However, in practical scenarios, assessing probabilities reliably can be challenging, compounded by diverse perceptions of probabilistic information among decision makers. To address this variability and accommodate diverse preferences regarding uncertainty, we introduce the Probabilistic Abstract Decision Framework (PADF). PADF offers a structured approach for reasoning across different decision criteria, encompassing the optimistic, pessimistic, and Laplace perspectives, each tailored to distinct perceptions of uncertainty. We illustrate how PADF facilitates the computation of optimal decisions aligned with these criteria by leveraging probabilistic rules. Furthermore, we present strategies for optimizing the computational efficiency of these rules, leveraging appropriate independence assumptions to navigate the extensive search space inherent in PADF. Through these contributions, our framework provides a robust and adaptable tool for effectively navigating the complexities of decision-making under uncertainty.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
FedCCL: Federated Dual-Clustered Feature Contrast Under Domain Heterogeneity
Authors:
Yu Qiao,
Huy Q. Le,
Mengchun Zhang,
Apurba Adhikary,
Chaoning Zhang,
Choong Seon Hong
Abstract:
Federated learning (FL) facilitates a privacy-preserving neural network training paradigm through collaboration between edge clients and a central server. One significant challenge is that the distributed data is not independently and identically distributed (non-IID), typically including both intra-domain and inter-domain heterogeneity. However, recent research is limited to simply using averaged…
▽ More
Federated learning (FL) facilitates a privacy-preserving neural network training paradigm through collaboration between edge clients and a central server. One significant challenge is that the distributed data is not independently and identically distributed (non-IID), typically including both intra-domain and inter-domain heterogeneity. However, recent research is limited to simply using averaged signals as a form of regularization and only focusing on one aspect of these non-IID challenges. Given these limitations, this paper clarifies these two non-IID challenges and attempts to introduce cluster representation to address them from both local and global perspectives. Specifically, we propose a dual-clustered feature contrast-based FL framework with dual focuses. First, we employ clustering on the local representations of each client, aiming to capture intra-class information based on these local clusters at a high level of granularity. Then, we facilitate cross-client knowledge sharing by pulling the local representation closer to clusters shared by clients with similar semantics while pushing them away from clusters with dissimilar semantics. Second, since the sizes of local clusters belonging to the same class may differ for each client, we further utilize clustering on the global side and conduct averaging to create a consistent global signal for guiding each local training in a contrastive manner. Experimental results on multiple datasets demonstrate that our proposal achieves comparable or superior performance gain under intra-domain and inter-domain heterogeneity.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Logit Calibration and Feature Contrast for Robust Federated Learning on Non-IID Data
Authors:
Yu Qiao,
Chaoning Zhang,
Apurba Adhikary,
Choong Seon Hong
Abstract:
Federated learning (FL) is a privacy-preserving distributed framework for collaborative model training on devices in edge networks. However, challenges arise due to vulnerability to adversarial examples (AEs) and the non-independent and identically distributed (non-IID) nature of data distribution among devices, hindering the deployment of adversarially robust and accurate learning models at the e…
▽ More
Federated learning (FL) is a privacy-preserving distributed framework for collaborative model training on devices in edge networks. However, challenges arise due to vulnerability to adversarial examples (AEs) and the non-independent and identically distributed (non-IID) nature of data distribution among devices, hindering the deployment of adversarially robust and accurate learning models at the edge. While adversarial training (AT) is commonly acknowledged as an effective defense strategy against adversarial attacks in centralized training, we shed light on the adverse effects of directly applying AT in FL that can severely compromise accuracy, especially in non-IID challenges. Given this limitation, this paper proposes FatCC, which incorporates local logit \underline{C}alibration and global feature \underline{C}ontrast into the vanilla federated adversarial training (\underline{FAT}) process from both logit and feature perspectives. This approach can effectively enhance the federated system's robust accuracy (RA) and clean accuracy (CA). First, we propose logit calibration, where the logits are calibrated during local adversarial updates, thereby improving adversarial robustness. Second, FatCC introduces feature contrast, which involves a global alignment term that aligns each local representation with unbiased global features, thus further enhancing robustness and accuracy in federated adversarial environments. Extensive experiments across multiple datasets demonstrate that FatCC achieves comparable or superior performance gains in both CA and RA compared to other baselines.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
HyperCLOVA X Technical Report
Authors:
Kang Min Yoo,
Jaegeun Han,
Sookyo In,
Heewon Jeon,
Jisu Jeong,
Jaewook Kang,
Hyunwook Kim,
Kyung-Min Kim,
Munhyong Kim,
Sungju Kim,
Donghyun Kwak,
Hanock Kwak,
Se Jung Kwon,
Bado Lee,
Dongsoo Lee,
Gichang Lee,
Jooho Lee,
Baeseong Park,
Seongjin Shin,
Joonsang Yu,
Seolki Baek,
Sumin Byeon,
Eungsup Cho,
Dooseok Choe,
Jeesung Han
, et al. (371 additional authors not shown)
Abstract:
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t…
▽ More
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.
△ Less
Submitted 13 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models
Authors:
Yuxin Wen,
Leo Marchyok,
Sanghyun Hong,
Jonas Geiping,
Tom Goldstein,
Nicholas Carlini
Abstract:
It is commonplace to produce application-specific models by fine-tuning large pre-trained models using a small bespoke dataset. The widespread availability of foundation model checkpoints on the web poses considerable risks, including the vulnerability to backdoor attacks. In this paper, we unveil a new vulnerability: the privacy backdoor attack. This black-box privacy attack aims to amplify the p…
▽ More
It is commonplace to produce application-specific models by fine-tuning large pre-trained models using a small bespoke dataset. The widespread availability of foundation model checkpoints on the web poses considerable risks, including the vulnerability to backdoor attacks. In this paper, we unveil a new vulnerability: the privacy backdoor attack. This black-box privacy attack aims to amplify the privacy leakage that arises when fine-tuning a model: when a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. We conduct extensive experiments on various datasets and models, including both vision-language models (CLIP) and large language models, demonstrating the broad applicability and effectiveness of such an attack. Additionally, we carry out multiple ablation studies with different fine-tuning methods and inference strategies to thoroughly analyze this new threat. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
When Do "More Contexts" Help with Sarcasm Recognition?
Authors:
Ojas Nimase,
Sanghyun Hong
Abstract:
Sarcasm recognition is challenging because it needs an understanding of the true intention, which is opposite to or different from the literal meaning of the words. Prior work has addressed this challenge by developing a series of methods that provide richer $contexts$, e.g., sentiment or cultural nuances, to models. While shown to be effective individually, no study has systematically evaluated t…
▽ More
Sarcasm recognition is challenging because it needs an understanding of the true intention, which is opposite to or different from the literal meaning of the words. Prior work has addressed this challenge by developing a series of methods that provide richer $contexts$, e.g., sentiment or cultural nuances, to models. While shown to be effective individually, no study has systematically evaluated their collective effectiveness. As a result, it remains unclear to what extent additional contexts can improve sarcasm recognition. In this work, we explore the improvements that existing methods bring by incorporating more contexts into a model. To this end, we develop a framework where we can integrate multiple contextual cues and test different approaches. In evaluation with four approaches on three sarcasm recognition benchmarks, we achieve existing state-of-the-art performances and also demonstrate the benefits of sequentially adding more contexts. We also identify inherent drawbacks of using more contexts, highlighting that in the pursuit of even better results, the model may need to adopt societal biases.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Diffusion Denoising as a Certified Defense against Clean-label Poisoning
Authors:
Sanghyun Hong,
Nicholas Carlini,
Alexey Kurakin
Abstract:
We present a certified defense to clean-label poisoning attacks. These attacks work by injecting a small number of poisoning samples (e.g., 1%) that contain $p$-norm bounded adversarial perturbations into the training data to induce a targeted misclassification of a test-time input. Inspired by the adversarial robustness achieved by $denoised$ $smoothing$, we show how an off-the-shelf diffusion mo…
▽ More
We present a certified defense to clean-label poisoning attacks. These attacks work by injecting a small number of poisoning samples (e.g., 1%) that contain $p$-norm bounded adversarial perturbations into the training data to induce a targeted misclassification of a test-time input. Inspired by the adversarial robustness achieved by $denoised$ $smoothing$, we show how an off-the-shelf diffusion model can sanitize the tampered training data. We extensively test our defense against seven clean-label poisoning attacks and reduce their attack success to 0-16% with only a negligible drop in the test time accuracy. We compare our defense with existing countermeasures against clean-label poisoning, showing that the defense reduces the attack success the most and offers the best model utility. Our results highlight the need for future work on developing stronger clean-label attacks and using our certified yet practical defense as a strong baseline to evaluate these attacks.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Unifying Feature and Cost Aggregation with Transformers for Semantic and Visual Correspondence
Authors:
Sunghwan Hong,
Seokju Cho,
Seungryong Kim,
Stephen Lin
Abstract:
This paper introduces a Transformer-based integrative feature and cost aggregation network designed for dense matching tasks. In the context of dense matching, many works benefit from one of two forms of aggregation: feature aggregation, which pertains to the alignment of similar features, or cost aggregation, a procedure aimed at instilling coherence in the flow estimates across neighboring pixel…
▽ More
This paper introduces a Transformer-based integrative feature and cost aggregation network designed for dense matching tasks. In the context of dense matching, many works benefit from one of two forms of aggregation: feature aggregation, which pertains to the alignment of similar features, or cost aggregation, a procedure aimed at instilling coherence in the flow estimates across neighboring pixels. In this work, we first show that feature aggregation and cost aggregation exhibit distinct characteristics and reveal the potential for substantial benefits stemming from the judicious use of both aggregation processes. We then introduce a simple yet effective architecture that harnesses self- and cross-attention mechanisms to show that our approach unifies feature aggregation and cost aggregation and effectively harnesses the strengths of both techniques. Within the proposed attention layers, the features and cost volume both complement each other, and the attention layers are interleaved through a coarse-to-fine design to further promote accurate correspondence estimation. Finally at inference, our network produces multi-scale predictions, computes their confidence scores, and selects the most confident flow for final prediction. Our framework is evaluated on standard benchmarks for semantic matching, and also applied to geometric matching, where we show that our approach achieves significant improvements compared to existing methods.
△ Less
Submitted 22 April, 2024; v1 submitted 17 March, 2024;
originally announced March 2024.
-
ArgMed-Agents: Explainable Clinical Decision Reasoning with LLM Disscusion via Argumentation Schemes
Authors:
Shengxin Hong,
Liang Xiao,
Xin Zhang,
Jianxia Chen
Abstract:
There are two main barriers to using large language models (LLMs) in clinical reasoning. Firstly, while LLMs exhibit significant promise in Natural Language Processing (NLP) tasks, their performance in complex reasoning and planning falls short of expectations. Secondly, LLMs use uninterpretable methods to make clinical decisions that are fundamentally different from the clinician's cognitive proc…
▽ More
There are two main barriers to using large language models (LLMs) in clinical reasoning. Firstly, while LLMs exhibit significant promise in Natural Language Processing (NLP) tasks, their performance in complex reasoning and planning falls short of expectations. Secondly, LLMs use uninterpretable methods to make clinical decisions that are fundamentally different from the clinician's cognitive processes. This leads to user distrust. In this paper, we present a multi-agent framework called ArgMed-Agents, which aims to enable LLM-based agents to make explainable clinical decision reasoning through interaction. ArgMed-Agents performs self-argumentation iterations via Argumentation Scheme for Clinical Discussion (a reasoning mechanism for modeling cognitive processes in clinical reasoning), and then constructs the argumentation process as a directed graph representing conflicting relationships. Ultimately, use symbolic solver to identify a series of rational and coherent arguments to support decision. We construct a formal model of ArgMed-Agents and present conjectures for theoretical guarantees. ArgMed-Agents enables LLMs to mimic the process of clinical argumentative reasoning by generating explanations of reasoning in a self-directed manner. The setup experiments show that ArgMed-Agents not only improves accuracy in complex clinical decision reasoning problems compared to other prompt methods, but more importantly, it provides users with decision explanations that increase their confidence.
△ Less
Submitted 20 June, 2024; v1 submitted 10 March, 2024;
originally announced March 2024.
-
Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation
Authors:
Joseph Cho,
Fachrina Dewi Puspitasari,
Sheng Zheng,
Jingyao Zheng,
Lik-Hang Lee,
Tae-Ho Kim,
Choong Seon Hong,
Chaoning Zhang
Abstract:
The evolution of video generation from text, starting with animating MNIST numbers to simulating the physical world with Sora, has progressed at a breakneck speed over the past seven years. While often seen as a superficial expansion of the predecessor text-to-image generation model, text-to-video generation models are developed upon carefully engineered constituents. Here, we systematically discu…
▽ More
The evolution of video generation from text, starting with animating MNIST numbers to simulating the physical world with Sora, has progressed at a breakneck speed over the past seven years. While often seen as a superficial expansion of the predecessor text-to-image generation model, text-to-video generation models are developed upon carefully engineered constituents. Here, we systematically discuss these elements consisting of but not limited to core building blocks (vision, language, and temporal) and supporting features from the perspective of their contributions to achieving a world model. We employ the PRISMA framework to curate 97 impactful research articles from renowned scientific databases primarily studying video synthesis using text conditions. Upon minute exploration of these manuscripts, we observe that text-to-video generation involves more intricate technologies beyond the plain extension of text-to-image generation. Our additional review into the shortcomings of Sora-generated videos pinpoints the call for more in-depth studies in various enabling aspects of video generation such as dataset, evaluation metric, efficient architecture, and human-controlled generation. Finally, we conclude that the study of the text-to-video generation may still be in its infancy, requiring contribution from the cross-discipline research community towards its advancement as the first step to realize artificial general intelligence (AGI).
△ Less
Submitted 7 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
A 28.6 mJ/iter Stable Diffusion Processor for Text-to-Image Generation with Patch Similarity-based Sparsity Augmentation and Text-based Mixed-Precision
Authors:
Jiwon Choi,
Wooyoung Jo,
Seongyon Hong,
Beomseok Kwon,
Wonhoon Park,
Hoi-Jun Yoo
Abstract:
This paper presents an energy-efficient stable diffusion processor for text-to-image generation. While stable diffusion attained attention for high-quality image synthesis results, its inherent characteristics hinder its deployment on mobile platforms. The proposed processor achieves high throughput and energy efficiency with three key features as solutions: 1) Patch similarity-based sparsity augm…
▽ More
This paper presents an energy-efficient stable diffusion processor for text-to-image generation. While stable diffusion attained attention for high-quality image synthesis results, its inherent characteristics hinder its deployment on mobile platforms. The proposed processor achieves high throughput and energy efficiency with three key features as solutions: 1) Patch similarity-based sparsity augmentation (PSSA) to reduce external memory access (EMA) energy of self-attention score by 60.3 %, leading to 37.8 % total EMA energy reduction. 2) Text-based important pixel spotting (TIPS) to allow 44.8 % of the FFN layer workload to be processed with low-precision activation. 3) Dual-mode bit-slice core (DBSC) architecture to enhance energy efficiency in FFN layers by 43.0 %. The proposed processor is implemented in 28 nm CMOS technology and achieves 3.84 TOPS peak throughput with 225.6 mW average power consumption. In sum, 28.6 mJ/iteration highly energy-efficient text-to-image generation processor can be achieved at MS-COCO dataset.
△ Less
Submitted 14 March, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Towards Robust Federated Learning via Logits Calibration on Non-IID Data
Authors:
Yu Qiao,
Apurba Adhikary,
Chaoning Zhang,
Choong Seon Hong
Abstract:
Federated learning (FL) is a privacy-preserving distributed management framework based on collaborative model training of distributed devices in edge networks. However, recent studies have shown that FL is vulnerable to adversarial examples (AEs), leading to a significant drop in its performance. Meanwhile, the non-independent and identically distributed (non-IID) challenge of data distribution be…
▽ More
Federated learning (FL) is a privacy-preserving distributed management framework based on collaborative model training of distributed devices in edge networks. However, recent studies have shown that FL is vulnerable to adversarial examples (AEs), leading to a significant drop in its performance. Meanwhile, the non-independent and identically distributed (non-IID) challenge of data distribution between edge devices can further degrade the performance of models. Consequently, both AEs and non-IID pose challenges to deploying robust learning models at the edge. In this work, we adopt the adversarial training (AT) framework to improve the robustness of FL models against adversarial example (AE) attacks, which can be termed as federated adversarial training (FAT). Moreover, we address the non-IID challenge by implementing a simple yet effective logits calibration strategy under the FAT framework, which can enhance the robustness of models when subjected to adversarial attacks. Specifically, we employ a direct strategy to adjust the logits output by assigning higher weights to classes with small samples during training. This approach effectively tackles the class imbalance in the training data, with the goal of mitigating biases between local and global models. Experimental results on three dataset benchmarks, MNIST, Fashion-MNIST, and CIFAR-10 show that our strategy achieves competitive results in natural and robust accuracy compared to several baselines.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Closing the Knowledge Gap in Designing Data Annotation Interfaces for AI-powered Disaster Management Analytic Systems
Authors:
Zinat Ara,
Hossein Salemi,
Sungsoo Ray Hong,
Yasas Senarath,
Steve Peterson,
Amanda Lee Hughes,
Hemant Purohit
Abstract:
Data annotation interfaces predominantly leverage ground truth labels to guide annotators toward accurate responses. With the growing adoption of Artificial Intelligence (AI) in domain-specific professional tasks, it has become increasingly important to help beginning annotators identify how their early-stage knowledge can lead to inaccurate answers, which in turn, helps to ensure quality annotati…
▽ More
Data annotation interfaces predominantly leverage ground truth labels to guide annotators toward accurate responses. With the growing adoption of Artificial Intelligence (AI) in domain-specific professional tasks, it has become increasingly important to help beginning annotators identify how their early-stage knowledge can lead to inaccurate answers, which in turn, helps to ensure quality annotations at scale. To investigate this issue, we conducted a formative study involving eight individuals from the field of disaster management, each possessing varying levels of expertise. The goal was to understand the prevalent factors contributing to disagreements among annotators when classifying Twitter messages related to disasters and to analyze their respective responses. Our analysis identified two primary causes of disagreement between expert and beginner annotators: 1) a lack of contextual knowledge or uncertainty about the situation, and 2) the absence of visual or supplementary cues. Based on these findings, we designed a Context interface, which generates aids that help beginners identify potential mistakes and provide the hidden context of the presented tweet. The summative study compares Context design with two widely used designs in data annotation UI, Highlight and Reasoning-based interfaces. We found significant differences between these designs in terms of attitudinal and behavioral data. We conclude with implications for designing future interfaces aiming at closing the knowledge gap among annotators.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Collaborative Job Seeking for People with Autism: Challenges and Design Opportunities
Authors:
Zinat Ara,
Amrita Ganguly,
Donna Peppard,
Dongjun Chung,
Slobodan Vucetic,
Vivian Genaro Motti,
Sungsoo Ray Hong
Abstract:
Successful job search results from job seekers' well-shaped social communication. While well-known differences in communication exist between people with autism and neurotypicals, little is known about how people with autism collaborate with their social surroundings to strive in the job market. To better understand the practices and challenges of collaborative job seeking for people with autism,…
▽ More
Successful job search results from job seekers' well-shaped social communication. While well-known differences in communication exist between people with autism and neurotypicals, little is known about how people with autism collaborate with their social surroundings to strive in the job market. To better understand the practices and challenges of collaborative job seeking for people with autism, we interviewed 20 participants including applicants with autism, their social surroundings, and career experts. Through the interviews, we identified social challenges that people with autism face during their job seeking; the social support they leverage to be successful; and the technological limitations that hinder their collaboration. We designed four probes that represent major collaborative features found from the interviews--executive planning, communication, stage-wise preparation, and neurodivergent community formation--and discussed their potential usefulness and impact through three focus groups. We provide implications regarding how our findings can enhance collaborative job seeking experiences for people with autism through new designs.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Data Interpreter: An LLM Agent For Data Science
Authors:
Sirui Hong,
Yizhang Lin,
Bang Liu,
Bangbang Liu,
Binhao Wu,
Danyang Li,
Jiaqi Chen,
Jiayi Zhang,
Jinlin Wang,
Li Zhang,
Lingyao Zhang,
Min Yang,
Mingchen Zhuge,
Taicheng Guo,
Tuo Zhou,
Wei Tao,
Wenyi Wang,
Xiangru Tang,
Xiangtao Lu,
Xiawu Zheng,
Xinbing Liang,
Yaying Fei,
Yuheng Cheng,
Zongze Xu,
Chenglin Wu
Abstract:
Large Language Model (LLM)-based agents have demonstrated remarkable effectiveness. However, their performance can be compromised in data science scenarios that require real-time data adjustment, expertise in optimization due to complex dependencies among various tasks, and the ability to identify logical errors for precise reasoning. In this study, we introduce the Data Interpreter, a solution de…
▽ More
Large Language Model (LLM)-based agents have demonstrated remarkable effectiveness. However, their performance can be compromised in data science scenarios that require real-time data adjustment, expertise in optimization due to complex dependencies among various tasks, and the ability to identify logical errors for precise reasoning. In this study, we introduce the Data Interpreter, a solution designed to solve with code that emphasizes three pivotal techniques to augment problem-solving in data science: 1) dynamic planning with hierarchical graph structures for real-time data adaptability;2) tool integration dynamically to enhance code proficiency during execution, enriching the requisite expertise;3) logical inconsistency identification in feedback, and efficiency enhancement through experience recording. We evaluate the Data Interpreter on various data science and real-world tasks. Compared to open-source baselines, it demonstrated superior performance, exhibiting significant improvements in machine learning tasks, increasing from 0.86 to 0.95. Additionally, it showed a 26% increase in the MATH dataset and a remarkable 112% improvement in open-ended tasks. The solution will be released at https://github.com/geekan/MetaGPT.
△ Less
Submitted 12 March, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
PAC-FNO: Parallel-Structured All-Component Fourier Neural Operators for Recognizing Low-Quality Images
Authors:
Jinsung Jeon,
Hyundong Jin,
Jonghyun Choi,
Sanghyun Hong,
Dongeun Lee,
Kookjin Lee,
Noseong Park
Abstract:
A standard practice in developing image recognition models is to train a model on a specific image resolution and then deploy it. However, in real-world inference, models often encounter images different from the training sets in resolution and/or subject to natural variations such as weather changes, noise types and compression artifacts. While traditional solutions involve training multiple mode…
▽ More
A standard practice in developing image recognition models is to train a model on a specific image resolution and then deploy it. However, in real-world inference, models often encounter images different from the training sets in resolution and/or subject to natural variations such as weather changes, noise types and compression artifacts. While traditional solutions involve training multiple models for different resolutions or input variations, these methods are computationally expensive and thus do not scale in practice. To this end, we propose a novel neural network model, parallel-structured and all-component Fourier neural operator (PAC-FNO), that addresses the problem. Unlike conventional feed-forward neural networks, PAC-FNO operates in the frequency domain, allowing it to handle images of varying resolutions within a single model. We also propose a two-stage algorithm for training PAC-FNO with a minimal modification to the original, downstream model. Moreover, the proposed PAC-FNO is ready to work with existing image recognition models. Extensively evaluating methods with seven image recognition benchmarks, we show that the proposed PAC-FNO improves the performance of existing baseline models on images with various resolutions by up to 77.1% and various types of natural variations in the images at inference.
△ Less
Submitted 14 March, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Transformers with Attentive Federated Aggregation for Time Series Stock Forecasting
Authors:
Chu Myaet Thwal,
Ye Lin Tun,
Kitae Kim,
Seong-Bae Park,
Choong Seon Hong
Abstract:
Recent innovations in transformers have shown their superior performance in natural language processing (NLP) and computer vision (CV). The ability to capture long-range dependencies and interactions in sequential data has also triggered a great interest in time series modeling, leading to the widespread use of transformers in many time series applications. However, being the most common and cruci…
▽ More
Recent innovations in transformers have shown their superior performance in natural language processing (NLP) and computer vision (CV). The ability to capture long-range dependencies and interactions in sequential data has also triggered a great interest in time series modeling, leading to the widespread use of transformers in many time series applications. However, being the most common and crucial application, the adaptation of transformers to time series forecasting has remained limited, with both promising and inconsistent results. In contrast to the challenges in NLP and CV, time series problems not only add the complexity of order or temporal dependence among input sequences but also consider trend, level, and seasonality information that much of this data is valuable for decision making. The conventional training scheme has shown deficiencies regarding model overfitting, data scarcity, and privacy issues when working with transformers for a forecasting task. In this work, we propose attentive federated transformers for time series stock forecasting with better performance while preserving the privacy of participating enterprises. Empirical results on various stock data from the Yahoo! Finance website indicate the superiority of our proposed scheme in dealing with the above challenges and data heterogeneity in federated learning.
△ Less
Submitted 22 January, 2024;
originally announced February 2024.
-
Retrieval-Augmented Score Distillation for Text-to-3D Generation
Authors:
Junyoung Seo,
Susung Hong,
Wooseok Jang,
Inès Hyeonsu Kim,
Minseop Kwak,
Doyup Lee,
Seungryong Kim
Abstract:
Text-to-3D generation has achieved significant success by incorporating powerful 2D diffusion models, but insufficient 3D prior knowledge also leads to the inconsistency of 3D geometry. Recently, since large-scale multi-view datasets have been released, fine-tuning the diffusion model on the multi-view datasets becomes a mainstream to solve the 3D inconsistency problem. However, it has confronted…
▽ More
Text-to-3D generation has achieved significant success by incorporating powerful 2D diffusion models, but insufficient 3D prior knowledge also leads to the inconsistency of 3D geometry. Recently, since large-scale multi-view datasets have been released, fine-tuning the diffusion model on the multi-view datasets becomes a mainstream to solve the 3D inconsistency problem. However, it has confronted with fundamental difficulties regarding the limited quality and diversity of 3D data, compared with 2D data. To sidestep these trade-offs, we explore a retrieval-augmented approach tailored for score distillation, dubbed ReDream. We postulate that both expressiveness of 2D diffusion models and geometric consistency of 3D assets can be fully leveraged by employing the semantically relevant assets directly within the optimization process. To this end, we introduce novel framework for retrieval-based quality enhancement in text-to-3D generation. We leverage the retrieved asset to incorporate its geometric prior in the variational objective and adapt the diffusion model's 2D prior toward view consistency, achieving drastic improvements in both geometry and fidelity of generated scenes. We conduct extensive experiments to demonstrate that ReDream exhibits superior quality with increased geometric consistency. Project page is available at https://ku-cvlab.github.io/ReDream/.
△ Less
Submitted 2 May, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.