-
CLST: Cold-Start Mitigation in Knowledge Tracing by Aligning a Generative Language Model as a Students' Knowledge Tracer
Authors:
Heeseok Jung,
Jaesang Yoo,
Yohaan Yoon,
Yeonju Jang
Abstract:
Knowledge tracing (KT), wherein students' problem-solving histories are used to estimate their current levels of knowledge, has attracted significant interest from researchers. However, most existing KT models were developed with an ID-based paradigm, which exhibits limitations in cold-start performance. These limitations can be mitigated by leveraging the vast quantities of external knowledge pos…
▽ More
Knowledge tracing (KT), wherein students' problem-solving histories are used to estimate their current levels of knowledge, has attracted significant interest from researchers. However, most existing KT models were developed with an ID-based paradigm, which exhibits limitations in cold-start performance. These limitations can be mitigated by leveraging the vast quantities of external knowledge possessed by generative large language models (LLMs). In this study, we propose cold-start mitigation in knowledge tracing by aligning a generative language model as a students' knowledge tracer (CLST) as a framework that utilizes a generative LLM as a knowledge tracer. Upon collecting data from math, social studies, and science subjects, we framed the KT task as a natural language processing task, wherein problem-solving data are expressed in natural language, and fine-tuned the generative LLM using the formatted KT dataset. Subsequently, we evaluated the performance of the CLST in situations of data scarcity using various baseline models for comparison. The results indicate that the CLST significantly enhanced performance with a dataset of fewer than 100 students in terms of prediction, reliability, and cross-domain generalization.
△ Less
Submitted 17 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Speed-up of Data Analysis with Kernel Trick in Encrypted Domain
Authors:
Joon Soo Yoo,
Baek Kyung Song,
Tae Min Ahn,
Ji Won Heo,
Ji Won Yoon
Abstract:
Homomorphic encryption (HE) is pivotal for secure computation on encrypted data, crucial in privacy-preserving data analysis. However, efficiently processing high-dimensional data in HE, especially for machine learning and statistical (ML/STAT) algorithms, poses a challenge. In this paper, we present an effective acceleration method using the kernel method for HE schemes, enhancing time performanc…
▽ More
Homomorphic encryption (HE) is pivotal for secure computation on encrypted data, crucial in privacy-preserving data analysis. However, efficiently processing high-dimensional data in HE, especially for machine learning and statistical (ML/STAT) algorithms, poses a challenge. In this paper, we present an effective acceleration method using the kernel method for HE schemes, enhancing time performance in ML/STAT algorithms within encrypted domains. This technique, independent of underlying HE mechanisms and complementing existing optimizations, notably reduces costly HE multiplications, offering near constant time complexity relative to data dimension. Aimed at accessibility, this method is tailored for data scientists and developers with limited cryptography background, facilitating advanced data analysis in secure environments.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Online Continual Learning of Video Diffusion Models From a Single Video Stream
Authors:
Jason Yoo,
Dylan Green,
Geoff Pleiss,
Frank Wood
Abstract:
Diffusion models have shown exceptional capabilities in generating realistic videos. Yet, their training has been predominantly confined to offline environments where models can repeatedly train on i.i.d. data to convergence. This work explores the feasibility of training diffusion models from a semantically continuous video stream, where correlated video frames sequentially arrive one at a time.…
▽ More
Diffusion models have shown exceptional capabilities in generating realistic videos. Yet, their training has been predominantly confined to offline environments where models can repeatedly train on i.i.d. data to convergence. This work explores the feasibility of training diffusion models from a semantically continuous video stream, where correlated video frames sequentially arrive one at a time. To investigate this, we introduce two novel continual video generative modeling benchmarks, Lifelong Bouncing Balls and Windows 95 Maze Screensaver, each containing over a million video frames generated from navigating stationary environments. Surprisingly, our experiments show that diffusion models can be effectively trained online using experience replay, achieving performance comparable to models trained with i.i.d. samples given the same number of gradient steps.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Decompose, Enrich, and Extract! Schema-aware Event Extraction using LLMs
Authors:
Fatemeh Shiri,
Van Nguyen,
Farhad Moghimifar,
John Yoo,
Gholamreza Haffari,
Yuan-Fang Li
Abstract:
Large Language Models (LLMs) demonstrate significant capabilities in processing natural language data, promising efficient knowledge extraction from diverse textual sources to enhance situational awareness and support decision-making. However, concerns arise due to their susceptibility to hallucination, resulting in contextually inaccurate content. This work focuses on harnessing LLMs for automate…
▽ More
Large Language Models (LLMs) demonstrate significant capabilities in processing natural language data, promising efficient knowledge extraction from diverse textual sources to enhance situational awareness and support decision-making. However, concerns arise due to their susceptibility to hallucination, resulting in contextually inaccurate content. This work focuses on harnessing LLMs for automated Event Extraction, introducing a new method to address hallucination by decomposing the task into Event Detection and Event Argument Extraction. Moreover, the proposed method integrates dynamic schema-aware augmented retrieval examples into prompts tailored for each specific inquiry, thereby extending and adapting advanced prompting techniques such as Retrieval-Augmented Generation. Evaluation findings on prominent event extraction benchmarks and results from a synthesized benchmark illustrate the method's superior performance compared to baseline approaches.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Nickel and Diming Your GAN: A Dual-Method Approach to Enhancing GAN Efficiency via Knowledge Distillation
Authors:
Sangyeop Yeo,
Yoojin Jang,
Jaejun Yoo
Abstract:
In this paper, we address the challenge of compressing generative adversarial networks (GANs) for deployment in resource-constrained environments by proposing two novel methodologies: Distribution Matching for Efficient compression (DiME) and Network Interactive Compression via Knowledge Exchange and Learning (NICKEL). DiME employs foundation models as embedding kernels for efficient distribution…
▽ More
In this paper, we address the challenge of compressing generative adversarial networks (GANs) for deployment in resource-constrained environments by proposing two novel methodologies: Distribution Matching for Efficient compression (DiME) and Network Interactive Compression via Knowledge Exchange and Learning (NICKEL). DiME employs foundation models as embedding kernels for efficient distribution matching, leveraging maximum mean discrepancy to facilitate effective knowledge distillation. Simultaneously, NICKEL employs an interactive compression method that enhances the communication between the student generator and discriminator, achieving a balanced and stable compression process. Our comprehensive evaluation on the StyleGAN2 architecture with the FFHQ dataset shows the effectiveness of our approach, with NICKEL & DiME achieving FID scores of 10.45 and 15.93 at compression rates of 95.73% and 98.92%, respectively. Remarkably, our methods sustain generative quality even at an extreme compression rate of 99.69%, surpassing the previous state-of-the-art performance by a large margin. These findings not only demonstrate our methodologies' capacity to significantly lower GANs' computational demands but also pave the way for deploying high-quality GAN models in settings with limited resources. Our code will be released soon.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Learning to Compose: Improving Object Centric Learning by Injecting Compositionality
Authors:
Whie Jung,
Jaehoon Yoo,
Sungjin Ahn,
Seunghoon Hong
Abstract:
Learning compositional representation is a key aspect of object-centric learning as it enables flexible systematic generalization and supports complex visual reasoning. However, most of the existing approaches rely on auto-encoding objective, while the compositionality is implicitly imposed by the architectural or algorithmic bias in the encoder. This misalignment between auto-encoding objective a…
▽ More
Learning compositional representation is a key aspect of object-centric learning as it enables flexible systematic generalization and supports complex visual reasoning. However, most of the existing approaches rely on auto-encoding objective, while the compositionality is implicitly imposed by the architectural or algorithmic bias in the encoder. This misalignment between auto-encoding objective and learning compositionality often results in failure of capturing meaningful object representations. In this study, we propose a novel objective that explicitly encourages compositionality of the representations. Built upon the existing object-centric learning framework (e.g., slot attention), our method incorporates additional constraints that an arbitrary mixture of object representations from two images should be valid by maximizing the likelihood of the composite data. We demonstrate that incorporating our objective to the existing framework consistently improves the objective-centric learning and enhances the robustness to the architectural choices.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Quantized Context Based LIF Neurons for Recurrent Spiking Neural Networks in 45nm
Authors:
Sai Sukruth Bezugam,
Yihao Wu,
JaeBum Yoo,
Dmitri Strukov,
Bongjin Kim
Abstract:
In this study, we propose the first hardware implementation of a context-based recurrent spiking neural network (RSNN) emphasizing on integrating dual information streams within the neocortical pyramidal neurons specifically Context- Dependent Leaky Integrate and Fire (CLIF) neuron models, essential element in RSNN. We present a quantized version of the CLIF neuron (qCLIF), developed through a har…
▽ More
In this study, we propose the first hardware implementation of a context-based recurrent spiking neural network (RSNN) emphasizing on integrating dual information streams within the neocortical pyramidal neurons specifically Context- Dependent Leaky Integrate and Fire (CLIF) neuron models, essential element in RSNN. We present a quantized version of the CLIF neuron (qCLIF), developed through a hardware-software codesign approach utilizing the sparse activity of RSNN. Implemented in a 45nm technology node, the qCLIF is compact (900um^2) and achieves a high accuracy of 90% despite 8 bit quantization on DVS gesture classification dataset. Our analysis spans a network configuration from 10 to 200 qCLIF neurons, supporting up to 82k synapses within a 1.86 mm^2 footprint, demonstrating scalability and efficiency
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
EB-GAME: A Game-Changer in ECG Heartbeat Anomaly Detection
Authors:
JuneYoung Park,
Da Young Kim,
Yunsoo Kim,
Jisu Yoo,
Tae Joon Kim
Abstract:
Cardiologists use electrocardiograms (ECG) for the detection of arrhythmias. However, continuous monitoring of ECG signals to detect cardiac abnormal-ities requires significant time and human resources. As a result, several deep learning studies have been conducted in advance for the automatic detection of arrhythmia. These models show relatively high performance in supervised learning, but are no…
▽ More
Cardiologists use electrocardiograms (ECG) for the detection of arrhythmias. However, continuous monitoring of ECG signals to detect cardiac abnormal-ities requires significant time and human resources. As a result, several deep learning studies have been conducted in advance for the automatic detection of arrhythmia. These models show relatively high performance in supervised learning, but are not applicable in cases with few training examples. This is because abnormal ECG data is scarce compared to normal data in most real-world clinical settings. Therefore, in this study, GAN-based anomaly detec-tion, i.e., unsupervised learning, was employed to address the issue of data imbalance. This paper focuses on detecting abnormal signals in electrocardi-ograms (ECGs) using only labels from normal signals as training data. In-spired by self-supervised vision transformers, which learn by dividing images into patches, and masked auto-encoders, known for their effectiveness in patch reconstruction and solving information redundancy, we introduce the ECG Heartbeat Anomaly Detection model, EB-GAME. EB-GAME was trained and validated on the MIT-BIH Arrhythmia Dataset, where it achieved state-of-the-art performance on this benchmark.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Coreset Selection for Object Detection
Authors:
Hojun Lee,
Suyoung Kim,
Junhoo Lee,
Jaeyoung Yoo,
Nojun Kwak
Abstract:
Coreset selection is a method for selecting a small, representative subset of an entire dataset. It has been primarily researched in image classification, assuming there is only one object per image. However, coreset selection for object detection is more challenging as an image can contain multiple objects. As a result, much research has yet to be done on this topic. Therefore, we introduce a new…
▽ More
Coreset selection is a method for selecting a small, representative subset of an entire dataset. It has been primarily researched in image classification, assuming there is only one object per image. However, coreset selection for object detection is more challenging as an image can contain multiple objects. As a result, much research has yet to be done on this topic. Therefore, we introduce a new approach, Coreset Selection for Object Detection (CSOD). CSOD generates imagewise and classwise representative feature vectors for multiple objects of the same class within each image. Subsequently, we adopt submodular optimization for considering both representativeness and diversity and utilize the representative vectors in the submodular optimization process to select a subset. When we evaluated CSOD on the Pascal VOC dataset, CSOD outperformed random selection by +6.4%p in AP$_{50}$ when selecting 200 images.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
TEGRA -- Scaling Up Terascale Graph Processing with Disaggregated Computing
Authors:
William Shaddix,
Mahyar Samani,
Marjan Fariborz,
S. J. Ben Yoo,
Jason Lowe-Power,
Venkatesh Akella
Abstract:
Graphs are essential for representing relationships in various domains, driving modern AI applications such as graph analytics and neural networks across science, engineering, cybersecurity, transportation, and economics. However, the size of modern graphs are rapidly expanding, posing challenges for traditional CPUs and GPUs in meeting real-time processing demands. As a result, hardware accelerat…
▽ More
Graphs are essential for representing relationships in various domains, driving modern AI applications such as graph analytics and neural networks across science, engineering, cybersecurity, transportation, and economics. However, the size of modern graphs are rapidly expanding, posing challenges for traditional CPUs and GPUs in meeting real-time processing demands. As a result, hardware accelerators for graph processing have been proposed. However, the largest graphs that can be handled by these systems is still modest often targeting Twitter graph(1.4B edges approximately). This paper aims to address this limitation by developing a graph accelerator capable of terascale graph processing. Scale out architectures, architectures where nodes are replicated to expand to larger datasets, are natural for handling larger graphs. We argue that this approach is not appropriate for very large-scale graphs because it leads to under utilization of both memory resources and compute resources. Additionally, vertex and edge processing have different access patterns. Communication overheads also pose further challenges in designing scalable architectures. To overcome these issues, this paper proposes TEGRA, a scale-up architecture for terascale graph processing. TEGRA leverages a composable computing system with disaggregated resources and a communication architecture inspired by Active Messages. By employing direct communication between cores and optimizing memory interconnect utilization, TEGRA effectively reduces communication overhead and improves resource utilization, therefore enabling efficient processing of terascale graphs.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
End-To-End Self-tuning Self-supervised Time Series Anomaly Detection
Authors:
Boje Deforce,
Meng-Chieh Lee,
Bart Baesens,
Estefanía Serral Asensio,
Jaemin Yoo,
Leman Akoglu
Abstract:
Time series anomaly detection (TSAD) finds many applications such as monitoring environmental sensors, industry KPIs, patient biomarkers, etc. A two-fold challenge for TSAD is a versatile and unsupervised model that can detect various different types of time series anomalies (spikes, discontinuities, trend shifts, etc.) without any labeled data. Modern neural networks have outstanding ability in m…
▽ More
Time series anomaly detection (TSAD) finds many applications such as monitoring environmental sensors, industry KPIs, patient biomarkers, etc. A two-fold challenge for TSAD is a versatile and unsupervised model that can detect various different types of time series anomalies (spikes, discontinuities, trend shifts, etc.) without any labeled data. Modern neural networks have outstanding ability in modeling complex time series. Self-supervised models in particular tackle unsupervised TSAD by transforming the input via various augmentations to create pseudo anomalies for training. However, their performance is sensitive to the choice of augmentation, which is hard to choose in practice, while there exists no effort in the literature on data augmentation tuning for TSAD without labels. Our work aims to fill this gap. We introduce TSAP for TSA "on autoPilot", which can (self-)tune augmentation hyperparameters end-to-end. It stands on two key components: a differentiable augmentation architecture and an unsupervised validation loss to effectively assess the alignment between augmentation type and anomaly type. Case studies show TSAP's ability to effectively select the (discrete) augmentation type and associated (continuous) hyperparameters. In turn, it outperforms established baselines, including SOTA self-supervised models, on diverse TSAD tasks exhibiting different anomaly types.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
RefQSR: Reference-based Quantization for Image Super-Resolution Networks
Authors:
Hongjae Lee,
Jun-Sang Yoo,
Seung-Won Jung
Abstract:
Single image super-resolution (SISR) aims to reconstruct a high-resolution image from its low-resolution observation. Recent deep learning-based SISR models show high performance at the expense of increased computational costs, limiting their use in resource-constrained environments. As a promising solution for computationally efficient network design, network quantization has been extensively stu…
▽ More
Single image super-resolution (SISR) aims to reconstruct a high-resolution image from its low-resolution observation. Recent deep learning-based SISR models show high performance at the expense of increased computational costs, limiting their use in resource-constrained environments. As a promising solution for computationally efficient network design, network quantization has been extensively studied. However, existing quantization methods developed for SISR have yet to effectively exploit image self-similarity, which is a new direction for exploration in this study. We introduce a novel method called reference-based quantization for image super-resolution (RefQSR) that applies high-bit quantization to several representative patches and uses them as references for low-bit quantization of the rest of the patches in an image. To this end, we design dedicated patch clustering and reference-based quantization modules and integrate them into existing SISR network quantization methods. The experimental results demonstrate the effectiveness of RefQSR on various SISR networks and quantization methods.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
PosterLlama: Bridging Design Ability of Langauge Model to Contents-Aware Layout Generation
Authors:
Jaejung Seol,
Seojun Kim,
Jaejun Yoo
Abstract:
Visual layout plays a critical role in graphic design fields such as advertising, posters, and web UI design. The recent trend towards content-aware layout generation through generative models has shown promise, yet it often overlooks the semantic intricacies of layout design by treating it as a simple numerical optimization. To bridge this gap, we introduce PosterLlama, a network designed for gen…
▽ More
Visual layout plays a critical role in graphic design fields such as advertising, posters, and web UI design. The recent trend towards content-aware layout generation through generative models has shown promise, yet it often overlooks the semantic intricacies of layout design by treating it as a simple numerical optimization. To bridge this gap, we introduce PosterLlama, a network designed for generating visually and textually coherent layouts by reformatting layout elements into HTML code and leveraging the rich design knowledge embedded within language models. Furthermore, we enhance the robustness of our model with a unique depth-based poster augmentation strategy. This ensures our generated layouts remain semantically rich but also visually appealing, even with limited data. Our extensive evaluations across several benchmarks demonstrate that PosterLlama outperforms existing methods in producing authentic and content-aware layouts. It supports an unparalleled range of conditions, including but not limited to unconditional layout generation, element conditional layout generation, layout completion, among others, serving as a highly versatile user manipulation tool.
△ Less
Submitted 2 April, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
Towards Label-Efficient Human Matting: A Simple Baseline for Weakly Semi-Supervised Trimap-Free Human Matting
Authors:
Beomyoung Kim,
Myeong Yeon Yi,
Joonsang Yu,
Young Joon Yoo,
Sung Ju Hwang
Abstract:
This paper presents a new practical training method for human matting, which demands delicate pixel-level human region identification and significantly laborious annotations. To reduce the annotation cost, most existing matting approaches often rely on image synthesis to augment the dataset. However, the unnaturalness of synthesized training images brings in a new domain generalization challenge f…
▽ More
This paper presents a new practical training method for human matting, which demands delicate pixel-level human region identification and significantly laborious annotations. To reduce the annotation cost, most existing matting approaches often rely on image synthesis to augment the dataset. However, the unnaturalness of synthesized training images brings in a new domain generalization challenge for natural images. To address this challenge, we introduce a new learning paradigm, weakly semi-supervised human matting (WSSHM), which leverages a small amount of expensive matte labels and a large amount of budget-friendly segmentation labels, to save the annotation cost and resolve the domain generalization problem. To achieve the goal of WSSHM, we propose a simple and effective training method, named Matte Label Blending (MLB), that selectively guides only the beneficial knowledge of the segmentation and matte data to the matting model. Extensive experiments with our detailed analysis demonstrate our method can substantially improve the robustness of the matting model using a few matte data and numerous segmentation data. Our training method is also easily applicable to real-time models, achieving competitive accuracy with breakneck inference speed (328 FPS on NVIDIA V100 GPU). The implementation code is available at \url{https://github.com/clovaai/WSSHM}.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
HypeBoy: Generative Self-Supervised Representation Learning on Hypergraphs
Authors:
Sunwoo Kim,
Shinhwan Kang,
Fanchen Bu,
Soo Yong Lee,
Jaemin Yoo,
Kijung Shin
Abstract:
Hypergraphs are marked by complex topology, expressing higher-order interactions among multiple nodes with hyperedges, and better capturing the topology is essential for effective representation learning. Recent advances in generative self-supervised learning (SSL) suggest that hypergraph neural networks learned from generative self supervision have the potential to effectively encode the complex…
▽ More
Hypergraphs are marked by complex topology, expressing higher-order interactions among multiple nodes with hyperedges, and better capturing the topology is essential for effective representation learning. Recent advances in generative self-supervised learning (SSL) suggest that hypergraph neural networks learned from generative self supervision have the potential to effectively encode the complex hypergraph topology. Designing a generative SSL strategy for hypergraphs, however, is not straightforward. Questions remain with regard to its generative SSL task, connection to downstream tasks, and empirical properties of learned representations. In light of the promises and challenges, we propose a novel generative SSL strategy for hypergraphs. We first formulate a generative SSL task on hypergraphs, hyperedge filling, and highlight its theoretical connection to node classification. Based on the generative SSL task, we propose a hypergraph SSL method, HypeBoy. HypeBoy learns effective general-purpose hypergraph representations, outperforming 16 baseline methods across 11 benchmark datasets.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Towards Reverse-Engineering the Brain: Brain-Derived Neuromorphic Computing Approach with Photonic, Electronic, and Ionic Dynamicity in 3D integrated circuits
Authors:
S. J. Ben Yoo,
Luis El-Srouji,
Suman Datta,
Shimeng Yu,
Jean Anne Incorvia,
Alberto Salleo,
Volker Sorger,
Juejun Hu,
Lionel C Kimerling,
Kristofer Bouchard,
Joy Geng,
Rishidev Chaudhuri,
Charan Ranganath,
Randall O'Reilly
Abstract:
The human brain has immense learning capabilities at extreme energy efficiencies and scale that no artificial system has been able to match. For decades, reverse engineering the brain has been one of the top priorities of science and technology research. Despite numerous efforts, conventional electronics-based methods have failed to match the scalability, energy efficiency, and self-supervised lea…
▽ More
The human brain has immense learning capabilities at extreme energy efficiencies and scale that no artificial system has been able to match. For decades, reverse engineering the brain has been one of the top priorities of science and technology research. Despite numerous efforts, conventional electronics-based methods have failed to match the scalability, energy efficiency, and self-supervised learning capabilities of the human brain. On the other hand, very recent progress in the development of new generations of photonic and electronic memristive materials, device technologies, and 3D electronic-photonic integrated circuits (3D EPIC ) promise to realize new brain-derived neuromorphic systems with comparable connectivity, density, energy-efficiency, and scalability. When combined with bio-realistic learning algorithms and architectures, it may be possible to realize an 'artificial brain' prototype with general self-learning capabilities. This paper argues the possibility of reverse-engineering the brain through architecting a prototype of a brain-derived neuromorphic computing system consisting of artificial electronic, ionic, photonic materials, devices, and circuits with dynamicity resembling the bio-plausible molecular, neuro/synaptic, neuro-circuit, and multi-structural hierarchical macro-circuits of the brain based on well-tested computational models. We further argue the importance of bio-plausible local learning algorithms applicable to the neuromorphic computing system that capture the flexible and adaptive unsupervised and self-supervised learning mechanisms central to human intelligence. Most importantly, we emphasize that the unique capabilities in brain-derived neuromorphic computing prototype systems will enable us to understand links between specific neuronal and network-level properties with system-level functioning and behavior.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
LeGO: Leveraging a Surface Deformation Network for Animatable Stylized Face Generation with One Example
Authors:
Soyeon Yoon,
Kwan Yun,
Kwanggyoon Seo,
Sihun Cha,
Jung Eun Yoo,
Junyong Noh
Abstract:
Recent advances in 3D face stylization have made significant strides in few to zero-shot settings. However, the degree of stylization achieved by existing methods is often not sufficient for practical applications because they are mostly based on statistical 3D Morphable Models (3DMM) with limited variations. To this end, we propose a method that can produce a highly stylized 3D face model with de…
▽ More
Recent advances in 3D face stylization have made significant strides in few to zero-shot settings. However, the degree of stylization achieved by existing methods is often not sufficient for practical applications because they are mostly based on statistical 3D Morphable Models (3DMM) with limited variations. To this end, we propose a method that can produce a highly stylized 3D face model with desired topology. Our methods train a surface deformation network with 3DMM and translate its domain to the target style using a paired exemplar. The network achieves stylization of the 3D face mesh by mimicking the style of the target using a differentiable renderer and directional CLIP losses. Additionally, during the inference process, we utilize a Mesh Agnostic Encoder (MAGE) that takes deformation target, a mesh of diverse topologies as input to the stylization process and encodes its shape into our latent space. The resulting stylized face model can be animated by commonly used 3DMM blend shapes. A set of quantitative and qualitative evaluations demonstrate that our method can produce highly stylized face meshes according to a given style and output them in a desired topology. We also demonstrate example applications of our method including image-based stylized avatar generation, linear interpolation of geometric styles, and facial animation of stylized avatars.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
HourglassNeRF: Casting an Hourglass as a Bundle of Rays for Few-shot Neural Rendering
Authors:
Seunghyeon Seo,
Yeonjin Chang,
Jayeon Yoo,
Seungwoo Lee,
Hojun Lee,
Nojun Kwak
Abstract:
Recent advancements in the Neural Radiance Field (NeRF) have bolstered its capabilities for novel view synthesis, yet its reliance on dense multi-view training images poses a practical challenge. Addressing this, we propose HourglassNeRF, an effective regularization-based approach with a novel hourglass casting strategy. Our proposed hourglass is conceptualized as a bundle of additional rays withi…
▽ More
Recent advancements in the Neural Radiance Field (NeRF) have bolstered its capabilities for novel view synthesis, yet its reliance on dense multi-view training images poses a practical challenge. Addressing this, we propose HourglassNeRF, an effective regularization-based approach with a novel hourglass casting strategy. Our proposed hourglass is conceptualized as a bundle of additional rays within the area between the original input ray and its corresponding reflection ray, by featurizing the conical frustum via Integrated Positional Encoding (IPE). This design expands the coverage of unseen views and enables an adaptive high-frequency regularization based on target pixel photo-consistency. Furthermore, we propose luminance consistency regularization based on the Lambertian assumption, which is known to be effective for training a set of augmented rays under the few-shot setting. Leveraging the inherent property of a Lambertian surface, which retains consistent luminance irrespective of the viewing angle, we assume our proposed hourglass as a collection of flipped diffuse reflection rays and enhance the luminance consistency between the original input ray and its corresponding hourglass, resulting in more physically grounded training framework and performance improvement. Our HourglassNeRF outperforms its baseline and achieves competitive results on multiple benchmarks with sharply rendered fine details. The code will be available.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Open-Universe Indoor Scene Generation using LLM Program Synthesis and Uncurated Object Databases
Authors:
Rio Aguina-Kang,
Maxim Gumin,
Do Heon Han,
Stewart Morris,
Seung Jean Yoo,
Aditya Ganeshan,
R. Kenny Jones,
Qiuhong Anna Wei,
Kailiang Fu,
Daniel Ritchie
Abstract:
We present a system for generating indoor scenes in response to text prompts. The prompts are not limited to a fixed vocabulary of scene descriptions, and the objects in generated scenes are not restricted to a fixed set of object categories -- we call this setting indoor scene generation. Unlike most prior work on indoor scene generation, our system does not require a large training dataset of ex…
▽ More
We present a system for generating indoor scenes in response to text prompts. The prompts are not limited to a fixed vocabulary of scene descriptions, and the objects in generated scenes are not restricted to a fixed set of object categories -- we call this setting indoor scene generation. Unlike most prior work on indoor scene generation, our system does not require a large training dataset of existing 3D scenes. Instead, it leverages the world knowledge encoded in pre-trained large language models (LLMs) to synthesize programs in a domain-specific layout language that describe objects and spatial relations between them. Executing such a program produces a specification of a constraint satisfaction problem, which the system solves using a gradient-based optimization scheme to produce object positions and orientations. To produce object geometry, the system retrieves 3D meshes from a database. Unlike prior work which uses databases of category-annotated, mutually-aligned meshes, we develop a pipeline using vision-language models (VLMs) to retrieve meshes from massive databases of un-annotated, inconsistently-aligned meshes. Experimental evaluations show that our system outperforms generative models trained on 3D data for traditional, closed-universe scene generation tasks; it also outperforms a recent LLM-based layout generation method on open-universe scene generation.
△ Less
Submitted 4 February, 2024;
originally announced March 2024.
-
STREAM: Spatio-TempoRal Evaluation and Analysis Metric for Video Generative Models
Authors:
Pum Jun Kim,
Seojun Kim,
Jaejun Yoo
Abstract:
Image generative models have made significant progress in generating realistic and diverse images, supported by comprehensive guidance from various evaluation metrics. However, current video generative models struggle to generate even short video clips, with limited tools that provide insights for improvements. Current video evaluation metrics are simple adaptations of image metrics by switching t…
▽ More
Image generative models have made significant progress in generating realistic and diverse images, supported by comprehensive guidance from various evaluation metrics. However, current video generative models struggle to generate even short video clips, with limited tools that provide insights for improvements. Current video evaluation metrics are simple adaptations of image metrics by switching the embeddings with video embedding networks, which may underestimate the unique characteristics of video. Our analysis reveals that the widely used Frechet Video Distance (FVD) has a stronger emphasis on the spatial aspect than the temporal naturalness of video and is inherently constrained by the input size of the embedding networks used, limiting it to 16 frames. Additionally, it demonstrates considerable instability and diverges from human evaluations. To address the limitations, we propose STREAM, a new video evaluation metric uniquely designed to independently evaluate spatial and temporal aspects. This feature allows comprehensive analysis and evaluation of video generative models from various perspectives, unconstrained by video length. We provide analytical and experimental evidence demonstrating that STREAM provides an effective evaluation tool for both visual and temporal quality of videos, offering insights into area of improvement for video generative models. To the best of our knowledge, STREAM is the first evaluation metric that can separately assess the temporal and spatial aspects of videos. Our code is available at https://github.com/pro2nit/STREAM.
△ Less
Submitted 28 March, 2024; v1 submitted 30 January, 2024;
originally announced March 2024.
-
PillarGen: Enhancing Radar Point Cloud Density and Quality via Pillar-based Point Generation Network
Authors:
Jisong Kim,
Geonho Bang,
Kwangjin Choi,
Minjae Seong,
Jaechang Yoo,
Eunjong Pyo,
Jun Won Choi
Abstract:
In this paper, we present a novel point generation model, referred to as Pillar-based Point Generation Network (PillarGen), which facilitates the transformation of point clouds from one domain into another. PillarGen can produce synthetic point clouds with enhanced density and quality based on the provided input point clouds. The PillarGen model performs the following three steps: 1) pillar encodi…
▽ More
In this paper, we present a novel point generation model, referred to as Pillar-based Point Generation Network (PillarGen), which facilitates the transformation of point clouds from one domain into another. PillarGen can produce synthetic point clouds with enhanced density and quality based on the provided input point clouds. The PillarGen model performs the following three steps: 1) pillar encoding, 2) Occupied Pillar Prediction (OPP), and 3) Pillar to Point Generation (PPG). The input point clouds are encoded using a pillar grid structure to generate pillar features. Then, OPP determines the active pillars used for point generation and predicts the center of points and the number of points to be generated for each active pillar. PPG generates the synthetic points for each active pillar based on the information provided by OPP. We evaluate the performance of PillarGen using our proprietary radar dataset, focusing on enhancing the density and quality of short-range radar data using the long-range radar data as supervision. Our experiments demonstrate that PillarGen outperforms traditional point upsampling methods in quantitative and qualitative measures. We also confirm that when PillarGen is incorporated into bird's eye view object detection, a significant improvement in detection accuracy is achieved.
△ Less
Submitted 8 March, 2024; v1 submitted 3 March, 2024;
originally announced March 2024.
-
Mitigating the Bias in the Model for Continual Test-Time Adaptation
Authors:
Inseop Chung,
Kyomin Hwang,
Jayeon Yoo,
Nojun Kwak
Abstract:
Continual Test-Time Adaptation (CTA) is a challenging task that aims to adapt a source pre-trained model to continually changing target domains. In the CTA setting, a model does not know when the target domain changes, thus facing a drastic change in the distribution of streaming inputs during the test-time. The key challenge is to keep adapting the model to the continually changing target domains…
▽ More
Continual Test-Time Adaptation (CTA) is a challenging task that aims to adapt a source pre-trained model to continually changing target domains. In the CTA setting, a model does not know when the target domain changes, thus facing a drastic change in the distribution of streaming inputs during the test-time. The key challenge is to keep adapting the model to the continually changing target domains in an online manner. We find that a model shows highly biased predictions as it constantly adapts to the chaining distribution of the target data. It predicts certain classes more often than other classes, making inaccurate over-confident predictions. This paper mitigates this issue to improve performance in the CTA scenario. To alleviate the bias issue, we make class-wise exponential moving average target prototypes with reliable target samples and exploit them to cluster the target features class-wisely. Moreover, we aim to align the target distributions to the source distribution by anchoring the target feature to its corresponding source prototype. With extensive experiments, our proposed method achieves noteworthy performance gain when applied on top of existing CTA methods without substantial adaptation time overhead.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation
Authors:
Kihong Kim,
Haneol Lee,
Jihye Park,
Seyeon Kim,
Kwanghee Lee,
Seungryong Kim,
Jaejun Yoo
Abstract:
Generating high-quality videos that synthesize desired realistic content is a challenging task due to their intricate high-dimensionality and complexity of videos. Several recent diffusion-based methods have shown comparable performance by compressing videos to a lower-dimensional latent space, using traditional video autoencoder architecture. However, such method that employ standard frame-wise 2…
▽ More
Generating high-quality videos that synthesize desired realistic content is a challenging task due to their intricate high-dimensionality and complexity of videos. Several recent diffusion-based methods have shown comparable performance by compressing videos to a lower-dimensional latent space, using traditional video autoencoder architecture. However, such method that employ standard frame-wise 2D and 3D convolution fail to fully exploit the spatio-temporal nature of videos. To address this issue, we propose a novel hybrid video diffusion model, called HVDM, which can capture spatio-temporal dependencies more effectively. The HVDM is trained by a hybrid video autoencoder which extracts a disentangled representation of the video including: (i) a global context information captured by a 2D projected latent (ii) a local volume information captured by 3D convolutions with wavelet decomposition (iii) a frequency information for improving the video reconstruction. Based on this disentangled representation, our hybrid autoencoder provide a more comprehensive video latent enriching the generated videos with fine structures and details. Experiments on video generation benchamarks (UCF101, SkyTimelapse, and TaiChi) demonstrate that the proposed approach achieves state-of-the-art video generation quality, showing a wide range of video applications (e.g., long video generation, image-to-video, and video dynamics control).
△ Less
Submitted 3 April, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
A Decoding Scheme with Successive Aggregation of Multi-Level Features for Light-Weight Semantic Segmentation
Authors:
Jiwon Yoo,
Jangwon Lee,
Gyeonghwan Kim
Abstract:
Multi-scale architecture, including hierarchical vision transformer, has been commonly applied to high-resolution semantic segmentation to deal with computational complexity with minimum performance loss. In this paper, we propose a novel decoding scheme for semantic segmentation in this regard, which takes multi-level features from the encoder with multi-scale architecture. The decoding scheme ba…
▽ More
Multi-scale architecture, including hierarchical vision transformer, has been commonly applied to high-resolution semantic segmentation to deal with computational complexity with minimum performance loss. In this paper, we propose a novel decoding scheme for semantic segmentation in this regard, which takes multi-level features from the encoder with multi-scale architecture. The decoding scheme based on a multi-level vision transformer aims to achieve not only reduced computational expense but also higher segmentation accuracy, by introducing successive cross-attention in aggregation of the multi-level features. Furthermore, a way to enhance the multi-level features by the aggregated semantics is proposed. The effort is focused on maintaining the contextual consistency from the perspective of attention allocation and brings improved performance with significantly lower computational cost. Set of experiments on popular datasets demonstrates superiority of the proposed scheme to the state-of-the-art semantic segmentation models in terms of computational cost without loss of accuracy, and extensive ablation studies prove the effectiveness of ideas proposed.
△ Less
Submitted 14 June, 2024; v1 submitted 17 February, 2024;
originally announced February 2024.
-
Layerwise Proximal Replay: A Proximal Point Method for Online Continual Learning
Authors:
Jason Yoo,
Yunpeng Liu,
Frank Wood,
Geoff Pleiss
Abstract:
In online continual learning, a neural network incrementally learns from a non-i.i.d. data stream. Nearly all online continual learning methods employ experience replay to simultaneously prevent catastrophic forgetting and underfitting on past data. Our work demonstrates a limitation of this approach: neural networks trained with experience replay tend to have unstable optimization trajectories, i…
▽ More
In online continual learning, a neural network incrementally learns from a non-i.i.d. data stream. Nearly all online continual learning methods employ experience replay to simultaneously prevent catastrophic forgetting and underfitting on past data. Our work demonstrates a limitation of this approach: neural networks trained with experience replay tend to have unstable optimization trajectories, impeding their overall accuracy. Surprisingly, these instabilities persist even when the replay buffer stores all previous training examples, suggesting that this issue is orthogonal to catastrophic forgetting. We minimize these instabilities through a simple modification of the optimization geometry. Our solution, Layerwise Proximal Replay (LPR), balances learning from new and replay data while only allowing for gradual changes in the hidden activation of past data. We demonstrate that LPR consistently improves replay-based online continual learning methods across multiple problem settings, regardless of the amount of available replay memory.
△ Less
Submitted 7 June, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
H2O-SDF: Two-phase Learning for 3D Indoor Reconstruction using Object Surface Fields
Authors:
Minyoung Park,
Mirae Do,
YeonJae Shin,
Jaeseok Yoo,
Jongkwang Hong,
Joongrock Kim,
Chul Lee
Abstract:
Advanced techniques using Neural Radiance Fields (NeRF), Signed Distance Fields (SDF), and Occupancy Fields have recently emerged as solutions for 3D indoor scene reconstruction. We introduce a novel two-phase learning approach, H2O-SDF, that discriminates between object and non-object regions within indoor environments. This method achieves a nuanced balance, carefully preserving the geometric in…
▽ More
Advanced techniques using Neural Radiance Fields (NeRF), Signed Distance Fields (SDF), and Occupancy Fields have recently emerged as solutions for 3D indoor scene reconstruction. We introduce a novel two-phase learning approach, H2O-SDF, that discriminates between object and non-object regions within indoor environments. This method achieves a nuanced balance, carefully preserving the geometric integrity of room layouts while also capturing intricate surface details of specific objects. A cornerstone of our two-phase learning framework is the introduction of the Object Surface Field (OSF), a novel concept designed to mitigate the persistent vanishing gradient problem that has previously hindered the capture of high-frequency details in other methods. Our proposed approach is validated through several experiments that include ablation studies.
△ Less
Submitted 8 March, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
Feature Distribution on Graph Topology Mediates the Effect of Graph Convolution: Homophily Perspective
Authors:
Soo Yong Lee,
Sunwoo Kim,
Fanchen Bu,
Jaemin Yoo,
Jiliang Tang,
Kijung Shin
Abstract:
How would randomly shuffling feature vectors among nodes from the same class affect graph neural networks (GNNs)? The feature shuffle, intuitively, perturbs the dependence between graph topology and features (A-X dependence) for GNNs to learn from. Surprisingly, we observe a consistent and significant improvement in GNN performance following the feature shuffle. Having overlooked the impact of A-X…
▽ More
How would randomly shuffling feature vectors among nodes from the same class affect graph neural networks (GNNs)? The feature shuffle, intuitively, perturbs the dependence between graph topology and features (A-X dependence) for GNNs to learn from. Surprisingly, we observe a consistent and significant improvement in GNN performance following the feature shuffle. Having overlooked the impact of A-X dependence on GNNs, the prior literature does not provide a satisfactory understanding of the phenomenon. Thus, we raise two research questions. First, how should A-X dependence be measured, while controlling for potential confounds? Second, how does A-X dependence affect GNNs? In response, we (i) propose a principled measure for A-X dependence, (ii) design a random graph model that controls A-X dependence, (iii) establish a theory on how A-X dependence relates to graph convolution, and (iv) present empirical analysis on real-world graphs that align with the theory. We conclude that A-X dependence mediates the effect of graph convolution, such that smaller dependence improves GNN-based node classification.
△ Less
Submitted 6 June, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
Bridging the Domain Gap: A Simple Domain Matching Method for Reference-based Image Super-Resolution in Remote Sensing
Authors:
Jeongho Min,
Yejun Lee,
Dongyoung Kim,
Jaejun Yoo
Abstract:
Recently, reference-based image super-resolution (RefSR) has shown excellent performance in image super-resolution (SR) tasks. The main idea of RefSR is to utilize additional information from the reference (Ref) image to recover the high-frequency components in low-resolution (LR) images. By transferring relevant textures through feature matching, RefSR models outperform existing single image supe…
▽ More
Recently, reference-based image super-resolution (RefSR) has shown excellent performance in image super-resolution (SR) tasks. The main idea of RefSR is to utilize additional information from the reference (Ref) image to recover the high-frequency components in low-resolution (LR) images. By transferring relevant textures through feature matching, RefSR models outperform existing single image super-resolution (SISR) models. However, their performance significantly declines when a domain gap between Ref and LR images exists, which often occurs in real-world scenarios, such as satellite imaging. In this letter, we introduce a Domain Matching (DM) module that can be seamlessly integrated with existing RefSR models to enhance their performance in a plug-and-play manner. To the best of our knowledge, we are the first to explore Domain Matching-based RefSR in remote sensing image processing. Our analysis reveals that their domain gaps often occur in different satellites, and our model effectively addresses these challenges, whereas existing models struggle. Our experiments demonstrate that the proposed DM module improves SR performance both qualitatively and quantitatively for remote sensing super-resolution tasks.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
What, How, and When Should Object Detectors Update in Continually Changing Test Domains?
Authors:
Jayeon Yoo,
Dongkwan Lee,
Inseop Chung,
Donghyun Kim,
Nojun Kwak
Abstract:
It is a well-known fact that the performance of deep learning models deteriorates when they encounter a distribution shift at test time. Test-time adaptation (TTA) algorithms have been proposed to adapt the model online while inferring test data. However, existing research predominantly focuses on classification tasks through the optimization of batch normalization layers or classification heads,…
▽ More
It is a well-known fact that the performance of deep learning models deteriorates when they encounter a distribution shift at test time. Test-time adaptation (TTA) algorithms have been proposed to adapt the model online while inferring test data. However, existing research predominantly focuses on classification tasks through the optimization of batch normalization layers or classification heads, but this approach limits its applicability to various model architectures like Transformers and makes it challenging to apply to other tasks, such as object detection. In this paper, we propose a novel online adaption approach for object detection in continually changing test domains, considering which part of the model to update, how to update it, and when to perform the update. By introducing architecture-agnostic and lightweight adaptor modules and only updating these while leaving the pre-trained backbone unchanged, we can rapidly adapt to new test domains in an efficient way and prevent catastrophic forgetting. Furthermore, we present a practical and straightforward class-wise feature aligning method for object detection to resolve domain shifts. Additionally, we enhance efficiency by determining when the model is sufficiently adapted or when additional adaptation is needed due to changes in the test distribution. Our approach surpasses baselines on widely used benchmarks, achieving improvements of up to 4.9\%p and 7.9\%p in mAP for COCO $\rightarrow$ COCO-corrupted and SHIFT, respectively, while maintaining about 20 FPS or higher.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for Open-Vocabulary Object Detection
Authors:
Joonhyun Jeong,
Geondo Park,
Jayeon Yoo,
Hyungsik Jung,
Heesu Kim
Abstract:
Open-vocabulary object detection (OVOD) aims to recognize novel objects whose categories are not included in the training set. In order to classify these unseen classes during training, many OVOD frameworks leverage the zero-shot capability of largely pretrained vision and language models, such as CLIP. To further improve generalization on the unseen novel classes, several approaches proposed to a…
▽ More
Open-vocabulary object detection (OVOD) aims to recognize novel objects whose categories are not included in the training set. In order to classify these unseen classes during training, many OVOD frameworks leverage the zero-shot capability of largely pretrained vision and language models, such as CLIP. To further improve generalization on the unseen novel classes, several approaches proposed to additionally train with pseudo region labeling on the external data sources that contain a substantial number of novel category labels beyond the existing training data. Albeit its simplicity, these pseudo-labeling methods still exhibit limited improvement with regard to the truly unseen novel classes that were not pseudo-labeled. In this paper, we present a novel, yet simple technique that helps generalization on the overall distribution of novel classes. Inspired by our observation that numerous novel classes reside within the convex hull constructed by the base (seen) classes in the CLIP embedding space, we propose to synthesize proxy-novel classes approximating novel classes via linear mixup between a pair of base classes. By training our detector with these synthetic proxy-novel classes, we effectively explore the embedding space of novel classes. The experimental results on various OVOD benchmarks such as LVIS and COCO demonstrate superior performance on novel classes compared to the other state-of-the-art methods. Code is available at https://github.com/clovaai/ProxyDet.
△ Less
Submitted 20 February, 2024; v1 submitted 12 December, 2023;
originally announced December 2023.
-
RTPS Attack Dataset Description
Authors:
Dong Young Kim,
Dongsung Kim,
Yuchan Song,
Gang Min Kim,
Min Geun Song,
Jeong Do Yoo,
Huy Kang Kim
Abstract:
This paper explains all about our RTPS datasets. We collect malicious/benign packet data by injecting attack data in an Unmanned Ground Vehicle (UGV) in the normal state. We assembled the testbed, consisting of UGV, Controller, PC, and Router. We collect this dataset in the UGV part of our testbed.
We conducted two types of attack "Command Injection" and "Command Injection with ARP Spoofing" on…
▽ More
This paper explains all about our RTPS datasets. We collect malicious/benign packet data by injecting attack data in an Unmanned Ground Vehicle (UGV) in the normal state. We assembled the testbed, consisting of UGV, Controller, PC, and Router. We collect this dataset in the UGV part of our testbed.
We conducted two types of attack "Command Injection" and "Command Injection with ARP Spoofing" on our testbed. The data collection time is 180, 300, 600, and 1200. The scenario has 30 each on collection time, 240 total. We expect this dataset to contribute to the development of defense technologies like anomaly detection to address security threat issues in ROS2 networks and Fast-DDS implements.
△ Less
Submitted 2 April, 2024; v1 submitted 24 November, 2023;
originally announced November 2023.
-
AI-based Attack Graph Generation
Authors:
Sangbeom Park,
Jaesung Lee,
Jeong Do Yoo,
Min Geun Song,
Hyosun Lee,
Jaewoong Choi,
Chaeyeon Sagong,
Huy Kang Kim
Abstract:
With the advancement of IoT technology, many electronic devices are interconnected through networks, communicating with each other and performing specific roles. However, as numerous devices join networks, the threat of cyberattacks also escalates. Preventing and detecting cyber threats are crucial, and one method of preventing such threats involves using attack graphs. Attack graphs are widely us…
▽ More
With the advancement of IoT technology, many electronic devices are interconnected through networks, communicating with each other and performing specific roles. However, as numerous devices join networks, the threat of cyberattacks also escalates. Preventing and detecting cyber threats are crucial, and one method of preventing such threats involves using attack graphs. Attack graphs are widely used to assess security threats within networks. However, a drawback emerges as the network scales, as generating attack graphs becomes time-consuming. To overcome this limitation, artificial intelligence models can be employed. By utilizing AI models, attack graphs can be created within a short period, approximating optimal outcomes. AI models designed for attack graph generation consist of encoders and decoders, trained using reinforcement learning algorithms. After training the AI models, we confirmed the model's learning effectiveness by observing changes in loss and reward values. Additionally, we compared attack graphs generated by the AI model with those created through conventional methods.
△ Less
Submitted 27 November, 2023; v1 submitted 24 November, 2023;
originally announced November 2023.
-
C-ITS Environment Modeling and Attack Modeling
Authors:
Jaewoong Choi,
Min Geun Song,
Hyosun Lee,
Chaeyeon Sagong,
Sangbeom Park,
Jaesung Lee,
Jeong Do Yoo,
Huy Kang Kim
Abstract:
As technology advances, cities are evolving into smart cities, with the ability to process large amounts of data and the increasing complexity and diversification of various elements within urban areas. Among the core systems of a smart city is the Cooperative-Intelligent Transport Systems (C-ITS). C-ITS is a system where vehicles provide real-time information to drivers about surrounding traffic…
▽ More
As technology advances, cities are evolving into smart cities, with the ability to process large amounts of data and the increasing complexity and diversification of various elements within urban areas. Among the core systems of a smart city is the Cooperative-Intelligent Transport Systems (C-ITS). C-ITS is a system where vehicles provide real-time information to drivers about surrounding traffic conditions, sudden stops, falling objects, and other accident risks through roadside base stations. It consists of road infrastructure, C-ITS centers, and vehicle terminals. However, as smart cities integrate many elements through networks and electronic control, they are susceptible to cybersecurity issues. In the case of cybersecurity problems in C-ITS, there is a significant risk of safety issues arising. This technical document aims to model the C-ITS environment and the services it provides, with the purpose of identifying the attack surface where security incidents could occur in a smart city environment. Subsequently, based on the identified attack surface, the document aims to construct attack scenarios and their respective stages. The document provides a description of the concept of C-ITS, followed by the description of the C-ITS environment model, service model, and attack scenario model defined by us.
△ Less
Submitted 27 November, 2023; v1 submitted 24 November, 2023;
originally announced November 2023.
-
AcTExplore: Active Tactile Exploration of Unknown Objects
Authors:
Amir-Hossein Shahidzadeh,
Seong Jong Yoo,
Pavan Mantripragada,
Chahat Deep Singh,
Cornelia Fermüller,
Yiannis Aloimonos
Abstract:
Tactile exploration plays a crucial role in understanding object structures for fundamental robotics tasks such as grasping and manipulation. However, efficiently exploring such objects using tactile sensors is challenging, primarily due to the large-scale unknown environments and limited sensing coverage of these sensors. To this end, we present AcTExplore, an active tactile exploration method dr…
▽ More
Tactile exploration plays a crucial role in understanding object structures for fundamental robotics tasks such as grasping and manipulation. However, efficiently exploring such objects using tactile sensors is challenging, primarily due to the large-scale unknown environments and limited sensing coverage of these sensors. To this end, we present AcTExplore, an active tactile exploration method driven by reinforcement learning for object reconstruction at scales that automatically explores the object surfaces in a limited number of steps. Through sufficient exploration, our algorithm incrementally collects tactile data and reconstructs 3D shapes of the objects as well, which can serve as a representation for higher-level downstream tasks. Our method achieves an average of 95.97% IoU coverage on unseen YCB objects while just being trained on primitive shapes. Project Webpage: https://prg.cs.umd.edu/AcTExplore
△ Less
Submitted 20 June, 2024; v1 submitted 12 October, 2023;
originally announced October 2023.
-
SHOT: Suppressing the Hessian along the Optimization Trajectory for Gradient-Based Meta-Learning
Authors:
JunHoo Lee,
Jayeon Yoo,
Nojun Kwak
Abstract:
In this paper, we hypothesize that gradient-based meta-learning (GBML) implicitly suppresses the Hessian along the optimization trajectory in the inner loop. Based on this hypothesis, we introduce an algorithm called SHOT (Suppressing the Hessian along the Optimization Trajectory) that minimizes the distance between the parameters of the target and reference models to suppress the Hessian in the i…
▽ More
In this paper, we hypothesize that gradient-based meta-learning (GBML) implicitly suppresses the Hessian along the optimization trajectory in the inner loop. Based on this hypothesis, we introduce an algorithm called SHOT (Suppressing the Hessian along the Optimization Trajectory) that minimizes the distance between the parameters of the target and reference models to suppress the Hessian in the inner loop. Despite dealing with high-order terms, SHOT does not increase the computational complexity of the baseline model much. It is agnostic to both the algorithm and architecture used in GBML, making it highly versatile and applicable to any GBML baseline. To validate the effectiveness of SHOT, we conduct empirical tests on standard few-shot learning tasks and qualitatively analyze its dynamics. We confirm our hypothesis empirically and demonstrate that SHOT outperforms the corresponding baseline. Code is available at: https://github.com/JunHoo-Lee/SHOT
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
RADIO: Reference-Agnostic Dubbing Video Synthesis
Authors:
Dongyeun Lee,
Chaewon Kim,
Sangjoon Yu,
Jaejun Yoo,
Gyeong-Moon Park
Abstract:
One of the most challenging problems in audio-driven talking head generation is achieving high-fidelity detail while ensuring precise synchronization. Given only a single reference image, extracting meaningful identity attributes becomes even more challenging, often causing the network to mirror the facial and lip structures too closely. To address these issues, we introduce RADIO, a framework eng…
▽ More
One of the most challenging problems in audio-driven talking head generation is achieving high-fidelity detail while ensuring precise synchronization. Given only a single reference image, extracting meaningful identity attributes becomes even more challenging, often causing the network to mirror the facial and lip structures too closely. To address these issues, we introduce RADIO, a framework engineered to yield high-quality dubbed videos regardless of the pose or expression in reference images. The key is to modulate the decoder layers using latent space composed of audio and reference features. Additionally, we incorporate ViT blocks into the decoder to emphasize high-fidelity details, especially in the lip region. Our experimental results demonstrate that RADIO displays high synchronization without the loss of fidelity. Especially in harsh scenarios where the reference frame deviates significantly from the ground truth, our method outperforms state-of-the-art methods, highlighting its robustness.
△ Less
Submitted 6 November, 2023; v1 submitted 5 September, 2023;
originally announced September 2023.
-
Bespoke Nanoparticle Synthesis and Chemical Knowledge Discovery Via Autonomous Experimentations
Authors:
Hyuk Jun Yoo,
Nayeon Kim,
Heeseung Lee,
Daeho Kim,
Leslie Tiong Ching Ow,
Hyobin Nam,
Chansoo Kim,
Seung Yong Lee,
Kwan-Young Lee,
Donghun Kim,
Sang Soo Han
Abstract:
The optimization of nanomaterial synthesis using numerous synthetic variables is considered to be extremely laborious task because the conventional combinatorial explorations are prohibitively expensive. In this work, we report an autonomous experimentation platform developed for the bespoke design of nanoparticles (NPs) with targeted optical properties. This platform operates in a closed-loop man…
▽ More
The optimization of nanomaterial synthesis using numerous synthetic variables is considered to be extremely laborious task because the conventional combinatorial explorations are prohibitively expensive. In this work, we report an autonomous experimentation platform developed for the bespoke design of nanoparticles (NPs) with targeted optical properties. This platform operates in a closed-loop manner between a batch synthesis module of NPs and a UV- Vis spectroscopy module, based on the feedback of the AI optimization modeling. With silver (Ag) NPs as a representative example, we demonstrate that the Bayesian optimizer implemented with the early stopping criterion can efficiently produce Ag NPs precisely possessing the desired absorption spectra within only 200 iterations (when optimizing among five synthetic reagents). In addition to the outstanding material developmental efficiency, the analysis of synthetic variables further reveals a novel chemistry involving the effects of citrate in Ag NP synthesis. The amount of citrate is a key to controlling the competitions between spherical and plate-shaped NPs and, as a result, affects the shapes of the absorption spectra as well. Our study highlights both capabilities of the platform to enhance search efficiencies and to provide a novel chemical knowledge by analyzing datasets accumulated from the autonomous experimentations.
△ Less
Submitted 1 September, 2023;
originally announced September 2023.
-
Self-Supervision for Tackling Unsupervised Anomaly Detection: Pitfalls and Opportunities
Authors:
Leman Akoglu,
Jaemin Yoo
Abstract:
Self-supervised learning (SSL) is a growing torrent that has recently transformed machine learning and its many real world applications, by learning on massive amounts of unlabeled data via self-generated supervisory signals. Unsupervised anomaly detection (AD) has also capitalized on SSL, by self-generating pseudo-anomalies through various data augmentation functions or external data exposure. In…
▽ More
Self-supervised learning (SSL) is a growing torrent that has recently transformed machine learning and its many real world applications, by learning on massive amounts of unlabeled data via self-generated supervisory signals. Unsupervised anomaly detection (AD) has also capitalized on SSL, by self-generating pseudo-anomalies through various data augmentation functions or external data exposure. In this vision paper, we first underline the importance of the choice of SSL strategies on AD performance, by presenting evidences and studies from the AD literature. Equipped with the understanding that SSL incurs various hyperparameters (HPs) to carefully tune, we present recent developments on unsupervised model selection and augmentation tuning for SSL-based AD. We then highlight emerging challenges and future opportunities; on designing new pretext tasks and augmentation functions for different data modalities, creating novel model selection solutions for systematically tuning the SSL HPs, as well as on capitalizing on the potential of pretrained foundation models on AD through effective density estimation.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
SPANet: Frequency-balancing Token Mixer using Spectral Pooling Aggregation Modulation
Authors:
Guhnoo Yun,
Juhan Yoo,
Kijung Kim,
Jeongho Lee,
Dong Hwan Kim
Abstract:
Recent studies show that self-attentions behave like low-pass filters (as opposed to convolutions) and enhancing their high-pass filtering capability improves model performance. Contrary to this idea, we investigate existing convolution-based models with spectral analysis and observe that improving the low-pass filtering in convolution operations also leads to performance improvement. To account f…
▽ More
Recent studies show that self-attentions behave like low-pass filters (as opposed to convolutions) and enhancing their high-pass filtering capability improves model performance. Contrary to this idea, we investigate existing convolution-based models with spectral analysis and observe that improving the low-pass filtering in convolution operations also leads to performance improvement. To account for this observation, we hypothesize that utilizing optimal token mixers that capture balanced representations of both high- and low-frequency components can enhance the performance of models. We verify this by decomposing visual features into the frequency domain and combining them in a balanced manner. To handle this, we replace the balancing problem with a mask filtering problem in the frequency domain. Then, we introduce a novel token-mixer named SPAM and leverage it to derive a MetaFormer model termed as SPANet. Experimental results show that the proposed method provides a way to achieve this balance, and the balanced representations of both high- and low-frequency components can improve the performance of models on multiple computer vision tasks. Our code is available at $\href{https://doranlyong.github.io/projects/spanet/}{\text{https://doranlyong.github.io/projects/spanet/}}$.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
On-Sensor Data Filtering using Neuromorphic Computing for High Energy Physics Experiments
Authors:
Shruti R. Kulkarni,
Aaron Young,
Prasanna Date,
Narasinga Rao Miniskar,
Jeffrey S. Vetter,
Farah Fahim,
Benjamin Parpillon,
Jennet Dickinson,
Nhan Tran,
Jieun Yoo,
Corrinne Mills,
Morris Swartz,
Petar Maksimovic,
Catherine D. Schuman,
Alice Bean
Abstract:
This work describes the investigation of neuromorphic computing-based spiking neural network (SNN) models used to filter data from sensor electronics in high energy physics experiments conducted at the High Luminosity Large Hadron Collider. We present our approach for developing a compact neuromorphic model that filters out the sensor data based on the particle's transverse momentum with the goal…
▽ More
This work describes the investigation of neuromorphic computing-based spiking neural network (SNN) models used to filter data from sensor electronics in high energy physics experiments conducted at the High Luminosity Large Hadron Collider. We present our approach for developing a compact neuromorphic model that filters out the sensor data based on the particle's transverse momentum with the goal of reducing the amount of data being sent to the downstream electronics. The incoming charge waveforms are converted to streams of binary-valued events, which are then processed by the SNN. We present our insights on the various system design choices - from data encoding to optimal hyperparameters of the training algorithm - for an accurate and compact SNN optimized for hardware deployment. Our results show that an SNN trained with an evolutionary algorithm and an optimized set of hyperparameters obtains a signal efficiency of about 91% with nearly half as many parameters as a deep neural network.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Hierarchical Spatiotemporal Transformers for Video Object Segmentation
Authors:
Jun-Sang Yoo,
Hongjae Lee,
Seung-Won Jung
Abstract:
This paper presents a novel framework called HST for semi-supervised video object segmentation (VOS). HST extracts image and video features using the latest Swin Transformer and Video Swin Transformer to inherit their inductive bias for the spatiotemporal locality, which is essential for temporally coherent VOS. To take full advantage of the image and video features, HST casts image and video feat…
▽ More
This paper presents a novel framework called HST for semi-supervised video object segmentation (VOS). HST extracts image and video features using the latest Swin Transformer and Video Swin Transformer to inherit their inductive bias for the spatiotemporal locality, which is essential for temporally coherent VOS. To take full advantage of the image and video features, HST casts image and video features as a query and memory, respectively. By applying efficient memory read operations at multiple scales, HST produces hierarchical features for the precise reconstruction of object masks. HST shows effectiveness and robustness in handling challenging scenarios with occluded and fast-moving objects under cluttered backgrounds. In particular, HST-B outperforms the state-of-the-art competitors on multiple popular benchmarks, i.e., YouTube-VOS (85.0%), DAVIS 2017 (85.9%), and DAVIS 2016 (94.0%).
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
DSV: An Alignment Validation Loss for Self-supervised Outlier Model Selection
Authors:
Jaemin Yoo,
Yue Zhao,
Lingxiao Zhao,
Leman Akoglu
Abstract:
Self-supervised learning (SSL) has proven effective in solving various problems by generating internal supervisory signals. Unsupervised anomaly detection, which faces the high cost of obtaining true labels, is an area that can greatly benefit from SSL. However, recent literature suggests that tuning the hyperparameters (HP) of data augmentation functions is crucial to the success of SSL-based ano…
▽ More
Self-supervised learning (SSL) has proven effective in solving various problems by generating internal supervisory signals. Unsupervised anomaly detection, which faces the high cost of obtaining true labels, is an area that can greatly benefit from SSL. However, recent literature suggests that tuning the hyperparameters (HP) of data augmentation functions is crucial to the success of SSL-based anomaly detection (SSAD), yet a systematic method for doing so remains unknown. In this work, we propose DSV (Discordance and Separability Validation), an unsupervised validation loss to select high-performing detection models with effective augmentation HPs. DSV captures the alignment between an augmentation function and the anomaly-generating mechanism with surrogate losses, which approximate the discordance and separability of test data, respectively. As a result, the evaluation via DSV leads to selecting an effective SSAD model exhibiting better alignment, which results in high detection accuracy. We theoretically derive the degree of approximation conducted by the surrogate losses and empirically show that DSV outperforms a wide range of baselines on 21 real-world tasks.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
Constrained optimization of sensor placement for nuclear digital twins
Authors:
Niharika Karnik,
Mohammad G. Abdo,
Carlos E. Estrada Perez,
Jun Soo Yoo,
Joshua J. Cogliati,
Richard S. Skifton,
Pattrick Calderoni,
Steven L. Brunton,
Krithika Manohar
Abstract:
The deployment of extensive sensor arrays in nuclear reactors is infeasible due to challenging operating conditions and inherent spatial limitations. Strategically placing sensors within defined spatial constraints is essential for the reconstruction of reactor flow fields and the creation of nuclear digital twins. We develop a data-driven technique that incorporates constraints into an optimizati…
▽ More
The deployment of extensive sensor arrays in nuclear reactors is infeasible due to challenging operating conditions and inherent spatial limitations. Strategically placing sensors within defined spatial constraints is essential for the reconstruction of reactor flow fields and the creation of nuclear digital twins. We develop a data-driven technique that incorporates constraints into an optimization framework for sensor placement, with the primary objective of minimizing reconstruction errors under noisy sensor measurements. The proposed greedy algorithm optimizes sensor locations over high-dimensional grids, adhering to user-specified constraints. We demonstrate the efficacy of optimized sensors by exhaustively computing all feasible configurations for a low-dimensional dynamical system. To validate our methodology, we apply the algorithm to the Out-of-Pile Testing and Instrumentation Transient Water Irradiation System (OPTI-TWIST) prototype capsule. This capsule is electrically heated to emulate the neutronics effect of the nuclear fuel. The TWIST prototype that will eventually be inserted in the Transient Reactor Test facility (TREAT) at the Idaho National Laboratory (INL), serves as a practical demonstration. The resulting sensor-based temperature reconstruction within OPTI-TWIST demonstrates minimized error, provides probabilistic bounds for noise-induced uncertainty, and establishes a foundation for communication between the digital twin and the experimental facility.
△ Less
Submitted 16 February, 2024; v1 submitted 23 June, 2023;
originally announced June 2023.
-
End-to-End Augmentation Hyperparameter Tuning for Self-Supervised Anomaly Detection
Authors:
Jaemin Yoo,
Lingxiao Zhao,
Leman Akoglu
Abstract:
Self-supervised learning (SSL) has emerged as a promising paradigm that presents self-generated supervisory signals to real-world problems, bypassing the extensive manual labeling burden. SSL is especially attractive for unsupervised tasks such as anomaly detection, where labeled anomalies are often nonexistent and costly to obtain. While self-supervised anomaly detection (SSAD) has seen a recent…
▽ More
Self-supervised learning (SSL) has emerged as a promising paradigm that presents self-generated supervisory signals to real-world problems, bypassing the extensive manual labeling burden. SSL is especially attractive for unsupervised tasks such as anomaly detection, where labeled anomalies are often nonexistent and costly to obtain. While self-supervised anomaly detection (SSAD) has seen a recent surge of interest, the literature has failed to treat data augmentation as a hyperparameter. Meanwhile, recent works have reported that the choice of augmentation has significant impact on detection performance. In this paper, we introduce ST-SSAD (Self-Tuning Self-Supervised Anomaly Detection), the first systematic approach to SSAD in regards to rigorously tuning augmentation. To this end, our work presents two key contributions. The first is a new unsupervised validation loss that quantifies the alignment between the augmented training data and the (unlabeled) test data. In principle we adopt transduction, quantifying the extent to which augmentation mimics the true anomaly-generating mechanism, in contrast to augmenting data with arbitrary pseudo anomalies without regard to test data. Second, we present new differentiable augmentation functions, allowing data augmentation hyperparameter(s) to be tuned end-to-end via our proposed validation loss. Experiments on two testbeds with semantic class anomalies and subtle industrial defects show that systematically tuning augmentation offers significant performance gains over current practices.
△ Less
Submitted 21 June, 2023;
originally announced June 2023.
-
Energy-efficient superparamagnetic Ising machine and its application to traveling salesman problems
Authors:
Jia Si,
Shuhan Yang,
Yunuo Cen,
Jiaer Chen,
Zhaoyang Yao,
Dong-Jun Kim,
Kaiming Cai,
Jerald Yoo,
Xuanyao Fong,
Hyunsoo Yang
Abstract:
The growth of artificial intelligence and IoT has created a significant computational load for solving non-deterministic polynomial-time (NP)-hard problems, which are difficult to solve using conventional computers. The Ising computer, based on the Ising model and annealing process, has been highly sought for finding approximate solutions to NP-hard problems by observing the convergence of dynamic…
▽ More
The growth of artificial intelligence and IoT has created a significant computational load for solving non-deterministic polynomial-time (NP)-hard problems, which are difficult to solve using conventional computers. The Ising computer, based on the Ising model and annealing process, has been highly sought for finding approximate solutions to NP-hard problems by observing the convergence of dynamic spin states. However, it faces several challenges, including high power consumption due to artificial spins and randomness emulated by complex circuits, as well as low scalability caused by the rapidly growing connectivity when considering large-scale problems. Here, we present an experimental Ising annealing computer based on superparamagnetic tunnel junctions (SMTJs) with all-to-all connections, which successfully solves a 70-city travelling salesman problem (4761-node Ising problem). By taking advantage of the intrinsic randomness of SMTJs, implementing a proper global annealing scheme, and using an efficient algorithm, our SMTJ-based Ising annealer shows superior performance in terms of power consumption and energy efficiency compared to other Ising schemes. Additionally, our approach provides a promising way to solve complex problems with limited hardware resources. Moreover, we propose a crossbar array architecture for scalable integration using conventional magnetic random access memories. Our results demonstrate that the SMTJ-based Ising annealing computer with high energy efficiency, speed, and scalability is a strong candidate for future unconventional computing schemes.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
TopP&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Models
Authors:
Pum Jun Kim,
Yoojin Jang,
Jisu Kim,
Jaejun Yoo
Abstract:
We propose a robust and reliable evaluation metric for generative models by introducing topological and statistical treatments for rigorous support estimation. Existing metrics, such as Inception Score (IS), Frechet Inception Distance (FID), and the variants of Precision and Recall (P&R), heavily rely on supports that are estimated from sample features. However, the reliability of their estimation…
▽ More
We propose a robust and reliable evaluation metric for generative models by introducing topological and statistical treatments for rigorous support estimation. Existing metrics, such as Inception Score (IS), Frechet Inception Distance (FID), and the variants of Precision and Recall (P&R), heavily rely on supports that are estimated from sample features. However, the reliability of their estimation has not been seriously discussed (and overlooked) even though the quality of the evaluation entirely depends on it. In this paper, we propose Topological Precision and Recall (TopP&R, pronounced 'topper'), which provides a systematic approach to estimating supports, retaining only topologically and statistically important features with a certain level of confidence. This not only makes TopP&R strong for noisy features, but also provides statistical consistency. Our theoretical and experimental results show that TopP&R is robust to outliers and non-independent and identically distributed (Non-IID) perturbations, while accurately capturing the true trend of change in samples. To the best of our knowledge, this is the first evaluation metric focused on the robust estimation of the support and provides its statistical consistency under noise.
△ Less
Submitted 24 January, 2024; v1 submitted 13 June, 2023;
originally announced June 2023.
-
Dual policy as self-model for planning
Authors:
Jaesung Yoo,
Fernanda de la Torre,
Guangyu Robert Yang
Abstract:
Planning is a data efficient decision-making strategy where an agent selects candidate actions by exploring possible future states. To simulate future states when there is a high-dimensional action space, the knowledge of one's decision making strategy must be used to limit the number of actions to be explored. We refer to the model used to simulate one's decisions as the agent's self-model. While…
▽ More
Planning is a data efficient decision-making strategy where an agent selects candidate actions by exploring possible future states. To simulate future states when there is a high-dimensional action space, the knowledge of one's decision making strategy must be used to limit the number of actions to be explored. We refer to the model used to simulate one's decisions as the agent's self-model. While self-models are implicitly used widely in conjunction with world models to plan actions, it remains unclear how self-models should be designed. Inspired by current reinforcement learning approaches and neuroscience, we explore the benefits and limitations of using a distilled policy network as the self-model. In such dual-policy agents, a model-free policy and a distilled policy are used for model-free actions and planned actions, respectively. Our results on a ecologically relevant, parametric environment indicate that distilled policy network for self-model stabilizes training, has faster inference than using model-free policy, promotes better exploration, and could learn a comprehensive understanding of its own behaviors, at the cost of distilling a new network apart from the model-free policy.
△ Less
Submitted 11 June, 2023; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Classification of Edge-dependent Labels of Nodes in Hypergraphs
Authors:
Minyoung Choe,
Sunwoo Kim,
Jaemin Yoo,
Kijung Shin
Abstract:
A hypergraph is a data structure composed of nodes and hyperedges, where each hyperedge is an any-sized subset of nodes. Due to the flexibility in hyperedge size, hypergraphs represent group interactions (e.g., co-authorship by more than two authors) more naturally and accurately than ordinary graphs. Interestingly, many real-world systems modeled as hypergraphs contain edge-dependent node labels,…
▽ More
A hypergraph is a data structure composed of nodes and hyperedges, where each hyperedge is an any-sized subset of nodes. Due to the flexibility in hyperedge size, hypergraphs represent group interactions (e.g., co-authorship by more than two authors) more naturally and accurately than ordinary graphs. Interestingly, many real-world systems modeled as hypergraphs contain edge-dependent node labels, i.e., node labels that vary depending on hyperedges. For example, on co-authorship datasets, the same author (i.e., a node) can be the primary author in a paper (i.e., a hyperedge) but the corresponding author in another paper (i.e., another hyperedge).
In this work, we introduce a classification of edge-dependent node labels as a new problem. This problem can be used as a benchmark task for hypergraph neural networks, which recently have attracted great attention, and also the usefulness of edge-dependent node labels has been verified in various applications. To tackle this problem, we propose WHATsNet, a novel hypergraph neural network that represents the same node differently depending on the hyperedges it participates in by reflecting its varying importance in the hyperedges. To this end, WHATsNet models the relations between nodes within each hyperedge, using their relative centrality as positional encodings. In our experiments, we demonstrate that WHATsNet significantly and consistently outperforms ten competitors on six real-world hypergraphs, and we also show successful applications of WHATsNet to (a) ranking aggregation, (b) node clustering, and (c) product return prediction.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
Towards Deep Attention in Graph Neural Networks: Problems and Remedies
Authors:
Soo Yong Lee,
Fanchen Bu,
Jaemin Yoo,
Kijung Shin
Abstract:
Graph neural networks (GNNs) learn the representation of graph-structured data, and their expressiveness can be further enhanced by inferring node relations for propagation. Attention-based GNNs infer neighbor importance to manipulate the weight of its propagation. Despite their popularity, the discussion on deep graph attention and its unique challenges has been limited. In this work, we investig…
▽ More
Graph neural networks (GNNs) learn the representation of graph-structured data, and their expressiveness can be further enhanced by inferring node relations for propagation. Attention-based GNNs infer neighbor importance to manipulate the weight of its propagation. Despite their popularity, the discussion on deep graph attention and its unique challenges has been limited. In this work, we investigate some problematic phenomena related to deep graph attention, including vulnerability to over-smoothed features and smooth cumulative attention. Through theoretical and empirical analyses, we show that various attention-based GNNs suffer from these problems. Motivated by our findings, we propose AEROGNN, a novel GNN architecture designed for deep graph attention. AERO-GNN provably mitigates the proposed problems of deep graph attention, which is further empirically demonstrated with (a) its adaptive and less smooth attention functions and (b) higher performance at deep layers (up to 64). On 9 out of 12 node classification benchmarks, AERO-GNN outperforms the baseline GNNs, highlighting the advantages of deep graph attention. Our code is available at https://github.com/syleeheal/AERO-GNN.
△ Less
Submitted 4 June, 2023;
originally announced June 2023.
-
How Transitive Are Real-World Group Interactions? -- Measurement and Reproduction
Authors:
Sunwoo Kim,
Fanchen Bu,
Minyoung Choe,
Jaemin Yoo,
Kijung Shin
Abstract:
Many real-world interactions (e.g., researcher collaborations and email communication) occur among multiple entities. These group interactions are naturally modeled as hypergraphs. In graphs, transitivity is helpful to understand the connections between node pairs sharing a neighbor, and it has extensive applications in various domains. Hypergraphs, an extension of graphs, are designed to represen…
▽ More
Many real-world interactions (e.g., researcher collaborations and email communication) occur among multiple entities. These group interactions are naturally modeled as hypergraphs. In graphs, transitivity is helpful to understand the connections between node pairs sharing a neighbor, and it has extensive applications in various domains. Hypergraphs, an extension of graphs, are designed to represent group relations. However, to the best of our knowledge, there has been no examination regarding the transitivity of real-world group interactions. In this work, we investigate the transitivity of group interactions in real-world hypergraphs. We first suggest intuitive axioms as necessary characteristics of hypergraph transitivity measures. Then, we propose a principled hypergraph transitivity measure HyperTrans, which satisfies all the proposed axioms, with a fast computation algorithm Fast-HyperTrans. After that, we analyze the transitivity patterns in real-world hypergraphs distinguished from those in random hypergraphs. Lastly, we propose a scalable hypergraph generator THera. It reproduces the observed transitivity patterns by leveraging community structures, which are pervasive in real-world hypergraphs. Our code and datasets are available at https://github.com/kswoo97/hypertrans.
△ Less
Submitted 25 October, 2023; v1 submitted 4 June, 2023;
originally announced June 2023.