subscribe to arXiv mailings

SpaceRIS: LEO Satellite Coverage Maximization in 6G Sub-THz Networks by MAPPO DRL and Whale Optimization

Authors: Sheikh Salman Hassan, Yu Min Park, Yan Kyaw Tun, Walid Saad, Zhu Han, Choong Seon Hong

Abstract: Satellite systems face a significant challenge in effectively utilizing limited communication resources to meet the demands of ground network traffic, characterized by asymmetrical spatial distribution and time-varying characteristics. Moreover, the coverage range and signal transmission distance of low Earth orbit (LEO) satellites are restricted by notable propagation attenuation, molecular absor… ▽ More Satellite systems face a significant challenge in effectively utilizing limited communication resources to meet the demands of ground network traffic, characterized by asymmetrical spatial distribution and time-varying characteristics. Moreover, the coverage range and signal transmission distance of low Earth orbit (LEO) satellites are restricted by notable propagation attenuation, molecular absorption, and space losses in sub-terahertz (THz) frequencies. This paper introduces a novel approach to maximize LEO satellite coverage by leveraging reconfigurable intelligent surfaces (RISs) within 6G sub-THz networks. The optimization objectives encompass enhancing the end-to-end data rate, optimizing satellite-remote user equipment (RUE) associations, data packet routing within satellite constellations, RIS phase shift, and ground base station (GBS) transmit power (i.e., active beamforming). The formulated joint optimization problem poses significant challenges owing to its time-varying environment, non-convex characteristics, and NP-hard complexity. To address these challenges, we propose a block coordinate descent (BCD) algorithm that integrates balanced K-means clustering, multi-agent proximal policy optimization (MAPPO) deep reinforcement learning (DRL), and whale optimization (WOA) techniques. The performance of the proposed approach is demonstrated through comprehensive simulation results, exhibiting its superiority over existing baseline methods in the literature. △ Less

Submitted 28 July, 2023; originally announced July 2023.

arXiv:2307.13214 [pdf, other]

FedMEKT: Distillation-based Embedding Knowledge Transfer for Multimodal Federated Learning

Authors: Huy Q. Le, Minh N. H. Nguyen, Chu Myaet Thwal, Yu Qiao, Chaoning Zhang, Choong Seon Hong

Abstract: Federated learning (FL) enables a decentralized machine learning paradigm for multiple clients to collaboratively train a generalized global model without sharing their private data. Most existing works simply propose typical FL systems for single-modal data, thus limiting its potential on exploiting valuable multimodal data for future personalized applications. Furthermore, the majority of FL app… ▽ More Federated learning (FL) enables a decentralized machine learning paradigm for multiple clients to collaboratively train a generalized global model without sharing their private data. Most existing works simply propose typical FL systems for single-modal data, thus limiting its potential on exploiting valuable multimodal data for future personalized applications. Furthermore, the majority of FL approaches still rely on the labeled data at the client side, which is limited in real-world applications due to the inability of self-annotation from users. In light of these limitations, we propose a novel multimodal FL framework that employs a semi-supervised learning approach to leverage the representations from different modalities. Bringing this concept into a system, we develop a distillation-based multimodal embedding knowledge transfer mechanism, namely FedMEKT, which allows the server and clients to exchange the joint knowledge of their learning models extracted from a small multimodal proxy dataset. Our FedMEKT iteratively updates the generalized global encoders with the joint embedding knowledge from the participating clients. Thereby, to address the modality discrepancy and labeled data constraint in existing FL systems, our proposed FedMEKT comprises local multimodal autoencoder learning, generalized multimodal autoencoder construction, and generalized classifier learning. Through extensive experiments on three multimodal human activity recognition datasets, we demonstrate that FedMEKT achieves superior global encoder performance on linear evaluation and guarantees user privacy for personal data and model parameters while demanding less communication cost than other baselines. △ Less

Submitted 6 November, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

arXiv:2307.10575 [pdf, other]

Boosting Federated Learning Convergence with Prototype Regularization

Authors: Yu Qiao, Huy Q. Le, Choong Seon Hong

Abstract: As a distributed machine learning technique, federated learning (FL) requires clients to collaboratively train a shared model with an edge server without leaking their local data. However, the heterogeneous data distribution among clients often leads to a decrease in model performance. To tackle this issue, this paper introduces a prototype-based regularization strategy to address the heterogeneit… ▽ More As a distributed machine learning technique, federated learning (FL) requires clients to collaboratively train a shared model with an edge server without leaking their local data. However, the heterogeneous data distribution among clients often leads to a decrease in model performance. To tackle this issue, this paper introduces a prototype-based regularization strategy to address the heterogeneity in the data distribution. Specifically, the regularization process involves the server aggregating local prototypes from distributed clients to generate a global prototype, which is then sent back to the individual clients to guide their local training. The experimental results on MNIST and Fashion-MNIST show that our proposal achieves improvements of 3.3% and 8.9% in average test accuracy, respectively, compared to the most popular baseline FedAvg. Furthermore, our approach has a fast convergence rate in heterogeneous settings. △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2307.10550 [pdf]

SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer

Authors: Daegyeom Kim, Seongho Hong, Yong-Hoon Choi

Abstract: Expressive speech synthesis models are trained by adding corpora with diverse speakers, various emotions, and different speaking styles to the dataset, in order to control various characteristics of speech and generate the desired voice. In this paper, we propose a style control (SC) VALL-E model based on the neural codec language model (called VALL-E), which follows the structure of the generativ… ▽ More Expressive speech synthesis models are trained by adding corpora with diverse speakers, various emotions, and different speaking styles to the dataset, in order to control various characteristics of speech and generate the desired voice. In this paper, we propose a style control (SC) VALL-E model based on the neural codec language model (called VALL-E), which follows the structure of the generative pretrained transformer 3 (GPT-3). The proposed SC VALL-E takes input from text sentences and prompt audio and is designed to generate controllable speech by not simply mimicking the characteristics of the prompt audio but by controlling the attributes to produce diverse voices. We identify tokens in the style embedding matrix of the newly designed style network that represent attributes such as emotion, speaking rate, pitch, and voice intensity, and design a model that can control these attributes. To evaluate the performance of SC VALL-E, we conduct comparative experiments with three representative expressive speech synthesis models: global style token (GST) Tacotron2, variational autoencoder (VAE) Tacotron2, and original VALL-E. We measure word error rate (WER), F0 voiced error (FVE), and F0 gross pitch error (F0GPE) as evaluation metrics to assess the accuracy of generated sentences. For comparing the quality of synthesized speech, we measure comparative mean option score (CMOS) and similarity mean option score (SMOS). To evaluate the style control ability of the generated speech, we observe the changes in F0 and mel-spectrogram by modifying the trained tokens. When using prompt audio that is not present in the training data, SC VALL-E generates a variety of expressive sounds and demonstrates competitive performance compared to the existing models. Our implementation, pretrained models, and audio samples are located on GitHub. △ Less

Submitted 19 July, 2023; originally announced July 2023.

arXiv:2307.04036 [pdf, other]

Designing a Direct Feedback Loop between Humans and Convolutional Neural Networks through Local Explanations

Authors: Tong Steven Sun, Yuyang Gao, Shubham Khaladkar, Sijia Liu, Liang Zhao, Young-Ho Kim, Sungsoo Ray Hong

Abstract: The local explanation provides heatmaps on images to explain how Convolutional Neural Networks (CNNs) derive their output. Due to its visual straightforwardness, the method has been one of the most popular explainable AI (XAI) methods for diagnosing CNNs. Through our formative study (S1), however, we captured ML engineers' ambivalent perspective about the local explanation as a valuable and indisp… ▽ More The local explanation provides heatmaps on images to explain how Convolutional Neural Networks (CNNs) derive their output. Due to its visual straightforwardness, the method has been one of the most popular explainable AI (XAI) methods for diagnosing CNNs. Through our formative study (S1), however, we captured ML engineers' ambivalent perspective about the local explanation as a valuable and indispensable envision in building CNNs versus the process that exhausts them due to the heuristic nature of detecting vulnerability. Moreover, steering the CNNs based on the vulnerability learned from the diagnosis seemed highly challenging. To mitigate the gap, we designed DeepFuse, the first interactive design that realizes the direct feedback loop between a user and CNNs in diagnosing and revising CNN's vulnerability using local explanations. DeepFuse helps CNN engineers to systemically search "unreasonable" local explanations and annotate the new boundaries for those identified as unreasonable in a labor-efficient manner. Next, it steers the model based on the given annotation such that the model doesn't introduce similar mistakes. We conducted a two-day study (S2) with 12 experienced CNN engineers. Using DeepFuse, participants made a more accurate and "reasonable" model than the current state-of-the-art. Also, participants found the way DeepFuse guides case-based reasoning can practically improve their current practice. We provide implications for design that explain how future HCI-driven design can move our practice forward to make XAI-driven insights more actionable. △ Less

Submitted 8 July, 2023; originally announced July 2023.

Comments: 32 pages, 6 figures, 5 tables. Accepted for publication in the Proceedings of the ACM on Human-Computer Interaction (PACM HCI), CSCW 2023

arXiv:2307.03402 [pdf, other]

Swin Transformer-Based Dynamic Semantic Communication for Multi-User with Different Computing Capacity

Authors: Loc X. Nguyen, Ye Lin Tun, Yan Kyaw Tun, Minh N. H. Nguyen, Chaoning Zhang, Zhu Han, Choong Seon Hong

Abstract: Semantic communication has gained significant attention from researchers as a promising technique to replace conventional communication in the next generation of communication systems, primarily due to its ability to reduce communication costs. However, little literature has studied its effectiveness in multi-user scenarios, particularly when there are variations in the model architectures used by… ▽ More Semantic communication has gained significant attention from researchers as a promising technique to replace conventional communication in the next generation of communication systems, primarily due to its ability to reduce communication costs. However, little literature has studied its effectiveness in multi-user scenarios, particularly when there are variations in the model architectures used by users and their computing capacities. To address this issue, we explore a semantic communication system that caters to multiple users with different model architectures by using a multi-purpose transmitter at the base station (BS). Specifically, the BS in the proposed framework employs semantic and channel encoders to encode the image for transmission, while the receiver utilizes its local channel and semantic decoder to reconstruct the original image. Our joint source-channel encoder at the BS can effectively extract and compress semantic features for specific users by considering the signal-to-noise ratio (SNR) and computing capacity of the user. Based on the network status, the joint source-channel encoder at the BS can adaptively adjust the length of the transmitted signal. A longer signal ensures more information for high-quality image reconstruction for the user, while a shorter signal helps avoid network congestion. In addition, we propose a hybrid loss function for training, which enhances the perceptual details of reconstructed images. Finally, we conduct a series of extensive evaluations and ablation studies to validate the effectiveness of the proposed system. △ Less

Submitted 7 July, 2023; originally announced July 2023.

Comments: 14 pages, 10 figures

arXiv:2307.02663 [pdf, other]

Convergence of Communications, Control, and Machine Learning for Secure and Autonomous Vehicle Navigation

Authors: Tengchan Zeng, Aidin Ferdowsi, Omid Semiari, Walid Saad, Choong Seon Hong

Abstract: Connected and autonomous vehicles (CAVs) can reduce human errors in traffic accidents, increase road efficiency, and execute various tasks ranging from delivery to smart city surveillance. Reaping these benefits requires CAVs to autonomously navigate to target destinations. To this end, each CAV's navigation controller must leverage the information collected by sensors and wireless systems for dec… ▽ More Connected and autonomous vehicles (CAVs) can reduce human errors in traffic accidents, increase road efficiency, and execute various tasks ranging from delivery to smart city surveillance. Reaping these benefits requires CAVs to autonomously navigate to target destinations. To this end, each CAV's navigation controller must leverage the information collected by sensors and wireless systems for decision-making on longitudinal and lateral movements. However, enabling autonomous navigation for CAVs requires a convergent integration of communication, control, and learning systems. The goal of this article is to explicitly expose the challenges related to this convergence and propose solutions to address them in two major use cases: Uncoordinated and coordinated CAVs. In particular, challenges related to the navigation of uncoordinated CAVs include stable path tracking, robust control against cyber-physical attacks, and adaptive navigation controller design. Meanwhile, when multiple CAVs coordinate their movements during navigation, fundamental problems such as stable formation, fast collaborative learning, and distributed intrusion detection are analyzed. For both cases, solutions using the convergence of communication theory, control theory, and machine learning are proposed to enable effective and secure CAV navigation. Preliminary simulation results are provided to show the merits of proposed solutions. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: 3 figures and 7 pages

arXiv:2306.16692 [pdf, other]

Performance Evaluation of Transport Protocols and Roadmap to a High-Performance Transport Design for Immersive Applications

Authors: Inayat Ali, Seungwoo Hong, Pyung-koo Park, Tae Yeon Kim

Abstract: Immersive technologies such as virtual reality (VR), augmented reality (AR), and holograms will change users' digital experience. These immersive technologies have a multitude of applications, including telesurgeries, teleconferencing, Internet shopping, computer games, etc. Holographic-type communication (HTC) is a type of augmented reality media that provides an immersive experience to Internet… ▽ More Immersive technologies such as virtual reality (VR), augmented reality (AR), and holograms will change users' digital experience. These immersive technologies have a multitude of applications, including telesurgeries, teleconferencing, Internet shopping, computer games, etc. Holographic-type communication (HTC) is a type of augmented reality media that provides an immersive experience to Internet users. However, HTC has different characteristics and network requirements, and the existing network architecture and transport protocols may not be able to cope with the stringent network requirements of HTC. Therefore, in this paper, we provide an in-depth and critical study of the transport protocols for HTC. We also discuss the characteristics and the network requirements for HTC. Based on the performance evaluation of the existing transport protocols, we propose a roadmap to design new high-performance transport protocols for immersive applications. △ Less

Submitted 30 June, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

Comments: Accepted in The 14th International Conference on Ubiquitous and Future Networks (ICUFN 2023), Paris France, July 4-7 2023

arXiv:2306.15902 [pdf, other]

Individual and Structural Graph Information Bottlenecks for Out-of-Distribution Generalization

Authors: Ling Yang, Jiayi Zheng, Heyuan Wang, Zhongyi Liu, Zhilin Huang, Shenda Hong, Wentao Zhang, Bin Cui

Abstract: Out-of-distribution (OOD) graph generalization are critical for many real-world applications. Existing methods neglect to discard spurious or noisy features of inputs, which are irrelevant to the label. Besides, they mainly conduct instance-level class-invariant graph learning and fail to utilize the structural class relationships between graph instances. In this work, we endeavor to address these… ▽ More Out-of-distribution (OOD) graph generalization are critical for many real-world applications. Existing methods neglect to discard spurious or noisy features of inputs, which are irrelevant to the label. Besides, they mainly conduct instance-level class-invariant graph learning and fail to utilize the structural class relationships between graph instances. In this work, we endeavor to address these issues in a unified framework, dubbed Individual and Structural Graph Information Bottlenecks (IS-GIB). To remove class spurious feature caused by distribution shifts, we propose Individual Graph Information Bottleneck (I-GIB) which discards irrelevant information by minimizing the mutual information between the input graph and its embeddings. To leverage the structural intra- and inter-domain correlations, we propose Structural Graph Information Bottleneck (S-GIB). Specifically for a batch of graphs with multiple domains, S-GIB first computes the pair-wise input-input, embedding-embedding, and label-label correlations. Then it minimizes the mutual information between input graph and embedding pairs while maximizing the mutual information between embedding and label pairs. The critical insight of S-GIB is to simultaneously discard spurious features and learn invariant features from a high-order perspective by maintaining class relationships under multiple distributional shifts. Notably, we unify the proposed I-GIB and S-GIB to form our complementary framework IS-GIB. Extensive experiments conducted on both node- and graph-level tasks consistently demonstrate the superior generalization ability of IS-GIB. The code is available at https://github.com/YangLing0818/GraphOOD. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: Accepted by IEEE Transactions on Knowledge and Data Engineering (TKDE)

arXiv:2306.15697 [pdf, other]

doi 10.1088/1402-4896/ad2d28

Foliation, topology and nucleon charge profiles in hypersphere soliton model

Authors: Soon-Tae Hong

Abstract: In the hypersphere soliton model (HSM), we study the geometrical inner structures and the ensuing charge distributions of the nucleons by exploiting the aspect of the HSM where the hypersphere soliton is described by an extended object possessing the parameter $λ$ $(0\leλ<\infty)$ which corresponds to the radial distance from the center of $S^{3}$ to the foliation leaves of the hypersphere soliton… ▽ More In the hypersphere soliton model (HSM), we study the geometrical inner structures and the ensuing charge distributions of the nucleons by exploiting the aspect of the HSM where the hypersphere soliton is described by an extended object possessing the parameter $λ$ $(0\leλ<\infty)$ which corresponds to the radial distance from the center of $S^{3}$ to the foliation leaves of the hypersphere soliton. To do this, we investigate the foliation and topology related with geometry on a hypersphere described by $(μ,θ,φ)$. Exploiting the so-called scanning algorithm we study geometrical relations between spherical shell foliation leave on a northern hemi-hypersphere $S^{3}_{+}$ and that on a flat equatorial solid sphere $E^{3}$ which contains the center of $S^{3}$. We then elucidate the physical meaning of $μ$ in $S^{3}$ of radius $λ$ by showing that $μ$ plays the role of an auxiliary angle to fix the radius $λ\sinμ$ of the $S^{2}$ spherical shell sharing the center of $S^{3}(=S^{2}\times S^{1})$, at a given angle $μ$. Next, using the charge density profiles of nucleons with $μ$ dependence, we construct the nucleon fractional charges of spherically symmetric and nontrivial distributions. In the HSM we note that the proton and neutron charges do not leak out from the hypersphere soliton, and the positive and negative charges in the neutron are confined inside and outside its core, respectively. Explicitly we predict the fractional volumes and charges of the neutron. The proton and neutron are shown to be described by a topological structure of two Hopf-linked Möbius strip type twist circles in $S^{3}$. We also note that the characteristic ratio of the hypersphere volume to the corresponding solid sphere one is given by a geometrical invariant related with hyper-compactness. △ Less

Submitted 2 May, 2024; v1 submitted 26 June, 2023; originally announced June 2023.

Comments: 15 pages, 3 figures, some modifications and corrections

Journal ref: Physica Scripta 99 (2024) 045301

arXiv:2306.15577 [pdf, ps, other]

Retrospective: A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing

Authors: Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, Kiyoung Choi

Abstract: Our ISCA 2015 paper provides a new programmable processing-in-memory (PIM) architecture and system design that can accelerate key data-intensive applications, with a focus on graph processing workloads. Our major idea was to completely rethink the system, including the programming model, data partitioning mechanisms, system support, instruction set architecture, along with near-memory execution un… ▽ More Our ISCA 2015 paper provides a new programmable processing-in-memory (PIM) architecture and system design that can accelerate key data-intensive applications, with a focus on graph processing workloads. Our major idea was to completely rethink the system, including the programming model, data partitioning mechanisms, system support, instruction set architecture, along with near-memory execution units and their communication architecture, such that an important workload can be accelerated at a maximum level using a distributed system of well-connected near-memory accelerators. We built our accelerator system, Tesseract, using 3D-stacked memories with logic layers, where each logic layer contains general-purpose processing cores and cores communicate with each other using a message-passing programming model. Cores could be specialized for graph processing (or any other application to be accelerated). To our knowledge, our paper was the first to completely design a near-memory accelerator system from scratch such that it is both generally programmable and specifically customizable to accelerate important applications, with a case study on major graph processing workloads. Ensuing work in academia and industry showed that similar approaches to system design can greatly benefit both graph processing workloads and other applications, such as machine learning, for which ideas from Tesseract seem to have been influential. This short retrospective provides a brief analysis of our ISCA 2015 paper and its impact. We briefly describe the major ideas and contributions of the work, discuss later works that built on it or were influenced by it, and make some educated guesses on what the future may bring on PIM and accelerator systems. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: Selected to the 50th Anniversary of ISCA (ACM/IEEE International Symposium on Computer Architecture), Commemorative Issue, 2023

arXiv:2306.14385 [pdf, other]

Calibration of Wideband LFM Radars based on Sliding Window Algorithm

Authors: Hyung-Woo Kim, Jin-woo Kim, Jin-ha Kim, JaeYoung Choi, Sangpyo Hong, Byungkwan Kim

Abstract: This paper addresses the challenges of wideband signal beamforming in radar systems and proposes a new calibration method. Due to operating conditions, the frequency dependent characteristics of the system can be changed, and amplitude, phase, and time delay error can be generated. The proposed method is based on the concept of sliding window algorithm for linear frequency modulated (LFM) signals.… ▽ More This paper addresses the challenges of wideband signal beamforming in radar systems and proposes a new calibration method. Due to operating conditions, the frequency dependent characteristics of the system can be changed, and amplitude, phase, and time delay error can be generated. The proposed method is based on the concept of sliding window algorithm for linear frequency modulated (LFM) signals. To calibrate the frequency-dependent errors from transceiver and the time delay error from true time delay elements, the proposed method utilizes the characteristic of the LFM signal. The LFM signal changes its frequency linearly with time, and the frequency domain characteristics of the hardware are presented in time. Therefore, by applying matched filter to a part of the LFM signal, the frequency dependent characteristics can be monitored and calibrated. The proposed method is compared with the conventional matched filter based calibration results and verified by simulation results and beampatterns. Since the proposed method utilizes LFM signal as calibration tone, the proposed method can be applied to any beamforming systems, not limited to LFM radars. △ Less

Submitted 25 June, 2023; originally announced June 2023.

Comments: 11 pages

arXiv:2306.14289 [pdf, other]

Faster Segment Anything: Towards Lightweight SAM for Mobile Applications

Authors: Chaoning Zhang, Dongshen Han, Yu Qiao, Jung Uk Kim, Sung-Ho Bae, Seungkyu Lee, Choong Seon Hong

Abstract: Segment Anything Model (SAM) has attracted significant attention due to its impressive zero-shot transfer performance and high versatility for numerous vision applications (like image editing with fine-grained control). Many of such applications need to be run on resource-constraint edge devices, like mobile phones. In this work, we aim to make SAM mobile-friendly by replacing the heavyweight imag… ▽ More Segment Anything Model (SAM) has attracted significant attention due to its impressive zero-shot transfer performance and high versatility for numerous vision applications (like image editing with fine-grained control). Many of such applications need to be run on resource-constraint edge devices, like mobile phones. In this work, we aim to make SAM mobile-friendly by replacing the heavyweight image encoder with a lightweight one. A naive way to train such a new SAM as in the original SAM paper leads to unsatisfactory performance, especially when limited training sources are available. We find that this is mainly caused by the coupled optimization of the image encoder and mask decoder, motivated by which we propose decoupled distillation. Concretely, we distill the knowledge from the heavy image encoder (ViT-H in the original SAM) to a lightweight image encoder, which can be automatically compatible with the mask decoder in the original SAM. The training can be completed on a single GPU within less than one day, and the resulting lightweight SAM is termed MobileSAM which is more than 60 times smaller yet performs on par with the original SAM. For inference speed, With a single GPU, MobileSAM runs around 10ms per image: 8ms on the image encoder and 4ms on the mask decoder. With superior performance, our MobileSAM is around 5 times faster than the concurrent FastSAM and 7 times smaller, making it more suitable for mobile applications. Moreover, we show that MobileSAM can run relatively smoothly on CPU. The code for our project is provided at \href{https://github.com/ChaoningZhang/MobileSAM}{\textcolor{red}{MobileSAM}}), with a demo showing that MobileSAM can run relatively smoothly on CPU. △ Less

Submitted 1 July, 2023; v1 submitted 25 June, 2023; originally announced June 2023.

Comments: First work to make SAM lightweight for mobile applications

arXiv:2306.09591 [pdf, other]

A Vision-based Autonomous Perching Approach for Nano Aerial Vehicles

Authors: Truong-Dong Do, Sung Kyung Hong

Abstract: Over the past decades, quadcopters have been investigated, due to their mobility and flexibility to operate in a wide range of environments. They have been used in various areas, including surveillance and monitoring. During a mission, drones do not have to remain active once they have reached a target location. To conserve energy and maintain a static position, it is possible to perch and stop th… ▽ More Over the past decades, quadcopters have been investigated, due to their mobility and flexibility to operate in a wide range of environments. They have been used in various areas, including surveillance and monitoring. During a mission, drones do not have to remain active once they have reached a target location. To conserve energy and maintain a static position, it is possible to perch and stop the motors in such situations. The problem of achieving a reliable and highly accurate perching method remains a challenge and promising. In this paper, a vision-based autonomous perching approach for nano quadcopters onto a predefined perching target on horizontal surfaces is proposed. First, a perching target with a small marker inside a larger one is designed to improve detection capability at a variety of ranges. Second, a monocular camera is used to calculate the relative poses of the flying vehicle from the markers detected. Then, a Kalman filter is applied to determine the pose more reliably, especially when measurement data is missing. Next, we introduce an algorithm for merging the pose data from multiple markers. Finally, the poses are sent to the perching planner to conduct the real flight test to align the drone with the target's center and steer it there. Based on the experimental results, the approach proved to be effective and feasible. The drone can successfully perch on the center of markers within two centimeters of precision. △ Less

Submitted 15 June, 2023; originally announced June 2023.

Comments: 6 pages, 6 figures, 2 tables. arXiv admin note: substantial text overlap with arXiv:2304.14838

arXiv:2306.09333 [pdf, other]

doi 10.1126/science.adi7877

Dynamics of magnetization at infinite temperature in a Heisenberg spin chain

Authors: Eliott Rosenberg, Trond Andersen, Rhine Samajdar, Andre Petukhov, Jesse Hoke, Dmitry Abanin, Andreas Bengtsson, Ilya Drozdov, Catherine Erickson, Paul Klimov, Xiao Mi, Alexis Morvan, Matthew Neeley, Charles Neill, Rajeev Acharya, Richard Allen, Kyle Anderson, Markus Ansmann, Frank Arute, Kunal Arya, Abraham Asfaw, Juan Atalaya, Joseph Bardin, A. Bilmes, Gina Bortoli , et al. (156 additional authors not shown)

Abstract: Understanding universal aspects of quantum dynamics is an unresolved problem in statistical mechanics. In particular, the spin dynamics of the 1D Heisenberg model were conjectured to belong to the Kardar-Parisi-Zhang (KPZ) universality class based on the scaling of the infinite-temperature spin-spin correlation function. In a chain of 46 superconducting qubits, we study the probability distributio… ▽ More Understanding universal aspects of quantum dynamics is an unresolved problem in statistical mechanics. In particular, the spin dynamics of the 1D Heisenberg model were conjectured to belong to the Kardar-Parisi-Zhang (KPZ) universality class based on the scaling of the infinite-temperature spin-spin correlation function. In a chain of 46 superconducting qubits, we study the probability distribution, $P(\mathcal{M})$, of the magnetization transferred across the chain's center. The first two moments of $P(\mathcal{M})$ show superdiffusive behavior, a hallmark of KPZ universality. However, the third and fourth moments rule out the KPZ conjecture and allow for evaluating other theories. Our results highlight the importance of studying higher moments in determining dynamic universality classes and provide key insights into universal behavior in quantum systems. △ Less

Submitted 4 April, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

Journal ref: Science 384, 48-53 (2024)

arXiv:2306.07713 [pdf, other]

Robustness of SAM: Segment Anything Under Corruptions and Beyond

Authors: Yu Qiao, Chaoning Zhang, Taegoo Kang, Donghun Kim, Chenshuang Zhang, Choong Seon Hong

Abstract: Segment anything model (SAM), as the name suggests, is claimed to be capable of cutting out any object and demonstrates impressive zero-shot transfer performance with the guidance of prompts. However, there is currently a lack of comprehensive evaluation regarding its robustness under various corruptions. Understanding the robustness of SAM across different corruption scenarios is crucial for its… ▽ More Segment anything model (SAM), as the name suggests, is claimed to be capable of cutting out any object and demonstrates impressive zero-shot transfer performance with the guidance of prompts. However, there is currently a lack of comprehensive evaluation regarding its robustness under various corruptions. Understanding the robustness of SAM across different corruption scenarios is crucial for its real-world deployment. Prior works show that SAM is biased towards texture (style) rather than shape, motivated by which we start by investigating its robustness against style transfer, which is synthetic corruption. Following by interpreting the effects of synthetic corruption as style changes, we proceed to conduct a comprehensive evaluation for its robustness against 15 types of common corruption. These corruptions mainly fall into categories such as digital, noise, weather, and blur, and within each corruption category, we explore 5 severity levels to simulate real-world corruption scenarios. Beyond the corruptions, we further assess the robustness of SAM against local occlusion and local adversarial patch attacks. To the best of our knowledge, our work is the first of its kind to evaluate the robustness of SAM under style change, local occlusion, and local adversarial patch attacks. Given that patch attacks visible to human eyes are easily detectable, we further assess its robustness against global adversarial attacks that are imperceptible to human eyes. Overall, this work provides a comprehensive empirical study of the robustness of SAM, evaluating its performance under various corruptions and extending the assessment to critical aspects such as local occlusion, local adversarial patch attacks, and global adversarial attacks. These evaluations yield valuable insights into the practical applicability and effectiveness of SAM in addressing real-world challenges. △ Less

Submitted 4 September, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

Comments: The first work evaluates the robustness of SAM under various corruptions such as style transfer, local occlusion, and adversarial patch attack

arXiv:2306.06211 [pdf, other]

A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering

Authors: Chaoning Zhang, Fachrina Dewi Puspitasari, Sheng Zheng, Chenghao Li, Yu Qiao, Taegoo Kang, Xinru Shan, Chenshuang Zhang, Caiyan Qin, Francois Rameau, Lik-Hang Lee, Sung-Ho Bae, Choong Seon Hong

Abstract: Segment anything model (SAM) developed by Meta AI Research has recently attracted significant attention. Trained on a large segmentation dataset of over 1 billion masks, SAM is capable of segmenting any object on a certain image. In the original SAM work, the authors turned to zero-short transfer tasks (like edge detection) for evaluating the performance of SAM. Recently, numerous works have attem… ▽ More Segment anything model (SAM) developed by Meta AI Research has recently attracted significant attention. Trained on a large segmentation dataset of over 1 billion masks, SAM is capable of segmenting any object on a certain image. In the original SAM work, the authors turned to zero-short transfer tasks (like edge detection) for evaluating the performance of SAM. Recently, numerous works have attempted to investigate the performance of SAM in various scenarios to recognize and segment objects. Moreover, numerous projects have emerged to show the versatility of SAM as a foundation model by combining it with other models, like Grounding DINO, Stable Diffusion, ChatGPT, etc. With the relevant papers and projects increasing exponentially, it is challenging for the readers to catch up with the development of SAM. To this end, this work conducts the first yet comprehensive survey on SAM. This is an ongoing project and we intend to update the manuscript on a regular basis. Therefore, readers are welcome to contact us if they complete new works related to SAM so that we can include them in our next version. △ Less

Submitted 3 July, 2023; v1 submitted 12 May, 2023; originally announced June 2023.

Comments: First survey on Segment Anything Model (SAM), work under progress

arXiv:2306.02866 [pdf, other]

Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance

Authors: Jinwoo Kim, Tien Dat Nguyen, Ayhan Suleymanzade, Hyeokjun An, Seunghoon Hong

Abstract: We present a novel framework to overcome the limitations of equivariant architectures in learning functions with group symmetries. In contrary to equivariant architectures, we use an arbitrary base model such as an MLP or a transformer and symmetrize it to be equivariant to the given group by employing a small equivariant network that parameterizes the probabilistic distribution underlying the sym… ▽ More We present a novel framework to overcome the limitations of equivariant architectures in learning functions with group symmetries. In contrary to equivariant architectures, we use an arbitrary base model such as an MLP or a transformer and symmetrize it to be equivariant to the given group by employing a small equivariant network that parameterizes the probabilistic distribution underlying the symmetrization. The distribution is end-to-end trained with the base model which can maximize performance while reducing sample complexity of symmetrization. We show that this approach ensures not only equivariance to given group but also universal approximation capability in expectation. We implement our method on various base models, including patch-based transformers that can be initialized from pretrained vision transformers, and test them for a wide range of symmetry groups including permutation and Euclidean groups and their combinations. Empirical tests show competitive results against tailored equivariant architectures, suggesting the potential for learning equivariant functions for diverse groups using a non-equivariant universal base architecture. We further show evidence of enhanced learning in symmetric modalities, like graphs, when pretrained from non-symmetric modalities, like vision. Code is available at https://github.com/jw9730/lps. △ Less

Submitted 13 April, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: 32 pages, 11 figures

arXiv:2306.00912 [pdf, other]

Introduction to Generalized Global Symmetries in QFT and Particle Physics

Authors: T. Daniel Brennan, Sungwoo Hong

Abstract: Generalized symmetries (also known as categorical symmetries) is a newly developing technique for studying quantum field theories. It has given us new insights into the structure of QFT and many new powerful tools that can be applied to the study of particle phenomenology. In these notes we give an exposition to the topic of generalized/categorical symmetries for high energy phenomenologists altho… ▽ More Generalized symmetries (also known as categorical symmetries) is a newly developing technique for studying quantum field theories. It has given us new insights into the structure of QFT and many new powerful tools that can be applied to the study of particle phenomenology. In these notes we give an exposition to the topic of generalized/categorical symmetries for high energy phenomenologists although the topics covered may be useful to the broader physics community. Here we describe generalized symmetries without the use of category theory and pay particular attention to the introduction of discrete symmetries and their gauging. △ Less

Submitted 2 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: 74 pages plus appendices

arXiv:2305.17816 [pdf, other]

Josephson parametric amplifier with Chebyshev gain profile and high saturation

Authors: Ryan Kaufman, Theodore White, Mark I. Dykman, Andrea Iorio, George Stirling, Sabrina Hong, Alex Opremcak, Andreas Bengtsson, Lara Faoro, Joseph C. Bardin, Tim Burger, Robert Gasca, Ofer Naaman

Abstract: We demonstrate a Josephson parametric amplifier design with a band-pass impedance matching network based on a third-order Chebyshev prototype. We measured eight amplifiers operating at 4.6 GHz that exhibit gains of 20 dB with less than 1 dB gain ripple and up to 500 MHz bandwidth. The amplifiers further achieve high output saturation powers around -73 dBm based on the use of rf-SQUID arrays as the… ▽ More We demonstrate a Josephson parametric amplifier design with a band-pass impedance matching network based on a third-order Chebyshev prototype. We measured eight amplifiers operating at 4.6 GHz that exhibit gains of 20 dB with less than 1 dB gain ripple and up to 500 MHz bandwidth. The amplifiers further achieve high output saturation powers around -73 dBm based on the use of rf-SQUID arrays as their nonlinear element. We characterize the system readout efficiency and its signal-to-noise ratio near saturation using a Sycamore processor, finding the data consistent with near quantum limited noise performance of the amplifiers. In addition, we measure the amplifiers' intermodulation distortion in two-tone experiments as a function of input power and inter-tone detuning, and observe excess distortion at small detuning with a pronounced dip as a function of signal power, which we interpret in terms of power-dependent dielectric losses. △ Less

Submitted 28 May, 2023; originally announced May 2023.

Comments: 14 pages, 10 figures

arXiv:2305.17701 [pdf, other]

KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application

Authors: Hwaran Lee, Seokhee Hong, Joonsuk Park, Takyoung Kim, Gunhee Kim, Jung-Woo Ha

Abstract: Large language models (LLMs) learn not only natural text generation abilities but also social biases against different demographic groups from real-world data. This poses a critical risk when deploying LLM-based applications. Existing research and resources are not readily applicable in South Korea due to the differences in language and culture, both of which significantly affect the biases and ta… ▽ More Large language models (LLMs) learn not only natural text generation abilities but also social biases against different demographic groups from real-world data. This poses a critical risk when deploying LLM-based applications. Existing research and resources are not readily applicable in South Korea due to the differences in language and culture, both of which significantly affect the biases and targeted demographic groups. This limitation requires localized social bias datasets to ensure the safe and effective deployment of LLMs. To this end, we present KO SB I, a new social bias dataset of 34k pairs of contexts and sentences in Korean covering 72 demographic groups in 15 categories. We find that through filtering-based moderation, social biases in generated content can be reduced by 16.47%p on average for HyperCLOVA (30B and 82B), and GPT-3. △ Less

Submitted 29 May, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

Comments: 17 pages, 8 figures, 12 tables, ACL 2023

arXiv:2305.17696 [pdf, other]

SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration

Authors: Hwaran Lee, Seokhee Hong, Joonsuk Park, Takyoung Kim, Meeyoung Cha, Yejin Choi, Byoung Pil Kim, Gunhee Kim, Eun-Ju Lee, Yong Lim, Alice Oh, Sangchul Park, Jung-Woo Ha

Abstract: The potential social harms that large language models pose, such as generating offensive content and reinforcing biases, are steeply rising. Existing works focus on coping with this concern while interacting with ill-intentioned users, such as those who explicitly make hate speech or elicit harmful responses. However, discussions on sensitive issues can become toxic even if the users are well-inte… ▽ More The potential social harms that large language models pose, such as generating offensive content and reinforcing biases, are steeply rising. Existing works focus on coping with this concern while interacting with ill-intentioned users, such as those who explicitly make hate speech or elicit harmful responses. However, discussions on sensitive issues can become toxic even if the users are well-intentioned. For safer models in such scenarios, we present the Sensitive Questions and Acceptable Response (SQuARe) dataset, a large-scale Korean dataset of 49k sensitive questions with 42k acceptable and 46k non-acceptable responses. The dataset was constructed leveraging HyperCLOVA in a human-in-the-loop manner based on real news headlines. Experiments show that acceptable response generation significantly improves for HyperCLOVA and GPT-3, demonstrating the efficacy of this dataset. △ Less

Submitted 28 May, 2023; originally announced May 2023.

Comments: 19 pages, 10 figures, ACL 2023

arXiv:2305.15060 [pdf, other]

Who Wrote this Code? Watermarking for Code Generation

Authors: Taehyun Lee, Seokhee Hong, Jaewoo Ahn, Ilgee Hong, Hwaran Lee, Sangdoo Yun, Jamin Shin, Gunhee Kim

Abstract: Since the remarkable generation performance of large language models raised ethical and legal concerns, approaches to detect machine-generated text by embedding watermarks are being developed. However, we discover that the existing works fail to function appropriately in code generation tasks due to the task's nature of having low entropy. Extending a logit-modifying watermark method, we propose S… ▽ More Since the remarkable generation performance of large language models raised ethical and legal concerns, approaches to detect machine-generated text by embedding watermarks are being developed. However, we discover that the existing works fail to function appropriately in code generation tasks due to the task's nature of having low entropy. Extending a logit-modifying watermark method, we propose Selective WatErmarking via Entropy Thresholding (SWEET), which enhances detection ability and mitigates code quality degeneration by removing low-entropy segments at generating and detecting watermarks. Our experiments show that SWEET significantly improves code quality preservation while outperforming all baselines, including post-hoc detection methods, in detecting machine-generated code text. Our code is available in https://github.com/hongcheki/sweet-watermark. △ Less

Submitted 3 July, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: To be presented at ACL 2024

arXiv:2305.14330 [pdf, other]

DirecT2V: Large Language Models are Frame-Level Directors for Zero-Shot Text-to-Video Generation

Authors: Susung Hong, Junyoung Seo, Heeseong Shin, Sunghwan Hong, Seungryong Kim

Abstract: In the paradigm of AI-generated content (AIGC), there has been increasing attention to transferring knowledge from pre-trained text-to-image (T2I) models to text-to-video (T2V) generation. Despite their effectiveness, these frameworks face challenges in maintaining consistent narratives and handling shifts in scene composition or object placement from a single abstract user prompt. Exploring the a… ▽ More In the paradigm of AI-generated content (AIGC), there has been increasing attention to transferring knowledge from pre-trained text-to-image (T2I) models to text-to-video (T2V) generation. Despite their effectiveness, these frameworks face challenges in maintaining consistent narratives and handling shifts in scene composition or object placement from a single abstract user prompt. Exploring the ability of large language models (LLMs) to generate time-dependent, frame-by-frame prompts, this paper introduces a new framework, dubbed DirecT2V. DirecT2V leverages instruction-tuned LLMs as directors, enabling the inclusion of time-varying content and facilitating consistent video generation. To maintain temporal consistency and prevent mapping the value to a different object, we equip a diffusion model with a novel value mapping method and dual-softmax filtering, which do not require any additional training. The experimental results validate the effectiveness of our framework in producing visually coherent and storyful videos from abstract user prompts, successfully addressing the challenges of zero-shot video generation. △ Less

Submitted 6 February, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: The code and demo will be available at https://github.com/KU-CVLAB/DirecT2V

arXiv:2305.12240 [pdf, other]

Bridging Active Exploration and Uncertainty-Aware Deployment Using Probabilistic Ensemble Neural Network Dynamics

Authors: Taekyung Kim, Jungwi Mun, Junwon Seo, Beomsu Kim, Seongil Hong

Abstract: In recent years, learning-based control in robotics has gained significant attention due to its capability to address complex tasks in real-world environments. With the advances in machine learning algorithms and computational capabilities, this approach is becoming increasingly important for solving challenging control problems in robotics by learning unknown or partially known robot dynamics. Ac… ▽ More In recent years, learning-based control in robotics has gained significant attention due to its capability to address complex tasks in real-world environments. With the advances in machine learning algorithms and computational capabilities, this approach is becoming increasingly important for solving challenging control problems in robotics by learning unknown or partially known robot dynamics. Active exploration, in which a robot directs itself to states that yield the highest information gain, is essential for efficient data collection and minimizing human supervision. Similarly, uncertainty-aware deployment has been a growing concern in robotic control, as uncertain actions informed by the learned model can lead to unstable motions or failure. However, active exploration and uncertainty-aware deployment have been studied independently, and there is limited literature that seamlessly integrates them. This paper presents a unified model-based reinforcement learning framework that bridges these two tasks in the robotics control domain. Our framework uses a probabilistic ensemble neural network for dynamics learning, allowing the quantification of epistemic uncertainty via Jensen-Renyi Divergence. The two opposing tasks of exploration and deployment are optimized through state-of-the-art sampling-based MPC, resulting in efficient collection of training data and successful avoidance of uncertain state-action spaces. We conduct experiments on both autonomous vehicles and wheeled robots, showing promising results for both exploration and deployment. △ Less

Submitted 28 May, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

Comments: 2023 Robotics: Science and Systems (RSS). Project page: https://taekyung.me/rss2023-bridging

arXiv:2305.09888 [pdf, other]

doi 10.3847/1538-4357/acd852

Cluster-counterpart Voids: Void Identification from Galaxy Density Field

Authors: Junsup Shim, Changbom Park, Juhan Kim, Sungwook E. Hong

Abstract: We identify cosmic voids from galaxy density fields under the theory of void-cluster correspondence. We extend the previous novel void-identification method developed for the matter density field to the galaxy density field for practical applications. From cosmological N-body simulations, we construct galaxy number- and mass-weighted density fields to identify cosmic voids that are counterparts of… ▽ More We identify cosmic voids from galaxy density fields under the theory of void-cluster correspondence. We extend the previous novel void-identification method developed for the matter density field to the galaxy density field for practical applications. From cosmological N-body simulations, we construct galaxy number- and mass-weighted density fields to identify cosmic voids that are counterparts of galaxy clusters of specific mass. The parameters for the cluster-counterpart void identification such as Gaussian smoothing scale, density threshold, and core volume fraction are found for galaxy density fields. We achieve about $60$--$67\%$ of completeness and reliability for identifying the voids of corresponding cluster mass above $3\times10^{14}h^{-1}M_{\odot}$ from a galaxy sample with the mean number density, $\bar{n}=4.4\times10^{-3} (h^{-1}{\rm Mpc})^{-3}$. When the mean density is increased to $\bar{n}=10^{-2} (h^{-1}{\rm Mpc})^{-3}$, the detection rate is enhanced by $\sim2$--$7\%$ depending on the `mass scale' of voids. We find that the detectability is insensitive to the density weighting scheme applied to generate the density field. Our result demonstrates that we can apply this method to the galaxy redshift survey data to identify cosmic voids corresponding statistically to the galaxy clusters in a given mass range. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: 10 pages, 5 figures, 1 table. Accepted for publication in ApJ

arXiv:2305.06657 [pdf, other]

On Practical Robust Reinforcement Learning: Practical Uncertainty Set and Double-Agent Algorithm

Authors: Ukjo Hwang, Songnam Hong

Abstract: Robust reinforcement learning (RRL) aims at seeking a robust policy to optimize the worst case performance over an uncertainty set of Markov decision processes (MDPs). This set contains some perturbed MDPs from a nominal MDP (N-MDP) that generate samples for training, which reflects some potential mismatches between training (i.e., N-MDP) and true environments. In this paper we present an elaborat… ▽ More Robust reinforcement learning (RRL) aims at seeking a robust policy to optimize the worst case performance over an uncertainty set of Markov decision processes (MDPs). This set contains some perturbed MDPs from a nominal MDP (N-MDP) that generate samples for training, which reflects some potential mismatches between training (i.e., N-MDP) and true environments. In this paper we present an elaborated uncertainty set by excluding some implausible MDPs from the existing sets. Under this uncertainty set, we develop a sample-based RRL algorithm (named ARQ-Learning) for tabular setting and characterize its finite-time error bound. Also, it is proved that ARQ-Learning converges as fast as the standard Q-Learning and robust Q-Learning while ensuring better robustness. We introduce an additional pessimistic agent which can tackle the major bottleneck for the extension of ARQ-Learning into the cases with larger or continuous state spaces. Incorporating this idea into RL algorithms, we propose double-agent algorithms for model-free RRL. Via experiments, we demonstrate the effectiveness of the proposed algorithms. △ Less

Submitted 19 November, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

arXiv:2305.06131 [pdf, other]

Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era

Authors: Chenghao Li, Chaoning Zhang, Atish Waghwase, Lik-Hang Lee, Francois Rameau, Yang Yang, Sung-Ho Bae, Choong Seon Hong

Abstract: Generative AI (AIGC, a.k.a. AI generated content) has made significant progress in recent years, with text-guided content generation being the most practical as it facilitates interaction between human instructions and AIGC. Due to advancements in text-to-image and 3D modeling technologies (like NeRF), text-to-3D has emerged as a nascent yet highly active research field. Our work conducts the firs… ▽ More Generative AI (AIGC, a.k.a. AI generated content) has made significant progress in recent years, with text-guided content generation being the most practical as it facilitates interaction between human instructions and AIGC. Due to advancements in text-to-image and 3D modeling technologies (like NeRF), text-to-3D has emerged as a nascent yet highly active research field. Our work conducts the first comprehensive survey and follows up on subsequent research progress in the overall field, aiming to help readers interested in this direction quickly catch up with its rapid development. First, we introduce 3D data representations, including both Euclidean and non-Euclidean data. Building on this foundation, we introduce various foundational technologies and summarize how recent work combines these foundational technologies to achieve satisfactory text-to-3D results. Additionally, we present mainstream baselines and research directions in recent text-to-3D technology, including fidelity, efficiency, consistency, controllability, diversity, and applicability. Furthermore, we summarize the usage of text-to-3D technology in various applications, including avatar generation, texture generation, shape editing, and scene generation. △ Less

Submitted 10 June, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

arXiv:2305.03864 [pdf]

Revisiting contrast mechanism of lateral piezoresponse force microscopy

Authors: Jaegyu Kim, Seongwoo Cho, Jiwon Yeom, Seongmun Eom, Seungbum Hong

Abstract: Piezoresponse force microscopy (PFM) has been widely used for nanoscale analysis of piezoelectric properties and ferroelectric domains. Although PFM is useful because of its simple and nondestructive features, PFM measurements can be obscured by non-piezoelectric effects that could affect the PFM signals or lead to ferroelectric-like behaviors in non-ferroelectric materials. Many researches have a… ▽ More Piezoresponse force microscopy (PFM) has been widely used for nanoscale analysis of piezoelectric properties and ferroelectric domains. Although PFM is useful because of its simple and nondestructive features, PFM measurements can be obscured by non-piezoelectric effects that could affect the PFM signals or lead to ferroelectric-like behaviors in non-ferroelectric materials. Many researches have addressed related technical issues, but they have primarily focused on vertical PFM. Here, we investigate significant discrepancies of lateral PFM signals between the trace and the retrace scans, which are proportional to the scan angle and the cantilever lateral tilting discrepancy. The discrepancies of PFM signals are analyzed based on intrinsic and extrinsic components, including out-of-plane piezoresponse, electrostatic force, and other factors. Our research will contribute to the accurate PFM measurements for visualization of ferroelectric in-plane polarization distributions. △ Less

Submitted 5 May, 2023; originally announced May 2023.

Comments: 38 pages, 18 figures

arXiv:2305.02904 [pdf, other]

Quantum Enhanced Probes of Magnetic Circular Dichroism

Authors: Chengyun Hua, Claire E. Marvinney, Seongjin Hong, Matthew Feldman, Yun-Yi Pai, Michael Chilcote, Joshua Rabinowitz, Raphael C. Pooser, Alberto Marino, Benjamin J. Lawrie

Abstract: Magneto-optical microscopies, including optical measurements of magnetic circular dichroism, are increasingly ubiquitous tools for probing spin-orbit coupling, charge-carrier g-factors, and chiral excitations in matter, but the minimum detectable signal in classical magnetic circular dichroism measurements is fundamentally limited by the shot-noise limit of the optical readout field. Here, we use… ▽ More Magneto-optical microscopies, including optical measurements of magnetic circular dichroism, are increasingly ubiquitous tools for probing spin-orbit coupling, charge-carrier g-factors, and chiral excitations in matter, but the minimum detectable signal in classical magnetic circular dichroism measurements is fundamentally limited by the shot-noise limit of the optical readout field. Here, we use a two-mode squeezed light source to improve the minimum detectable signal in magnetic circular dichroism measurements by 3 dB compared with state-of-the-art classical measurements, even with relatively lossy samples like terbium gallium garnet. We also identify additional opportunities for improvement in quantum-enhanced magneto-optical microscopies, and we demonstrate the importance of these approaches for environmentally sensitive materials and for low temperature measurements where increased optical power can introduce unacceptable thermal perturbations. △ Less

Submitted 4 May, 2023; originally announced May 2023.

arXiv:2305.01943 [pdf, other]

Cosmological Parameter Constraints from the SDSS Density and Momentum Power Spectra

Authors: Stephen Appleby, Motonari Tonegawa, Changbom Park, Sungwook E. Hong, Juhan Kim, Yongmin Yoon

Abstract: We extract the galaxy density and momentum power spectra from a subset of early-type galaxies in the SDSS DR7 main galaxy catalog. Using galaxy distance information inferred from the improved fundamental plane described in \citet{Yoon_2020}, we reconstruct the peculiar velocities of the galaxies and generate number density and density-weighted velocity fields, from which we extract the galaxy dens… ▽ More We extract the galaxy density and momentum power spectra from a subset of early-type galaxies in the SDSS DR7 main galaxy catalog. Using galaxy distance information inferred from the improved fundamental plane described in \citet{Yoon_2020}, we reconstruct the peculiar velocities of the galaxies and generate number density and density-weighted velocity fields, from which we extract the galaxy density and momentum power spectra. We compare the measured values to the theoretical expectation of the same statistics, assuming an input $Λ$CDM model and using a third-order perturbative expansion. After validating our analysis pipeline with a series of mock data sets, we apply our methodology to the SDSS data and arrive at constraints $fσ_{8} = 0.471_{-0.080}^{+0.077}$ and $b_{1}σ_{8} = 0.920_{-0.070}^{+0.070}$ at a mean redshift $\bar{z} = 0.04$. Our result is consistent with the Planck cosmological best fit parameters for the $Λ$CDM model. The momentum power spectrum is found to be strongly contaminated by small scale velocity dispersion, which suppresses power by $\sim {\cal O}(30\%)$ on intermediate scales $k \sim 0.05 \, h \, {\rm Mpc}^{-1}$. △ Less

Submitted 2 October, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

Comments: 15 figures, 4 tables, accepted for ApJ

arXiv:2305.00278 [pdf, other]

Segment Anything Model (SAM) Meets Glass: Mirror and Transparent Objects Cannot Be Easily Detected

Authors: Dongsheng Han, Chaoning Zhang, Yu Qiao, Maryam Qamar, Yuna Jung, SeungKyu Lee, Sung-Ho Bae, Choong Seon Hong

Abstract: Meta AI Research has recently released SAM (Segment Anything Model) which is trained on a large segmentation dataset of over 1 billion masks. As a foundation model in the field of computer vision, SAM (Segment Anything Model) has gained attention for its impressive performance in generic object segmentation. Despite its strong capability in a wide range of zero-shot transfer tasks, it remains unkn… ▽ More Meta AI Research has recently released SAM (Segment Anything Model) which is trained on a large segmentation dataset of over 1 billion masks. As a foundation model in the field of computer vision, SAM (Segment Anything Model) has gained attention for its impressive performance in generic object segmentation. Despite its strong capability in a wide range of zero-shot transfer tasks, it remains unknown whether SAM can detect things in challenging setups like transparent objects. In this work, we perform an empirical evaluation of two glass-related challenging scenarios: mirror and transparent objects. We found that SAM often fails to detect the glass in both scenarios, which raises concern for deploying the SAM in safety-critical situations that have various forms of glass. △ Less

Submitted 29 April, 2023; originally announced May 2023.

arXiv:2305.00206 [pdf, other]

doi 10.3847/1538-4357/acd185

Tomographic Alcock-Paczynski Test with Redshift-Space Correlation Function: Evidence for the Dark Energy Equation of State Parameter w>-1

Authors: Fuyu Dong, Changbom Park, Sungwook E. Hong, Juhan Kim, Ho Seong Hwang, Hyunbae Park, Stephen Appleby

Abstract: The apparent shape of galaxy clustering depends on the adopted cosmology used to convert observed redshift to comoving distance, the $r(z)$ relation, as it changes the line elements along and across the line of sight differently. The Alcock-Paczyński (AP) test exploits this property to constrain the expansion history of the universe. We present an extensive review of past studies on the AP test. W… ▽ More The apparent shape of galaxy clustering depends on the adopted cosmology used to convert observed redshift to comoving distance, the $r(z)$ relation, as it changes the line elements along and across the line of sight differently. The Alcock-Paczyński (AP) test exploits this property to constrain the expansion history of the universe. We present an extensive review of past studies on the AP test. We adopt an extended AP test method introduced by Park et al. (2019), which uses the full shape of redshift-space two-point correlation function (CF) as the standard shape, and apply it to the SDSS DR7, BOSS, and eBOSS LRG samples covering the redshift range up to $z=0.8$.We calibrate the test against the nonlinear cosmology-dependent systematic evolution of the CF shape using the Multiverse simulations. We focus on examining whether or not the flat $Λ$CDM `concordance' model is consistent with observation. We constrain the flat $w$CDM model to have $w=-0.892_{-0.050}^{+0.045}$ and $Ω_m=0.282_{-0.023}^{+0.024}$ from our AP test alone, which is significantly tighter than the constraints from the BAO or SNe I$a$ methods by a factor of 3 - 6. When the AP test result is combined with the recent BAO and SNe I$a$ results, we obtain $w=-0.903_{-0.023}^{+0.023}$ and $Ω_m=0.285_{-0.009}^{+0.014}$. This puts a strong tension with the flat $Λ$CDM model with $w=-1$ at $4.2σ$ level. Consistency with $w=-1$ is obtained only when the Planck CMB observation is combined. It remains to see if this tension between observations of galaxy distribution at low redshifts and CMB anisotropy at the decoupling epoch becomes greater in the future studies and leads us to a new paradigm of cosmology. △ Less

Submitted 29 April, 2023; originally announced May 2023.

Comments: 21 pages, 11 figures, accepted by ApJ

arXiv:2304.14838 [pdf, other]

Vision-based Target Pose Estimation with Multiple Markers for the Perching of UAVs

Authors: Truong-Dong Do, Nguyen Xuan-Mung, Sung-Kyung Hong

Abstract: Autonomous Nano Aerial Vehicles have been increasingly popular in surveillance and monitoring operations due to their efficiency and maneuverability. Once a target location has been reached, drones do not have to remain active during the mission. It is possible for the vehicle to perch and stop its motors in such situations to conserve energy, as well as maintain a static position in unfavorable f… ▽ More Autonomous Nano Aerial Vehicles have been increasingly popular in surveillance and monitoring operations due to their efficiency and maneuverability. Once a target location has been reached, drones do not have to remain active during the mission. It is possible for the vehicle to perch and stop its motors in such situations to conserve energy, as well as maintain a static position in unfavorable flying conditions. In the perching target estimation phase, the steady and accuracy of a visual camera with markers is a significant challenge. It is rapidly detectable from afar when using a large marker, but when the drone approaches, it quickly disappears as out of camera view. In this paper, a vision-based target poses estimation method using multiple markers is proposed to deal with the above-mentioned problems. First, a perching target with a small marker inside a larger one is designed to improve detection capability at wide and close ranges. Second, the relative poses of the flying vehicle are calculated from detected markers using a monocular camera. Next, a Kalman filter is applied to provide a more stable and reliable pose estimation, especially when the measurement data is missing due to unexpected reasons. Finally, we introduced an algorithm for merging the poses data from multi markers. The poses are then sent to the position controller to align the drone and the marker's center and steer it to perch on the target. The experimental results demonstrated the effectiveness and feasibility of the adopted approach. The drone can perch successfully onto the center of the markers with the attached 25mm-diameter rounded magnet. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Comments: 5 pages, 6 figures, 2 tables

arXiv:2304.13878 [pdf, other]

doi 10.1126/science.adh9932

Stable Quantum-Correlated Many Body States through Engineered Dissipation

Authors: X. Mi, A. A. Michailidis, S. Shabani, K. C. Miao, P. V. Klimov, J. Lloyd, E. Rosenberg, R. Acharya, I. Aleiner, T. I. Andersen, M. Ansmann, F. Arute, K. Arya, A. Asfaw, J. Atalaya, J. C. Bardin, A. Bengtsson, G. Bortoli, A. Bourassa, J. Bovaird, L. Brill, M. Broughton, B. B. Buckley, D. A. Buell, T. Burger , et al. (142 additional authors not shown)

Abstract: Engineered dissipative reservoirs have the potential to steer many-body quantum systems toward correlated steady states useful for quantum simulation of high-temperature superconductivity or quantum magnetism. Using up to 49 superconducting qubits, we prepared low-energy states of the transverse-field Ising model through coupling to dissipative auxiliary qubits. In one dimension, we observed long-… ▽ More Engineered dissipative reservoirs have the potential to steer many-body quantum systems toward correlated steady states useful for quantum simulation of high-temperature superconductivity or quantum magnetism. Using up to 49 superconducting qubits, we prepared low-energy states of the transverse-field Ising model through coupling to dissipative auxiliary qubits. In one dimension, we observed long-range quantum correlations and a ground-state fidelity of 0.86 for 18 qubits at the critical point. In two dimensions, we found mutual information that extends beyond nearest neighbors. Lastly, by coupling the system to auxiliaries emulating reservoirs with different chemical potentials, we explored transport in the quantum Heisenberg model. Our results establish engineered dissipation as a scalable alternative to unitary evolution for preparing entangled many-body states on noisy quantum processors. △ Less

Submitted 5 April, 2024; v1 submitted 26 April, 2023; originally announced April 2023.

Journal ref: Science 383, 1332-1337 (2024)

arXiv:2304.11119 [pdf, other]

Phase transition in Random Circuit Sampling

Authors: A. Morvan, B. Villalonga, X. Mi, S. Mandrà, A. Bengtsson, P. V. Klimov, Z. Chen, S. Hong, C. Erickson, I. K. Drozdov, J. Chau, G. Laun, R. Movassagh, A. Asfaw, L. T. A. N. Brandão, R. Peralta, D. Abanin, R. Acharya, R. Allen, T. I. Andersen, K. Anderson, M. Ansmann, F. Arute, K. Arya, J. Atalaya , et al. (160 additional authors not shown)

Abstract: Undesired coupling to the surrounding environment destroys long-range correlations on quantum processors and hinders the coherent evolution in the nominally available computational space. This incoherent noise is an outstanding challenge to fully leverage the computation power of near-term quantum processors. It has been shown that benchmarking Random Circuit Sampling (RCS) with Cross-Entropy Benc… ▽ More Undesired coupling to the surrounding environment destroys long-range correlations on quantum processors and hinders the coherent evolution in the nominally available computational space. This incoherent noise is an outstanding challenge to fully leverage the computation power of near-term quantum processors. It has been shown that benchmarking Random Circuit Sampling (RCS) with Cross-Entropy Benchmarking (XEB) can provide a reliable estimate of the effective size of the Hilbert space coherently available. The extent to which the presence of noise can trivialize the outputs of a given quantum algorithm, i.e. making it spoofable by a classical computation, is an unanswered question. Here, by implementing an RCS algorithm we demonstrate experimentally that there are two phase transitions observable with XEB, which we explain theoretically with a statistical model. The first is a dynamical transition as a function of the number of cycles and is the continuation of the anti-concentration point in the noiseless case. The second is a quantum phase transition controlled by the error per cycle; to identify it analytically and experimentally, we create a weak link model which allows varying the strength of noise versus coherent evolution. Furthermore, by presenting an RCS experiment with 67 qubits at 32 cycles, we demonstrate that the computational cost of our experiment is beyond the capabilities of existing classical supercomputers, even when accounting for the inevitable presence of noise. Our experimental and theoretical work establishes the existence of transitions to a stable computationally complex phase that is reachable with current quantum processors. △ Less

Submitted 21 December, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

arXiv:2304.10477 [pdf, other]

Location Privacy Protection Game against Adversary through Multi-user Cooperative Obfuscation

Authors: Shu Hong, Lingjie Duan

Abstract: In location-based services(LBSs), it is promising for users to crowdsource and share their Point-of-Interest(PoI) information with each other in a common cache to reduce query frequency and preserve location privacy. Yet most studies on multi-user privacy preservation overlook the opportunity of leveraging their service flexibility. This paper is the first to study multiple users' strategic cooper… ▽ More In location-based services(LBSs), it is promising for users to crowdsource and share their Point-of-Interest(PoI) information with each other in a common cache to reduce query frequency and preserve location privacy. Yet most studies on multi-user privacy preservation overlook the opportunity of leveraging their service flexibility. This paper is the first to study multiple users' strategic cooperation against an adversary's optimal inference attack, by leveraging mutual service flexibility. We formulate the multi-user privacy cooperation against the adversary as a max-min adversarial game and solve it in a linear program. Unlike the vast literature, even if a user finds the cached information useful, we prove it beneficial to still query the platform to further confuse the adversary. As the linear program's computational complexity still increases superlinearly with the number of users' possible locations, we propose a binary obfuscation scheme in two opposite spatial directions to achieve guaranteed performance with only constant complexity. Perhaps surprisingly, a user with a greater service flexibility should query with a less obfuscated location to add confusion. Finally, we provide guidance on the optimal query sequence among LBS users. Simulation results show that our crowdsourced privacy protection scheme greatly improves users' privacy as compared with existing approaches. △ Less

Submitted 17 February, 2023; originally announced April 2023.

Comments: Online technical report for a forthcoming paper in IEEE Transactions on Mobile Computing (TMC)

arXiv:2304.09522 [pdf, other]

doi 10.1364/OE.489688

Optomechanically induced optical trapping system based on photonic crystal cavities

Authors: Manuel Monterrosas-Romero, Seyed K. Alavi, Ester M. Koistinen, Sungkun Hong

Abstract: Optical trapping has proven to be a valuable experimental technique for precisely controlling small dielectric objects. However, due to their very nature, conventional optical traps are diffraction limited and require high intensities to confine the dielectric objects. In this work, we propose a novel optical trap based on dielectric photonic crystal nanobeam cavities, which overcomes the limitati… ▽ More Optical trapping has proven to be a valuable experimental technique for precisely controlling small dielectric objects. However, due to their very nature, conventional optical traps are diffraction limited and require high intensities to confine the dielectric objects. In this work, we propose a novel optical trap based on dielectric photonic crystal nanobeam cavities, which overcomes the limitations of conventional optical traps by significant factors. This is achieved by exploiting an optomechanically induced backaction mechanism between a dielectric nanoparticle and the cavities. We perform numerical simulations to show that our trap can fully levitate a submicron-scale dielectric particle with a trap width as narrow as 56 nm. It allows for achieving a high trap stiffness, therefore, a high Q-frequency product for the particle's motion while reducing the optical absorption by a factor of 43 compared to the cases for conventional optical tweezers. Moreover, we show that multiple laser tones can be used further to create a complex, dynamic potential landscape with feature sizes well below the diffraction limit. The presented optical trapping system offers new opportunities for precision sensing and fundamental quantum experiments based on levitated particles. △ Less

Submitted 19 April, 2023; originally announced April 2023.

arXiv:2304.08192 [pdf, other]

The Universe is worth $64^3$ pixels: Convolution Neural Network and Vision Transformers for Cosmology

Authors: Se Yeon Hwang, Cristiano G. Sabiu, Inkyu Park, Sungwook E. Hong

Abstract: We present a novel approach for estimating cosmological parameters, $Ω_m$, $σ_8$, $w_0$, and one derived parameter, $S_8$, from 3D lightcone data of dark matter halos in redshift space covering a sky area of $40^\circ \times 40^\circ$ and redshift range of $0.3 < z < 0.8$, binned to $64^3$ voxels. Using two deep learning algorithms, Convolutional Neural Network (CNN) and Vision Transformer (ViT),… ▽ More We present a novel approach for estimating cosmological parameters, $Ω_m$, $σ_8$, $w_0$, and one derived parameter, $S_8$, from 3D lightcone data of dark matter halos in redshift space covering a sky area of $40^\circ \times 40^\circ$ and redshift range of $0.3 < z < 0.8$, binned to $64^3$ voxels. Using two deep learning algorithms, Convolutional Neural Network (CNN) and Vision Transformer (ViT), we compare their performance with the standard two-point correlation (2pcf) function. Our results indicate that CNN yields the best performance, while ViT also demonstrates significant potential in predicting cosmological parameters. By combining the outcomes of Vision Transformer, Convolution Neural Network, and 2pcf, we achieved a substantial reduction in error compared to the 2pcf alone. To better understand the inner workings of the machine learning algorithms, we employed the Grad-CAM method to investigate the sources of essential information in activation maps of the CNN and ViT. Our findings suggest that the algorithms focus on different parts of the density field and redshift depending on which parameter they are predicting. This proof-of-concept work paves the way for incorporating deep learning methods to estimate cosmological parameters from large-scale structures, potentially leading to tighter constraints and improved understanding of the Universe. △ Less

Submitted 2 November, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

Comments: 23 pages, 10 figures

arXiv:2304.06954 [pdf]

doi 10.1021/acs.nanolett.2c04425

Electrical transport properties driven by unique bonding configuration in gamma-GeSe

Authors: Jeongsu Jang, Joonho Kim, Dongchul Sung, Jong Hyuk Kim, Joong-Eon Jung, Sol Lee, Jinsub Park, Chaewoon Lee, Heesun Bae, Seongil Im, Kibog Park, Young Jai Choi, Suklyun Hong, Kwanpyo Kim

Abstract: Group-IV monochalcogenides have recently shown great potential for their thermoelectric, ferroelectric, and other intriguing properties. The electrical properties of group-IV monochalcogenides exhibit a strong dependence on the chalcogen type. For example, GeTe exhibits high doping concentration, whereas S/Se-based chalcogenides are semiconductors with sizable bandgaps. Here, we investigate the el… ▽ More Group-IV monochalcogenides have recently shown great potential for their thermoelectric, ferroelectric, and other intriguing properties. The electrical properties of group-IV monochalcogenides exhibit a strong dependence on the chalcogen type. For example, GeTe exhibits high doping concentration, whereas S/Se-based chalcogenides are semiconductors with sizable bandgaps. Here, we investigate the electrical and thermoelectric properties of gamma-GeSe, a recently identified polymorph of GeSe. gamma-GeSe exhibits high electrical conductivity (~106 S/m) and a relatively low Seebeck coefficient (9.4 uV/K at room temperature) owing to its high p-doping level (5x1021 cm-3), which is in stark contrast to other known GeSe polymorphs. Elemental analysis and first-principles calculations confirm that the abundant formation of Ge vacancies leads to the high p-doping concentration. The magnetoresistance measurements also reveal weak-antilocalization because of spin-orbit coupling in the crystal. Our results demonstrate that gamma-GeSe is a unique polymorph in which the modified local bonding configuration leads to substantially different physical properties. △ Less

Submitted 14 April, 2023; originally announced April 2023.

arXiv:2304.06488 [pdf, other]

One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era

Authors: Chaoning Zhang, Chenshuang Zhang, Chenghao Li, Yu Qiao, Sheng Zheng, Sumit Kumar Dam, Mengchun Zhang, Jung Uk Kim, Seong Tae Kim, Jinwoo Choi, Gyeong-Moon Park, Sung-Ho Bae, Lik-Hang Lee, Pan Hui, In So Kweon, Choong Seon Hong

Abstract: OpenAI has recently released GPT-4 (a.k.a. ChatGPT plus), which is demonstrated to be one small step for generative AI (GAI), but one giant leap for artificial general intelligence (AGI). Since its official release in November 2022, ChatGPT has quickly attracted numerous users with extensive media coverage. Such unprecedented attention has also motivated numerous researchers to investigate ChatGPT… ▽ More OpenAI has recently released GPT-4 (a.k.a. ChatGPT plus), which is demonstrated to be one small step for generative AI (GAI), but one giant leap for artificial general intelligence (AGI). Since its official release in November 2022, ChatGPT has quickly attracted numerous users with extensive media coverage. Such unprecedented attention has also motivated numerous researchers to investigate ChatGPT from various aspects. According to Google scholar, there are more than 500 articles with ChatGPT in their titles or mentioning it in their abstracts. Considering this, a review is urgently needed, and our work fills this gap. Overall, this work is the first to survey ChatGPT with a comprehensive review of its underlying technology, applications, and challenges. Moreover, we present an outlook on how ChatGPT might evolve to realize general-purpose AIGC (a.k.a. AI-generated content), which will be a significant milestone for the development of AGI. △ Less

Submitted 4 April, 2023; originally announced April 2023.

Comments: A Survey on ChatGPT and GPT-4, 29 pages. Feedback is appreciated (chaoningzhang1990@gmail.com)

arXiv:2304.04609 [pdf]

Inverse design of artificial skins

Authors: Zhiguang Liu, Minkun Cai, Shenda Hong, Junli Shi, Sai Xie, Chang Liu, Huifeng Du, James D. Morin, Gang Li, Wang Liu, Hong Wang, Ke Tang, Nicholas X. Fang, Chuan Fei Guo

Abstract: Mimicking the perceptual functions of human cutaneous mechanoreceptors, artificial skins or flexible pressure sensors can transduce tactile stimuli to quantitative electrical signals. Conventional methods to design such devices follow a forward structure-to-property routine based on trial-and-error experiments/simulations, which take months or longer to determine one solution valid for one specifi… ▽ More Mimicking the perceptual functions of human cutaneous mechanoreceptors, artificial skins or flexible pressure sensors can transduce tactile stimuli to quantitative electrical signals. Conventional methods to design such devices follow a forward structure-to-property routine based on trial-and-error experiments/simulations, which take months or longer to determine one solution valid for one specific material. Target-oriented inverse design that shows far higher output efficiency has proven effective in other fields, but is still absent for artificial skins because of the difficulties in acquiring big data. Here, we report a property-to-structure inverse design of artificial skins based on small dataset machine learning, exhibiting a comprehensive efficiency at least four orders of magnitude higher than the conventional routine. The inverse routine can predict hundreds of solutions that overcome the intrinsic signal saturation problem for linear response in hours, and the solutions are valid to a variety of materials. Our results demonstrate that the inverse design allowed by small dataset is an efficient and powerful tool to target multifarious applications of artificial skins, which can potentially advance the fields of intelligent robots, advanced healthcare, and human-machine interfaces. △ Less

Submitted 10 April, 2023; originally announced April 2023.

arXiv:2304.04555 [pdf, other]

Neural Diffeomorphic Non-uniform B-spline Flows

Authors: Seongmin Hong, Se Young Chun

Abstract: Normalizing flows have been successfully modeling a complex probability distribution as an invertible transformation of a simple base distribution. However, there are often applications that require more than invertibility. For instance, the computation of energies and forces in physics requires the second derivatives of the transformation to be well-defined and continuous. Smooth normalizing flow… ▽ More Normalizing flows have been successfully modeling a complex probability distribution as an invertible transformation of a simple base distribution. However, there are often applications that require more than invertibility. For instance, the computation of energies and forces in physics requires the second derivatives of the transformation to be well-defined and continuous. Smooth normalizing flows employ infinitely differentiable transformation, but with the price of slow non-analytic inverse transforms. In this work, we propose diffeomorphic non-uniform B-spline flows that are at least twice continuously differentiable while bi-Lipschitz continuous, enabling efficient parametrization while retaining analytic inverse transforms based on a sufficient condition for diffeomorphism. Firstly, we investigate the sufficient condition for Ck-2-diffeomorphic non-uniform kth-order B-spline transformations. Then, we derive an analytic inverse transformation of the non-uniform cubic B-spline transformation for neural diffeomorphic non-uniform B-spline flows. Lastly, we performed experiments on solving the force matching problem in Boltzmann generators, demonstrating that our C2-diffeomorphic non-uniform B-spline flows yielded solutions better than previous spline flows and faster than smooth normalizing flows. Our source code is publicly available at https://github.com/smhongok/Non-uniform-B-spline-Flow. △ Less

Submitted 11 April, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

Comments: Accepted to AAAI 2023

arXiv:2304.03132 [pdf]

Do All Asians Look the Same?: A Comparative Analysis of the East Asian Facial Color Desires using Instagram

Authors: Jaeyoun You, Sojeong Park, Seok-Kyeong Hong, Bongwon Suh

Abstract: Selfies represent people's desires, and social media platforms like Instagram have been flooded with them. This study uses selfie data to examine how peoples' desires for ideal facial representations vary by region, particularly in East Asia. Through the analysis, we aim to refute the "all Asians prefer identical visuals," which is a subset of the prevalent Western belief that "all Asians look the… ▽ More Selfies represent people's desires, and social media platforms like Instagram have been flooded with them. This study uses selfie data to examine how peoples' desires for ideal facial representations vary by region, particularly in East Asia. Through the analysis, we aim to refute the "all Asians prefer identical visuals," which is a subset of the prevalent Western belief that "all Asians look the same." Our findings, reinforced by postcolonial interpretations, dispute those assumptions. We propose a strategy for resolving the mismatch between real-world desires and the Western beauty market's views. We expect the disparity between hegemonic color schemes and the augmented skin colors shown by our results may facilitate the study of color and Asian identity. △ Less

Submitted 6 April, 2023; originally announced April 2023.

arXiv:2304.01950 [pdf, other]

MP-FedCL: Multiprototype Federated Contrastive Learning for Edge Intelligence

Authors: Yu Qiao, Md. Shirajum Munir, Apurba Adhikary, Huy Q. Le, Avi Deb Raha, Chaoning Zhang, Choong Seon Hong

Abstract: Federated learning-assisted edge intelligence enables privacy protection in modern intelligent services. However, not independent and identically distributed (non-IID) distribution among edge clients can impair the local model performance. The existing single prototype-based strategy represents a class by using the mean of the feature space. However, feature spaces are usually not clustered, and a… ▽ More Federated learning-assisted edge intelligence enables privacy protection in modern intelligent services. However, not independent and identically distributed (non-IID) distribution among edge clients can impair the local model performance. The existing single prototype-based strategy represents a class by using the mean of the feature space. However, feature spaces are usually not clustered, and a single prototype may not represent a class well. Motivated by this, this paper proposes a multi-prototype federated contrastive learning approach (MP-FedCL) which demonstrates the effectiveness of using a multi-prototype strategy over a single-prototype under non-IID settings, including both label and feature skewness. Specifically, a multi-prototype computation strategy based on \textit{k-means} is first proposed to capture different embedding representations for each class space, using multiple prototypes ($k$ centroids) to represent a class in the embedding space. In each global round, the computed multiple prototypes and their respective model parameters are sent to the edge server for aggregation into a global prototype pool, which is then sent back to all clients to guide their local training. Finally, local training for each client minimizes their own supervised learning tasks and learns from shared prototypes in the global prototype pool through supervised contrastive learning, which encourages them to learn knowledge related to their own class from others and reduces the absorption of unrelated knowledge in each global iteration. Experimental results on MNIST, Digit-5, Office-10, and DomainNet show that our method outperforms multiple baselines, with an average test accuracy improvement of about 4.6\% and 10.4\% under feature and label non-IID distributions, respectively. △ Less

Submitted 11 October, 2023; v1 submitted 1 April, 2023; originally announced April 2023.

Comments: Accepted by IEEE Internet of Things

arXiv:2303.17719 [pdf, other]

Why is the winner the best?

Authors: Matthias Eisenmann, Annika Reinke, Vivienn Weru, Minu Dietlinde Tizabi, Fabian Isensee, Tim J. Adler, Sharib Ali, Vincent Andrearczyk, Marc Aubreville, Ujjwal Baid, Spyridon Bakas, Niranjan Balu, Sophia Bano, Jorge Bernal, Sebastian Bodenstedt, Alessandro Casella, Veronika Cheplygina, Marie Daum, Marleen de Bruijne, Adrien Depeursinge, Reuben Dorent, Jan Egger, David G. Ellis, Sandy Engelhardt, Melanie Ganz , et al. (100 additional authors not shown)

Abstract: International benchmarking competitions have become fundamental for the comparative performance assessment of image analysis methods. However, little attention has been given to investigating what can be learnt from these competitions. Do they really generate scientific progress? What are common and successful participation strategies? What makes a solution superior to a competing method? To addre… ▽ More International benchmarking competitions have become fundamental for the comparative performance assessment of image analysis methods. However, little attention has been given to investigating what can be learnt from these competitions. Do they really generate scientific progress? What are common and successful participation strategies? What makes a solution superior to a competing method? To address this gap in the literature, we performed a multi-center study with all 80 competitions that were conducted in the scope of IEEE ISBI 2021 and MICCAI 2021. Statistical analyses performed based on comprehensive descriptions of the submitted algorithms linked to their rank as well as the underlying participation strategies revealed common characteristics of winning solutions. These typically include the use of multi-task learning (63%) and/or multi-stage pipelines (61%), and a focus on augmentation (100%), image preprocessing (97%), data curation (79%), and postprocessing (66%). The "typical" lead of a winning team is a computer scientist with a doctoral degree, five years of experience in biomedical image analysis, and four years of experience in deep learning. Two core general development strategies stood out for highly-ranked teams: the reflection of the metrics in the method design and the focus on analyzing and handling failure cases. According to the organizers, 43% of the winning algorithms exceeded the state of the art but only 11% completely solved the respective domain problem. The insights of our study could help researchers (1) improve algorithm development strategies when approaching new problems, and (2) focus on open research questions revealed by this work. △ Less

Submitted 30 March, 2023; originally announced March 2023.

Comments: accepted to CVPR 2023

arXiv:2303.17565 [pdf, other]

doi 10.1103/PhysRevResearch.5.043202

Context Aware Fidelity Estimation

Authors: Dripto M. Debroy, Elie Genois, Jonathan A. Gross, Wojciech Mruczkiewicz, Kenny Lee, Sabrina Hong, Zijun Chen, Vadim Smelyanskiy, Zhang Jiang

Abstract: We present Context Aware Fidelity Estimation (CAFE), a framework for benchmarking quantum operations that offers several practical advantages over existing methods such as Randomized Benchmarking (RB) and Cross-Entropy Benchmarking (XEB). In CAFE, a gate or a subcircuit from some target experiment is repeated n times before being measured. By using a subcircuit, we account for effects from spatial… ▽ More We present Context Aware Fidelity Estimation (CAFE), a framework for benchmarking quantum operations that offers several practical advantages over existing methods such as Randomized Benchmarking (RB) and Cross-Entropy Benchmarking (XEB). In CAFE, a gate or a subcircuit from some target experiment is repeated n times before being measured. By using a subcircuit, we account for effects from spatial and temporal circuit context. Since coherent errors accumulate quadratically while incoherent errors grow linearly, we can separate them by fitting the measured fidelity as a function of n. One can additionally interleave the subcircuit with dynamical decoupling sequences to remove certain coherent error sources from the characterization when desired. We have used CAFE to experimentally validate our single- and two-qubit unitary characterizations by measuring fidelity against estimated unitaries. In numerical simulations, we find CAFE produces fidelity estimates at least as accurate as Interleaved RB while using significantly fewer resources. We also introduce a compact formulation for preparing an arbitrary two-qubit state with a single entangling operation, and use it to present a concrete example using CAFE to study CZ gates in parallel on a Sycamore processor. △ Less

Submitted 30 March, 2023; originally announced March 2023.

arXiv:2303.15413 [pdf, other]

Debiasing Scores and Prompts of 2D Diffusion for View-consistent Text-to-3D Generation

Authors: Susung Hong, Donghoon Ahn, Seungryong Kim

Abstract: Existing score-distilling text-to-3D generation techniques, despite their considerable promise, often encounter the view inconsistency problem. One of the most notable issues is the Janus problem, where the most canonical view of an object (\textit{e.g}., face or head) appears in other views. In this work, we explore existing frameworks for score-distilling text-to-3D generation and identify the m… ▽ More Existing score-distilling text-to-3D generation techniques, despite their considerable promise, often encounter the view inconsistency problem. One of the most notable issues is the Janus problem, where the most canonical view of an object (\textit{e.g}., face or head) appears in other views. In this work, we explore existing frameworks for score-distilling text-to-3D generation and identify the main causes of the view inconsistency problem -- the embedded bias of 2D diffusion models. Based on these findings, we propose two approaches to debias the score-distillation frameworks for view-consistent text-to-3D generation. Our first approach, called score debiasing, involves cutting off the score estimated by 2D diffusion models and gradually increasing the truncation value throughout the optimization process. Our second approach, called prompt debiasing, identifies conflicting words between user prompts and view prompts using a language model, and adjusts the discrepancy between view prompts and the viewing direction of an object. Our experimental results show that our methods improve the realism of the generated 3D objects by significantly reducing artifacts and achieve a good trade-off between faithfulness to the 2D diffusion models and 3D consistency with little overhead. Our project page is available at~\url{https://susunghong.github.io/Debiased-Score-Distillation-Sampling/}. △ Less

Submitted 19 December, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

Comments: Accepted to NeurIPS 2023. Project Page: https://susunghong.github.io/Debiased-Score-Distillation-Sampling/

arXiv:2303.15314 [pdf]

Development of a thorium coating on an aluminium substrate by using electrodeposition method and alpha spectroscopy

Authors: Dal-Ho Moon, Vivek Chavan, Vasant Bhoraskar, Yeong Hoon Jeong, Jung Ho Park, Su-Jeong Suh, Seung-Woo Hong

Abstract: A thin coating of thorium on aluminium substrates with the areal density of 110 to 130 $μg/cm^2$ is developed over a circular area of 22 mm diameter by using the electrodeposition method. An electrodeposition system is fabricated to consist of three components; an anode made of a platinum mesh, a cylindrical-shape vessel to contain the thorium solution, and a cathode in the form of a circular alum… ▽ More A thin coating of thorium on aluminium substrates with the areal density of 110 to 130 $μg/cm^2$ is developed over a circular area of 22 mm diameter by using the electrodeposition method. An electrodeposition system is fabricated to consist of three components; an anode made of a platinum mesh, a cylindrical-shape vessel to contain the thorium solution, and a cathode in the form of a circular aluminium plate. The aluminium plate is mounted horizontally, and the platinum mesh is connected to an axial rod of an electric motor, mounted vertically and normal to the plane of the aluminium. The electrolyte solution is prepared by dissolving a known-weight thorium nitrate powder in 0.8 M HNO3 and isopropanol. The system is operated either in constant voltage (CV) or constant current (CC) mode. Under the electric field between the anode and cathode, thorium ions were deposited on the aluminium substrate mounted on the cathode. In the CV mode at 320, 360, and 400 V and in the CC mode at 15 mA, thorium films were formed over a circular area of the aluminium substrate. The areal density of thorium coating was measured by detecting emitted alpha particles. The areal density of thorium varied from 80 to 130 $μg/cm^2$ by changing the deposition time from 10 to 60 min. The results from the CV mode and CC mode are compared, and the radial dependence in the measured areal density is discussed for different modes of the electric field. The developed thorium coatings are to be used in the in-house development of particle detectors, fast neutron converters, targets for thorium fission experiments, and other purposes. △ Less

Submitted 11 March, 2023; originally announced March 2023.

Comments: 11 pages, 5 figures, 1 table

arXiv:2303.14969 [pdf, other]

Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching

Authors: Donggyun Kim, Jinwoo Kim, Seongwoong Cho, Chong Luo, Seunghoon Hong

Abstract: Dense prediction tasks are a fundamental class of problems in computer vision. As supervised methods suffer from high pixel-wise labeling cost, a few-shot learning solution that can learn any dense task from a few labeled images is desired. Yet, current few-shot learning methods target a restricted set of tasks such as semantic segmentation, presumably due to challenges in designing a general and… ▽ More Dense prediction tasks are a fundamental class of problems in computer vision. As supervised methods suffer from high pixel-wise labeling cost, a few-shot learning solution that can learn any dense task from a few labeled images is desired. Yet, current few-shot learning methods target a restricted set of tasks such as semantic segmentation, presumably due to challenges in designing a general and unified model that is able to flexibly and efficiently adapt to arbitrary tasks of unseen semantics. We propose Visual Token Matching (VTM), a universal few-shot learner for arbitrary dense prediction tasks. It employs non-parametric matching on patch-level embedded tokens of images and labels that encapsulates all tasks. Also, VTM flexibly adapts to any task with a tiny amount of task-specific parameters that modulate the matching algorithm. We implement VTM as a powerful hierarchical encoder-decoder architecture involving ViT backbones where token matching is performed at multiple feature hierarchies. We experiment VTM on a challenging variant of Taskonomy dataset and observe that it robustly few-shot learns various unseen dense prediction tasks. Surprisingly, it is competitive with fully supervised baselines using only 10 labeled examples of novel tasks (0.004% of full supervision) and sometimes outperforms using 0.1% of full supervision. Codes are available at https://github.com/GitGyun/visual_token_matching. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Showing 151–200 of 1,164 results for author: Hong, S