subscribe to arXiv mailings

Confabulation: The Surprising Value of Large Language Model Hallucinations

Authors: Peiqi Sui, Eamon Duede, Sophie Wu, Richard Jean So

Abstract: This paper presents a systematic defense of large language model (LLM) hallucinations or 'confabulations' as a potential resource instead of a categorically negative pitfall. The standard view is that confabulations are inherently problematic and AI research should eliminate this flaw. In this paper, we argue and empirically demonstrate that measurable semantic characteristics of LLM confabulation… ▽ More This paper presents a systematic defense of large language model (LLM) hallucinations or 'confabulations' as a potential resource instead of a categorically negative pitfall. The standard view is that confabulations are inherently problematic and AI research should eliminate this flaw. In this paper, we argue and empirically demonstrate that measurable semantic characteristics of LLM confabulations mirror a human propensity to utilize increased narrativity as a cognitive resource for sense-making and communication. In other words, it has potential value. Specifically, we analyze popular hallucination benchmarks and reveal that hallucinated outputs display increased levels of narrativity and semantic coherence relative to veridical outputs. This finding reveals a tension in our usually dismissive understandings of confabulation. It suggests, counter-intuitively, that the tendency for LLMs to confabulate may be intimately associated with a positive capacity for coherent narrative-text generation. △ Less

Submitted 25 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

Comments: Forthcoming at ACL2024 main conference. 1 figure

arXiv:2406.02223 [pdf, other]

doi 10.1109/ICASSP49357.2023.10097143

SMCL: Saliency Masked Contrastive Learning for Long-tailed Recognition

Authors: Sanglee Park, Seung-won Hwang, Jungmin So

Abstract: Real-world data often follow a long-tailed distribution with a high imbalance in the number of samples between classes. The problem with training from imbalanced data is that some background features, common to all classes, can be unobserved in classes with scarce samples. As a result, this background correlates to biased predictions into ``major" classes. In this paper, we propose saliency masked… ▽ More Real-world data often follow a long-tailed distribution with a high imbalance in the number of samples between classes. The problem with training from imbalanced data is that some background features, common to all classes, can be unobserved in classes with scarce samples. As a result, this background correlates to biased predictions into ``major" classes. In this paper, we propose saliency masked contrastive learning, a new method that uses saliency masking and contrastive learning to mitigate the problem and improve the generalizability of a model. Our key idea is to mask the important part of an image using saliency detection and use contrastive learning to move the masked image towards minor classes in the feature space, so that background features present in the masked image are no longer correlated with the original class. Experiment results show that our method achieves state-of-the-art level performance on benchmark long-tailed datasets. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: accepted at ICASSP 2023

arXiv:2406.01801 [pdf, other]

Fearless Stochasticity in Expectation Propagation

Authors: Jonathan So, Richard E. Turner

Abstract: Expectation propagation (EP) is a family of algorithms for performing approximate inference in probabilistic models. The updates of EP involve the evaluation of moments -- expectations of certain functions -- which can be estimated from Monte Carlo (MC) samples. However, the updates are not robust to MC noise when performed naively, and various prior works have attempted to address this issue in d… ▽ More Expectation propagation (EP) is a family of algorithms for performing approximate inference in probabilistic models. The updates of EP involve the evaluation of moments -- expectations of certain functions -- which can be estimated from Monte Carlo (MC) samples. However, the updates are not robust to MC noise when performed naively, and various prior works have attempted to address this issue in different ways. In this work, we provide a novel perspective on the moment-matching updates of EP; namely, that they perform natural-gradient-based optimisation of a variational objective. We use this insight to motivate two new EP variants, with updates that are particularly well-suited to MC estimation; they remain stable and are most sample-efficient when estimated with just a single sample. These new variants combine the benefits of their predecessors and address key weaknesses. In particular, they are easier to tune, offer an improved speed-accuracy trade-off, and do not rely on the use of debiasing estimators. We demonstrate their efficacy on a variety of probabilistic inference tasks. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.16088 [pdf, ps, other]

Estimating the normal-inverse-Wishart distribution

Authors: Jonathan So

Abstract: The normal-inverse-Wishart (NIW) distribution is commonly used as a prior distribution for the mean and covariance parameters of a multivariate normal distribution. The family of NIW distributions is also a minimal exponential family. In this short note we describe a convergent procedure for converting from mean parameters to natural parameters in the NIW family, or -- equivalently -- for performi… ▽ More The normal-inverse-Wishart (NIW) distribution is commonly used as a prior distribution for the mean and covariance parameters of a multivariate normal distribution. The family of NIW distributions is also a minimal exponential family. In this short note we describe a convergent procedure for converting from mean parameters to natural parameters in the NIW family, or -- equivalently -- for performing maximum likelihood estimation of the natural parameters given observed sufficient statistics. This is needed, for example, when using a NIW base family in expectation propagation. △ Less

Submitted 3 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

arXiv:2403.17428 [pdf, other]

Aligning Large Language Models for Enhancing Psychiatric Interviews through Symptom Delineation and Summarization

Authors: Jae-hee So, Joonhwan Chang, Eunji Kim, Junho Na, JiYeon Choi, Jy-yong Sohn, Byung-Hoon Kim, Sang Hui Chu

Abstract: Recent advancements in Large Language Models (LLMs) have accelerated their usage in various domains. Given the fact that psychiatric interviews are goal-oriented and structured dialogues between the professional interviewer and the interviewee, it is one of the most underexplored areas where LLMs can contribute substantial value. Here, we explore the use of LLMs for enhancing psychiatric interview… ▽ More Recent advancements in Large Language Models (LLMs) have accelerated their usage in various domains. Given the fact that psychiatric interviews are goal-oriented and structured dialogues between the professional interviewer and the interviewee, it is one of the most underexplored areas where LLMs can contribute substantial value. Here, we explore the use of LLMs for enhancing psychiatric interviews, by analyzing counseling data from North Korean defectors with traumatic events and mental health issues. Specifically, we investigate whether LLMs can (1) delineate the part of the conversation that suggests psychiatric symptoms and name the symptoms, and (2) summarize stressors and symptoms, based on the interview dialogue transcript. Here, the transcript data was labeled by mental health experts for training and evaluation of LLMs. Our experimental results show that appropriately prompted LLMs can achieve high performance on both the symptom delineation task and the summarization task. This research contributes to the nascent field of applying LLMs to psychiatric interview and demonstrates their potential effectiveness in aiding mental health practitioners. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.00299 [pdf, ps, other]

Universal Auto-encoder Framework for MIMO CSI Feedback

Authors: Jinhyun So, Hyukjoon Kwon

Abstract: Existing auto-encoder (AE)-based channel state information (CSI) frameworks have focused on a specific configuration of user equipment (UE) and base station (BS), and thus the input and output sizes of the AE are fixed. However, in the real-world scenario, the input and output sizes may vary depending on the number of antennas of the BS and UE and the allocated resource block in the frequency dime… ▽ More Existing auto-encoder (AE)-based channel state information (CSI) frameworks have focused on a specific configuration of user equipment (UE) and base station (BS), and thus the input and output sizes of the AE are fixed. However, in the real-world scenario, the input and output sizes may vary depending on the number of antennas of the BS and UE and the allocated resource block in the frequency dimension. A naive approach to support the different input and output sizes is to use multiple AE models, which is impractical for the UE due to the limited HW resources. In this paper, we propose a universal AE framework that can support different input sizes and multiple compression ratios. The proposed AE framework significantly reduces the HW complexity while providing comparable performance in terms of compression ratio-distortion trade-off compared to the naive and state-of-the-art approaches. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 7 pages, 11 figures

arXiv:2401.00025 [pdf, other]

Any-point Trajectory Modeling for Policy Learning

Authors: Chuan Wen, Xingyu Lin, John So, Kai Chen, Qi Dou, Yang Gao, Pieter Abbeel

Abstract: Learning from demonstration is a powerful method for teaching robots new skills, and having more demonstration data often improves policy learning. However, the high cost of collecting demonstration data is a significant bottleneck. Videos, as a rich data source, contain knowledge of behaviors, physics, and semantics, but extracting control-specific information from them is challenging due to the… ▽ More Learning from demonstration is a powerful method for teaching robots new skills, and having more demonstration data often improves policy learning. However, the high cost of collecting demonstration data is a significant bottleneck. Videos, as a rich data source, contain knowledge of behaviors, physics, and semantics, but extracting control-specific information from them is challenging due to the lack of action labels. In this work, we introduce a novel framework, Any-point Trajectory Modeling (ATM), that utilizes video demonstrations by pre-training a trajectory model to predict future trajectories of arbitrary points within a video frame. Once trained, these trajectories provide detailed control guidance, enabling the learning of robust visuomotor policies with minimal action-labeled data. Across over 130 language-conditioned tasks we evaluated in both simulation and the real world, ATM outperforms strong video pre-training baselines by 80% on average. Furthermore, we show effective transfer learning of manipulation skills from human videos and videos from a different robot morphology. Visualizations and code are available at: \url{https://xingyu-lin.github.io/atm}. △ Less

Submitted 12 July, 2024; v1 submitted 28 December, 2023; originally announced January 2024.

Comments: 18 pages, 15 figures

arXiv:2312.03517 [pdf, other]

FRDiff : Feature Reuse for Universal Training-free Acceleration of Diffusion Models

Authors: Junhyuk So, Jungwon Lee, Eunhyeok Park

Abstract: The substantial computational costs of diffusion models, especially due to the repeated denoising steps necessary for high-quality image generation, present a major obstacle to their widespread adoption. While several studies have attempted to address this issue by reducing the number of score function evaluations (NFE) using advanced ODE solvers without fine-tuning, the decreased number of denois… ▽ More The substantial computational costs of diffusion models, especially due to the repeated denoising steps necessary for high-quality image generation, present a major obstacle to their widespread adoption. While several studies have attempted to address this issue by reducing the number of score function evaluations (NFE) using advanced ODE solvers without fine-tuning, the decreased number of denoising iterations misses the opportunity to update fine details, resulting in noticeable quality degradation. In our work, we introduce an advanced acceleration technique that leverages the temporal redundancy inherent in diffusion models. Reusing feature maps with high temporal similarity opens up a new opportunity to save computation resources without compromising output quality. To realize the practical benefits of this intuition, we conduct an extensive analysis and propose a novel method, FRDiff. FRDiff is designed to harness the advantages of both reduced NFE and feature reuse, achieving a Pareto frontier that balances fidelity and latency trade-offs in various generative tasks. △ Less

Submitted 2 April, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

Comments: Work in progress. Project page : https://jungwon-lee.github.io/Project_FRDiff/

arXiv:2311.16849 [pdf, other]

Identifiable Feature Learning for Spatial Data with Nonlinear ICA

Authors: Hermanni Hälvä, Jonathan So, Richard E. Turner, Aapo Hyvärinen

Abstract: Recently, nonlinear ICA has surfaced as a popular alternative to the many heuristic models used in deep representation learning and disentanglement. An advantage of nonlinear ICA is that a sophisticated identifiability theory has been developed; in particular, it has been proven that the original components can be recovered under sufficiently strong latent dependencies. Despite this general theory… ▽ More Recently, nonlinear ICA has surfaced as a popular alternative to the many heuristic models used in deep representation learning and disentanglement. An advantage of nonlinear ICA is that a sophisticated identifiability theory has been developed; in particular, it has been proven that the original components can be recovered under sufficiently strong latent dependencies. Despite this general theory, practical nonlinear ICA algorithms have so far been mainly limited to data with one-dimensional latent dependencies, especially time-series data. In this paper, we introduce a new nonlinear ICA framework that employs $t$-process (TP) latent components which apply naturally to data with higher-dimensional dependency structures, such as spatial and spatio-temporal data. In particular, we develop a new learning and inference algorithm that extends variational inference methods to handle the combination of a deep neural network mixing function with the TP prior, and employs the method of inducing points for computational efficacy. On the theoretical side, we show that such TP independent components are identifiable under very general conditions. Further, Gaussian Process (GP) nonlinear ICA is established as a limit of the TP Nonlinear ICA model, and we prove that the identifiability of the latent components at this GP limit is more restricted. Namely, those components are identifiable if and only if they have distinctly different covariance kernels. Our algorithm and identifiability theorems are explored on simulated spatial data and real world spatio-temporal data. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: Work under review

arXiv:2310.11837 [pdf, other]

Optimising Distributions with Natural Gradient Surrogates

Authors: Jonathan So, Richard E. Turner

Abstract: Natural gradient methods have been used to optimise the parameters of probability distributions in a variety of settings, often resulting in fast-converging procedures. Unfortunately, for many distributions of interest, computing the natural gradient has a number of challenges. In this work we propose a novel technique for tackling such issues, which involves reframing the optimisation as one with… ▽ More Natural gradient methods have been used to optimise the parameters of probability distributions in a variety of settings, often resulting in fast-converging procedures. Unfortunately, for many distributions of interest, computing the natural gradient has a number of challenges. In this work we propose a novel technique for tackling such issues, which involves reframing the optimisation as one with respect to the parameters of a surrogate distribution, for which computing the natural gradient is easy. We give several examples of existing methods that can be interpreted as applying this technique, and propose a new method for applying it to a wide variety of problems. Our method expands the set of distributions that can be efficiently targeted with natural gradients. Furthermore, it is fast, easy to understand, simple to implement using standard autodiff software, and does not require lengthy model-specific derivations. We demonstrate our method on maximum likelihood estimation and variational inference tasks. △ Less

Submitted 4 March, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

Journal ref: PMLR 238 (2024):2224-2232

arXiv:2308.13327 [pdf, other]

3D Face Alignment Through Fusion of Head Pose Information and Features

Authors: Jaehyun So, Youngjoon Han

Abstract: The ability of humans to infer head poses from face shapes, and vice versa, indicates a strong correlation between the two. Accordingly, recent studies on face alignment have employed head pose information to predict facial landmarks in computer vision tasks. In this study, we propose a novel method that employs head pose information to improve face alignment performance by fusing said information… ▽ More The ability of humans to infer head poses from face shapes, and vice versa, indicates a strong correlation between the two. Accordingly, recent studies on face alignment have employed head pose information to predict facial landmarks in computer vision tasks. In this study, we propose a novel method that employs head pose information to improve face alignment performance by fusing said information with the feature maps of a face alignment network, rather than simply using it to initialize facial landmarks. Furthermore, the proposed network structure performs robust face alignment through a dual-dimensional network using multidimensional features represented by 2D feature maps and a 3D heatmap. For effective dense face alignment, we also propose a prediction method for facial geometric landmarks through training based on knowledge distillation using predicted keypoints. We experimentally assessed the correlation between the predicted facial landmarks and head pose information, as well as variations in the accuracy of facial landmarks with respect to the quality of head pose information. In addition, we demonstrated the effectiveness of the proposed method through a competitive performance comparison with state-of-the-art methods on the AFLW2000-3D, AFLW, and BIWI datasets. △ Less

Submitted 25 August, 2023; originally announced August 2023.

arXiv:2307.03567 [pdf, other]

SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained Networks

Authors: Xingyu Lin, John So, Sashwat Mahalingam, Fangchen Liu, Pieter Abbeel

Abstract: The existing internet-scale image and video datasets cover a wide range of everyday objects and tasks, bringing the potential of learning policies that generalize in diverse scenarios. Prior works have explored visual pre-training with different self-supervised objectives. Still, the generalization capabilities of the learned policies and the advantages over well-tuned baselines remain unclear fro… ▽ More The existing internet-scale image and video datasets cover a wide range of everyday objects and tasks, bringing the potential of learning policies that generalize in diverse scenarios. Prior works have explored visual pre-training with different self-supervised objectives. Still, the generalization capabilities of the learned policies and the advantages over well-tuned baselines remain unclear from prior studies. In this work, we present a focused study of the generalization capabilities of the pre-trained visual representations at the categorical level. We identify the key bottleneck in using a frozen pre-trained visual backbone for policy learning and then propose SpawnNet, a novel two-stream architecture that learns to fuse pre-trained multi-layer representations into a separate network to learn a robust policy. Through extensive simulated and real experiments, we show significantly better categorical generalization compared to prior approaches in imitation learning settings. Open-sourced code and videos can be found on our website: https://xingyu-lin.github.io/spawnnet. △ Less

Submitted 21 October, 2023; v1 submitted 7 July, 2023; originally announced July 2023.

arXiv:2306.02316 [pdf, other]

Temporal Dynamic Quantization for Diffusion Models

Authors: Junhyuk So, Jungwon Lee, Daehyun Ahn, Hyungjun Kim, Eunhyeok Park

Abstract: The diffusion model has gained popularity in vision applications due to its remarkable generative performance and versatility. However, high storage and computation demands, resulting from the model size and iterative generation, hinder its use on mobile devices. Existing quantization techniques struggle to maintain performance even in 8-bit precision due to the diffusion model's unique property o… ▽ More The diffusion model has gained popularity in vision applications due to its remarkable generative performance and versatility. However, high storage and computation demands, resulting from the model size and iterative generation, hinder its use on mobile devices. Existing quantization techniques struggle to maintain performance even in 8-bit precision due to the diffusion model's unique property of temporal variation in activation. We introduce a novel quantization method that dynamically adjusts the quantization interval based on time step information, significantly improving output quality. Unlike conventional dynamic quantization techniques, our approach has no computational overhead during inference and is compatible with both post-training quantization (PTQ) and quantization-aware training (QAT). Our extensive experiments demonstrate substantial improvements in output quality with the quantized diffusion model across various datasets. △ Less

Submitted 11 December, 2023; v1 submitted 4 June, 2023; originally announced June 2023.

arXiv:2210.14721 [pdf, other]

Sim-to-Real via Sim-to-Seg: End-to-end Off-road Autonomous Driving Without Real Data

Authors: John So, Amber Xie, Sunggoo Jung, Jeffrey Edlund, Rohan Thakker, Ali Agha-mohammadi, Pieter Abbeel, Stephen James

Abstract: Autonomous driving is complex, requiring sophisticated 3D scene understanding, localization, mapping, and control. Rather than explicitly modelling and fusing each of these components, we instead consider an end-to-end approach via reinforcement learning (RL). However, collecting exploration driving data in the real world is impractical and dangerous. While training in simulation and deploying vis… ▽ More Autonomous driving is complex, requiring sophisticated 3D scene understanding, localization, mapping, and control. Rather than explicitly modelling and fusing each of these components, we instead consider an end-to-end approach via reinforcement learning (RL). However, collecting exploration driving data in the real world is impractical and dangerous. While training in simulation and deploying visual sim-to-real techniques has worked well for robot manipulation, deploying beyond controlled workspace viewpoints remains a challenge. In this paper, we address this challenge by presenting Sim2Seg, a re-imagining of RCAN that crosses the visual reality gap for off-road autonomous driving, without using any real-world data. This is done by learning to translate randomized simulation images into simulated segmentation and depth maps, subsequently enabling real-world images to also be translated. This allows us to train an end-to-end RL policy in simulation, and directly deploy in the real-world. Our approach, which can be trained in 48 hours on 1 GPU, can perform equally as well as a classical perception and control stack that took thousands of engineering hours over several months to build. We hope this work motivates future end-to-end autonomous driving research. △ Less

Submitted 25 October, 2022; originally announced October 2022.

Comments: CoRL 2022 Paper

arXiv:2208.08812 [pdf, other]

Automatic laser steering for middle ear surgery

Authors: Jae-Hun So, Jérôme Szewczyk, Brahim Tamadazte

Abstract: This paper deals with the control of laser spot in the context of minimally invasive surgery of the middle ear, e.g., cholesteatoma removal. More precisely, our work is concerned with the exhaustive burring of residual infected cells after primary mechanical resection of the pathological tissues since the latter cannot guarantee the treatment of all the infected tissues, the remaining infected cel… ▽ More This paper deals with the control of laser spot in the context of minimally invasive surgery of the middle ear, e.g., cholesteatoma removal. More precisely, our work is concerned with the exhaustive burring of residual infected cells after primary mechanical resection of the pathological tissues since the latter cannot guarantee the treatment of all the infected tissues, the remaining infected cells cause regeneration of the diseases in 20%-25\-% of cases, which require a second surgery 12-18 months later. To tackle such a complex surgery, we have developed a robotic platform that consists of the combination of a macro-scale system (7 degrees of freedom (DoFs) robotic arm) and a micro-scale flexible system (2 DoFs) which operates inside the middle ear cavity. To be able to treat the residual cholesteatoma regions, we proposed a method to automatically generate optimal laser scanning trajectories inside the regions and between them. The trajectories are tacked using an image-based control scheme. The proposed method and materials were validated experimentally using the lab-made robotic platform. The obtained results in terms of accuracy and behaviour meet perfectly the laser surgery requirements. △ Less

Submitted 18 August, 2022; originally announced August 2022.

Comments: 7 pages, 8 figures, conference

arXiv:2206.08743 [pdf, other]

Learning Fair Representation via Distributional Contrastive Disentanglement

Authors: Changdae Oh, Heeji Won, Junhyuk So, Taero Kim, Yewon Kim, Hosik Choi, Kyungwoo Song

Abstract: Learning fair representation is crucial for achieving fairness or debiasing sensitive information. Most existing works rely on adversarial representation learning to inject some invariance into representation. However, adversarial learning methods are known to suffer from relatively unstable training, and this might harm the balance between fairness and predictiveness of representation. We propose… ▽ More Learning fair representation is crucial for achieving fairness or debiasing sensitive information. Most existing works rely on adversarial representation learning to inject some invariance into representation. However, adversarial learning methods are known to suffer from relatively unstable training, and this might harm the balance between fairness and predictiveness of representation. We propose a new approach, learning FAir Representation via distributional CONtrastive Variational AutoEncoder (FarconVAE), which induces the latent space to be disentangled into sensitive and nonsensitive parts. We first construct the pair of observations with different sensitive attributes but with the same labels. Then, FarconVAE enforces each non-sensitive latent to be closer, while sensitive latents to be far from each other and also far from the non-sensitive latent by contrasting their distributions. We provide a new type of contrastive loss motivated by Gaussian and Student-t kernels for distributional contrastive learning with theoretical analysis. Besides, we adopt a new swap-reconstruction loss to boost the disentanglement further. FarconVAE shows superior performance on fairness, pretrained model debiasing, and domain generalization tasks from various modalities, including tabular, image, and text. △ Less

Submitted 17 June, 2022; originally announced June 2022.

Comments: Accepted by KDD 2022 (Research Track)

arXiv:2206.00820 [pdf, other]

NIPQ: Noise proxy-based Integrated Pseudo-Quantization

Authors: Juncheol Shin, Junhyuk So, Sein Park, Seungyeop Kang, Sungjoo Yoo, Eunhyeok Park

Abstract: Straight-through estimator (STE), which enables the gradient flow over the non-differentiable function via approximation, has been favored in studies related to quantization-aware training (QAT). However, STE incurs unstable convergence during QAT, resulting in notable quality degradation in low precision. Recently, pseudoquantization training has been proposed as an alternative approach to updati… ▽ More Straight-through estimator (STE), which enables the gradient flow over the non-differentiable function via approximation, has been favored in studies related to quantization-aware training (QAT). However, STE incurs unstable convergence during QAT, resulting in notable quality degradation in low precision. Recently, pseudoquantization training has been proposed as an alternative approach to updating the learnable parameters using the pseudo-quantization noise instead of STE. In this study, we propose a novel noise proxy-based integrated pseudoquantization (NIPQ) that enables unified support of pseudoquantization for both activation and weight by integrating the idea of truncation on the pseudo-quantization framework. NIPQ updates all of the quantization parameters (e.g., bit-width and truncation boundary) as well as the network parameters via gradient descent without STE instability. According to our extensive experiments, NIPQ outperforms existing quantization algorithms in various vision and language applications by a large margin. △ Less

Submitted 1 July, 2023; v1 submitted 1 June, 2022; originally announced June 2022.

arXiv:2203.03897 [pdf, other]

Geodesic Multi-Modal Mixup for Robust Fine-Tuning

Authors: Changdae Oh, Junhyuk So, Hoyoon Byun, YongTaek Lim, Minchul Shin, Jong-June Jeon, Kyungwoo Song

Abstract: Pre-trained multi-modal models, such as CLIP, provide transferable embeddings and show promising results in diverse applications. However, the analysis of learned multi-modal embeddings is relatively unexplored, and the embedding transferability can be improved. In this work, we observe that CLIP holds separated embedding subspaces for two different modalities, and then we investigate it through t… ▽ More Pre-trained multi-modal models, such as CLIP, provide transferable embeddings and show promising results in diverse applications. However, the analysis of learned multi-modal embeddings is relatively unexplored, and the embedding transferability can be improved. In this work, we observe that CLIP holds separated embedding subspaces for two different modalities, and then we investigate it through the lens of uniformity-alignment to measure the quality of learned representation. Both theoretically and empirically, we show that CLIP retains poor uniformity and alignment even after fine-tuning. Such a lack of alignment and uniformity might restrict the transferability and robustness of embeddings. To this end, we devise a new fine-tuning method for robust representation equipping better alignment and uniformity. First, we propose a Geodesic Multi-Modal Mixup that mixes the embeddings of image and text to generate hard negative samples on the hypersphere. Then, we fine-tune the model on hard negatives as well as original negatives and positives with contrastive loss. Based on the theoretical analysis about hardness guarantee and limiting behavior, we justify the use of our method. Extensive experiments on retrieval, calibration, few- or zero-shot classification (under distribution shift), embedding arithmetic, and image captioning further show that our method provides transferable representations, enabling robust model adaptation on diverse tasks. Code: https://github.com/changdaeoh/multimodal-mixup △ Less

Submitted 6 November, 2023; v1 submitted 8 March, 2022; originally announced March 2022.

Comments: To appear at NeurIPS 2023

arXiv:2202.01267 [pdf, other]

FedSpace: An Efficient Federated Learning Framework at Satellites and Ground Stations

Authors: Jinhyun So, Kevin Hsieh, Behnaz Arzani, Shadi Noghabi, Salman Avestimehr, Ranveer Chandra

Abstract: Large-scale deployments of low Earth orbit (LEO) satellites collect massive amount of Earth imageries and sensor data, which can empower machine learning (ML) to address global challenges such as real-time disaster navigation and mitigation. However, it is often infeasible to download all the high-resolution images and train these ML models on the ground because of limited downlink bandwidth, spar… ▽ More Large-scale deployments of low Earth orbit (LEO) satellites collect massive amount of Earth imageries and sensor data, which can empower machine learning (ML) to address global challenges such as real-time disaster navigation and mitigation. However, it is often infeasible to download all the high-resolution images and train these ML models on the ground because of limited downlink bandwidth, sparse connectivity, and regularization constraints on the imagery resolution. To address these challenges, we leverage Federated Learning (FL), where ground stations and satellites collaboratively train a global ML model without sharing the captured images on the satellites. We show fundamental challenges in applying existing FL algorithms among satellites and ground stations, and we formulate an optimization problem which captures a unique trade-off between staleness and idleness. We propose a novel FL framework, named FedSpace, which dynamically schedules model aggregation based on the deterministic and time-varying connectivity according to satellite orbits. Extensive numerical evaluations based on real-world satellite images and satellite networks show that FedSpace reduces the training time by 1.7 days (38.6%) over the state-of-the-art FL algorithms. △ Less

Submitted 2 February, 2022; originally announced February 2022.

arXiv:2110.02177 [pdf, other]

Secure Aggregation for Buffered Asynchronous Federated Learning

Authors: Jinhyun So, Ramy E. Ali, Başak Güler, A. Salman Avestimehr

Abstract: Federated learning (FL) typically relies on synchronous training, which is slow due to stragglers. While asynchronous training handles stragglers efficiently, it does not ensure privacy due to the incompatibility with the secure aggregation protocols. A buffered asynchronous training protocol known as FedBuff has been proposed recently which bridges the gap between synchronous and asynchronous tra… ▽ More Federated learning (FL) typically relies on synchronous training, which is slow due to stragglers. While asynchronous training handles stragglers efficiently, it does not ensure privacy due to the incompatibility with the secure aggregation protocols. A buffered asynchronous training protocol known as FedBuff has been proposed recently which bridges the gap between synchronous and asynchronous training to mitigate stragglers and to also ensure privacy simultaneously. FedBuff allows the users to send their updates asynchronously while ensuring privacy by storing the updates in a trusted execution environment (TEE) enabled private buffer. TEEs, however, have limited memory which limits the buffer size. Motivated by this limitation, we develop a buffered asynchronous secure aggregation (BASecAgg) protocol that does not rely on TEEs. The conventional secure aggregation protocols cannot be applied in the buffered asynchronous setting since the buffer may have local models corresponding to different rounds and hence the masks that the users use to protect their models may not cancel out. BASecAgg addresses this challenge by carefully designing the masks such that they cancel out even if they correspond to different rounds. Our convergence analysis and experiments show that BASecAgg almost has the same convergence guarantees as FedBuff without relying on TEEs. △ Less

Submitted 5 October, 2021; originally announced October 2021.

Comments: arXiv admin note: substantial overlap with arXiv:2109.14236

arXiv:2109.14236 [pdf, other]

LightSecAgg: a Lightweight and Versatile Design for Secure Aggregation in Federated Learning

Authors: Jinhyun So, Chaoyang He, Chien-Sheng Yang, Songze Li, Qian Yu, Ramy E. Ali, Basak Guler, Salman Avestimehr

Abstract: Secure model aggregation is a key component of federated learning (FL) that aims at protecting the privacy of each user's individual model while allowing for their global aggregation. It can be applied to any aggregation-based FL approach for training a global or personalized model. Model aggregation needs to also be resilient against likely user dropouts in FL systems, making its design substanti… ▽ More Secure model aggregation is a key component of federated learning (FL) that aims at protecting the privacy of each user's individual model while allowing for their global aggregation. It can be applied to any aggregation-based FL approach for training a global or personalized model. Model aggregation needs to also be resilient against likely user dropouts in FL systems, making its design substantially more complex. State-of-the-art secure aggregation protocols rely on secret sharing of the random-seeds used for mask generations at the users to enable the reconstruction and cancellation of those belonging to the dropped users. The complexity of such approaches, however, grows substantially with the number of dropped users. We propose a new approach, named LightSecAgg, to overcome this bottleneck by changing the design from "random-seed reconstruction of the dropped users" to "one-shot aggregate-mask reconstruction of the active users via mask encoding/decoding". We show that LightSecAgg achieves the same privacy and dropout-resiliency guarantees as the state-of-the-art protocols while significantly reducing the overhead for resiliency against dropped users. We also demonstrate that, unlike existing schemes, LightSecAgg can be applied to secure aggregation in the asynchronous FL setting. Furthermore, we provide a modular system design and optimized on-device parallelization for scalable implementation, by enabling computational overlapping between model training and on-device encoding, as well as improving the speed of concurrent receiving and sending of chunked masks. We evaluate LightSecAgg via extensive experiments for training diverse models on various datasets in a realistic FL system with large number of users and demonstrate that LightSecAgg significantly reduces the total training time. △ Less

Submitted 1 February, 2022; v1 submitted 29 September, 2021; originally announced September 2021.

Comments: This paper is accepted to the 5th MLSys Conference, Santa Clara, CA, USA, 2022

arXiv:2106.09620 [pdf, other]

Disentangling Identifiable Features from Noisy Data with Structured Nonlinear ICA

Authors: Hermanni Hälvä, Sylvain Le Corff, Luc Lehéricy, Jonathan So, Yongjie Zhu, Elisabeth Gassiat, Aapo Hyvarinen

Abstract: We introduce a new general identifiable framework for principled disentanglement referred to as Structured Nonlinear Independent Component Analysis (SNICA). Our contribution is to extend the identifiability theory of deep generative models for a very broad class of structured models. While previous works have shown identifiability for specific classes of time-series models, our theorems extend thi… ▽ More We introduce a new general identifiable framework for principled disentanglement referred to as Structured Nonlinear Independent Component Analysis (SNICA). Our contribution is to extend the identifiability theory of deep generative models for a very broad class of structured models. While previous works have shown identifiability for specific classes of time-series models, our theorems extend this to more general temporal structures as well as to models with more complex structures such as spatial dependencies. In particular, we establish the major result that identifiability for this framework holds even in the presence of noise of unknown distribution. Finally, as an example of our framework's flexibility, we introduce the first nonlinear ICA model for time-series that combines the following very useful properties: it accounts for both nonstationarity and autocorrelation in a fully unsupervised setting; performs dimensionality reduction; models hidden states; and enables principled estimation and inference by variational maximum-likelihood. △ Less

Submitted 27 October, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

Comments: Accepted for publication at NeurIPS 2021

arXiv:2106.03328 [pdf, other]

Securing Secure Aggregation: Mitigating Multi-Round Privacy Leakage in Federated Learning

Authors: Jinhyun So, Ramy E. Ali, Basak Guler, Jiantao Jiao, Salman Avestimehr

Abstract: Secure aggregation is a critical component in federated learning (FL), which enables the server to learn the aggregate model of the users without observing their local models. Conventionally, secure aggregation algorithms focus only on ensuring the privacy of individual users in a single training round. We contend that such designs can lead to significant privacy leakages over multiple training ro… ▽ More Secure aggregation is a critical component in federated learning (FL), which enables the server to learn the aggregate model of the users without observing their local models. Conventionally, secure aggregation algorithms focus only on ensuring the privacy of individual users in a single training round. We contend that such designs can lead to significant privacy leakages over multiple training rounds, due to partial user selection/participation at each round of FL. In fact, we show that the conventional random user selection strategies in FL lead to leaking users' individual models within number of rounds that is linear in the number of users. To address this challenge, we introduce a secure aggregation framework, Multi-RoundSecAgg, with multi-round privacy guarantees. In particular, we introduce a new metric to quantify the privacy guarantees of FL over multiple training rounds, and develop a structured user selection strategy that guarantees the long-term privacy of each user (over any number of training rounds). Our framework also carefully accounts for the fairness and the average number of participating users at each round. Our experiments on MNIST and CIFAR-10 datasets in the IID and the non-IID settings demonstrate the performance improvement over the baselines, both in terms of privacy protection and test accuracy. △ Less

Submitted 27 July, 2023; v1 submitted 7 June, 2021; originally announced June 2021.

Journal ref: AAAI 2023

arXiv:2011.05530 [pdf, other]

On Polynomial Approximations for Privacy-Preserving and Verifiable ReLU Networks

Authors: Ramy E. Ali, Jinhyun So, A. Salman Avestimehr

Abstract: Outsourcing deep neural networks (DNNs) inference tasks to an untrusted cloud raises data privacy and integrity concerns. While there are many techniques to ensure privacy and integrity for polynomial-based computations, DNNs involve non-polynomial computations. To address these challenges, several privacy-preserving and verifiable inference techniques have been proposed based on replacing the non… ▽ More Outsourcing deep neural networks (DNNs) inference tasks to an untrusted cloud raises data privacy and integrity concerns. While there are many techniques to ensure privacy and integrity for polynomial-based computations, DNNs involve non-polynomial computations. To address these challenges, several privacy-preserving and verifiable inference techniques have been proposed based on replacing the non-polynomial activation functions such as the rectified linear unit (ReLU) function with polynomial activation functions. Such techniques usually require polynomials with integer coefficients or polynomials over finite fields. Motivated by such requirements, several works proposed replacing the ReLU function with the square function. In this work, we empirically show that the square function is not the best degree-2 polynomial that can replace the ReLU function even when restricting the polynomials to have integer coefficients. We instead propose a degree-2 polynomial activation function with a first order term and empirically show that it can lead to much better models. Our experiments on the CIFAR and Tiny ImageNet datasets on various architectures such as VGG-16 show that our proposed function improves the test accuracy by up to 10.4% compared to the square function. △ Less

Submitted 6 February, 2024; v1 submitted 10 November, 2020; originally announced November 2020.

arXiv:2011.01963 [pdf, ps, other]

A Scalable Approach for Privacy-Preserving Collaborative Machine Learning

Authors: Jinhyun So, Basak Guler, A. Salman Avestimehr

Abstract: We consider a collaborative learning scenario in which multiple data-owners wish to jointly train a logistic regression model, while keeping their individual datasets private from the other parties. We propose COPML, a fully-decentralized training framework that achieves scalability and privacy-protection simultaneously. The key idea of COPML is to securely encode the individual datasets to distri… ▽ More We consider a collaborative learning scenario in which multiple data-owners wish to jointly train a logistic regression model, while keeping their individual datasets private from the other parties. We propose COPML, a fully-decentralized training framework that achieves scalability and privacy-protection simultaneously. The key idea of COPML is to securely encode the individual datasets to distribute the computation load effectively across many parties and to perform the training computations as well as the model updates in a distributed manner on the securely encoded data. We provide the privacy analysis of COPML and prove its convergence. Furthermore, we experimentally demonstrate that COPML can achieve significant speedup in training over the benchmark protocols. Our protocol provides strong statistical privacy guarantees against colluding parties (adversaries) with unbounded computational power, while achieving up to $16\times$ speedup in the training time against the benchmark protocols. △ Less

Submitted 3 November, 2020; originally announced November 2020.

arXiv:2010.14282 [pdf, other]

Active Learning for Human-in-the-Loop Customs Inspection

Authors: Sundong Kim, Tung-Duong Mai, Sungwon Han, Sungwon Park, Thi Nguyen Duc Khanh, Jaechan So, Karandeep Singh, Meeyoung Cha

Abstract: We study the human-in-the-loop customs inspection scenario, where an AI-assisted algorithm supports customs officers by recommending a set of imported goods to be inspected. If the inspected items are fraudulent, the officers can levy extra duties. Th formed logs are then used as additional training data for successive iterations. Choosing to inspect suspicious items first leads to an immediate ga… ▽ More We study the human-in-the-loop customs inspection scenario, where an AI-assisted algorithm supports customs officers by recommending a set of imported goods to be inspected. If the inspected items are fraudulent, the officers can levy extra duties. Th formed logs are then used as additional training data for successive iterations. Choosing to inspect suspicious items first leads to an immediate gain in customs revenue, yet such inspections may not bring new insights for learning dynamic traffic patterns. On the other hand, inspecting uncertain items can help acquire new knowledge, which will be used as a supplementary training resource to update the selection systems. Based on multiyear customs datasets obtained from three countries, we demonstrate that some degree of exploration is necessary to cope with domain shifts in trade data. The results show that a hybrid strategy of selecting likely fraudulent and uncertain items will eventually outperform the exploitation-only strategy. △ Less

Submitted 23 February, 2022; v1 submitted 27 October, 2020; originally announced October 2020.

Comments: To Appear at IEEE TKDE

ACM Class: H.4.0

arXiv:2010.10177 [pdf, other]

Sparse Gaussian Process Variational Autoencoders

Authors: Matthew Ashman, Jonathan So, Will Tebbutt, Vincent Fortuin, Michael Pearce, Richard E. Turner

Abstract: Large, multi-dimensional spatio-temporal datasets are omnipresent in modern science and engineering. An effective framework for handling such data are Gaussian process deep generative models (GP-DGMs), which employ GP priors over the latent variables of DGMs. Existing approaches for performing inference in GP-DGMs do not support sparse GP approximations based on inducing points, which are essentia… ▽ More Large, multi-dimensional spatio-temporal datasets are omnipresent in modern science and engineering. An effective framework for handling such data are Gaussian process deep generative models (GP-DGMs), which employ GP priors over the latent variables of DGMs. Existing approaches for performing inference in GP-DGMs do not support sparse GP approximations based on inducing points, which are essential for the computational efficiency of GPs, nor do they handle missing data -- a natural occurrence in many spatio-temporal datasets -- in a principled manner. We address these shortcomings with the development of the sparse Gaussian process variational autoencoder (SGP-VAE), characterised by the use of partial inference networks for parameterising sparse GP approximations. Leveraging the benefits of amortised variational inference, the SGP-VAE enables inference in multi-output sparse GPs on previously unobserved data with no additional training. The SGP-VAE is evaluated in a variety of experiments where it outperforms alternative approaches including multi-output GPs and structured VAEs. △ Less

Submitted 23 October, 2020; v1 submitted 20 October, 2020; originally announced October 2020.

Comments: 19 pages, 6 figures

arXiv:2008.10400 [pdf, other]

An Ensemble of Simple Convolutional Neural Network Models for MNIST Digit Recognition

Authors: Sanghyeon An, Minjun Lee, Sanglee Park, Heerin Yang, Jungmin So

Abstract: We report that a very high accuracy on the MNIST test set can be achieved by using simple convolutional neural network (CNN) models. We use three different models with 3x3, 5x5, and 7x7 kernel size in the convolution layers. Each model consists of a set of convolution layers followed by a single fully connected layer. Every convolution layer uses batch normalization and ReLU activation, and poolin… ▽ More We report that a very high accuracy on the MNIST test set can be achieved by using simple convolutional neural network (CNN) models. We use three different models with 3x3, 5x5, and 7x7 kernel size in the convolution layers. Each model consists of a set of convolution layers followed by a single fully connected layer. Every convolution layer uses batch normalization and ReLU activation, and pooling is not used. Rotation and translation is used to augment training data, which is frequently used in most image classification tasks. A majority voting using the three models independently trained on the training data set can achieve up to 99.87% accuracy on the test set, which is one of the state-of-the-art results. A two-layer ensemble, a heterogeneous ensemble of three homogeneous ensemble networks, can achieve up to 99.91% test accuracy. The results can be reproduced by using the code at: https://github.com/ansh941/MnistSimpleCNN △ Less

Submitted 4 October, 2020; v1 submitted 12 August, 2020; originally announced August 2020.

Comments: 10 pages, 12 figures, 7 tables

arXiv:2007.13518 [pdf, other]

FedML: A Research Library and Benchmark for Federated Machine Learning

Authors: Chaoyang He, Songze Li, Jinhyun So, Xiao Zeng, Mi Zhang, Hongyi Wang, Xiaoyang Wang, Praneeth Vepakomma, Abhishek Singh, Hang Qiu, Xinghua Zhu, Jianzong Wang, Li Shen, Peilin Zhao, Yan Kang, Yang Liu, Ramesh Raskar, Qiang Yang, Murali Annavaram, Salman Avestimehr

Abstract: Federated learning (FL) is a rapidly growing research field in machine learning. However, existing FL libraries cannot adequately support diverse algorithmic development; inconsistent dataset and model usage make fair algorithm comparison challenging. In this work, we introduce FedML, an open research library and benchmark to facilitate FL algorithm development and fair performance comparison. Fed… ▽ More Federated learning (FL) is a rapidly growing research field in machine learning. However, existing FL libraries cannot adequately support diverse algorithmic development; inconsistent dataset and model usage make fair algorithm comparison challenging. In this work, we introduce FedML, an open research library and benchmark to facilitate FL algorithm development and fair performance comparison. FedML supports three computing paradigms: on-device training for edge devices, distributed computing, and single-machine simulation. FedML also promotes diverse algorithmic research with flexible and generic API design and comprehensive reference baseline implementations (optimizer, models, and datasets). We hope FedML could provide an efficient and reproducible means for developing and evaluating FL algorithms that would benefit the FL research community. We maintain the source code, documents, and user community at https://fedml.ai. △ Less

Submitted 8 November, 2020; v1 submitted 27 July, 2020; originally announced July 2020.

Comments: This is FedML white paper V3. Homepage: https://fedml.ai; GitHub: https://github.com/FedML-AI/FedML; In V3, More advanced algorithms and IoT device training are supported, please check here: https://github.com/FedML-AI/FedML/blob/master/fedml_iot/

arXiv:2007.11115 [pdf, ps, other]

Byzantine-Resilient Secure Federated Learning

Authors: Jinhyun So, Basak Guler, A. Salman Avestimehr

Abstract: Secure federated learning is a privacy-preserving framework to improve machine learning models by training over large volumes of data collected by mobile users. This is achieved through an iterative process where, at each iteration, users update a global model using their local datasets. Each user then masks its local model via random keys, and the masked models are aggregated at a central server… ▽ More Secure federated learning is a privacy-preserving framework to improve machine learning models by training over large volumes of data collected by mobile users. This is achieved through an iterative process where, at each iteration, users update a global model using their local datasets. Each user then masks its local model via random keys, and the masked models are aggregated at a central server to compute the global model for the next iteration. As the local models are protected by random masks, the server cannot observe their true values. This presents a major challenge for the resilience of the model against adversarial (Byzantine) users, who can manipulate the global model by modifying their local models or datasets. Towards addressing this challenge, this paper presents the first single-server Byzantine-resilient secure aggregation framework (BREA) for secure federated learning. BREA is based on an integrated stochastic quantization, verifiable outlier detection, and secure model aggregation approach to guarantee Byzantine-resilience, privacy, and convergence simultaneously. We provide theoretical convergence and privacy guarantees and characterize the fundamental trade-offs in terms of the network size, user dropouts, and privacy protection. Our experiments demonstrate convergence in the presence of Byzantine users, and comparable accuracy to conventional federated learning benchmarks. △ Less

Submitted 20 February, 2021; v1 submitted 21 July, 2020; originally announced July 2020.

arXiv:2002.04156 [pdf, ps, other]

Turbo-Aggregate: Breaking the Quadratic Aggregation Barrier in Secure Federated Learning

Authors: Jinhyun So, Basak Guler, A. Salman Avestimehr

Abstract: Federated learning is a distributed framework for training machine learning models over the data residing at mobile devices, while protecting the privacy of individual users. A major bottleneck in scaling federated learning to a large number of users is the overhead of secure model aggregation across many users. In particular, the overhead of the state-of-the-art protocols for secure model aggrega… ▽ More Federated learning is a distributed framework for training machine learning models over the data residing at mobile devices, while protecting the privacy of individual users. A major bottleneck in scaling federated learning to a large number of users is the overhead of secure model aggregation across many users. In particular, the overhead of the state-of-the-art protocols for secure model aggregation grows quadratically with the number of users. In this paper, we propose the first secure aggregation framework, named Turbo-Aggregate, that in a network with $N$ users achieves a secure aggregation overhead of $O(N\log{N})$, as opposed to $O(N^2)$, while tolerating up to a user dropout rate of $50\%$. Turbo-Aggregate employs a multi-group circular strategy for efficient model aggregation, and leverages additive secret sharing and novel coding techniques for injecting aggregation redundancy in order to handle user dropouts while guaranteeing user privacy. We experimentally demonstrate that Turbo-Aggregate achieves a total running time that grows almost linear in the number of users, and provides up to $40\times$ speedup over the state-of-the-art protocols with up to $N=200$ users. Our experiments also demonstrate the impact of model size and bandwidth on the performance of Turbo-Aggregate. △ Less

Submitted 20 February, 2021; v1 submitted 10 February, 2020; originally announced February 2020.

arXiv:1902.00641 [pdf, ps, other]

CodedPrivateML: A Fast and Privacy-Preserving Framework for Distributed Machine Learning

Authors: Jinhyun So, Basak Guler, A. Salman Avestimehr

Abstract: How to train a machine learning model while keeping the data private and secure? We present CodedPrivateML, a fast and scalable approach to this critical problem. CodedPrivateML keeps both the data and the model information-theoretically private, while allowing efficient parallelization of training across distributed workers. We characterize CodedPrivateML's privacy threshold and prove its converg… ▽ More How to train a machine learning model while keeping the data private and secure? We present CodedPrivateML, a fast and scalable approach to this critical problem. CodedPrivateML keeps both the data and the model information-theoretically private, while allowing efficient parallelization of training across distributed workers. We characterize CodedPrivateML's privacy threshold and prove its convergence for logistic (and linear) regression. Furthermore, via extensive experiments on Amazon EC2, we demonstrate that CodedPrivateML provides significant speedup over cryptographic approaches based on multi-party computing (MPC). △ Less

Submitted 20 February, 2021; v1 submitted 2 February, 2019; originally announced February 2019.

arXiv:1808.07617 [pdf, ps, other]

Tomlinson-Harashima Precoding-Aided Multi-Antenna Non-Orthogonal Multiple Access

Authors: Jungho So, Youngchul Sung, Yong H. Lee

Abstract: In this paper, Tomlinson-Harashima Precoding (THP) is considered for multi-user multiple-input single-output (MU-MISO) non-orthogonal multiple access (NOMA) donwlink. Under the hierarchical structure in which multiple clusters each with two users are formed and served in the spatial domain and users in each cluster are served in the power domain, THP is applied to eliminate the inter-cluster inter… ▽ More In this paper, Tomlinson-Harashima Precoding (THP) is considered for multi-user multiple-input single-output (MU-MISO) non-orthogonal multiple access (NOMA) donwlink. Under the hierarchical structure in which multiple clusters each with two users are formed and served in the spatial domain and users in each cluster are served in the power domain, THP is applied to eliminate the inter-cluster interference (ICI) to the strong users and enlarge the dimension of the beam design space for mitigation of ICI to weak users as compared to conventional zero-forcing (ZF) inter-cluster beamforming. With the enlarged beam design space, two beam design algorithms for THP-aided MISO-NOMA are proposed. The first is a greedy sequential beam design with user scheduling, and the second is the joint beam redesign and power allocation. The two design problems lead to non-convex optimization problems. An efficient algorithm is proposed to solve the non-convex optimization problems based on successive convex approximation (SCA). Numerical results show that the proposed user scheduling and two beam design methods based on THP yield noticeable gain over existing methods. △ Less

Submitted 22 August, 2018; originally announced August 2018.

Comments: 13 pages, 6 figures, double-column, submitted to IEEE J. Sel. Topics in Signal Process

arXiv:1510.07369 [pdf, ps, other]

Enhancing Non-Orthogonal Multiple Access By Forming Relaying Broadcast Channels

Authors: Jungho So, Youngchul Sung

Abstract: In this paper, using relaying broadcast channels (RBCs) as component channels for non-orthogonal multiple access (NOMA) is proposed to enhance the performance of NOMA in single-input single-output (SISO) cellular downlink systems. To analyze the performance of the proposed scheme, an achievable rate region of a RBC with compress-and-forward (CF) relaying is newly derived based on the recent work o… ▽ More In this paper, using relaying broadcast channels (RBCs) as component channels for non-orthogonal multiple access (NOMA) is proposed to enhance the performance of NOMA in single-input single-output (SISO) cellular downlink systems. To analyze the performance of the proposed scheme, an achievable rate region of a RBC with compress-and-forward (CF) relaying is newly derived based on the recent work of noisy network coding (NNC). Based on the analysis of the achievable rate region of a RBC with decode-and-forward (DF) relaying, CF relaying, or CF relaying with dirty-paper coding (DPC) at the transmitter, the overall system performance of NOMA equipped with RBC component channels is investigated. It is shown that NOMA with RBC-DF yields marginal gain and NOMA with RBC-CF/DPC yields drastic gain over the simple NOMA based on broadcast component channels in a practical system setup. By going beyond simple broadcast channel (BC)/successive interference cancellation (SIC) to advanced multi-terminal encoding including DPC and CF/NNC, far larger gains can be obtained for NOMA. △ Less

Submitted 26 October, 2015; originally announced October 2015.

Comments: 29 pages, 5 figures, submitted to IEEE Transactions on Communications

arXiv:1508.04562 [pdf, other]

Fast, Flexible Models for Discovering Topic Correlation across Weakly-Related Collections

Authors: Jingwei Zhang, Aaron Gerow, Jaan Altosaar, James Evans, Richard Jean So

Abstract: Weak topic correlation across document collections with different numbers of topics in individual collections presents challenges for existing cross-collection topic models. This paper introduces two probabilistic topic models, Correlated LDA (C-LDA) and Correlated HDP (C-HDP). These address problems that can arise when analyzing large, asymmetric, and potentially weakly-related collections. Topic… ▽ More Weak topic correlation across document collections with different numbers of topics in individual collections presents challenges for existing cross-collection topic models. This paper introduces two probabilistic topic models, Correlated LDA (C-LDA) and Correlated HDP (C-HDP). These address problems that can arise when analyzing large, asymmetric, and potentially weakly-related collections. Topic correlations in weakly-related collections typically lie in the tail of the topic distribution, where they would be overlooked by models unable to fit large numbers of topics. To efficiently model this long tail for large-scale analysis, our models implement a parallel sampling algorithm based on the Metropolis-Hastings and alias methods (Yuan et al., 2015). The models are first evaluated on synthetic data, generated to simulate various collection-level asymmetries. We then present a case study of modeling over 300k documents in collections of sciences and humanities research from JSTOR. △ Less

Submitted 19 August, 2015; originally announced August 2015.

Comments: EMNLP 2015

arXiv:1406.3404 [pdf, ps, other]

doi 10.1109/LSP.2014.2364180

Pilot Signal Design for Massive MIMO Systems: A Received Signal-To-Noise-Ratio-Based Approach

Authors: Jungho So, Donggun Kim, Yuni Lee, Youngchul Sung

Abstract: In this paper, the pilot signal design for massive MIMO systems to maximize the training-based received signal-to-noise ratio (SNR) is considered under two channel models: block Gauss-Markov and block independent and identically distributed (i.i.d.) channel models. First, it is shown that under the block Gauss-Markov channel model, the optimal pilot design problem reduces to a semi-definite progra… ▽ More In this paper, the pilot signal design for massive MIMO systems to maximize the training-based received signal-to-noise ratio (SNR) is considered under two channel models: block Gauss-Markov and block independent and identically distributed (i.i.d.) channel models. First, it is shown that under the block Gauss-Markov channel model, the optimal pilot design problem reduces to a semi-definite programming (SDP) problem, which can be solved numerically by a standard convex optimization tool. Second, under the block i.i.d. channel model, an optimal solution is obtained in closed form. Numerical results show that the proposed method yields noticeably better performance than other existing pilot design methods in terms of received SNR. △ Less

Submitted 12 June, 2014; originally announced June 2014.

Comments: 5 pages, double column, 1 figure. Submitted to IEEE Signal Processing Letters

Showing 1–36 of 36 results for author: So, J