subscribe to arXiv mailings

Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation

Authors: JoonHo Lee, Jae Oh Woo, Juree Seok, Parisa Hassanzadeh, Wooseok Jang, JuYoun Son, Sima Didari, Baruch Gutow, Heng Hao, Hankyu Moon, Wenjun Hu, Yeong-Dae Kwon, Taehee Lee, Seungjai Min

Abstract: Assessing response quality to instructions in language models is vital but challenging due to the complexity of human language across different contexts. This complexity often results in ambiguous or inconsistent interpretations, making accurate assessment difficult. To address this issue, we propose a novel Uncertainty-aware Reward Model (URM) that introduces a robust uncertainty estimation for t… ▽ More Assessing response quality to instructions in language models is vital but challenging due to the complexity of human language across different contexts. This complexity often results in ambiguous or inconsistent interpretations, making accurate assessment difficult. To address this issue, we propose a novel Uncertainty-aware Reward Model (URM) that introduces a robust uncertainty estimation for the quality of paired responses based on Bayesian approximation. Trained with preference datasets, our uncertainty-enabled proxy not only scores rewards for responses but also evaluates their inherent uncertainty. Empirical results demonstrate significant benefits of incorporating the proposed proxy into language model training. Our method boosts the instruction following capability of language models by refining data curation for training and improving policy optimization objectives, thereby surpassing existing methods by a large margin on benchmarks such as Vicuna and MT-bench. These findings highlight that our proposed approach substantially advances language model training and paves a new way of harnessing uncertainty within language models. △ Less

Submitted 19 May, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

Comments: Accepted to ICML 2024

arXiv:2307.10062 [pdf, other]

Unsupervised Accuracy Estimation of Deep Visual Models using Domain-Adaptive Adversarial Perturbation without Source Samples

Authors: JoonHo Lee, Jae Oh Woo, Hankyu Moon, Kwonho Lee

Abstract: Deploying deep visual models can lead to performance drops due to the discrepancies between source and target distributions. Several approaches leverage labeled source data to estimate target domain accuracy, but accessing labeled source data is often prohibitively difficult due to data confidentiality or resource limitations on serving devices. Our work proposes a new framework to estimate model… ▽ More Deploying deep visual models can lead to performance drops due to the discrepancies between source and target distributions. Several approaches leverage labeled source data to estimate target domain accuracy, but accessing labeled source data is often prohibitively difficult due to data confidentiality or resource limitations on serving devices. Our work proposes a new framework to estimate model accuracy on unlabeled target data without access to source data. We investigate the feasibility of using pseudo-labels for accuracy estimation and evolve this idea into adopting recent advances in source-free domain adaptation algorithms. Our approach measures the disagreement rate between the source hypothesis and the target pseudo-labeling function, adapted from the source hypothesis. We mitigate the impact of erroneous pseudo-labels that may arise due to a high ideal joint hypothesis risk by employing adaptive adversarial perturbation on the input of the target model. Our proposed source-free framework effectively addresses the challenging distribution shift scenarios and outperforms existing methods requiring source data and labels for training. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Comments: Accepted to ICCV 2023

arXiv:2208.04278 [pdf, other]

Self-Supervised Contrastive Representation Learning for 3D Mesh Segmentation

Authors: Ayaan Haque, Hankyu Moon, Heng Hao, Sima Didari, Jae Oh Woo, Patrick Bangert

Abstract: 3D deep learning is a growing field of interest due to the vast amount of information stored in 3D formats. Triangular meshes are an efficient representation for irregular, non-uniform 3D objects. However, meshes are often challenging to annotate due to their high geometrical complexity. Specifically, creating segmentation masks for meshes is tedious and time-consuming. Therefore, it is desirable… ▽ More 3D deep learning is a growing field of interest due to the vast amount of information stored in 3D formats. Triangular meshes are an efficient representation for irregular, non-uniform 3D objects. However, meshes are often challenging to annotate due to their high geometrical complexity. Specifically, creating segmentation masks for meshes is tedious and time-consuming. Therefore, it is desirable to train segmentation networks with limited-labeled data. Self-supervised learning (SSL), a form of unsupervised representation learning, is a growing alternative to fully-supervised learning which can decrease the burden of supervision for training. We propose SSL-MeshCNN, a self-supervised contrastive learning method for pre-training CNNs for mesh segmentation. We take inspiration from traditional contrastive learning frameworks to design a novel contrastive learning algorithm specifically for meshes. Our preliminary experiments show promising results in reducing the heavy labeled data requirement needed for mesh segmentation by at least 33%. △ Less

Submitted 21 December, 2022; v1 submitted 8 August, 2022; originally announced August 2022.

Comments: AAAI 2023

arXiv:2201.09815 [pdf, other]

Analytic Mutual Information in Bayesian Neural Networks

Authors: Jae Oh Woo

Abstract: Bayesian neural networks have successfully designed and optimized a robust neural network model in many application problems, including uncertainty quantification. However, with its recent success, information-theoretic understanding about the Bayesian neural network is still at an early stage. Mutual information is an example of an uncertainty measure in a Bayesian neural network to quantify epis… ▽ More Bayesian neural networks have successfully designed and optimized a robust neural network model in many application problems, including uncertainty quantification. However, with its recent success, information-theoretic understanding about the Bayesian neural network is still at an early stage. Mutual information is an example of an uncertainty measure in a Bayesian neural network to quantify epistemic uncertainty. Still, no analytic formula is known to describe it, one of the fundamental information measures to understand the Bayesian deep learning framework. In this paper, we derive the analytical formula of the mutual information between model parameters and the predictive output by leveraging the notion of the point process entropy. Then, as an application, we discuss the parameter estimation of the Dirichlet distribution and show its practical application in the active learning uncertainty measures by demonstrating that our analytical formula can improve the performance of active learning further in practice. △ Less

Submitted 18 June, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

arXiv:2106.08599 [pdf, other]

PatchNet: Unsupervised Object Discovery based on Patch Embedding

Authors: Hankyu Moon, Heng Hao, Sima Didari, Jae Oh Woo, Patrick Bangert

Abstract: We demonstrate that frequently appearing objects can be discovered by training randomly sampled patches from a small number of images (100 to 200) by self-supervision. Key to this approach is the pattern space, a latent space of patterns that represents all possible sub-images of the given image data. The distance structure in the pattern space captures the co-occurrence of patterns due to the fre… ▽ More We demonstrate that frequently appearing objects can be discovered by training randomly sampled patches from a small number of images (100 to 200) by self-supervision. Key to this approach is the pattern space, a latent space of patterns that represents all possible sub-images of the given image data. The distance structure in the pattern space captures the co-occurrence of patterns due to the frequent objects. The pattern space embedding is learned by minimizing the contrastive loss between randomly generated adjacent patches. To prevent the embedding from learning the background, we modulate the contrastive loss by color-based object saliency and background dissimilarity. The learned distance structure serves as object memory, and the frequent objects are simply discovered by clustering the pattern vectors from the random patches sampled for inference. Our image representation based on image patches naturally handles the position and scale invariance property that is crucial to multi-object discovery. The method has been proven surprisingly effective, and successfully applied to finding multiple human faces and bodies from natural images. △ Less

Submitted 16 June, 2021; originally announced June 2021.

ACM Class: I.2.10; I.4.10; I.5.3

arXiv:2105.14559 [pdf, other]

Active Learning in Bayesian Neural Networks with Balanced Entropy Learning Principle

Authors: Jae Oh Woo

Abstract: Acquiring labeled data is challenging in many machine learning applications with limited budgets. Active learning gives a procedure to select the most informative data points and improve data efficiency by reducing the cost of labeling. The info-max learning principle maximizing mutual information such as BALD has been successful and widely adapted in various active learning applications. However,… ▽ More Acquiring labeled data is challenging in many machine learning applications with limited budgets. Active learning gives a procedure to select the most informative data points and improve data efficiency by reducing the cost of labeling. The info-max learning principle maximizing mutual information such as BALD has been successful and widely adapted in various active learning applications. However, this pool-based specific objective inherently introduces a redundant selection and further requires a high computational cost for batch selection. In this paper, we design and propose a new uncertainty measure, Balanced Entropy Acquisition (BalEntAcq), which captures the information balance between the uncertainty of underlying softmax probability and the label variable. To do this, we approximate each marginal distribution by Beta distribution. Beta approximation enables us to formulate BalEntAcq as a ratio between an augmented entropy and the marginalized joint entropy. The closed-form expression of BalEntAcq facilitates parallelization by estimating two parameters in each marginal Beta distribution. BalEntAcq is a purely standalone measure without requiring any relational computations with other data points. Nevertheless, BalEntAcq captures a well-diversified selection near the decision boundary with a margin, unlike other existing uncertainty measures such as BALD, Entropy, or Mean Standard Deviation (MeanSD). Finally, we demonstrate that our balanced entropy learning principle with BalEntAcq consistently outperforms well-known linearly scalable active learning methods, including a recently proposed PowerBALD, a simple but diversified version of BALD, by showing experimental results obtained from MNIST, CIFAR-100, SVHN, and TinyImageNet datasets. △ Less

Submitted 15 April, 2023; v1 submitted 30 May, 2021; originally announced May 2021.

Journal ref: International Conference on Learning Representations 2023

arXiv:2103.05109 [pdf, other]

Highly Efficient Representation and Active Learning Framework and Its Application to Imbalanced Medical Image Classification

Authors: Heng Hao, Hankyu Moon, Sima Didari, Jae Oh Woo, Patrick Bangert

Abstract: We propose a highly data-efficient active learning framework for image classification. Our novel framework combines: (1) unsupervised representation learning of a Convolutional Neural Network and (2) the Gaussian Process (GP) method, in sequence to achieve highly data and label efficient classifications. Moreover, both elements are less sensitive to the prevalent and challenging class imbalance is… ▽ More We propose a highly data-efficient active learning framework for image classification. Our novel framework combines: (1) unsupervised representation learning of a Convolutional Neural Network and (2) the Gaussian Process (GP) method, in sequence to achieve highly data and label efficient classifications. Moreover, both elements are less sensitive to the prevalent and challenging class imbalance issue, thanks to the (1) feature learned without labels and (2) the Bayesian nature of GP. The GP-provided uncertainty estimates enable active learning by ranking samples based on the uncertainty and selectively labeling samples showing higher uncertainty. We apply this novel combination to the severely imbalanced case of COVID-19 chest X-ray classification and the Nerthus colonoscopy classification. We demonstrate that only . 10% of the labeled data is needed to reach the accuracy from training all available labels. We also applied our model architecture and proposed framework to a broader class of datasets with expected success. △ Less

Submitted 20 June, 2022; v1 submitted 24 February, 2021; originally announced March 2021.

Comments: Published in NeurIPs Data-Centric AI workshop

arXiv:1712.00913 [pdf, ps, other]

doi 10.1016/j.disc.2019.03.002

Majorization and Rényi Entropy Inequalities via Sperner Theory

Authors: Mokshay Madiman, Liyao Wang, Jae Oh Woo

Abstract: A natural link between the notions of majorization and strongly Sperner posets is elucidated. It is then used to obtain a variety of consequences, including new Rényi entropy inequalities for sums of independent, integer-valued random variables. A natural link between the notions of majorization and strongly Sperner posets is elucidated. It is then used to obtain a variety of consequences, including new Rényi entropy inequalities for sums of independent, integer-valued random variables. △ Less

Submitted 13 November, 2018; v1 submitted 4 December, 2017; originally announced December 2017.

Comments: Introduction was completely rewritten and there are numerous corrections. Expansion of background on Sperner theory, and several references are added

Journal ref: Discrete Mathematics (AEGT 2017 Special issue edited by S. Cioaba, R. Coulter, E. Fiorini, Q. Xiang, F. Pfender), vol. 342, no. 10, pp. 2911-2923, October 2019

arXiv:1711.00881 [pdf, other]

On the Steady State of Continuous Time Stochastic Opinion Dynamics with Power Law Confidence

Authors: Jae Oh Woo, François Baccelli, Sriram Vishwanath

Abstract: This paper introduces a class of non-linear and continuous-time opinion dynamics model with additive noise and state dependent interaction rates between agents. The model features interaction rates which are proportional to a negative power of opinion distances. We establish a non-local partial differential equation for the distribution of opinion distances and use Mellin transforms to provide an… ▽ More This paper introduces a class of non-linear and continuous-time opinion dynamics model with additive noise and state dependent interaction rates between agents. The model features interaction rates which are proportional to a negative power of opinion distances. We establish a non-local partial differential equation for the distribution of opinion distances and use Mellin transforms to provide an explicit formula for the stationary solution of the latter, when it exists. Our approach leads to new qualitative and quantitative results on this type of dynamics. To the best of our knowledge these Mellin transform results are the first quantitative results on the equilibria of opinion dynamics with distance-dependent interaction rates. The closed form expressions for this class of dynamics are obtained for the two agent case. However the results can be used in mean-field models featuring several agents whose interaction rates depend on the empirical average of their opinions. The technique also applies to linear dynamics, namely with a constant interaction rate, on an interaction graph. △ Less

Submitted 12 December, 2020; v1 submitted 2 November, 2017; originally announced November 2017.

arXiv:1710.00812 [pdf, ps, other]

doi 10.1137/18M1185570

Entropy Inequalities for Sums in Prime Cyclic Groups

Authors: Mokshay Madiman, Liyao Wang, Jae Oh Woo

Abstract: Lower bounds for the Rényi entropies of sums of independent random variables taking values in cyclic groups of prime order under permutations are established. The main ingredients of our approach are extended rearrangement inequalities in prime cyclic groups building on Lev (2001), and notions of stochastic ordering. Several applications are developed, including to discrete entropy power inequalit… ▽ More Lower bounds for the Rényi entropies of sums of independent random variables taking values in cyclic groups of prime order under permutations are established. The main ingredients of our approach are extended rearrangement inequalities in prime cyclic groups building on Lev (2001), and notions of stochastic ordering. Several applications are developed, including to discrete entropy power inequalities, the Littlewood-Offord problem, and counting solutions of certain linear systems. △ Less

Submitted 26 November, 2020; v1 submitted 2 October, 2017; originally announced October 2017.

Comments: 25 pages

Journal ref: SIAM J. Discrete Math., 35(3), pp. 1628-1649, 2021

arXiv:1701.02261 [pdf, other]

An Analytical Framework for Modeling a Spatially Repulsive Cellular Network

Authors: Chang-Sik Choi, Jae Oh Woo, Jeffrey G. Andrews

Abstract: We propose a new cellular network model that captures both deterministic and random aspects of base station deployments. Namely, the base station locations are modeled as the superposition of two independent stationary point processes: a random shifted grid with intensity $λ_g$ and a Poisson point process (PPP) with intensity $λ_p$. Grid and PPP deployments are special cases with $λ_p \to 0$ and… ▽ More We propose a new cellular network model that captures both deterministic and random aspects of base station deployments. Namely, the base station locations are modeled as the superposition of two independent stationary point processes: a random shifted grid with intensity $λ_g$ and a Poisson point process (PPP) with intensity $λ_p$. Grid and PPP deployments are special cases with $λ_p \to 0$ and $λ_g \to 0$, with actual deployments in between these two extremes, as we demonstrate with deployment data. Assuming that each user is associated with the base station that provides the strongest average received signal power, we obtain the probability that a typical user is associated with either a grid or PPP base station. Assuming Rayleigh fading channels, we derive the expression for the coverage probability of the typical user, resulting in the following observations. First, the association and the coverage probability of the typical user are fully characterized as functions of intensity ratio $ρ_λ= λ_p/λ_g$. Second, the user association is biased towards the base stations located on a grid. Finally, the proposed model predicts the coverage probability of the actual deployment with great accuracy. △ Less

Submitted 29 September, 2017; v1 submitted 9 January, 2017; originally announced January 2017.

Comments: Submitted to IEEE Transactions on Communications

arXiv:1407.5383 [pdf, other]

doi 10.3390/e16105339

Redundancy of Exchangeable Estimators

Authors: Narayana P. Santhanam, Anand D. Sarwate, Jae Oh Woo

Abstract: Exchangeable random partition processes are the basis for Bayesian approaches to statistical inference in large alphabet settings. On the other hand, the notion of the pattern of a sequence provides an information-theoretic framework for data compression in large alphabet scenarios. Because data compression and parameter estimation are intimately related, we study the redundancy of Bayes estimator… ▽ More Exchangeable random partition processes are the basis for Bayesian approaches to statistical inference in large alphabet settings. On the other hand, the notion of the pattern of a sequence provides an information-theoretic framework for data compression in large alphabet scenarios. Because data compression and parameter estimation are intimately related, we study the redundancy of Bayes estimators coming from Poisson-Dirichlet priors (or "Chinese restaurant processes") and the Pitman-Yor prior. This provides an understanding of these estimators in the setting of unknown discrete alphabets from the perspective of universal compression. In particular, we identify relations between alphabet sizes and sample sizes where the redundancy is small, thereby characterizing useful regimes for these estimators. △ Less

Submitted 20 October, 2014; v1 submitted 21 July, 2014; originally announced July 2014.

Comments: 18 pages

Showing 1–12 of 12 results for author: Woo, J O