subscribe to arXiv mailings

Deflated Dynamics Value Iteration

Authors: Jongmin Lee, Amin Rakhsha, Ernest K. Ryu, Amir-massoud Farahmand

Abstract: The Value Iteration (VI) algorithm is an iterative procedure to compute the value function of a Markov decision process, and is the basis of many reinforcement learning (RL) algorithms as well. As the error convergence rate of VI as a function of iteration $k$ is $O(γ^k)$, it is slow when the discount factor $γ$ is close to $1$. To accelerate the computation of the value function, we propose Defla… ▽ More The Value Iteration (VI) algorithm is an iterative procedure to compute the value function of a Markov decision process, and is the basis of many reinforcement learning (RL) algorithms as well. As the error convergence rate of VI as a function of iteration $k$ is $O(γ^k)$, it is slow when the discount factor $γ$ is close to $1$. To accelerate the computation of the value function, we propose Deflated Dynamics Value Iteration (DDVI). DDVI uses matrix splitting and matrix deflation techniques to effectively remove (deflate) the top $s$ dominant eigen-structure of the transition matrix $\mathcal{P}^π$. We prove that this leads to a $\tilde{O}(γ^k |λ_{s+1}|^k)$ convergence rate, where $λ_{s+1}$is $(s+1)$-th largest eigenvalue of the dynamics matrix. We then extend DDVI to the RL setting and present Deflated Dynamics Temporal Difference (DDTD) algorithm. We empirically show the effectiveness of the proposed algorithms. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2405.03958 [pdf, other]

Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model

Authors: Joo Young Choi, Jaesung R. Park, Inkyu Park, Jaewoong Cho, Albert No, Ernest K. Ryu

Abstract: Current state-of-the-art diffusion models employ U-Net architectures containing convolutional and (qkv) self-attention layers. The U-Net processes images while being conditioned on the time embedding input for each sampling step and the class or caption embedding input corresponding to the desired conditional generation. Such conditioning involves scale-and-shift operations to the convolutional la… ▽ More Current state-of-the-art diffusion models employ U-Net architectures containing convolutional and (qkv) self-attention layers. The U-Net processes images while being conditioned on the time embedding input for each sampling step and the class or caption embedding input corresponding to the desired conditional generation. Such conditioning involves scale-and-shift operations to the convolutional layers but does not directly affect the attention layers. While these standard architectural choices are certainly effective, not conditioning the attention layers feels arbitrary and potentially suboptimal. In this work, we show that simply adding LoRA conditioning to the attention layers without changing or tuning the other parts of the U-Net architecture improves the image generation quality. For example, a drop-in addition of LoRA conditioning to EDM diffusion model yields FID scores of 1.91/1.75 for unconditional and class-conditional CIFAR-10 generation, improving upon the baseline of 1.97/1.79. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2403.17199 [pdf, other]

Extracting Social Support and Social Isolation Information from Clinical Psychiatry Notes: Comparing a Rule-based NLP System and a Large Language Model

Authors: Braja Gopal Patra, Lauren A. Lepow, Praneet Kasi Reddy Jagadeesh Kumar, Veer Vekaria, Mohit Manoj Sharma, Prakash Adekkanattu, Brian Fennessy, Gavin Hynes, Isotta Landi, Jorge A. Sanchez-Ruiz, Euijung Ryu, Joanna M. Biernacka, Girish N. Nadkarni, Ardesheer Talati, Myrna Weissman, Mark Olfson, J. John Mann, Alexander W. Charney, Jyotishman Pathak

Abstract: Background: Social support (SS) and social isolation (SI) are social determinants of health (SDOH) associated with psychiatric outcomes. In electronic health records (EHRs), individual-level SS/SI is typically documented as narrative clinical notes rather than structured coded data. Natural language processing (NLP) algorithms can automate the otherwise labor-intensive process of data extraction.… ▽ More Background: Social support (SS) and social isolation (SI) are social determinants of health (SDOH) associated with psychiatric outcomes. In electronic health records (EHRs), individual-level SS/SI is typically documented as narrative clinical notes rather than structured coded data. Natural language processing (NLP) algorithms can automate the otherwise labor-intensive process of data extraction. Data and Methods: Psychiatric encounter notes from Mount Sinai Health System (MSHS, n=300) and Weill Cornell Medicine (WCM, n=225) were annotated and established a gold standard corpus. A rule-based system (RBS) involving lexicons and a large language model (LLM) using FLAN-T5-XL were developed to identify mentions of SS and SI and their subcategories (e.g., social network, instrumental support, and loneliness). Results: For extracting SS/SI, the RBS obtained higher macro-averaged f-scores than the LLM at both MSHS (0.89 vs. 0.65) and WCM (0.85 vs. 0.82). For extracting subcategories, the RBS also outperformed the LLM at both MSHS (0.90 vs. 0.62) and WCM (0.82 vs. 0.81). Discussion and Conclusion: Unexpectedly, the RBS outperformed the LLMs across all metrics. Intensive review demonstrates that this finding is due to the divergent approach taken by the RBS and LLM. The RBS were designed and refined to follow the same specific rules as the gold standard annotations. Conversely, the LLM were more inclusive with categorization and conformed to common English-language understanding. Both approaches offer advantages and are made available open-source for future testing. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 2 figures, 3 tables

arXiv:2403.04616 [pdf, other]

Modeling reputation-based behavioral biases in school choice

Authors: Jon Kleinberg, Sigal Oren, Emily Ryu, Éva Tardos

Abstract: A fundamental component in the theoretical school choice literature is the problem a student faces in deciding which schools to apply to. Recent models have considered a set of schools of different selectiveness and a student who is unsure of their strength and can apply to at most $k$ schools. Such models assume that the student cares solely about maximizing the quality of the school that they at… ▽ More A fundamental component in the theoretical school choice literature is the problem a student faces in deciding which schools to apply to. Recent models have considered a set of schools of different selectiveness and a student who is unsure of their strength and can apply to at most $k$ schools. Such models assume that the student cares solely about maximizing the quality of the school that they attend, but experience suggests that students' decisions are also influenced by a set of behavioral biases based on reputational effects: a subjective reputational benefit when admitted to a selective school, whether or not they attend; and a subjective loss based on disappointment when rejected. Guided by these observations, and inspired by recent behavioral economics work on loss aversion relative to expectations, we propose a behavioral model by which a student chooses schools to balance these behavioral effects with the quality of the school they attend. Our main results show that a student's choices change in dramatic ways when these reputation-based behavioral biases are taken into account. In particular, where a rational applicant spreads their applications evenly, a biased student applies very sparsely to highly selective schools, such that above a certain threshold they apply to only an absolute constant number of schools even as their budget of applications grows to infinity. Consequently, a biased student underperforms a rational student even when the rational student is restricted to a sufficiently large upper bound on applications and the biased student can apply to arbitrarily many. Our analysis shows that the reputation-based model is rich enough to cover a range of different ways that biased students cope with fear of rejection, including not just targeting less selective schools, but also occasionally applying to schools that are too selective, compared to rational students. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 22 pages, 8 figures

arXiv:2403.03937 [pdf, ps, other]

Settling the Competition Complexity of Additive Buyers over Independent Items

Authors: Mahsa Derakhshan, Emily Ryu, S. Matthew Weinberg, Eric Xue

Abstract: The competition complexity of an auction setting is the number of additional bidders needed such that the simple mechanism of selling items separately (with additional bidders) achieves greater revenue than the optimal but complex (randomized, prior-dependent, Bayesian-truthful) optimal mechanism without the additional bidders. Our main result settles the competition complexity of $n$ bidders with… ▽ More The competition complexity of an auction setting is the number of additional bidders needed such that the simple mechanism of selling items separately (with additional bidders) achieves greater revenue than the optimal but complex (randomized, prior-dependent, Bayesian-truthful) optimal mechanism without the additional bidders. Our main result settles the competition complexity of $n$ bidders with additive values over $m < n$ independent items at $Θ(\sqrt{nm})$. The $O(\sqrt{nm})$ upper bound is due to [BW19], and our main result improves the prior lower bound of $Ω(\ln n)$ to $Ω(\sqrt{nm})$. Our main result follows from an explicit construction of a Bayesian IC auction for $n$ bidders with additive values over $m<n$ independent items drawn from the Equal Revenue curve truncated at $\sqrt{nm}$ ($\mathcal{ER}_{\le \sqrt{nm}}$), which achieves revenue that exceeds $\text{SRev}_{n+\sqrt{nm}}(\mathcal{ER}_{\le \sqrt{nm}}^m)$. Along the way, we show that the competition complexity of $n$ bidders with additive values over $m$ independent items is exactly equal to the minimum $c$ such that $\text{SRev}_{n+c}(\mathcal{ER}_{\le p}^m) \geq \text{Rev}_n(\mathcal{ER}_{\le p}^m)$ for all $p$ (that is, some truncated Equal Revenue witnesses the worst-case competition complexity). Interestingly, we also show that the untruncated Equal Revenue curve does not witness the worst-case competition complexity when $n > m$: $\text{SRev}_n(\mathcal{ER}^m) = nm+O_m(\ln (n)) \leq \text{SRev}_{n+O_m(\ln (n))}(\mathcal{ER}^m)$, and therefore our result can only follow by considering all possible truncations. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 50 pages

arXiv:2402.11867 [pdf, other]

LoRA Training in the NTK Regime has No Spurious Local Minima

Authors: Uijeong Jang, Jason D. Lee, Ernest K. Ryu

Abstract: Low-rank adaptation (LoRA) has become the standard approach for parameter-efficient fine-tuning of large language models (LLM), but our theoretical understanding of LoRA has been limited. In this work, we theoretically analyze LoRA fine-tuning in the neural tangent kernel (NTK) regime with $N$ data points, showing: (i) full fine-tuning (without LoRA) admits a low-rank solution of rank… ▽ More Low-rank adaptation (LoRA) has become the standard approach for parameter-efficient fine-tuning of large language models (LLM), but our theoretical understanding of LoRA has been limited. In this work, we theoretically analyze LoRA fine-tuning in the neural tangent kernel (NTK) regime with $N$ data points, showing: (i) full fine-tuning (without LoRA) admits a low-rank solution of rank $r\lesssim \sqrt{N}$; (ii) using LoRA with rank $r\gtrsim \sqrt{N}$ eliminates spurious local minima, allowing gradient descent to find the low-rank solutions; (iii) the low-rank solution found using LoRA generalizes well. △ Less

Submitted 28 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: 23 pages

arXiv:2310.18297 [pdf, other]

Image Clustering Conditioned on Text Criteria

Authors: Sehyun Kwon, Jaeseung Park, Minkyu Kim, Jaewoong Cho, Ernest K. Ryu, Kangwook Lee

Abstract: Classical clustering methods do not provide users with direct control of the clustering results, and the clustering results may not be consistent with the relevant criterion that a user has in mind. In this work, we present a new methodology for performing image clustering based on user-specified text criteria by leveraging modern vision-language models and large language models. We call our metho… ▽ More Classical clustering methods do not provide users with direct control of the clustering results, and the clustering results may not be consistent with the relevant criterion that a user has in mind. In this work, we present a new methodology for performing image clustering based on user-specified text criteria by leveraging modern vision-language models and large language models. We call our method Image Clustering Conditioned on Text Criteria (IC|TC), and it represents a different paradigm of image clustering. IC|TC requires a minimal and practical degree of human intervention and grants the user significant control over the clustering results in return. Our experiments show that IC|TC can effectively cluster images with various criteria, such as human action, physical location, or the person's mood, while significantly outperforming baselines. △ Less

Submitted 21 February, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

arXiv:2307.02770 [pdf, other]

Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback

Authors: TaeHo Yoon, Kibeom Myoung, Keon Lee, Jaewoong Cho, Albert No, Ernest K. Ryu

Abstract: Diffusion models have recently shown remarkable success in high-quality image generation. Sometimes, however, a pre-trained diffusion model exhibits partial misalignment in the sense that the model can generate good images, but it sometimes outputs undesirable images. If so, we simply need to prevent the generation of the bad images, and we call this task censoring. In this work, we present censor… ▽ More Diffusion models have recently shown remarkable success in high-quality image generation. Sometimes, however, a pre-trained diffusion model exhibits partial misalignment in the sense that the model can generate good images, but it sometimes outputs undesirable images. If so, we simply need to prevent the generation of the bad images, and we call this task censoring. In this work, we present censored generation with a pre-trained diffusion model using a reward model trained on minimal human feedback. We show that censoring can be accomplished with extreme human feedback efficiency and that labels generated with a mere few minutes of human feedback are sufficient. Code available at: https://github.com/tetrzim/diffusion-human-feedback. △ Less

Submitted 30 October, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: Published in NeurIPS 2023

arXiv:2305.16569 [pdf, ps, other]

Accelerating Value Iteration with Anchoring

Authors: Jongmin Lee, Ernest K. Ryu

Abstract: Value Iteration (VI) is foundational to the theory and practice of modern reinforcement learning, and it is known to converge at a $\mathcal{O}(γ^k)$-rate, where $γ$ is the discount factor. Surprisingly, however, the optimal rate for the VI setup was not known, and finding a general acceleration mechanism has been an open problem. In this paper, we present the first accelerated VI for both the Bel… ▽ More Value Iteration (VI) is foundational to the theory and practice of modern reinforcement learning, and it is known to converge at a $\mathcal{O}(γ^k)$-rate, where $γ$ is the discount factor. Surprisingly, however, the optimal rate for the VI setup was not known, and finding a general acceleration mechanism has been an open problem. In this paper, we present the first accelerated VI for both the Bellman consistency and optimality operators. Our method, called Anc-VI, is based on an \emph{anchoring} mechanism (distinct from Nesterov's acceleration), and it reduces the Bellman error faster than standard VI. In particular, Anc-VI exhibits a $\mathcal{O}(1/k)$-rate for $γ\approx 1$ or even $γ=1$, while standard VI has rate $\mathcal{O}(1)$ for $γ\ge 1-1/k$, where $k$ is the iteration count. We also provide a complexity lower bound matching the upper bound up to a constant factor of $4$, thereby establishing optimality of the accelerated rate of Anc-VI. Finally, we show that the anchoring mechanism provides the same benefit in the approximate VI and Gauss--Seidel VI setups as well. △ Less

Submitted 28 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Journal ref: Neural Information Processing System 2023

arXiv:2304.13995 [pdf, other]

Rotation and Translation Invariant Representation Learning with Implicit Neural Representations

Authors: Sehyun Kwon, Joo Young Choi, Ernest K. Ryu

Abstract: In many computer vision applications, images are acquired with arbitrary or random rotations and translations, and in such setups, it is desirable to obtain semantic representations disentangled from the image orientation. Examples of such applications include semiconductor wafer defect inspection, plankton microscope images, and inference on single-particle cryo-electron microscopy (cryo-EM) micr… ▽ More In many computer vision applications, images are acquired with arbitrary or random rotations and translations, and in such setups, it is desirable to obtain semantic representations disentangled from the image orientation. Examples of such applications include semiconductor wafer defect inspection, plankton microscope images, and inference on single-particle cryo-electron microscopy (cryo-EM) micro-graphs. In this work, we propose Invariant Representation Learning with Implicit Neural Representation (IRL-INR), which uses an implicit neural representation (INR) with a hypernetwork to obtain semantic representations disentangled from the orientation of the image. We show that IRL-INR can effectively learn disentangled semantic representations on more complex images compared to those considered in prior works and show that these semantic representations synergize well with SCAN to produce state-of-the-art unsupervised clustering results. △ Less

Submitted 12 June, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

arXiv:2302.03239 [pdf, ps, other]

Calibrated Recommendations for Users with Decaying Attention

Authors: Jon Kleinberg, Emily Ryu, Éva Tardos

Abstract: Recommendation systems capable of providing diverse sets of results are a focus of increasing importance, with motivations ranging from fairness to novelty and other aspects of optimizing user experience. One form of diversity of recent interest is calibration, the notion that personalized recommendations should reflect the full distribution of a user's interests, rather than a single predominant… ▽ More Recommendation systems capable of providing diverse sets of results are a focus of increasing importance, with motivations ranging from fairness to novelty and other aspects of optimizing user experience. One form of diversity of recent interest is calibration, the notion that personalized recommendations should reflect the full distribution of a user's interests, rather than a single predominant category -- for instance, a user who mainly reads entertainment news but also wants to keep up with news on the environment and the economy would prefer to see a mixture of these genres, not solely entertainment news. Existing work has formulated calibration as a subset selection problem; this line of work observes that the formulation requires the unrealistic assumption that all recommended items receive equal consideration from the user, but leaves as an open question the more realistic setting in which user attention decays as they move down the list of results. In this paper, we consider calibration with decaying user attention under two different models. In both models, there is a set of underlying genres that items can belong to. In the first setting, where items are represented by fine-grained mixtures of genre percentages, we provide a $(1-1/e)$-approximation algorithm by extending techniques for constrained submodular optimization. In the second setting, where items are coarsely binned into a single genre each, we surpass the $(1-1/e)$ barrier imposed by submodular maximization and give a $2/3$-approximate greedy algorithm. Our work thus addresses the problem of capturing ordering effects due to decaying attention, allowing for the extension of near-optimal calibration from recommendation sets to recommendation lists. △ Less

Submitted 12 July, 2024; v1 submitted 6 February, 2023; originally announced February 2023.

Comments: 31 pages, 1 figure, 17th International Symposium on Algorithmic Game Theory (SAGT 2024). This paper incorporates and supersedes our earlier paper arXiv:2203.00233

arXiv:2203.00233 [pdf, ps, other]

Ordered Submodularity and its Applications to Diversifying Recommendations

Authors: Jon Kleinberg, Emily Ryu, Éva Tardos

Abstract: A fundamental task underlying many important optimization problems, from influence maximization to sensor placement to content recommendation, is to select the optimal group of $k$ items from a larger set. Submodularity has been very effective in allowing approximation algorithms for such subset selection problems. However, in several applications, we are interested not only in the elements of a s… ▽ More A fundamental task underlying many important optimization problems, from influence maximization to sensor placement to content recommendation, is to select the optimal group of $k$ items from a larger set. Submodularity has been very effective in allowing approximation algorithms for such subset selection problems. However, in several applications, we are interested not only in the elements of a set, but also the order in which they appear, breaking the assumption that all selected items receive equal consideration. One such category of applications involves the presentation of search results, product recommendations, news articles, and other content, due to the well-documented phenomenon that humans pay greater attention to higher-ranked items. As a result, optimization in content presentation for diversity, user coverage, calibration, or other objectives more accurately represents a sequence selection problem, to which traditional submodularity approximation results no longer apply. Although extensions of submodularity to sequences have been proposed, none is designed to model settings where items contribute based on their position in a ranked list, and hence they are not able to express these types of optimization problems. In this paper, we aim to address this modeling gap. Here, we propose a new formalism of ordered submodularity that captures these ordering problems in content presentation, and more generally a category of optimization problems over ranked sequences in which different list positions contribute differently to the objective function. We analyze the natural ordered analogue of the greedy algorithm and show that it provides a $2$-approximation. We also show that this bound is tight, establishing that our new framework is conceptually and quantitatively distinct from previous formalisms of set and sequence submodularity. △ Less

Submitted 1 March, 2022; originally announced March 2022.

Comments: 17 pages

arXiv:2202.11910 [pdf, other]

Robust Probabilistic Time Series Forecasting

Authors: TaeHo Yoon, Youngsuk Park, Ernest K. Ryu, Yuyang Wang

Abstract: Probabilistic time series forecasting has played critical role in decision-making processes due to its capability to quantify uncertainties. Deep forecasting models, however, could be prone to input perturbations, and the notion of such perturbations, together with that of robustness, has not even been completely established in the regime of probabilistic forecasting. In this work, we propose a fr… ▽ More Probabilistic time series forecasting has played critical role in decision-making processes due to its capability to quantify uncertainties. Deep forecasting models, however, could be prone to input perturbations, and the notion of such perturbations, together with that of robustness, has not even been completely established in the regime of probabilistic forecasting. In this work, we propose a framework for robust probabilistic time series forecasting. First, we generalize the concept of adversarial input perturbations, based on which we formulate the concept of robustness in terms of bounded Wasserstein deviation. Then we extend the randomized smoothing technique to attain robust probabilistic forecasters with theoretical robustness certificates against certain classes of adversarial perturbations. Lastly, extensive experiments demonstrate that our methods are empirically effective in enhancing the forecast quality under additive adversarial attacks and forecast consistency under supplement of noisy observations. △ Less

Submitted 24 February, 2022; originally announced February 2022.

Comments: AISTATS 2022 camera ready version

arXiv:2202.02981 [pdf, other]

Neural Tangent Kernel Analysis of Deep Narrow Neural Networks

Authors: Jongmin Lee, Joo Young Choi, Ernest K. Ryu, Albert No

Abstract: The tremendous recent progress in analyzing the training dynamics of overparameterized neural networks has primarily focused on wide networks and therefore does not sufficiently address the role of depth in deep learning. In this work, we present the first trainability guarantee of infinitely deep but narrow neural networks. We study the infinite-depth limit of a multilayer perceptron (MLP) with a… ▽ More The tremendous recent progress in analyzing the training dynamics of overparameterized neural networks has primarily focused on wide networks and therefore does not sufficiently address the role of depth in deep learning. In this work, we present the first trainability guarantee of infinitely deep but narrow neural networks. We study the infinite-depth limit of a multilayer perceptron (MLP) with a specific initialization and establish a trainability guarantee using the NTK theory. We then extend the analysis to an infinitely deep convolutional neural network (CNN) and perform brief experiments. △ Less

Submitted 27 June, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

Journal ref: Published in International Conference on Machine Learning, 2022

arXiv:2201.09077 [pdf, other]

LTC-GIF: Attracting More Clicks on Feature-length Sports Videos

Authors: Ghulam Mujtaba, Jaehyuk Choi, Eun-Seok Ryu

Abstract: This paper proposes a lightweight method to attract users and increase views of the video by presenting personalized artistic media -- i.e, static thumbnails and animated GIFs. This method analyzes lightweight thumbnail containers (LTC) using computational resources of the client device to recognize personalized events from full-length sports videos. In addition, instead of processing the entire v… ▽ More This paper proposes a lightweight method to attract users and increase views of the video by presenting personalized artistic media -- i.e, static thumbnails and animated GIFs. This method analyzes lightweight thumbnail containers (LTC) using computational resources of the client device to recognize personalized events from full-length sports videos. In addition, instead of processing the entire video, small video segments are processed to generate artistic media. This makes the proposed approach more computationally efficient compared to the baseline approaches that create artistic media using the entire video. The proposed method retrieves and uses thumbnail containers and video segments, which reduces the required transmission bandwidth as well as the amount of locally stored data used during artistic media generation. When extensive experiments were conducted on the Nvidia Jetson TX2, the computational complexity of the proposed method was 3.57 times lower than that of the SoA method. In the qualitative assessment, GIFs generated using the proposed method received 1.02 higher overall ratings compared to the SoA method. To the best of our knowledge, this is the first technique that uses LTC to generate artistic media while providing lightweight and high-performance services even on resource-constrained devices. △ Less

Submitted 22 January, 2022; originally announced January 2022.

arXiv:2201.09049 [pdf, other]

doi 10.1109/ACCESS.2022.3209275

LTC-SUM: Lightweight Client-driven Personalized Video Summarization Framework Using 2D CNN

Authors: Ghulam Mujtaba, Adeel Malik, Eun-Seok Ryu

Abstract: This paper proposes a novel lightweight thumbnail container-based summarization (LTC-SUM) framework for full feature-length videos. This framework generates a personalized keyshot summary for concurrent users by using the computational resource of the end-user device. State-of-the-art methods that acquire and process entire video data to generate video summaries are highly computationally intensiv… ▽ More This paper proposes a novel lightweight thumbnail container-based summarization (LTC-SUM) framework for full feature-length videos. This framework generates a personalized keyshot summary for concurrent users by using the computational resource of the end-user device. State-of-the-art methods that acquire and process entire video data to generate video summaries are highly computationally intensive. In this regard, the proposed LTC-SUM method uses lightweight thumbnails to handle the complex process of detecting events. This significantly reduces computational complexity and improves communication and storage efficiency by resolving computational and privacy bottlenecks in resource-constrained end-user devices. These improvements were achieved by designing a lightweight 2D CNN model to extract features from thumbnails, which helped select and retrieve only a handful of specific segments. Extensive quantitative experiments on a set of full 18 feature-length videos (approximately 32.9 h in duration) showed that the proposed method is significantly computationally efficient than state-of-the-art methods on the same end-user device configurations. Joint qualitative assessments of the results of 56 participants showed that participants gave higher ratings to the summaries generated using the proposed method. To the best of our knowledge, this is the first attempt in designing a fully client-driven personalized keyshot video summarization framework using thumbnail containers for feature-length videos. △ Less

Submitted 4 October, 2022; v1 submitted 22 January, 2022; originally announced January 2022.

Comments: 14

Journal ref: in IEEE Access, vol. 10, pp. 103041-103055, 2022

arXiv:2112.09379 [pdf]

Enhanced Frame and Event-Based Simulator and Event-Based Video Interpolation Network

Authors: Adam Radomski, Andreas Georgiou, Thomas Debrunner, Chenghan Li, Luca Longinotti, Minwon Seo, Moosung Kwak, Chang-Woo Shin, Paul K. J. Park, Hyunsurk Eric Ryu, Kynan Eng

Abstract: Fast neuromorphic event-based vision sensors (Dynamic Vision Sensor, DVS) can be combined with slower conventional frame-based sensors to enable higher-quality inter-frame interpolation than traditional methods relying on fixed motion approximations using e.g. optical flow. In this work we present a new, advanced event simulator that can produce realistic scenes recorded by a camera rig with an ar… ▽ More Fast neuromorphic event-based vision sensors (Dynamic Vision Sensor, DVS) can be combined with slower conventional frame-based sensors to enable higher-quality inter-frame interpolation than traditional methods relying on fixed motion approximations using e.g. optical flow. In this work we present a new, advanced event simulator that can produce realistic scenes recorded by a camera rig with an arbitrary number of sensors located at fixed offsets. It includes a new configurable frame-based image sensor model with realistic image quality reduction effects, and an extended DVS model with more accurate characteristics. We use our simulator to train a novel reconstruction model designed for end-to-end reconstruction of high-fps video. Unlike previously published methods, our method does not require the frame and DVS cameras to have the same optics, positions, or camera resolutions. It is also not limited to objects a fixed distance from the sensor. We show that data generated by our simulator can be used to train our new model, leading to reconstructed images on public datasets of equivalent or better quality than the state of the art. We also show our sensor generalizing to data recorded by real sensors. △ Less

Submitted 17 December, 2021; originally announced December 2021.

Comments: 10 pages, 19 figures

arXiv:2104.09644 [pdf]

Neural Language Models with Distant Supervision to Identify Major Depressive Disorder from Clinical Notes

Authors: Bhavani Singh Agnikula Kshatriya, Nicolas A Nunez, Manuel Gardea- Resendez, Euijung Ryu, Brandon J Coombes, Sunyang Fu, Mark A Frye, Joanna M Biernacka, Yanshan Wang

Abstract: Major depressive disorder (MDD) is a prevalent psychiatric disorder that is associated with significant healthcare burden worldwide. Phenotyping of MDD can help early diagnosis and consequently may have significant advantages in patient management. In prior research MDD phenotypes have been extracted from structured Electronic Health Records (EHR) or using Electroencephalographic (EEG) data with t… ▽ More Major depressive disorder (MDD) is a prevalent psychiatric disorder that is associated with significant healthcare burden worldwide. Phenotyping of MDD can help early diagnosis and consequently may have significant advantages in patient management. In prior research MDD phenotypes have been extracted from structured Electronic Health Records (EHR) or using Electroencephalographic (EEG) data with traditional machine learning models to predict MDD phenotypes. However, MDD phenotypic information is also documented in free-text EHR data, such as clinical notes. While clinical notes may provide more accurate phenotyping information, natural language processing (NLP) algorithms must be developed to abstract such information. Recent advancements in NLP resulted in state-of-the-art neural language models, such as Bidirectional Encoder Representations for Transformers (BERT) model, which is a transformer-based model that can be pre-trained from a corpus of unsupervised text data and then fine-tuned on specific tasks. However, such neural language models have been underutilized in clinical NLP tasks due to the lack of large training datasets. In the literature, researchers have utilized the distant supervision paradigm to train machine learning models on clinical text classification tasks to mitigate the issue of lacking annotated training data. It is still unknown whether the paradigm is effective for neural language models. In this paper, we propose to leverage the neural language models in a distant supervision paradigm to identify MDD phenotypes from clinical notes. The experimental results indicate that our proposed approach is effective in identifying MDD phenotypes and that the Bio- Clinical BERT, a specific BERT model for clinical data, achieved the best performance in comparison with conventional machine learning models. △ Less

Submitted 19 April, 2021; originally announced April 2021.

arXiv:2102.07541 [pdf, other]

WGAN with an Infinitely Wide Generator Has No Spurious Stationary Points

Authors: Albert No, TaeHo Yoon, Sehyun Kwon, Ernest K. Ryu

Abstract: Generative adversarial networks (GAN) are a widely used class of deep generative models, but their minimax training dynamics are not understood very well. In this work, we show that GANs with a 2-layer infinite-width generator and a 2-layer finite-width discriminator trained with stochastic gradient ascent-descent have no spurious stationary points. We then show that when the width of the generato… ▽ More Generative adversarial networks (GAN) are a widely used class of deep generative models, but their minimax training dynamics are not understood very well. In this work, we show that GANs with a 2-layer infinite-width generator and a 2-layer finite-width discriminator trained with stochastic gradient ascent-descent have no spurious stationary points. We then show that when the width of the generator is finite but wide, there are no spurious stationary points within a ball whose radius becomes arbitrarily large (to cover the entire parameter space) as the width goes to infinity. △ Less

Submitted 9 June, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

Comments: Published at ICML 2021

arXiv:1906.12141 [pdf, other]

MGOS: A Library for Molecular Geometry and its Operating System

Authors: Deok-Soo Kima, Joonghyun Ryua, Youngsong Choa, Mokwon Leeb, Jehyun Cha, Chanyoung Song, Sangwha Kim, Roman A Laskowskid, Kokichi Sugihara, Jong Bhak, Seong Eon Ryu

Abstract: The geometry of atomic arrangement underpins the structural understanding of molecules in many fields. However, no general framework of mathematical/computational theory for the geometry of atomic arrangement exists. Here we present "Molecular Geometry (MG)" as a theoretical framework accompanied by "MG Operating System (MGOS)" which consists of callable functions implementing the MG theory. MG al… ▽ More The geometry of atomic arrangement underpins the structural understanding of molecules in many fields. However, no general framework of mathematical/computational theory for the geometry of atomic arrangement exists. Here we present "Molecular Geometry (MG)" as a theoretical framework accompanied by "MG Operating System (MGOS)" which consists of callable functions implementing the MG theory. MG allows researchers to model complicated molecular structure problems in terms of elementary yet standard notions of volume, area, etc. and MGOS frees them from the hard and tedious task of developing/implementing geometric algorithms so that they can focus more on their primary research issues. MG facilitates simpler modeling of molecular structure problems; MGOS functions can be conveniently embedded in application programs for the efficient and accurate solution of geometric queries involving atomic arrangements. The use of MGOS in problems involving spherical entities is akin to the use of math libraries in general purpose programming languages in science and engineering. △ Less

Submitted 28 June, 2019; originally announced June 2019.

arXiv:1905.10899 [pdf, other]

ODE Analysis of Stochastic Gradient Methods with Optimism and Anchoring for Minimax Problems

Authors: Ernest K. Ryu, Kun Yuan, Wotao Yin

Abstract: Despite remarkable empirical success, the training dynamics of generative adversarial networks (GAN), which involves solving a minimax game using stochastic gradients, is still poorly understood. In this work, we analyze last-iterate convergence of simultaneous gradient descent (simGD) and its variants under the assumption of convex-concavity, guided by a continuous-time analysis with differential… ▽ More Despite remarkable empirical success, the training dynamics of generative adversarial networks (GAN), which involves solving a minimax game using stochastic gradients, is still poorly understood. In this work, we analyze last-iterate convergence of simultaneous gradient descent (simGD) and its variants under the assumption of convex-concavity, guided by a continuous-time analysis with differential equations. First, we show that simGD, as is, converges with stochastic sub-gradients under strict convexity in the primal variable. Second, we generalize optimistic simGD to accommodate an optimism rate separate from the learning rate and show its convergence with full gradients. Finally, we present anchored simGD, a new method, and show convergence with stochastic subgradients. △ Less

Submitted 11 October, 2020; v1 submitted 26 May, 2019; originally announced May 2019.

arXiv:1905.05406 [pdf, other]

Plug-and-Play Methods Provably Converge with Properly Trained Denoisers

Authors: Ernest K. Ryu, Jialin Liu, Sicheng Wang, Xiaohan Chen, Zhangyang Wang, Wotao Yin

Abstract: Plug-and-play (PnP) is a non-convex framework that integrates modern denoising priors, such as BM3D or deep learning-based denoisers, into ADMM or other proximal algorithms. An advantage of PnP is that one can use pre-trained denoisers when there is not sufficient data for end-to-end training. Although PnP has been recently studied extensively with great empirical success, theoretical analysis add… ▽ More Plug-and-play (PnP) is a non-convex framework that integrates modern denoising priors, such as BM3D or deep learning-based denoisers, into ADMM or other proximal algorithms. An advantage of PnP is that one can use pre-trained denoisers when there is not sufficient data for end-to-end training. Although PnP has been recently studied extensively with great empirical success, theoretical analysis addressing even the most basic question of convergence has been insufficient. In this paper, we theoretically establish convergence of PnP-FBS and PnP-ADMM, without using diminishing stepsizes, under a certain Lipschitz condition on the denoisers. We then propose real spectral normalization, a technique for training deep learning-based denoisers to satisfy the proposed Lipschitz condition. Finally, we present experimental results validating the theory. △ Less

Submitted 14 May, 2019; originally announced May 2019.

Comments: Published in the International Conference on Machine Learning, 2019

arXiv:1511.05498 [pdf, ps, other]

Feasibility Study of Stochastic Streaming with 4K UHD Video Traces

Authors: Joongheon Kim, Eun-Seok Ryu

Abstract: This paper performs the feasibility study of stochastic video streaming algorithms with up-to-date 4K ultra-high-definition (UHD) video traces. In previous work, various stochastic video streaming algorithms were proposed which maximize time-average video streaming quality subject to queue stability based on the information of queue-backlog length. The performance improvements with the stochastic… ▽ More This paper performs the feasibility study of stochastic video streaming algorithms with up-to-date 4K ultra-high-definition (UHD) video traces. In previous work, various stochastic video streaming algorithms were proposed which maximize time-average video streaming quality subject to queue stability based on the information of queue-backlog length. The performance improvements with the stochastic video streaming algorithms were verified with traditional MPEG test sequences; but there is no study how much the proposed stochastic algorithm is better when we consider up-to-date 4K UHD video traces. Therefore, this paper evaluates the stochastic streaming algorithms with 4K UHD video traces; and verifies that the stochastic algorithms perform better than queue-independent algorithms, as desired. △ Less

Submitted 17 November, 2015; originally announced November 2015.

Comments: Presented at the International Conference on ICT Convergence (ICTC), Jeju Island, Korea, 28 - 30 October 2015

arXiv:1208.2239 [pdf, other]

Stochastic Kronecker Graph on Vertex-Centric BSP

Authors: Ernest Ryu, Sean Choi

Abstract: Recently Stochastic Kronecker Graph (SKG), a network generation model, and vertex-centric BSP, a graph processing framework like Pregel, have attracted much attention in the network analysis community. Unfortunately the two are not very well-suited for each other and thus an implementation of SKG on vertex-centric BSP must either be done serially or in an unnatural manner. In this paper, we pres… ▽ More Recently Stochastic Kronecker Graph (SKG), a network generation model, and vertex-centric BSP, a graph processing framework like Pregel, have attracted much attention in the network analysis community. Unfortunately the two are not very well-suited for each other and thus an implementation of SKG on vertex-centric BSP must either be done serially or in an unnatural manner. In this paper, we present a new network generation model, which we call Poisson Stochastic Kronecker Graph (PSKG), that generate edges according to the Poisson distribution. The advantage of PSKG is that it is easily parallelizable on vertex-centric BSP, requires no communication between computational nodes, and yet retains all the desired properties of SKG. △ Less

Submitted 10 August, 2012; originally announced August 2012.

Showing 1–24 of 24 results for author: Ryu, E