subscribe to arXiv mailings

Purrfect Pitch: Exploring Musical Interval Learning through Multisensory Interfaces

Authors: Sam Chin, Cathy Mengying Fang, Nikhil Singh, Ibrahim Ibrahim, Joe Paradiso, Pattie Maes

Abstract: We introduce Purrfect Pitch, a system consisting of a wearable haptic device and a custom-designed learning interface for musical ear training. We focus on the ability to identify musical intervals (sequences of two musical notes), which is a perceptually ambiguous task that usually requires strenuous rote training. With our system, the user would hear a sequence of two tones while simultaneously… ▽ More We introduce Purrfect Pitch, a system consisting of a wearable haptic device and a custom-designed learning interface for musical ear training. We focus on the ability to identify musical intervals (sequences of two musical notes), which is a perceptually ambiguous task that usually requires strenuous rote training. With our system, the user would hear a sequence of two tones while simultaneously receiving two corresponding vibrotactile stimuli on the back. Providing haptic feedback along the back makes the auditory distance between the two tones more salient, and the back-worn design is comfortable and unobtrusive. During training, the user receives multi-sensory feedback from our system and inputs their guessed interval value on our web-based learning interface. They see a green (otherwise red) screen for a correct guess with the correct interval value. Our study with 18 participants shows that our system enables novice learners to identify intervals more accurately and consistently than those who only received audio feedback, even after the haptic feedback is removed. We also share further insights on how to design a multisensory learning system. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2403.00026 [pdf, other]

Learning to Deliver: a Foundation Model for the Montreal Capacitated Vehicle Routing Problem

Authors: Samuel J. K. Chin, Matthias Winkenbach, Akash Srivastava

Abstract: In this paper, we present the Foundation Model for the Montreal Capacitated Vehicle Routing Problem (FM-MCVRP), a novel Deep Learning (DL) model that approximates high-quality solutions to a variant of the Capacitated Vehicle Routing Problem (CVRP) that characterizes many real-world applications. The so-called Montreal Capacitated Vehicle Routing Problem (MCVRP), first formally described by Bengio… ▽ More In this paper, we present the Foundation Model for the Montreal Capacitated Vehicle Routing Problem (FM-MCVRP), a novel Deep Learning (DL) model that approximates high-quality solutions to a variant of the Capacitated Vehicle Routing Problem (CVRP) that characterizes many real-world applications. The so-called Montreal Capacitated Vehicle Routing Problem (MCVRP), first formally described by Bengio et al. (2021), is defined on a fixed and finite graph, which is analogous to a city. Each MCVRP instance is essentially the sub-graph connecting a randomly sampled subset of the nodes in the fixed graph, which represent a set of potential addresses in a real-world delivery problem on a given day. Our work exploits this problem structure to frame the MCVRP as an analogous Natural Language Processing (NLP) task. Specifically, we leverage a Transformer architecture embedded in a Large Language Model (LLM) framework to train our model in a supervised manner on computationally inexpensive, sub-optimal MCVRP solutions obtained algorithmically. Through comprehensive computational experiments, we show that FM-MCVRP produces better MCVRP solutions than the training data and generalizes to larger sized problem instances not seen during training. Even when compared to near-optimal solutions from state-of-the-art heuristics, FM-MCVRP yields competitive results despite being trained on inferior data. For instance, for 400-customer problems, FM-MCVRP solutions on average fall within 2% of the benchmark. Our results further demonstrate that unlike prior works in the literature, FM-MCVRP is a unified model, which performs consistently and reliably on a range of problem instance sizes and parameter values such as the vehicle capacity. △ Less

Submitted 28 February, 2024; originally announced March 2024.

arXiv:2402.04020 [pdf]

doi 10.1177/03611981231170632

Examining Rail Transportation Route of Crude Oil in the United States Using Crowdsourced Social Media Data

Authors: Yuandong Liu, Majbah Uddin, Shih-Miao Chin, Ho-Ling Hwang, Jiaoli Chen

Abstract: Safety issues associated with transporting crude oil by rail have been a concern since the boom of US domestic shale oil production in 2012. During the last decade, over 300 crude oil by rail incidents have occurred in the US. Some of them have caused adverse consequences including fire and hazardous materials leakage. However, only limited information on the routes of crude-on-rail and their asso… ▽ More Safety issues associated with transporting crude oil by rail have been a concern since the boom of US domestic shale oil production in 2012. During the last decade, over 300 crude oil by rail incidents have occurred in the US. Some of them have caused adverse consequences including fire and hazardous materials leakage. However, only limited information on the routes of crude-on-rail and their associated risks is available to the public. To this end, this study proposes an unconventional way to reconstruct the crude-on-rail routes using geotagged photos harvested from the Flickr website. The proposed method links the geotagged photos of crude oil trains posted online with national railway networks to identify potential railway segments those crude oil trains were traveling on. A shortest path-based method was then applied to infer the complete crude-on-rail routes, by utilizing the confirmed railway segments as well as their movement direction information. Validation of the inferred routes was performed using a public map and official crude oil incident data. Results suggested that the inferred routes based on geotagged photos have high coverage, with approximately 96% of the documented crude oil incidents aligned with the reconstructed crude-on-rail network. The inferred crude oil train routes were found to pass through many metropolitan areas with dense populations, who are exposed to potential risk. This finding could improve situation awareness for policymakers and transportation planners. In addition, with the inferred routes, this study establishes a good foundation for future crude oil train risk analysis along the rail route. △ Less

Submitted 12 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Journal ref: Transportation Research Record, 2678(1), 218-228 (2024)

arXiv:2402.00654 [pdf]

doi 10.1016/j.eswa.2023.122478

Improving the accuracy of freight mode choice models: A case study using the 2017 CFS PUF data set and ensemble learning techniques

Authors: Diyi Liu, Hyeonsup Lim, Majbah Uddin, Yuandong Liu, Lee D. Han, Ho-ling Hwang, Shih-Miao Chin

Abstract: The US Census Bureau has collected two rounds of experimental data from the Commodity Flow Survey, providing shipment-level characteristics of nationwide commodity movements, published in 2012 (i.e., Public Use Microdata) and in 2017 (i.e., Public Use File). With this information, data-driven methods have become increasingly valuable for understanding detailed patterns in freight logistics. In thi… ▽ More The US Census Bureau has collected two rounds of experimental data from the Commodity Flow Survey, providing shipment-level characteristics of nationwide commodity movements, published in 2012 (i.e., Public Use Microdata) and in 2017 (i.e., Public Use File). With this information, data-driven methods have become increasingly valuable for understanding detailed patterns in freight logistics. In this study, we used the 2017 Commodity Flow Survey Public Use File data set to explore building a high-performance freight mode choice model, considering three main improvements: (1) constructing local models for each separate commodity/industry category; (2) extracting useful geographical features, particularly the derived distance of each freight mode between origin/destination zones; and (3) applying additional ensemble learning methods such as stacking or voting to combine results from local and unified models for improved performance. The proposed method achieved over 92% accuracy without incorporating external information, an over 19% increase compared to directly fitting Random Forests models over 10,000 samples. Furthermore, SHAP (Shapely Additive Explanations) values were computed to explain the outputs and major patterns obtained from the proposed model. The model framework could enhance the performance and interpretability of existing freight mode choice models. △ Less

Submitted 12 February, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

Journal ref: Expert Systems with Applications, 240, 122478 (2024)

arXiv:2211.02681 [pdf, ps, other]

Deep Distance Sensitivity Oracles

Authors: Davin Jeong, Allison Gunby-Mann, Sarel Cohen, Maximilian Katzmann, Chau Pham, Arnav Bhakta, Tobias Friedrich, Sang Chin

Abstract: One of the most fundamental graph problems is finding a shortest path from a source to a target node. While in its basic forms the problem has been studied extensively and efficient algorithms are known, it becomes significantly harder as soon as parts of the graph are susceptible to failure. Although one can recompute a shortest replacement path after every outage, this is rather inefficient both… ▽ More One of the most fundamental graph problems is finding a shortest path from a source to a target node. While in its basic forms the problem has been studied extensively and efficient algorithms are known, it becomes significantly harder as soon as parts of the graph are susceptible to failure. Although one can recompute a shortest replacement path after every outage, this is rather inefficient both in time and/or storage. One way to overcome this problem is to shift computational burden from the queries into a pre-processing step, where a data structure is computed that allows for fast querying of replacement paths, typically referred to as a Distance Sensitivity Oracle (DSO). While DSOs have been extensively studied in the theoretical computer science community, to the best of our knowledge this is the first work to construct DSOs using deep learning techniques. We show how to use deep learning to utilize a combinatorial structure of replacement paths. More specifically, we utilize the combinatorial structure of replacement paths as a concatenation of shortest paths and use deep learning to find the pivot nodes for stitching shortest paths into replacement paths. △ Less

Submitted 18 October, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2007.11495 by other authors

arXiv:2205.07795 [pdf, other]

Referring Expressions with Rational Speech Act Framework: A Probabilistic Approach

Authors: Hieu Le, Taufiq Daryanto, Fabian Zhafransyah, Derry Wijaya, Elizabeth Coppock, Sang Chin

Abstract: This paper focuses on a referring expression generation (REG) task in which the aim is to pick out an object in a complex visual scene. One common theoretical approach to this problem is to model the task as a two-agent cooperative scheme in which a `speaker' agent would generate the expression that best describes a targeted area and a `listener' agent would identify the target. Several recent REG… ▽ More This paper focuses on a referring expression generation (REG) task in which the aim is to pick out an object in a complex visual scene. One common theoretical approach to this problem is to model the task as a two-agent cooperative scheme in which a `speaker' agent would generate the expression that best describes a targeted area and a `listener' agent would identify the target. Several recent REG systems have used deep learning approaches to represent the speaker/listener agents. The Rational Speech Act framework (RSA), a Bayesian approach to pragmatics that can predict human linguistic behavior quite accurately, has been shown to generate high quality and explainable expressions on toy datasets involving simple visual scenes. Its application to large scale problems, however, remains largely unexplored. This paper applies a combination of the probabilistic RSA framework and deep learning approaches to larger datasets involving complex visual scenes in a multi-step process with the aim of generating better-explained expressions. We carry out experiments on the RefCOCO and RefCOCO+ datasets and compare our approach with other end-to-end deep learning approaches as well as a variation of RSA to highlight our key contribution. Experimental results show that while achieving lower accuracy than SOTA deep learning methods, our approach outperforms similar RSA approach in human comprehension and has an advantage over end-to-end deep learning under limited data scenario. Lastly, we provide a detailed analysis on the expression generation process with concrete examples, thus providing a systematic view on error types and deficiencies in the generation process and identifying possible areas for future improvements. △ Less

Submitted 16 May, 2022; originally announced May 2022.

arXiv:2202.08490 [pdf]

Dynamic Object Comprehension: A Framework For Evaluating Artificial Visual Perception

Authors: Scott Y. L. Chin, Bradley R. Quinton

Abstract: Augmented and Mixed Reality are emerging as likely successors to the mobile internet. However, many technical challenges remain. One of the key requirements of these systems is the ability to create a continuity between physical and virtual worlds, with the user's visual perception as the primary interface medium. Building this continuity requires the system to develop a visual understanding of th… ▽ More Augmented and Mixed Reality are emerging as likely successors to the mobile internet. However, many technical challenges remain. One of the key requirements of these systems is the ability to create a continuity between physical and virtual worlds, with the user's visual perception as the primary interface medium. Building this continuity requires the system to develop a visual understanding of the physical world. While there has been significant recent progress in computer vision and AI techniques such as image classification and object detection, success in these areas has not yet led to the visual perception required for these critical MR and AR applications. A significant issue is that current evaluation criteria are insufficient for these applications. To motivate and evaluate progress in this emerging area, there is a need for new metrics. In this paper we outline limitations of current evaluation criteria and propose new criteria. △ Less

Submitted 17 February, 2022; originally announced February 2022.

Comments: Submitted to IEEE International Conference in Image Processing 2022

arXiv:2009.09632 [pdf]

Detecting Sound Events Using Convolutional Macaron Net With Pseudo Strong Labels

Authors: Teck Kai Chan, Cheng Siong Chin

Abstract: In this paper, we propose addressing the lack of strongly labeled data by using pseudo strongly labeled data approximated using Convolutive Nonnegative Matrix Factorization. Using this set of data, we then train a novel architecture called the Convolutional Macaron Net (CMN), which combines Convolutional Neural Network (CNN) with MN, in a semi-supervised manner. Instead of training only a single m… ▽ More In this paper, we propose addressing the lack of strongly labeled data by using pseudo strongly labeled data approximated using Convolutive Nonnegative Matrix Factorization. Using this set of data, we then train a novel architecture called the Convolutional Macaron Net (CMN), which combines Convolutional Neural Network (CNN) with MN, in a semi-supervised manner. Instead of training only a single model or using the Mean-teacher approach, we train two different CMNs synchronously using a curriculum consistency cost and a curriculum interpolated consistency cost. In the inference stage, one of the models will provide the frame-level prediction while the other model will provide the clip-level prediction. Our system outperforms the baseline system of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 Challenge Task 4 by a margin of over 10% based on our proposed framework. By comparing with the top submission of the DCASE 2019 challenge, our system accuracy is also higher by 1.8%. On the other hand, as compared to the top submission of DCASE 2020, our accuracy is also marginally higher by 0.3%, even with fewer Transformer encoding layers. Our system remains robust on unseen YouTube evaluation dataset and has a winning margin of 0.6% and 6.3% against the top submission of DCASE 2019 and the baseline system. △ Less

Submitted 2 August, 2021; v1 submitted 21 September, 2020; originally announced September 2020.

Comments: Updated

arXiv:2002.07065 [pdf]

Acoustic Scene Classification Using Bilinear Pooling on Time-liked and Frequency-liked Convolution Neural Network

Authors: Xing Yong Kek, Cheng Siong Chin, Ye Li

Abstract: The current methodology in tackling Acoustic Scene Classification (ASC) task can be described in two steps, preprocessing of the audio waveform into log-mel spectrogram and then using it as the input representation for Convolutional Neural Network (CNN). This paradigm shift occurs after DCASE 2016 where this framework model achieves the state-of-the-art result in ASC tasks on the (ESC-50) dataset… ▽ More The current methodology in tackling Acoustic Scene Classification (ASC) task can be described in two steps, preprocessing of the audio waveform into log-mel spectrogram and then using it as the input representation for Convolutional Neural Network (CNN). This paradigm shift occurs after DCASE 2016 where this framework model achieves the state-of-the-art result in ASC tasks on the (ESC-50) dataset and achieved an accuracy of 64.5%, which constitute to 20.5% improvement over the baseline model, and DCASE 2016 dataset with an accuracy of 90.0% (development) and 86.2% (evaluation), which constitute a 6.4% and 9% improvements with respect to the baseline system. In this paper, we explored the use of harmonic and percussive source separation (HPSS) to split the audio into harmonic audio and percussive audio, which has received popularity in the field of music information retrieval (MIR). Although works have been done in using HPSS as input representation for CNN model in ASC task, this paper further investigate the possibility on leveraging the separated harmonic component and percussive component by curating 2 CNNs which tries to understand harmonic audio and percussive audio in their natural form, one specialized in extracting deep features in time biased domain and another specialized in extracting deep features in frequency biased domain, respectively. The deep features extracted from these 2 CNNs will then be combined using bilinear pooling. Hence, presenting a two-stream time and frequency CNN architecture approach in classifying acoustic scene. The model is being evaluated on DCASE 2019 sub task 1a dataset and scored an average of 65% on development dataset, Kaggle Leadership Private and Public board. △ Less

Submitted 13 February, 2020; originally announced February 2020.

Comments: inclusion in conference proceedings 2019 IEEE Symposium Series on Computational Intelligence (IEEE SSCI 2019), Xiamen

arXiv:2001.11367 [pdf, other]

Application of Seq2Seq Models on Code Correction

Authors: Shan Huang, Xiao Zhou, Sang Chin

Abstract: We apply various seq2seq models on programming language correction tasks on Juliet Test Suite for C/C++ and Java of Software Assurance Reference Datasets(SARD), and achieve 75\%(for C/C++) and 56\%(for Java) repair rates on these tasks. We introduce Pyramid Encoder in these seq2seq models, which largely increases the computational efficiency and memory efficiency, while remain similar repair rate… ▽ More We apply various seq2seq models on programming language correction tasks on Juliet Test Suite for C/C++ and Java of Software Assurance Reference Datasets(SARD), and achieve 75\%(for C/C++) and 56\%(for Java) repair rates on these tasks. We introduce Pyramid Encoder in these seq2seq models, which largely increases the computational efficiency and memory efficiency, while remain similar repair rate to their non-pyramid counterparts. We successfully carry out error type classification task on ITC benchmark examples (with only 685 code instances) using transfer learning with models pre-trained on Juliet Test Suite, pointing out a novel way of processing small programing language datasets. △ Less

Submitted 4 August, 2020; v1 submitted 28 January, 2020; originally announced January 2020.

arXiv:2001.07874 [pdf]

Non-Negative Matrix Factorization-Convolutional Neural Network (NMF-CNN) For Sound Event Detection

Authors: Teck Kai Chan, Cheng Siong Chin, Ye Li

Abstract: The main scientific question of this year DCASE challenge, Task 4 - Sound Event Detection in Domestic Environments, is to investigate the types of data (strongly labeled synthetic data, weakly labeled data, unlabeled in domain data) required to achieve the best performing system. In this paper, we proposed a deep learning model that integrates Non-Negative Matrix Factorization (NMF) with Convoluti… ▽ More The main scientific question of this year DCASE challenge, Task 4 - Sound Event Detection in Domestic Environments, is to investigate the types of data (strongly labeled synthetic data, weakly labeled data, unlabeled in domain data) required to achieve the best performing system. In this paper, we proposed a deep learning model that integrates Non-Negative Matrix Factorization (NMF) with Convolutional Neural Network (CNN). The key idea of such integration is to use NMF to provide an approximate strong label to the weakly labeled data. Such integration was able to achieve a higher event-based F1-score as compared to the baseline system (Evaluation Dataset: 30.39% vs. 23.7%, Validation Dataset: 31% vs. 25.8%). By comparing the validation results with other participants, the proposed system was ranked 8th among 19 teams (inclusive of the baseline system) in this year Task 4 challenge. △ Less

Submitted 21 January, 2020; originally announced January 2020.

Comments: 5 pages, 1 figure, 2 tables

arXiv:1906.01026 [pdf, other]

NodeDrop: A Condition for Reducing Network Size without Effect on Output

Authors: Louis Jensen, Jacob Harer, Sang Chin

Abstract: Determining an appropriate number of features for each layer in a neural network is an important and difficult task. This task is especially important in applications on systems with limited memory or processing power. Many current approaches to reduce network size either utilize iterative procedures, which can extend training time significantly, or require very careful tuning of algorithm paramet… ▽ More Determining an appropriate number of features for each layer in a neural network is an important and difficult task. This task is especially important in applications on systems with limited memory or processing power. Many current approaches to reduce network size either utilize iterative procedures, which can extend training time significantly, or require very careful tuning of algorithm parameters to achieve reasonable results. In this paper we propose NodeDrop, a new method for eliminating features in a network. With NodeDrop, we define a condition to identify and guarantee which nodes carry no information, and then use regularization to encourage nodes to meet this condition. We find that NodeDrop drastically reduces the number of features in a network while maintaining high performance, reducing the number of parameters by a factor of 114x for a VGG like network on CIFAR10 without a drop in accuracy. △ Less

Submitted 12 June, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

arXiv:1902.02434 [pdf, other]

A Scale Invariant Flatness Measure for Deep Network Minima

Authors: Akshay Rangamani, Nam H. Nguyen, Abhishek Kumar, Dzung Phan, Sang H. Chin, Trac D. Tran

Abstract: It has been empirically observed that the flatness of minima obtained from training deep networks seems to correlate with better generalization. However, for deep networks with positively homogeneous activations, most measures of sharpness/flatness are not invariant to rescaling of the network parameters, corresponding to the same function. This means that the measure of flatness/sharpness can be… ▽ More It has been empirically observed that the flatness of minima obtained from training deep networks seems to correlate with better generalization. However, for deep networks with positively homogeneous activations, most measures of sharpness/flatness are not invariant to rescaling of the network parameters, corresponding to the same function. This means that the measure of flatness/sharpness can be made as small or as large as possible through rescaling, rendering the quantitative measures meaningless. In this paper we show that for deep networks with positively homogenous activations, these rescalings constitute equivalence relations, and that these equivalence relations induce a quotient manifold structure in the parameter space. Using this manifold structure and an appropriate metric, we propose a Hessian-based measure for flatness that is invariant to rescaling. We use this new measure to confirm the proposition that Large-Batch SGD minima are indeed sharper than Small-Batch SGD minima. △ Less

Submitted 6 February, 2019; originally announced February 2019.

arXiv:1812.08808 [pdf, other]

Reducing Sampling Ratios Improves Bagging in Sparse Regression

Authors: Luoluo Liu, Sang Peter Chin, Trac D. Tran

Abstract: Bagging, a powerful ensemble method from machine learning, improves the performance of unstable predictors. Although the power of Bagging has been shown mostly in classification problems, we demonstrate the success of employing Bagging in sparse regression over the baseline method (L1 minimization). The framework employs the generalized version of the original Bagging with various bootstrap ratios… ▽ More Bagging, a powerful ensemble method from machine learning, improves the performance of unstable predictors. Although the power of Bagging has been shown mostly in classification problems, we demonstrate the success of employing Bagging in sparse regression over the baseline method (L1 minimization). The framework employs the generalized version of the original Bagging with various bootstrap ratios. The performance limits associated with different choices of bootstrap sampling ratio L/m and number of estimates K is analyzed theoretically. Simulation shows that the proposed method yields state-of-the-art recovery performance, outperforming L1 minimization and Bolasso in the challenging case of low levels of measurements. A lower L/m ratio (60% - 90%) leads to better performance, especially with a small number of measurements. With the reduced sampling rate, SNR improves over the original Bagging by up to 24%. With a properly chosen sampling ratio, a reasonably small number of estimates K = 30 gives satisfying result, even though increasing K is discovered to always improve or at least maintain the performance. △ Less

Submitted 1 May, 2019; v1 submitted 20 December, 2018; originally announced December 2018.

Comments: arXiv admin note: substantial text overlap with arXiv:1810.03743

arXiv:1810.03743 [pdf, other]

JOBS: Joint-Sparse Optimization from Bootstrap Samples

Authors: Luoluo Liu, Sang Peter Chin, Trac D. Tran

Abstract: Classical signal recovery based on $\ell_1$ minimization solves the least squares problem with all available measurements via sparsity-promoting regularization. In practice, it is often the case that not all measurements are available or required for recovery. Measurements might be corrupted/missing or they arrive sequentially in streaming fashion. In this paper, we propose a global sparse recover… ▽ More Classical signal recovery based on $\ell_1$ minimization solves the least squares problem with all available measurements via sparsity-promoting regularization. In practice, it is often the case that not all measurements are available or required for recovery. Measurements might be corrupted/missing or they arrive sequentially in streaming fashion. In this paper, we propose a global sparse recovery strategy based on subsets of measurements, named JOBS, in which multiple measurements vectors are generated from the original pool of measurements via bootstrapping, and then a joint-sparse constraint is enforced to ensure support consistency among multiple predictors. The final estimate is obtained by averaging over the $K$ predictors. The performance limits associated with different choices of number of bootstrap samples $L$ and number of estimates $K$ is analyzed theoretically. Simulation results validate some of the theoretical analysis, and show that the proposed method yields state-of-the-art recovery performance, outperforming $\ell_1$ minimization and a few other existing bootstrap-based techniques in the challenging case of low levels of measurements and is preferable over other bagging-based methods in the streaming setting since it performs better with small $K$ and $L$ for data-sets with large sizes. △ Less

Submitted 10 December, 2018; v1 submitted 8 October, 2018; originally announced October 2018.

arXiv:1803.00232 [pdf, other]

DRUNET: A Dilated-Residual U-Net Deep Learning Network to Digitally Stain Optic Nerve Head Tissues in Optical Coherence Tomography Images

Authors: Sripad Krishna Devalla, Prajwal K. Renukanand, Bharathwaj K. Sreedhar, Shamira Perera, Jean-Martial Mari, Khai Sing Chin, Tin A. Tun, Nicholas G. Strouthidis, Tin Aung, Alexandre H. Thiery, Michael J. A. Girard

Abstract: Given that the neural and connective tissues of the optic nerve head (ONH) exhibit complex morphological changes with the development and progression of glaucoma, their simultaneous isolation from optical coherence tomography (OCT) images may be of great interest for the clinical diagnosis and management of this pathology. A deep learning algorithm was designed and trained to digitally stain (i.e.… ▽ More Given that the neural and connective tissues of the optic nerve head (ONH) exhibit complex morphological changes with the development and progression of glaucoma, their simultaneous isolation from optical coherence tomography (OCT) images may be of great interest for the clinical diagnosis and management of this pathology. A deep learning algorithm was designed and trained to digitally stain (i.e. highlight) 6 ONH tissue layers by capturing both the local (tissue texture) and contextual information (spatial arrangement of tissues). The overall dice coefficient (mean of all tissues) was $0.91 \pm 0.05$ when assessed against manual segmentations performed by an expert observer. We offer here a robust segmentation framework that could be extended for the automated parametric study of the ONH tissues. △ Less

Submitted 1 March, 2018; originally announced March 2018.

arXiv:1708.03735 [pdf, other]

Sparse Coding and Autoencoders

Authors: Akshay Rangamani, Anirbit Mukherjee, Amitabh Basu, Tejaswini Ganapathy, Ashish Arora, Sang Chin, Trac D. Tran

Abstract: In "Dictionary Learning" one tries to recover incoherent matrices $A^* \in \mathbb{R}^{n \times h}$ (typically overcomplete and whose columns are assumed to be normalized) and sparse vectors $x^* \in \mathbb{R}^h$ with a small support of size $h^p$ for some $0 <p < 1$ while having access to observations $y \in \mathbb{R}^n$ where $y = A^*x^*$. In this work we undertake a rigorous analysis of wheth… ▽ More In "Dictionary Learning" one tries to recover incoherent matrices $A^* \in \mathbb{R}^{n \times h}$ (typically overcomplete and whose columns are assumed to be normalized) and sparse vectors $x^* \in \mathbb{R}^h$ with a small support of size $h^p$ for some $0 <p < 1$ while having access to observations $y \in \mathbb{R}^n$ where $y = A^*x^*$. In this work we undertake a rigorous analysis of whether gradient descent on the squared loss of an autoencoder can solve the dictionary learning problem. The "Autoencoder" architecture we consider is a $\mathbb{R}^n \rightarrow \mathbb{R}^n$ mapping with a single ReLU activation layer of size $h$. Under very mild distributional assumptions on $x^*$, we prove that the norm of the expected gradient of the standard squared loss function is asymptotically (in sparse code dimension) negligible for all points in a small neighborhood of $A^*$. This is supported with experimental evidence using synthetic data. We also conduct experiments to suggest that $A^*$ is a local minimum. Along the way we prove that a layer of ReLU gates can be set up to automatically recover the support of the sparse codes. This property holds independent of the loss function. We believe that it could be of independent interest. △ Less

Submitted 20 October, 2017; v1 submitted 11 August, 2017; originally announced August 2017.

Comments: In this new version of the paper with a small change in the distributional assumptions we are actually able to prove the asymptotic criticality of a neighbourhood of the ground truth dictionary for even just the standard squared loss of the ReLU autoencoder (unlike the regularized loss in the older version)

arXiv:1705.05998 [pdf, other]

Automatic Vertebra Labeling in Large-Scale 3D CT using Deep Image-to-Image Network with Message Passing and Sparsity Regularization

Authors: Dong Yang, Tao Xiong, Daguang Xu, Qiangui Huang, David Liu, S. Kevin Zhou, Zhoubing Xu, JinHyeong Park, Mingqing Chen, Trac D. Tran, Sang Peter Chin, Dimitris Metaxas, Dorin Comaniciu

Abstract: Automatic localization and labeling of vertebra in 3D medical images plays an important role in many clinical tasks, including pathological diagnosis, surgical planning and postoperative assessment. However, the unusual conditions of pathological cases, such as the abnormal spine curvature, bright visual imaging artifacts caused by metal implants, and the limited field of view, increase the diffic… ▽ More Automatic localization and labeling of vertebra in 3D medical images plays an important role in many clinical tasks, including pathological diagnosis, surgical planning and postoperative assessment. However, the unusual conditions of pathological cases, such as the abnormal spine curvature, bright visual imaging artifacts caused by metal implants, and the limited field of view, increase the difficulties of accurate localization. In this paper, we propose an automatic and fast algorithm to localize and label the vertebra centroids in 3D CT volumes. First, we deploy a deep image-to-image network (DI2IN) to initialize vertebra locations, employing the convolutional encoder-decoder architecture together with multi-level feature concatenation and deep supervision. Next, the centroid probability maps from DI2IN are iteratively evolved with the message passing schemes based on the mutual relation of vertebra centroids. Finally, the localization results are refined with sparsity regularization. The proposed method is evaluated on a public dataset of 302 spine CT volumes with various pathologies. Our method outperforms other state-of-the-art methods in terms of localization accuracy. The run time is around 3 seconds on average per case. To further boost the performance, we retrain the DI2IN on additional 1000+ 3D CT volumes from different patients. To the best of our knowledge, this is the first time more than 1000 3D CT volumes with expert annotation are adopted in experiments for the anatomic landmark detection tasks. Our experimental results show that training with such a large dataset significantly improves the performance and the overall identification rate, for the first time by our knowledge, reaches 90 %. △ Less

Submitted 16 May, 2017; originally announced May 2017.

arXiv:1702.06061 [pdf, ps, other]

doi 10.1088/1751-8121/aa908d

Generalized Coherence Concurrence and Path distinguishability

Authors: Seungbeom Chin

Abstract: We propose a new family of coherence monotones, named the \emph{generalized coherence concurrence} (or coherence $k$-concurrence), which is an analogous concept to the generalized entanglement concurrence. The coherence $k$-concurrence of a state is nonzero if and only if the coherence number (a recently introduced discrete coherence monotone) of the state is not smaller than $k$, and a state can… ▽ More We propose a new family of coherence monotones, named the \emph{generalized coherence concurrence} (or coherence $k$-concurrence), which is an analogous concept to the generalized entanglement concurrence. The coherence $k$-concurrence of a state is nonzero if and only if the coherence number (a recently introduced discrete coherence monotone) of the state is not smaller than $k$, and a state can be converted to a state with nonzero entanglement $k$-concurrence via incoherent operations if and only if the state has nonzero coherence $k$-concurrence. We apply the coherence concurrence family to the problem of wave-particle duality in multi-path interference phenomena. We obtain a sharper equation for path distinguishability (which witness the duality) than the known value and show that the amount of each concurrence for the quanton state determines the number of slits which are identified unambiguously. △ Less

Submitted 18 June, 2017; v1 submitted 20 February, 2017; originally announced February 2017.

Comments: 9 pages, Revtex 4.1; (v3) new analysis on the application of generalized coherence concurrence to multi-path interference problems, former title "Generalized Coherence Concurrence and Coherence Number", (v4) minor error corrections

Journal ref: J. Phys. A: Math. Theor. 50 475302 (2017)

arXiv:1702.03219 [pdf, other]

doi 10.1103/PhysRevA.96.042336

Coherence number as a discrete quantum resource

Authors: Seungbeom Chin

Abstract: We introduce a new discrete coherence monotone named the \emph{coherence number}, which is a generalization of the coherence rank to mixed states. After defining the coherence number in a similar manner to the Schmidt number in entanglement theory, we present a necessary and sufficient condition of the coherence number for a coherent state to be converted to an entangled state of nonzero $k$-concu… ▽ More We introduce a new discrete coherence monotone named the \emph{coherence number}, which is a generalization of the coherence rank to mixed states. After defining the coherence number in a similar manner to the Schmidt number in entanglement theory, we present a necessary and sufficient condition of the coherence number for a coherent state to be converted to an entangled state of nonzero $k$-concurrence (a member of the generalized concurrence family with $2\le k \le d$). It also turns out that the coherence number is a useful measure to understand the process of Grover search algorithm of $N$ items. We show that the coherence number remains $N$ and falls abruptly when the success probability of the searching process becomes maximal. This phenomenon motivates us to analyze the depletion pattern of $C_c^{(N)}$ (the last member of the generalized coherence concurrence, nonzero when the coherence number is $N$), which turns out to be an optimal resource for the process since it is completely consumed to finish the searching task. △ Less

Submitted 8 June, 2017; v1 submitted 10 February, 2017; originally announced February 2017.

Comments: 10 pages, Revtex 4.1; (v3) new analysis on the aspect of coherence number as a resource for the Grover algorithm, former title "Conversion of Coherence into Entanglement $ k $-concurrence" ; (v4) minor corrections and clarifications, 1 figure added

Journal ref: Phys. Rev. A 96, 042336 (2017)

arXiv:1411.5005 [pdf, ps, other]

Detection of Early-Stage Enterprise Infection by Mining Large-Scale Log Data

Authors: Alina Oprea, Zhou Li, Ting-Fang Yen, Sang Chin, Sumayah Alrwais

Abstract: Recent years have seen the rise of more sophisticated attacks including advanced persistent threats (APTs) which pose severe risks to organizations and governments by targeting confidential proprietary information. Additionally, new malware strains are appearing at a higher rate than ever before. Since many of these malware are designed to evade existing security products, traditional defenses dep… ▽ More Recent years have seen the rise of more sophisticated attacks including advanced persistent threats (APTs) which pose severe risks to organizations and governments by targeting confidential proprietary information. Additionally, new malware strains are appearing at a higher rate than ever before. Since many of these malware are designed to evade existing security products, traditional defenses deployed by most enterprises today, e.g., anti-virus, firewalls, intrusion detection systems, often fail at detecting infections at an early stage. We address the problem of detecting early-stage infection in an enterprise setting by proposing a new framework based on belief propagation inspired from graph theory. Belief propagation can be used either with "seeds" of compromised hosts or malicious domains (provided by the enterprise security operation center -- SOC) or without any seeds. In the latter case we develop a detector of C&C communication particularly tailored to enterprises which can detect a stealthy compromise of only a single host communicating with the C&C server. We demonstrate that our techniques perform well on detecting enterprise infections. We achieve high accuracy with low false detection and false negative rates on two months of anonymized DNS logs released by Los Alamos National Lab (LANL), which include APT infection attacks simulated by LANL domain experts. We also apply our algorithms to 38TB of real-world web proxy logs collected at the border of a large enterprise. Through careful manual investigation in collaboration with the enterprise SOC, we show that our techniques identified hundreds of malicious domains overlooked by state-of-the-art security products. △ Less

Submitted 24 November, 2014; v1 submitted 18 November, 2014; originally announced November 2014.

arXiv:1410.8068 [pdf]

Health Information Search Behavior on the Web: A Pilot Study

Authors: Shanu Sushmita, Si-Chi Chin

Abstract: Searching health information on web has become an integral part of today's world, and many people turn to the Web for healthcare information and healthcare assessment. Our pilot study investigates users' preferences for the type of search results (image, news, video, etc.), and investigates users' ability to accurately interpret online health information for the purpose of self diagnosis. The prel… ▽ More Searching health information on web has become an integral part of today's world, and many people turn to the Web for healthcare information and healthcare assessment. Our pilot study investigates users' preferences for the type of search results (image, news, video, etc.), and investigates users' ability to accurately interpret online health information for the purpose of self diagnosis. The preliminary results reveal that blog and news articles are most sought by users when searching online information and there exist challenges in the use of online health information for self-diagnosis. △ Less

Submitted 27 October, 2014; originally announced October 2014.

arXiv:1401.7043 [pdf, ps, other]

Randomized Minmax Regret for Combinatorial Optimization Under Uncertainty

Authors: Andrew Mastin, Patrick Jaillet, Sang Chin

Abstract: The minmax regret problem for combinatorial optimization under uncertainty can be viewed as a zero-sum game played between an optimizing player and an adversary, where the optimizing player selects a solution and the adversary selects costs with the intention of maximizing the regret of the player. The existing minmax regret model considers only deterministic solutions/strategies, and minmax regre… ▽ More The minmax regret problem for combinatorial optimization under uncertainty can be viewed as a zero-sum game played between an optimizing player and an adversary, where the optimizing player selects a solution and the adversary selects costs with the intention of maximizing the regret of the player. The existing minmax regret model considers only deterministic solutions/strategies, and minmax regret versions of most polynomial solvable problems are NP-hard. In this paper, we consider a randomized model where the optimizing player selects a probability distribution (corresponding to a mixed strategy) over solutions and the adversary selects costs with knowledge of the player's distribution, but not its realization. We show that under this randomized model, the minmax regret version of any polynomial solvable combinatorial problem becomes polynomial solvable. This holds true for both the interval and discrete scenario representations of uncertainty. Using the randomized model, we show new proofs of existing approximation algorithms for the deterministic model based on primal-dual approaches. Finally, we prove that minmax regret problems are NP-hard under general convex uncertainty. △ Less

Submitted 19 September, 2014; v1 submitted 27 January, 2014; originally announced January 2014.

arXiv:1306.2094 [pdf]

Predicting Risk-of-Readmission for Congestive Heart Failure Patients: A Multi-Layer Approach

Authors: Kiyana Zolfaghar, Nele Verbiest, Jayshree Agarwal, Naren Meadem, Si-Chi Chin, Senjuti Basu Roy, Ankur Teredesai, David Hazel, Paul Amoroso, Lester Reed

Abstract: Mitigating risk-of-readmission of Congestive Heart Failure (CHF) patients within 30 days of discharge is important because such readmissions are not only expensive but also critical indicator of provider care and quality of treatment. Accurately predicting the risk-of-readmission may allow hospitals to identify high-risk patients and eventually improve quality of care by identifying factors that c… ▽ More Mitigating risk-of-readmission of Congestive Heart Failure (CHF) patients within 30 days of discharge is important because such readmissions are not only expensive but also critical indicator of provider care and quality of treatment. Accurately predicting the risk-of-readmission may allow hospitals to identify high-risk patients and eventually improve quality of care by identifying factors that contribute to such readmissions in many scenarios. In this paper, we investigate the problem of predicting risk-of-readmission as a supervised learning problem, using a multi-layer classification approach. Earlier contributions inadequately attempted to assess a risk value for 30 day readmission by building a direct predictive model as opposed to our approach. We first split the problem into various stages, (a) at risk in general (b) risk within 60 days (c) risk within 30 days, and then build suitable classifiers for each stage, thereby increasing the ability to accurately predict the risk using multiple layers of decision. The advantage of our approach is that we can use different classification models for the subtasks that are more suited for the respective problems. Moreover, each of the subtasks can be solved using different features and training data leading to a highly confident diagnosis or risk compared to a one-shot single layer approach. An experimental evaluation on actual hospital patient record data from Multicare Health Systems shows that our model is significantly better at predicting risk-of-readmission of CHF patients within 30 days after discharge compared to prior attempts. △ Less

Submitted 9 June, 2013; originally announced June 2013.

Showing 1–24 of 24 results for author: Chin, S