subscribe to arXiv mailings

arXiv:2406.13578 [pdf, other]

Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration

Authors: Han-Cheng Yu, Yu-An Shih, Kin-Man Law, Kai-Yu Hsieh, Yu-Chen Cheng, Hsin-Chih Ho, Zih-An Lin, Wen-Chuan Hsu, Yao-Chung Fan

Abstract: In this paper, we tackle the task of distractor generation (DG) for multiple-choice questions. Our study introduces two key designs. First, we propose \textit{retrieval augmented pretraining}, which involves refining the language model pretraining to align it more closely with the downstream task of DG. Second, we explore the integration of knowledge graphs to enhance the performance of DG. Throug… ▽ More In this paper, we tackle the task of distractor generation (DG) for multiple-choice questions. Our study introduces two key designs. First, we propose \textit{retrieval augmented pretraining}, which involves refining the language model pretraining to align it more closely with the downstream task of DG. Second, we explore the integration of knowledge graphs to enhance the performance of DG. Through experiments with benchmarking datasets, we show that our models significantly outperform the state-of-the-art results. Our best-performing model advances the F1@3 score from 14.80 to 16.47 in MCQ dataset and from 15.92 to 16.50 in Sciq dataset. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: Findings at ACL 2024

arXiv:2402.06173 [pdf, other]

SMC Is All You Need: Parallel Strong Scaling

Authors: Xinzhu Liang, Joseph M. Lukens, Sanjaya Lohani, Brian T. Kirby, Thomas A. Searles, Kody J. H. Law

Abstract: The Bayesian posterior distribution can only be evaluated up-to a constant of proportionality, which makes simulation and consistent estimation challenging. Classical consistent Bayesian methods such as sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC) have unbounded time complexity requirements. We develop a fully parallel sequential Monte Carlo (pSMC) method which provably deliver… ▽ More The Bayesian posterior distribution can only be evaluated up-to a constant of proportionality, which makes simulation and consistent estimation challenging. Classical consistent Bayesian methods such as sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC) have unbounded time complexity requirements. We develop a fully parallel sequential Monte Carlo (pSMC) method which provably delivers parallel strong scaling, i.e. the time complexity (and per-node memory) remains bounded if the number of asynchronous processes is allowed to grow. More precisely, the pSMC has a theoretical convergence rate of Mean Square Error (MSE)$ = O(1/NP)$, where $N$ denotes the number of communicating samples in each processor and $P$ denotes the number of processors. In particular, for suitably-large problem-dependent $N$, as $P \rightarrow \infty$ the method converges to infinitesimal accuracy MSE$=O(\varepsilon^2)$ with a fixed finite time-complexity Cost$=O(1)$ and with no efficiency leakage, i.e. computational complexity Cost$=O(\varepsilon^{-2})$. A number of Bayesian inference problems are taken into consideration to compare the pSMC and MCMC methods. △ Less

Submitted 2 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: 23 pages, 17 figures

arXiv:2402.02111 [pdf, other]

Accelerating Look-ahead in Bayesian Optimization: Multilevel Monte Carlo is All you Need

Authors: Shangda Yang, Vitaly Zankin, Maximilian Balandat, Stefan Scherer, Kevin Carlberg, Neil Walton, Kody J. H. Law

Abstract: We leverage multilevel Monte Carlo (MLMC) to improve the performance of multi-step look-ahead Bayesian optimization (BO) methods that involve nested expectations and maximizations. Often these expectations must be computed by Monte Carlo (MC). The complexity rate of naive MC degrades for nested operations, whereas MLMC is capable of achieving the canonical MC convergence rate for this type of prob… ▽ More We leverage multilevel Monte Carlo (MLMC) to improve the performance of multi-step look-ahead Bayesian optimization (BO) methods that involve nested expectations and maximizations. Often these expectations must be computed by Monte Carlo (MC). The complexity rate of naive MC degrades for nested operations, whereas MLMC is capable of achieving the canonical MC convergence rate for this type of problem, independently of dimension and without any smoothness assumptions. Our theoretical study focuses on the approximation improvements for twoand three-step look-ahead acquisition functions, but, as we discuss, the approach is generalizable in various ways, including beyond the context of BO. Our findings are verified numerically and the benefits of MLMC for BO are illustrated on several benchmark examples. Code is available at https://github.com/Shangda-Yang/MLMCBO . △ Less

Submitted 25 June, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

Comments: Preprint ICML 2024

arXiv:2302.02506 [pdf]

Generating Dispatching Rules for the Interrupting Swap-Allowed Blocking Job Shop Problem Using Graph Neural Network and Reinforcement Learning

Authors: Vivian W. H. Wong, Sang Hun Kim, Junyoung Park, Jinkyoo Park, Kincho H. Law

Abstract: The interrupting swap-allowed blocking job shop problem (ISBJSSP) is a complex scheduling problem that is able to model many manufacturing planning and logistics applications realistically by addressing both the lack of storage capacity and unforeseen production interruptions. Subjected to random disruptions due to machine malfunction or maintenance, industry production settings often choose to ad… ▽ More The interrupting swap-allowed blocking job shop problem (ISBJSSP) is a complex scheduling problem that is able to model many manufacturing planning and logistics applications realistically by addressing both the lack of storage capacity and unforeseen production interruptions. Subjected to random disruptions due to machine malfunction or maintenance, industry production settings often choose to adopt dispatching rules to enable adaptive, real-time re-scheduling, rather than traditional methods that require costly re-computation on the new configuration every time the problem condition changes dynamically. To generate dispatching rules for the ISBJSSP problem, we introduce a dynamic disjunctive graph formulation characterized by nodes and edges subjected to continuous deletions and additions. This formulation enables the training of an adaptive scheduler utilizing graph neural networks and reinforcement learning. Furthermore, a simulator is developed to simulate interruption, swapping, and blocking in the ISBJSSP setting. Employing a set of reported benchmark instances, we conduct a detailed experimental study on ISBJSSP instances with a range of machine shutdown probabilities to show that the scheduling policies generated can outperform or are at least as competitive as existing dispatching rules with predetermined priority. This study shows that the ISBJSSP, which requires real-time adaptive solutions, can be scheduled efficiently with the proposed method when production interruptions occur with random machine shutdowns. △ Less

Submitted 28 September, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

Comments: 14 pages, 10 figures. Supplementary Material not included

arXiv:2208.12830 [pdf, other]

Mixtures of Gaussian Process Experts with SMC$^2$

Authors: Teemu Härkönen, Sara Wade, Kody Law, Lassi Roininen

Abstract: Gaussian processes are a key component of many flexible statistical and machine learning models. However, they exhibit cubic computational complexity and high memory constraints due to the need of inverting and storing a full covariance matrix. To circumvent this, mixtures of Gaussian process experts have been considered where data points are assigned to independent experts, reducing the complexit… ▽ More Gaussian processes are a key component of many flexible statistical and machine learning models. However, they exhibit cubic computational complexity and high memory constraints due to the need of inverting and storing a full covariance matrix. To circumvent this, mixtures of Gaussian process experts have been considered where data points are assigned to independent experts, reducing the complexity by allowing inference based on smaller, local covariance matrices. Moreover, mixtures of Gaussian process experts substantially enrich the model's flexibility, allowing for behaviors such as non-stationarity, heteroscedasticity, and discontinuities. In this work, we construct a novel inference approach based on nested sequential Monte Carlo samplers to simultaneously infer both the gating network and Gaussian process expert parameters. This greatly improves inference compared to importance sampling, particularly in settings when a stationary Gaussian process is inappropriate, while still being thoroughly parallelizable. △ Less

Submitted 26 August, 2022; originally announced August 2022.

arXiv:2208.07243 [pdf, other]

Exponential Concentration in Stochastic Approximation

Authors: Kody Law, Neil Walton, Shangda Yang

Abstract: We analyze the behavior of stochastic approximation algorithms where iterates, in expectation, progress towards an objective at each step. When progress is proportional to the step size of the algorithm, we prove exponential concentration bounds. These tail-bounds contrast asymptotic normality results, which are more frequently associated with stochastic approximation. The methods that we develop… ▽ More We analyze the behavior of stochastic approximation algorithms where iterates, in expectation, progress towards an objective at each step. When progress is proportional to the step size of the algorithm, we prove exponential concentration bounds. These tail-bounds contrast asymptotic normality results, which are more frequently associated with stochastic approximation. The methods that we develop rely on a geometric ergodicity proof. This extends a result on Markov chains due to Hajek (1982) to the area of stochastic approximation algorithms. We apply our results to several different Stochastic Approximation algorithms, specifically Projected Stochastic Gradient Descent, Kiefer-Wolfowitz and Stochastic Frank-Wolfe algorithms. When applicable, our results prove faster $O(1/t)$ and linear convergence rates for Projected Stochastic Gradient Descent with a non-vanishing gradient. △ Less

Submitted 24 March, 2024; v1 submitted 15 August, 2022; originally announced August 2022.

Comments: 35 pages, 11 Figures

arXiv:2205.04721 [pdf, other]

doi 10.1007/s11263-022-01627-3

Efficient Burst Raw Denoising with Variance Stabilization and Multi-frequency Denoising Network

Authors: Dasong Li, Yi Zhang, Ka Lung Law, Xiaogang Wang, Hongwei Qin, Hongsheng Li

Abstract: With the growing popularity of smartphones, capturing high-quality images is of vital importance to smartphones. The cameras of smartphones have small apertures and small sensor cells, which lead to the noisy images in low light environment. Denoising based on a burst of multiple frames generally outperforms single frame denoising but with the larger compututional cost. In this paper, we propose a… ▽ More With the growing popularity of smartphones, capturing high-quality images is of vital importance to smartphones. The cameras of smartphones have small apertures and small sensor cells, which lead to the noisy images in low light environment. Denoising based on a burst of multiple frames generally outperforms single frame denoising but with the larger compututional cost. In this paper, we propose an efficient yet effective burst denoising system. We adopt a three-stage design: noise prior integration, multi-frame alignment and multi-frame denoising. First, we integrate noise prior by pre-processing raw signals into a variance-stabilization space, which allows using a small-scale network to achieve competitive performance. Second, we observe that it is essential to adopt an explicit alignment for burst denoising, but it is not necessary to integrate a learning-based method to perform multi-frame alignment. Instead, we resort to a conventional and efficient alignment method and combine it with our multi-frame denoising network. At last, we propose a denoising strategy that processes multiple frames sequentially. Sequential denoising avoids filtering a large number of frames by decomposing multiple frames denoising into several efficient sub-network denoising. As for each sub-network, we propose an efficient multi-frequency denoising network to remove noise of different frequencies. Our three-stage design is efficient and shows strong performance on burst denoising. Experiments on synthetic and real raw datasets demonstrate that our method outperforms state-of-the-art methods, with less computational cost. Furthermore, the low complexity and high-quality performance make deployment on smartphones possible. △ Less

Submitted 10 May, 2022; originally announced May 2022.

Comments: Accepted for publication in International Journal of Computer Vision

Journal ref: IJCV 2022

arXiv:2203.13718 [pdf, other]

doi 10.1016/j.commatsci.2022.111985

Digital Fingerprinting of Microstructures

Authors: Michael D. White, Alexander Tarakanov, Christopher P. Race, Philip J. Withers, Kody J. H. Law

Abstract: Finding efficient means of fingerprinting microstructural information is a critical step towards harnessing data-centric machine learning approaches. A statistical framework is systematically developed for compressed characterisation of a population of images, which includes some classical computer vision methods as special cases. The focus is on materials microstructure. The ultimate purpose is t… ▽ More Finding efficient means of fingerprinting microstructural information is a critical step towards harnessing data-centric machine learning approaches. A statistical framework is systematically developed for compressed characterisation of a population of images, which includes some classical computer vision methods as special cases. The focus is on materials microstructure. The ultimate purpose is to rapidly fingerprint sample images in the context of various high-throughput design/make/test scenarios. This includes, but is not limited to, quantification of the disparity between microstructures for quality control, classifying microstructures, predicting materials properties from image data and identifying potential processing routes to engineer new materials with specific properties. Here, we consider microstructure classification and utilise the resulting features over a range of related machine learning tasks, namely supervised, semi-supervised, and unsupervised learning. The approach is applied to two distinct datasets to illustrate various aspects and some recommendations are made based on the findings. In particular, methods that leverage transfer learning with convolutional neural networks (CNNs), pretrained on the ImageNet dataset, are generally shown to outperform other methods. Additionally, dimensionality reduction of these CNN-based fingerprints is shown to have negligible impact on classification accuracy for the supervised learning approaches considered. In situations where there is a large dataset with only a handful of images labelled, graph-based label propagation to unlabelled data is shown to be favourable over discarding unlabelled data and performing supervised learning. In particular, label propagation by Poisson learning is shown to be highly effective at low label rates. △ Less

Submitted 22 January, 2024; v1 submitted 25 March, 2022; originally announced March 2022.

arXiv:2111.14358 [pdf, other]

IDR: Self-Supervised Image Denoising via Iterative Data Refinement

Authors: Yi Zhang, Dasong Li, Ka Lung Law, Xiaogang Wang, Hongwei Qin, Hongsheng Li

Abstract: The lack of large-scale noisy-clean image pairs restricts supervised denoising methods' deployment in actual applications. While existing unsupervised methods are able to learn image denoising without ground-truth clean images, they either show poor performance or work under impractical settings (e.g., paired noisy images). In this paper, we present a practical unsupervised image denoising method… ▽ More The lack of large-scale noisy-clean image pairs restricts supervised denoising methods' deployment in actual applications. While existing unsupervised methods are able to learn image denoising without ground-truth clean images, they either show poor performance or work under impractical settings (e.g., paired noisy images). In this paper, we present a practical unsupervised image denoising method to achieve state-of-the-art denoising performance. Our method only requires single noisy images and a noise model, which is easily accessible in practical raw image denoising. It performs two steps iteratively: (1) Constructing a noisier-noisy dataset with random noise from the noise model; (2) training a model on the noisier-noisy dataset and using the trained model to refine noisy images to obtain the targets used in the next round. We further approximate our full iterative method with a fast algorithm for more efficient training while keeping its original high performance. Experiments on real-world, synthetic, and correlated noise show that our proposed unsupervised denoising approach has superior performances over existing unsupervised methods and competitive performance with supervised methods. In addition, we argue that existing denoising datasets are of low quality and contain only a small number of scenes. To evaluate raw image denoising performance in real-world applications, we build a high-quality raw image dataset SenseNoise-500 that contains 500 real-life scenes. The dataset can serve as a strong benchmark for better evaluating raw image denoising. Code and dataset will be released at https://github.com/zhangyi-3/IDR △ Less

Submitted 22 March, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

Comments: CVPR2022; code & dataset: https://github.com/zhangyi-3/IDR

arXiv:2104.05237 [pdf, other]

Neural Camera Simulators

Authors: Hao Ouyang, Zifan Shi, Chenyang Lei, Ka Lung Law, Qifeng Chen

Abstract: We present a controllable camera simulator based on deep neural networks to synthesize raw image data under different camera settings, including exposure time, ISO, and aperture. The proposed simulator includes an exposure module that utilizes the principle of modern lens designs for correcting the luminance level. It also contains a noise module using the noise level function and an aperture modu… ▽ More We present a controllable camera simulator based on deep neural networks to synthesize raw image data under different camera settings, including exposure time, ISO, and aperture. The proposed simulator includes an exposure module that utilizes the principle of modern lens designs for correcting the luminance level. It also contains a noise module using the noise level function and an aperture module with adaptive attention to simulate the side effects on noise and defocus blur. To facilitate the learning of a simulator model, we collect a dataset of the 10,000 raw images of 450 scenes with different exposure settings. Quantitative experiments and qualitative comparisons show that our approach outperforms relevant baselines in raw data synthesize on multiple cameras. Furthermore, the camera simulator enables various applications, including large-aperture enhancement, HDR, auto exposure, and data augmentation for training local feature detectors. Our work represents the first attempt to simulate a camera sensor's behavior leveraging both the advantage of traditional raw sensor features and the power of data-driven deep learning. △ Less

Submitted 9 August, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

Comments: Accepted to CVPR2021

arXiv:2101.08993 [pdf]

Automatic Volumetric Segmentation of Additive Manufacturing Defects with 3D U-Net

Authors: Vivian Wen Hui Wong, Max Ferguson, Kincho H. Law, Yung-Tsun Tina Lee, Paul Witherell

Abstract: Segmentation of additive manufacturing (AM) defects in X-ray Computed Tomography (XCT) images is challenging, due to the poor contrast, small sizes and variation in appearance of defects. Automatic segmentation can, however, provide quality control for additive manufacturing. Over recent years, three-dimensional convolutional neural networks (3D CNNs) have performed well in the volumetric segmenta… ▽ More Segmentation of additive manufacturing (AM) defects in X-ray Computed Tomography (XCT) images is challenging, due to the poor contrast, small sizes and variation in appearance of defects. Automatic segmentation can, however, provide quality control for additive manufacturing. Over recent years, three-dimensional convolutional neural networks (3D CNNs) have performed well in the volumetric segmentation of medical images. In this work, we leverage techniques from the medical imaging domain and propose training a 3D U-Net model to automatically segment defects in XCT images of AM samples. This work not only contributes to the use of machine learning for AM defect detection but also demonstrates for the first time 3D volumetric segmentation in AM. We train and test with three variants of the 3D U-Net on an AM dataset, achieving a mean intersection of union (IOU) value of 88.4%. △ Less

Submitted 22 January, 2021; originally announced January 2021.

Comments: Accepted by AAAI 2020 Spring Symposia

Journal ref: AAAI 2020 Spring Symposia, Stanford, CA, USA, Mar 23-25, 2020

arXiv:2101.05808 [pdf, other]

doi 10.1016/j.cpc.2021.108019

Materials Fingerprinting Classification

Authors: Adam Spannaus, Kody J. H. Law, Piotr Luszczek, Farzana Nasrin, Cassie Putman Micucci, Peter K. Liaw, Louis J. Santodonato, David J. Keffer, Vasileios Maroulas

Abstract: Significant progress in many classes of materials could be made with the availability of experimentally-derived large datasets composed of atomic identities and three-dimensional coordinates. Methods for visualizing the local atomic structure, such as atom probe tomography (APT), which routinely generate datasets comprised of millions of atoms, are an important step in realizing this goal. However… ▽ More Significant progress in many classes of materials could be made with the availability of experimentally-derived large datasets composed of atomic identities and three-dimensional coordinates. Methods for visualizing the local atomic structure, such as atom probe tomography (APT), which routinely generate datasets comprised of millions of atoms, are an important step in realizing this goal. However, state-of-the-art APT instruments generate noisy and sparse datasets that provide information about elemental type, but obscure atomic structures, thus limiting their subsequent value for materials discovery. The application of a materials fingerprinting process, a machine learning algorithm coupled with topological data analysis, provides an avenue by which here-to-fore unprecedented structural information can be extracted from an APT dataset. As a proof of concept, the material fingerprint is applied to high-entropy alloy APT datasets containing body-centered cubic (BCC) and face-centered cubic (FCC) crystal structures. A local atomic configuration centered on an arbitrary atom is assigned a topological descriptor, with which it can be characterized as a BCC or FCC lattice with near perfect accuracy, despite the inherent noise in the dataset. This successful identification of a fingerprint is a crucial first step in the development of algorithms which can extract more nuanced information, such as chemical ordering, from existing datasets of complex materials. △ Less

Submitted 14 January, 2021; originally announced January 2021.

arXiv:2006.13309 [pdf, other]

doi 10.1007/s10994-023-06491-x

Fast Deep Mixtures of Gaussian Process Experts

Authors: Clement Etienam, Kody Law, Sara Wade, Vitaly Zankin

Abstract: Mixtures of experts have become an indispensable tool for flexible modelling in a supervised learning context, allowing not only the mean function but the entire density of the output to change with the inputs. Sparse Gaussian processes (GP) have shown promise as a leading candidate for the experts in such models, and in this article, we propose to design the gating network for selecting the exper… ▽ More Mixtures of experts have become an indispensable tool for flexible modelling in a supervised learning context, allowing not only the mean function but the entire density of the output to change with the inputs. Sparse Gaussian processes (GP) have shown promise as a leading candidate for the experts in such models, and in this article, we propose to design the gating network for selecting the experts from such mixtures of sparse GPs using a deep neural network (DNN). Furthermore, a fast one pass algorithm called Cluster-Classify-Regress (CCR) is leveraged to approximate the maximum a posteriori (MAP) estimator extremely quickly. This powerful combination of model and algorithm together delivers a novel method which is flexible, robust, and extremely efficient. In particular, the method is able to outperform competing methods in terms of accuracy and uncertainty quantification. The cost is competitive on low-dimensional and small data sets, but is significantly lower for higher-dimensional and big data sets. Iteratively maximizing the distribution of experts given allocations and allocations given experts does not provide significant improvement, which indicates that the algorithm achieves a good approximation to the local MAP estimator very fast. This insight can be useful also in the context of other mixture of experts models. △ Less

Submitted 30 November, 2023; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: 22 pages, 28 figures, to be published in Machine Learning journal

Journal ref: Machine Learning (2024)

arXiv:1912.12197 [pdf, ps, other]

Experimental Demonstration of Learned Time-Domain Digital Back-Propagation

Authors: Eric Sillekens, Wenting Yi, Daniel Semrau, Alessandro Ottino, Boris Karanov, Sujie Zhou, Kevin Law, Jack Chen, Domanic Lavery, Lidia Galdino, Polina Bayvel, Robert I. Killey

Abstract: We present the first experimental demonstration of learned time-domain digital back-propagation (DBP), in 64-GBd dual-polarization 64-QAM signal transmission over 1014 km. Performance gains were comparable to those obtained with conventional, higher complexity, frequency-domain DBP. We present the first experimental demonstration of learned time-domain digital back-propagation (DBP), in 64-GBd dual-polarization 64-QAM signal transmission over 1014 km. Performance gains were comparable to those obtained with conventional, higher complexity, frequency-domain DBP. △ Less

Submitted 23 December, 2019; originally announced December 2019.

arXiv:1910.06391 [pdf, other]

doi 10.5281/zenodo.3996808

Building Information Modeling and Classification by Visual Learning At A City Scale

Authors: Qian Yu, Chaofeng Wang, Barbaros Cetiner, Stella X. Yu, Frank Mckenna, Ertugrul Taciroglu, Kincho H. Law

Abstract: In this paper, we provide two case studies to demonstrate how artificial intelligence can empower civil engineering. In the first case, a machine learning-assisted framework, BRAILS, is proposed for city-scale building information modeling. Building information modeling (BIM) is an efficient way of describing buildings, which is essential to architecture, engineering, and construction. Our propose… ▽ More In this paper, we provide two case studies to demonstrate how artificial intelligence can empower civil engineering. In the first case, a machine learning-assisted framework, BRAILS, is proposed for city-scale building information modeling. Building information modeling (BIM) is an efficient way of describing buildings, which is essential to architecture, engineering, and construction. Our proposed framework employs deep learning technique to extract visual information of buildings from satellite/street view images. Further, a novel machine learning (ML)-based statistical tool, SURF, is proposed to discover the spatial patterns in building metadata. The second case focuses on the task of soft-story building classification. Soft-story buildings are a type of buildings prone to collapse during a moderate or severe earthquake. Hence, identifying and retrofitting such buildings is vital in the current earthquake preparedness efforts. For this task, we propose an automated deep learning-based procedure for identifying soft-story buildings from street view images at a regional scale. We also create a large-scale building image database and a semi-automated image labeling approach that effectively annotates new database entries. Through extensive computational experiments, we demonstrate the effectiveness of the proposed method. △ Less

Submitted 20 July, 2020; v1 submitted 14 October, 2019; originally announced October 2019.

Comments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

arXiv:1905.06220 [pdf, other]

Cluster, Classify, Regress: A General Method For Learning Discountinous Functions

Authors: David E. Bernholdt, Mark R. Cianciosa, Clement Etienam, David L. Green, Kody J. H. Law, J. M. Park

Abstract: This paper presents a method for solving the supervised learning problem in which the output is highly nonlinear and discontinuous. It is proposed to solve this problem in three stages: (i) cluster the pairs of input-output data points, resulting in a label for each point; (ii) classify the data, where the corresponding label is the output; and finally (iii) perform one separate regression for eac… ▽ More This paper presents a method for solving the supervised learning problem in which the output is highly nonlinear and discontinuous. It is proposed to solve this problem in three stages: (i) cluster the pairs of input-output data points, resulting in a label for each point; (ii) classify the data, where the corresponding label is the output; and finally (iii) perform one separate regression for each class, where the training data corresponds to the subset of the original input-output pairs which have that label according to the classifier. It has not yet been proposed to combine these 3 fundamental building blocks of machine learning in this simple and powerful fashion. This can be viewed as a form of deep learning, where any of the intermediate layers can itself be deep. The utility and robustness of the methodology is illustrated on some toy problems, including one example problem arising from simulation of plasma fusion in a tokamak. △ Less

Submitted 16 May, 2019; v1 submitted 15 May, 2019; originally announced May 2019.

Comments: 12 files,6 figures

arXiv:1903.03989 [pdf]

Uncertainty Propagation in Deep Neural Network Using Active Subspace

Authors: Weiqi Ji, Zhuyin Ren, Chung K. Law

Abstract: The inputs of deep neural network (DNN) from real-world data usually come with uncertainties. Yet, it is challenging to propagate the uncertainty in the input features to the DNN predictions at a low computational cost. This work employs a gradient-based subspace method and response surface technique to accelerate the uncertainty propagation in DNN. Specifically, the active subspace method is empl… ▽ More The inputs of deep neural network (DNN) from real-world data usually come with uncertainties. Yet, it is challenging to propagate the uncertainty in the input features to the DNN predictions at a low computational cost. This work employs a gradient-based subspace method and response surface technique to accelerate the uncertainty propagation in DNN. Specifically, the active subspace method is employed to identify the most important subspace in the input features using the gradient of the DNN output to the inputs. Then the response surface within that low-dimensional subspace can be efficiently built, and the uncertainty of the prediction can be acquired by evaluating the computationally cheap response surface instead of the DNN models. In addition, the subspace can help explain the adversarial examples. The approach is demonstrated in MNIST datasets with a convolutional neural network. Code is available at: https://github.com/jiweiqi/nnsubspace. △ Less

Submitted 11 January, 2020; v1 submitted 10 March, 2019; originally announced March 2019.

Comments: Add link to github repo

arXiv:1808.02518 [pdf, other]

Detection and Segmentation of Manufacturing Defects with Convolutional Neural Networks and Transfer Learning

Authors: Max Ferguson, Ronay Ak, Yung-Tsun Tina Lee, Kincho H. Law

Abstract: Quality control is a fundamental component of many manufacturing processes, especially those involving casting or welding. However, manual quality control procedures are often time-consuming and error-prone. In order to meet the growing demand for high-quality products, the use of intelligent visual inspection systems is becoming essential in production lines. Recently, Convolutional Neural Networ… ▽ More Quality control is a fundamental component of many manufacturing processes, especially those involving casting or welding. However, manual quality control procedures are often time-consuming and error-prone. In order to meet the growing demand for high-quality products, the use of intelligent visual inspection systems is becoming essential in production lines. Recently, Convolutional Neural Networks (CNNs) have shown outstanding performance in both image classification and localization tasks. In this article, a system is proposed for the identification of casting defects in X-ray images, based on the Mask Region-based CNN architecture. The proposed defect detection system simultaneously performs defect detection and segmentation on input images, making it suitable for a range of defect detection tasks. It is shown that training the network to simultaneously perform defect detection and defect instance segmentation, results in a higher defect detection accuracy than training on defect detection alone. Transfer learning is leveraged to reduce the training data demands and increase the prediction accuracy of the trained model. More specifically, the model is first trained with two large openly-available image datasets before finetuning on a relatively small metal casting X-ray dataset. The accuracy of the trained model exceeds state-of-the art performance on the GRIMA database of X-ray images (GDXray) Castings dataset and is fast enough to be used in a production setting. The system also performs well on the GDXray Welds dataset. A number of in-depth studies are conducted to explore how transfer learning, multi-task learning, and multi-class learning influence the performance of the trained system. △ Less

Submitted 2 September, 2018; v1 submitted 7 August, 2018; originally announced August 2018.

arXiv:1805.08551 [pdf]

Robust Model Predictive Control for Autonomous Vehicles/Self Driving Cars

Authors: Che Kun Law, Darshit Dalal, Stephen Shearrow

Abstract: A robust Model Predictive Control (MPC) approach for controlling front steering of an autonomous vehicle is presented in this paper. We present various approaches to increase the robustness of model predictive control by using weight tuning, a successive on-line linearization of a nonlinear vehicle model to track position error and successive on-line linearization to track velocity error. Results… ▽ More A robust Model Predictive Control (MPC) approach for controlling front steering of an autonomous vehicle is presented in this paper. We present various approaches to increase the robustness of model predictive control by using weight tuning, a successive on-line linearization of a nonlinear vehicle model to track position error and successive on-line linearization to track velocity error. Results of the effectiveness of each method in terms of accuracy and computational load are discussed. △ Less

Submitted 22 May, 2018; originally announced May 2018.

Comments: 12 pages,9 figures

arXiv:1802.04520

Learning Robust and Adaptive Real-World Continuous Control Using Simulation and Transfer Learning

Authors: M Ferguson, K. H. Law

Abstract: We use model-free reinforcement learning, extensive simulation, and transfer learning to develop a continuous control algorithm that has good zero-shot performance in a real physical environment. We train a simulated agent to act optimally across a set of similar environments, each with dynamics drawn from a prior distribution. We propose that the agent is able to adjust its actions almost immedia… ▽ More We use model-free reinforcement learning, extensive simulation, and transfer learning to develop a continuous control algorithm that has good zero-shot performance in a real physical environment. We train a simulated agent to act optimally across a set of similar environments, each with dynamics drawn from a prior distribution. We propose that the agent is able to adjust its actions almost immediately, based on small set of observations. This robust and adaptive behavior is enabled by using a policy gradient algorithm with an Long Short Term Memory (LSTM) function approximation. Finally, we train an agent to navigate a two-dimensional environment with uncertain dynamics and noisy observations. We demonstrate that this agent has good zero-shot performance in a real physical environment. Our preliminary results indicate that the agent is able to infer the environmental dynamics after only a few timesteps, and adjust its actions accordingly. △ Less

Submitted 8 March, 2018; v1 submitted 13 February, 2018; originally announced February 2018.

Comments: The paper has several technical errors. Rather than correct these errors we have chosen to significantly reformulate the work

arXiv:1701.01657 [pdf, other]

Autonomous Multirobot Excavation for Lunar Applications

Authors: Jekanthan Thangavelautham, Kenneth Law, Terence Fu, Nader Abu El Samid, Alexander D. S. Smith, Gabriele M. T. D'Eleuterio

Abstract: In this paper, a control approach called Artificial Neural Tissue (ANT) is applied to multirobot excavation for lunar base preparation tasks including clearing landing pads and burying of habitat modules. We show for the first time, a team of autonomous robots excavating a terrain to match a given 3D blueprint. Constructing mounds around landing pads will provide physical shielding from debris dur… ▽ More In this paper, a control approach called Artificial Neural Tissue (ANT) is applied to multirobot excavation for lunar base preparation tasks including clearing landing pads and burying of habitat modules. We show for the first time, a team of autonomous robots excavating a terrain to match a given 3D blueprint. Constructing mounds around landing pads will provide physical shielding from debris during launch/landing. Burying a human habitat modules under 0.5 m of lunar regolith is expected to provide both radiation shielding and maintain temperatures of -25 $^{o}$C. This minimizes base life-support complexity and reduces launch mass. ANT is compelling for a lunar mission because it doesn't require a team of astronauts for excavation and it requires minimal supervision. The robot teams are shown to autonomously interpret blueprints, excavate and prepare sites for a lunar base. Because little pre-programmed knowledge is provided, the controllers discover creative techniques. ANT evolves techniques such as slot-dozing that would otherwise require excavation experts. This is critical in making an excavation mission feasible when it is prohibitively expensive to send astronauts. The controllers evolve elaborate negotiation behaviors to work in close quarters. These and other techniques such as concurrent evolution of the controller and team size are shown to tackle problem of antagonism, when too many robots interfere reducing the overall efficiency or worse, resulting in gridlock. While many challenges remain with this technology our work shows a compelling pathway for field testing this approach. △ Less

Submitted 6 January, 2017; originally announced January 2017.

Comments: 38 pages, 32 figures, archive of journal article, in Robotica, 2017

arXiv:1606.06504 [pdf, ps, other]

doi 10.1109/TSP.2017.2695448

Transmit Beamforming for Interference Exploitation in the Underlay Cognitive Radio Z-channel

Authors: Ka Lung Law, Christos Masouros, Marius Pesavento

Abstract: This paper introduces novel transmit beamforming approaches for the cognitive radio (CR) Z-channel. The proposed transmission schemes exploit non-causal information about the interference at the SBS to re-design the CR beamforming optimization problem. This is done with the objective to improve the quality of service (QoS) of secondary users by taking advantage of constructive interference in the… ▽ More This paper introduces novel transmit beamforming approaches for the cognitive radio (CR) Z-channel. The proposed transmission schemes exploit non-causal information about the interference at the SBS to re-design the CR beamforming optimization problem. This is done with the objective to improve the quality of service (QoS) of secondary users by taking advantage of constructive interference in the secondary link. The beamformers are designed to minimize the worst secondary user's symbol error probability (SEP) under constraints on the instantaneous total transmit power, and the power of the instantaneous interference in the primary link. The problem is formulated as a bivariate probabilistic constrained programming (BPCP) problem. We show that the BPCP problem can be transformed for practical SEPs into a convex optimization problem that can be solved, e.g., by the barrier method. A computationally efficient tight approximate approach is also developed to compute the near-optimal solutions. Simulation results and analysis show that the average computational complexity per downlink frame of the proposed approximate problem is comparable to that of the conventional CR downlink beamforming problem. In addition, both the proposed methods offer significant performance improvements as compared to the conventional CR downlink beamforming, while guaranteeing the QoS of primary users on an instantaneous basis, in contrast to the average QoS guarantees of conventional beamformers. △ Less

Submitted 21 June, 2016; originally announced June 2016.

arXiv:1502.04861 [pdf, ps, other]

doi 10.1109/TSP.2015.2423255

Rank-Two Beamforming and Power Allocation in Multicasting Relay Networks

Authors: Adrian Schad, Ka L. Law, Marius Pesavento

Abstract: In this paper, we propose a novel single-group multicasting relay beamforming scheme. We assume a source that transmits common messages via multiple amplify-and-forward relays to multiple destinations. To increase the number of degrees of freedom in the beamforming design, the relays process two received signals jointly and transmit the Alamouti space-time block code over two different beams. Furt… ▽ More In this paper, we propose a novel single-group multicasting relay beamforming scheme. We assume a source that transmits common messages via multiple amplify-and-forward relays to multiple destinations. To increase the number of degrees of freedom in the beamforming design, the relays process two received signals jointly and transmit the Alamouti space-time block code over two different beams. Furthermore, in contrast to the existing relay multicasting scheme of the literature, we take into account the direct links from the source to the destinations. We aim to maximize the lowest received quality-of-service by choosing the proper relay weights and the ideal distribution of the power resources in the network. To solve the corresponding optimization problem, we propose an iterative algorithm which solves sequences of convex approximations of the original non-convex optimization problem. Simulation results demonstrate significant performance improvements of the proposed methods as compared with the existing relay multicasting scheme of the literature and an algorithm based on the popular semidefinite relaxation technique. △ Less

Submitted 17 February, 2015; originally announced February 2015.

arXiv:1502.04488 [pdf, ps, other]

doi 10.1109/TSP.2015.2455516

General Rank Multiuser Downlink Beamforming With Shaping Constraints Using Real-valued OSTBC

Authors: Ka Lung Law, Xin Wen, Minh Thanh Vu, Marius Pesavento

Abstract: In this paper we consider optimal multiuser downlink beamforming in the presence of a massive number of arbitrary quadratic shaping constraints. We combine beamforming with full-rate high dimensional real-valued orthogonal space time block coding (OSTBC) to increase the number of beamforming weight vectors and associated degrees of freedom in the beamformer design. The original multi-constraint be… ▽ More In this paper we consider optimal multiuser downlink beamforming in the presence of a massive number of arbitrary quadratic shaping constraints. We combine beamforming with full-rate high dimensional real-valued orthogonal space time block coding (OSTBC) to increase the number of beamforming weight vectors and associated degrees of freedom in the beamformer design. The original multi-constraint beamforming problem is converted into a convex optimization problem using semidefinite relaxation (SDR) which can be solved efficiently. In contrast to conventional (rank-one) beamforming approaches in which an optimal beamforming solution can be obtained only when the SDR solution (after rank reduction) exhibits the rank-one property, in our approach optimality is guaranteed when a rank of eight is not exceeded. We show that our approach can incorporate up to 79 additional shaping constraints for which an optimal beamforming solution is guaranteed as compared to a maximum of two additional constraints that bound the conventional rank-one downlink beamforming designs. Simulation results demonstrate the flexibility of our proposed beamformer design. △ Less

Submitted 17 February, 2015; v1 submitted 16 February, 2015; originally announced February 2015.

Showing 1–24 of 24 results for author: Law, K