-
3D Adaptive Structural Convolution Network for Domain-Invariant Point Cloud Recognition
Authors:
Younggun Kim,
Beomsik Cho,
Seonghoon Ryoo,
Soomok Lee
Abstract:
Adapting deep learning networks for point cloud data recognition in self-driving vehicles faces challenges due to the variability in datasets and sensor technologies, emphasizing the need for adaptive techniques to maintain accuracy across different conditions. In this paper, we introduce the 3D Adaptive Structural Convolution Network (3D-ASCN), a cutting-edge framework for 3D point cloud recognit…
▽ More
Adapting deep learning networks for point cloud data recognition in self-driving vehicles faces challenges due to the variability in datasets and sensor technologies, emphasizing the need for adaptive techniques to maintain accuracy across different conditions. In this paper, we introduce the 3D Adaptive Structural Convolution Network (3D-ASCN), a cutting-edge framework for 3D point cloud recognition. It combines 3D convolution kernels, a structural tree structure, and adaptive neighborhood sampling for effective geometric feature extraction. This method obtains domain-invariant features and demonstrates robust, adaptable performance on a variety of point cloud datasets, ensuring compatibility across diverse sensor configurations without the need for parameter adjustments. This highlights its potential to significantly enhance the reliability and efficiency of self-driving vehicle technology.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Modeling Kinematic Uncertainty of Tendon-Driven Continuum Robots via Mixture Density Networks
Authors:
Jordan Thompson,
Brian Y. Cho,
Daniel S. Brown,
Alan Kuntz
Abstract:
Tendon-driven continuum robot kinematic models are frequently computationally expensive, inaccurate due to unmodeled effects, or both. In particular, unmodeled effects produce uncertainties that arise during the robot's operation that lead to variability in the resulting geometry. We propose a novel solution to these issues through the development of a Gaussian mixture kinematic model. We train a…
▽ More
Tendon-driven continuum robot kinematic models are frequently computationally expensive, inaccurate due to unmodeled effects, or both. In particular, unmodeled effects produce uncertainties that arise during the robot's operation that lead to variability in the resulting geometry. We propose a novel solution to these issues through the development of a Gaussian mixture kinematic model. We train a mixture density network to output a Gaussian mixture model representation of the robot geometry given the current tendon displacements. This model computes a probability distribution that is more representative of the true distribution of geometries at a given configuration than a model that outputs a single geometry, while also reducing the computation time. We demonstrate one use of this model through a trajectory optimization method that explicitly reasons about the workspace uncertainty to minimize the probability of collision.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Accounting for Hysteresis in the Forward Kinematics of Nonlinearly-Routed Tendon-Driven Continuum Robots via a Learned Deep Decoder Network
Authors:
Brian Y. Cho,
Daniel S. Esser,
Jordan Thompson,
Bao Thach,
Robert J. Webster III,
Alan Kuntz
Abstract:
Tendon-driven continuum robots have been gaining popularity in medical applications due to their ability to curve around complex anatomical structures, potentially reducing the invasiveness of surgery. However, accurate modeling is required to plan and control the movements of these flexible robots. Physics-based models have limitations due to unmodeled effects, leading to mismatches between model…
▽ More
Tendon-driven continuum robots have been gaining popularity in medical applications due to their ability to curve around complex anatomical structures, potentially reducing the invasiveness of surgery. However, accurate modeling is required to plan and control the movements of these flexible robots. Physics-based models have limitations due to unmodeled effects, leading to mismatches between model prediction and actual robot shape. Recently proposed learning-based methods have been shown to overcome some of these limitations but do not account for hysteresis, a significant source of error for these robots. To overcome these challenges, we propose a novel deep decoder neural network that predicts the complete shape of tendon-driven robots using point clouds as the shape representation, conditioned on prior configurations to account for hysteresis. We evaluate our method on a physical tendon-driven robot and show that our network model accurately predicts the robot's shape, significantly outperforming a state-of-the-art physics-based model and a learning-based model that does not account for hysteresis.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Software-Defined Cryptography: A Design Feature of Cryptographic Agility
Authors:
Jihoon Cho,
Changhoon Lee,
Eunkyung Kim,
Jieun Lee,
Beumjin Cho
Abstract:
Cryptographic agility, or crypto-agility, is a design feature that enables agile updates to new cryptographic algorithms and standards without the need to modify or replace the surrounding infrastructure. This paper examines the prerequisites for crypto-agility and proposes its desired design feature. More specifically, we investigate the design characteristics of widely deployed cybersecurity par…
▽ More
Cryptographic agility, or crypto-agility, is a design feature that enables agile updates to new cryptographic algorithms and standards without the need to modify or replace the surrounding infrastructure. This paper examines the prerequisites for crypto-agility and proposes its desired design feature. More specifically, we investigate the design characteristics of widely deployed cybersecurity paradigms, i.e., zero trust, and apply its design feature to crypto-agility, achieving greater visibility and automation in cryptographic management.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
The Bid Picture: Auction-Inspired Multi-player Generative Adversarial Networks Training
Authors:
Joo Yong Shim,
Jean Seong Bjorn Choe,
Jong-Kook Kim
Abstract:
This article proposes auction-inspired multi-player generative adversarial networks training, which mitigates the mode collapse problem of GANs. Mode collapse occurs when an over-fitted generator generates a limited range of samples, often concentrating on a small subset of the data distribution. Despite the restricted diversity of generated samples, the discriminator can still be deceived into di…
▽ More
This article proposes auction-inspired multi-player generative adversarial networks training, which mitigates the mode collapse problem of GANs. Mode collapse occurs when an over-fitted generator generates a limited range of samples, often concentrating on a small subset of the data distribution. Despite the restricted diversity of generated samples, the discriminator can still be deceived into distinguishing these samples as real samples from the actual distribution. In the absence of external standards, a model cannot recognize its failure during the training phase. We extend the two-player game of generative adversarial networks to the multi-player game. During the training, the values of each model are determined by the bids submitted by other players in an auction-like process.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Peeking with PEAK: Sequential, Nonparametric Composite Hypothesis Tests for Means of Multiple Data Streams
Authors:
Brian Cho,
Kyra Gan,
Nathan Kallus
Abstract:
We propose a novel nonparametric sequential test for composite hypotheses for means of multiple data streams. Our proposed method, \emph{peeking with expectation-based averaged capital} (PEAK), builds upon the testing-by-betting framework and provides a non-asymptotic $α$-level test across any stopping time. Our contributions are two-fold: (1) we propose a novel betting scheme and provide theoreti…
▽ More
We propose a novel nonparametric sequential test for composite hypotheses for means of multiple data streams. Our proposed method, \emph{peeking with expectation-based averaged capital} (PEAK), builds upon the testing-by-betting framework and provides a non-asymptotic $α$-level test across any stopping time. Our contributions are two-fold: (1) we propose a novel betting scheme and provide theoretical guarantees on type-I error control, power, and asymptotic growth rate/$e$-power in the setting of a single data stream; (2) we introduce PEAK, a generalization of this betting scheme to multiple streams, that (i) avoids using wasteful union bounds via averaging, (ii) is a test of power one under mild regularity conditions on the sampling scheme of the streams, and (iii) reduces computational overhead when applying the testing-as-betting approaches for pure-exploration bandit problems. We illustrate the practical benefits of PEAK using both synthetic and real-world HeartSteps datasets. Our experiments show that PEAK provides up to an 85\% reduction in the number of samples before stopping compared to existing stopping rules for pure-exploration bandit problems, and matches the performance of state-of-the-art sequential tests while improving upon computational complexity.
△ Less
Submitted 2 June, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Enhancing Cross-Category Learning in Recommendation Systems with Multi-Layer Embedding Training
Authors:
Zihao Deng,
Benjamin Ghaemmaghami,
Ashish Kumar Singh,
Benjamin Cho,
Leo Orshansky,
Mattan Erez,
Michael Orshansky
Abstract:
Modern DNN-based recommendation systems rely on training-derived embeddings of sparse features. Input sparsity makes obtaining high-quality embeddings for rarely-occurring categories harder as their representations are updated infrequently. We demonstrate a training-time technique to produce superior embeddings via effective cross-category learning and theoretically explain its surprising effectiv…
▽ More
Modern DNN-based recommendation systems rely on training-derived embeddings of sparse features. Input sparsity makes obtaining high-quality embeddings for rarely-occurring categories harder as their representations are updated infrequently. We demonstrate a training-time technique to produce superior embeddings via effective cross-category learning and theoretically explain its surprising effectiveness. The scheme, termed the multi-layer embeddings training (MLET), trains embeddings using factorization of the embedding layer, with an inner dimension higher than the target embedding dimension. For inference efficiency, MLET converts the trained two-layer embedding into a single-layer one thus keeping inference-time model size unchanged.
Empirical superiority of MLET is puzzling as its search space is not larger than that of the single-layer embedding. The strong dependence of MLET on the inner dimension is even more surprising. We develop a theory that explains both of these behaviors by showing that MLET creates an adaptive update mechanism modulated by the singular vectors of embeddings. When tested on multiple state-of-the-art recommendation models for click-through rate (CTR) prediction tasks, MLET consistently produces better models, especially for rare items. At constant model quality, MLET allows embedding dimension, and model size, reduction by up to 16x, and 5.8x on average, across the models.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Efficient and Accurate Mapping of Subsurface Anatomy via Online Trajectory Optimization for Robot Assisted Surgery
Authors:
Brian Y. Cho,
Alan Kuntz
Abstract:
Robotic surgical subtask automation has the potential to reduce the per-patient workload of human surgeons. There are a variety of surgical subtasks that require geometric information of subsurface anatomy, such as the location of tumors, which necessitates accurate and efficient surgical sensing. In this work, we propose an automated sensing method that maps 3D subsurface anatomy to provide such…
▽ More
Robotic surgical subtask automation has the potential to reduce the per-patient workload of human surgeons. There are a variety of surgical subtasks that require geometric information of subsurface anatomy, such as the location of tumors, which necessitates accurate and efficient surgical sensing. In this work, we propose an automated sensing method that maps 3D subsurface anatomy to provide such geometric knowledge. We model the anatomy via a Bayesian Hilbert map-based probabilistic 3D occupancy map. Using the 3D occupancy map, we plan sensing paths on the surface of the anatomy via a graph search algorithm, $A^*$ search, with a cost function that enables the trajectories generated to balance between exploration of unsensed regions and refining the existing probabilistic understanding. We demonstrate the performance of our proposed method by comparing it against 3 different methods in several anatomical environments including a real-life CT scan dataset. The experimental results show that our method efficiently detects relevant subsurface anatomy with shorter trajectories than the comparison methods, and the resulting occupancy map achieves high accuracy.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
Rubric-Specific Approach to Automated Essay Scoring with Augmentation Training
Authors:
Brian Cho,
Youngbin Jang,
Jaewoong Yoon
Abstract:
Neural based approaches to automatic evaluation of subjective responses have shown superior performance and efficiency compared to traditional rule-based and feature engineering oriented solutions. However, it remains unclear whether the suggested neural solutions are sufficient replacements of human raters as we find recent works do not properly account for rubric items that are essential for aut…
▽ More
Neural based approaches to automatic evaluation of subjective responses have shown superior performance and efficiency compared to traditional rule-based and feature engineering oriented solutions. However, it remains unclear whether the suggested neural solutions are sufficient replacements of human raters as we find recent works do not properly account for rubric items that are essential for automated essay scoring during model training and validation. In this paper, we propose a series of data augmentation operations that train and test an automated scoring model to learn features and functions overlooked by previous works while still achieving state-of-the-art performance in the Automated Student Assessment Prize dataset.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
Towards Understanding of Deepfake Videos in the Wild
Authors:
Beomsang Cho,
Binh M. Le,
Jiwon Kim,
Simon Woo,
Shahroz Tariq,
Alsharif Abuadbba,
Kristen Moore
Abstract:
Deepfakes have become a growing concern in recent years, prompting researchers to develop benchmark datasets and detection algorithms to tackle the issue. However, existing datasets suffer from significant drawbacks that hamper their effectiveness. Notably, these datasets fail to encompass the latest deepfake videos produced by state-of-the-art methods that are being shared across various platform…
▽ More
Deepfakes have become a growing concern in recent years, prompting researchers to develop benchmark datasets and detection algorithms to tackle the issue. However, existing datasets suffer from significant drawbacks that hamper their effectiveness. Notably, these datasets fail to encompass the latest deepfake videos produced by state-of-the-art methods that are being shared across various platforms. This limitation impedes the ability to keep pace with the rapid evolution of generative AI techniques employed in real-world deepfake production. Our contributions in this IRB-approved study are to bridge this knowledge gap from current real-world deepfakes by providing in-depth analysis. We first present the largest and most diverse and recent deepfake dataset (RWDF-23) collected from the wild to date, consisting of 2,000 deepfake videos collected from 4 platforms targeting 4 different languages span created from 21 countries: Reddit, YouTube, TikTok, and Bilibili. By expanding the dataset's scope beyond the previous research, we capture a broader range of real-world deepfake content, reflecting the ever-evolving landscape of online platforms. Also, we conduct a comprehensive analysis encompassing various aspects of deepfakes, including creators, manipulation strategies, purposes, and real-world content production methods. This allows us to gain valuable insights into the nuances and characteristics of deepfakes in different contexts. Lastly, in addition to the video content, we also collect viewer comments and interactions, enabling us to explore the engagements of internet users with deepfake content. By considering this rich contextual information, we aim to provide a holistic understanding of the {evolving} deepfake phenomenon and its impact on online platforms.
△ Less
Submitted 6 September, 2023; v1 submitted 4 September, 2023;
originally announced September 2023.
-
DeformerNet: Learning Bimanual Manipulation of 3D Deformable Objects
Authors:
Bao Thach,
Brian Y. Cho,
Shing-Hei Ho,
Tucker Hermans,
Alan Kuntz
Abstract:
Applications in fields ranging from home care to warehouse fulfillment to surgical assistance require robots to reliably manipulate the shape of 3D deformable objects. Analytic models of elastic, 3D deformable objects require numerous parameters to describe the potentially infinite degrees of freedom present in determining the object's shape. Previous attempts at performing 3D shape control rely o…
▽ More
Applications in fields ranging from home care to warehouse fulfillment to surgical assistance require robots to reliably manipulate the shape of 3D deformable objects. Analytic models of elastic, 3D deformable objects require numerous parameters to describe the potentially infinite degrees of freedom present in determining the object's shape. Previous attempts at performing 3D shape control rely on hand-crafted features to represent the object shape and require training of object-specific control models. We overcome these issues through the use of our novel DeformerNet neural network architecture, which operates on a partial-view point cloud of the manipulated object and a point cloud of the goal shape to learn a low-dimensional representation of the object shape. This shape embedding enables the robot to learn a visual servo controller that computes the desired robot end-effector action to iteratively deform the object toward the target shape. We demonstrate both in simulation and on a physical robot that DeformerNet reliably generalizes to object shapes and material stiffness not seen during training, including ex vivo chicken muscle tissue. Crucially, using DeformerNet, the robot successfully accomplishes three surgical sub-tasks: retraction (moving tissue aside to access a site underneath it), tissue wrapping (a sub-task in procedures like aortic stent placements), and connecting two tubular pieces of tissue (a sub-task in anastomosis).
△ Less
Submitted 19 February, 2024; v1 submitted 8 May, 2023;
originally announced May 2023.
-
AIROGS: Artificial Intelligence for RObust Glaucoma Screening Challenge
Authors:
Coen de Vente,
Koenraad A. Vermeer,
Nicolas Jaccard,
He Wang,
Hongyi Sun,
Firas Khader,
Daniel Truhn,
Temirgali Aimyshev,
Yerkebulan Zhanibekuly,
Tien-Dung Le,
Adrian Galdran,
Miguel Ángel González Ballester,
Gustavo Carneiro,
Devika R G,
Hrishikesh P S,
Densen Puthussery,
Hong Liu,
Zekang Yang,
Satoshi Kondo,
Satoshi Kasai,
Edward Wang,
Ashritha Durvasula,
Jónathan Heras,
Miguel Ángel Zapata,
Teresa Araújo
, et al. (11 additional authors not shown)
Abstract:
The early detection of glaucoma is essential in preventing visual impairment. Artificial intelligence (AI) can be used to analyze color fundus photographs (CFPs) in a cost-effective manner, making glaucoma screening more accessible. While AI models for glaucoma screening from CFPs have shown promising results in laboratory settings, their performance decreases significantly in real-world scenarios…
▽ More
The early detection of glaucoma is essential in preventing visual impairment. Artificial intelligence (AI) can be used to analyze color fundus photographs (CFPs) in a cost-effective manner, making glaucoma screening more accessible. While AI models for glaucoma screening from CFPs have shown promising results in laboratory settings, their performance decreases significantly in real-world scenarios due to the presence of out-of-distribution and low-quality images. To address this issue, we propose the Artificial Intelligence for Robust Glaucoma Screening (AIROGS) challenge. This challenge includes a large dataset of around 113,000 images from about 60,000 patients and 500 different screening centers, and encourages the development of algorithms that are robust to ungradable and unexpected input data. We evaluated solutions from 14 teams in this paper, and found that the best teams performed similarly to a set of 20 expert ophthalmologists and optometrists. The highest-scoring team achieved an area under the receiver operating characteristic curve of 0.99 (95% CI: 0.98-0.99) for detecting ungradable images on-the-fly. Additionally, many of the algorithms showed robust performance when tested on three other publicly available datasets. These results demonstrate the feasibility of robust AI-enabled glaucoma screening.
△ Less
Submitted 10 February, 2023; v1 submitted 3 February, 2023;
originally announced February 2023.
-
A repeated unknown game: Decentralized task offloading in vehicular fog computing
Authors:
Byungjin Cho,
Yu Xiao
Abstract:
Offloading computation to nearby edge/fog computing nodes, including the ones carried by moving vehicles, e.g., vehicular fog nodes (VFN), has proved to be a promising approach for enabling low-latency and compute-intensive mobility applications, such as cooperative and autonomous driving. This work considers vehicular fog computing scenarios where the clients of computation offloading services tr…
▽ More
Offloading computation to nearby edge/fog computing nodes, including the ones carried by moving vehicles, e.g., vehicular fog nodes (VFN), has proved to be a promising approach for enabling low-latency and compute-intensive mobility applications, such as cooperative and autonomous driving. This work considers vehicular fog computing scenarios where the clients of computation offloading services try to minimize their own costs while deciding which VFNs to offload their tasks. We focus on decentralized multi-agent decision-making in a repeated unknown game where each agent, e.g., service client, can observe only its own action and realized cost. In other words, each agent is unaware of the game composition or even the existence of opponents. We apply a completely uncoupled learning rule to generalize the decentralized decision-making algorithm presented in \cite{Cho2021} for the multi-agent case. The multi-agent solution proposed in this work can capture the unknown offloading cost variations susceptive to resource congestion under an adversarial framework where each agent may take implicit cost estimation and suitable resource choice adapting to the dynamics associated with volatile supply and demand. According to the evaluation via simulation, this work reveals that such individual perturbations for robustness to uncertainty and adaptation to dynamicity ensure a certain level of optimality in terms of social welfare, e.g., converging the actual sequence of play with unknown and asymmetric attributes and lowering the correspondent cost in social welfare due to the self-interested behaviors of agents.
△ Less
Submitted 20 May, 2023; v1 submitted 3 September, 2022;
originally announced September 2022.
-
Planning Sensing Sequences for Subsurface 3D Tumor Mapping
Authors:
Brian Y. Cho,
Tucker Hermans,
Alan Kuntz
Abstract:
Surgical automation has the potential to enable increased precision and reduce the per-patient workload of overburdened human surgeons. An effective automation system must be able to sense and map subsurface anatomy, such as tumors, efficiently and accurately. In this work, we present a method that plans a sequence of sensing actions to map the 3D geometry of subsurface tumors. We leverage a seque…
▽ More
Surgical automation has the potential to enable increased precision and reduce the per-patient workload of overburdened human surgeons. An effective automation system must be able to sense and map subsurface anatomy, such as tumors, efficiently and accurately. In this work, we present a method that plans a sequence of sensing actions to map the 3D geometry of subsurface tumors. We leverage a sequential Bayesian Hilbert map to create a 3D probabilistic occupancy model that represents the likelihood that any given point in the anatomy is occupied by a tumor, conditioned on sensor readings. We iteratively update the map, utilizing Bayesian optimization to determine sensing poses that explore unsensed regions of anatomy and exploit the knowledge gained by previous sensing actions. We demonstrate our method's efficiency and accuracy in three anatomical scenarios including a liver tumor scenario generated from a real patient's CT scan. The results show that our proposed method significantly outperforms comparison methods in terms of efficiency while detecting subsurface tumors with high accuracy.
△ Less
Submitted 12 October, 2021;
originally announced October 2021.
-
Learning Visual Shape Control of Novel 3D Deformable Objects from Partial-View Point Clouds
Authors:
Bao Thach,
Brian Y. Cho,
Alan Kuntz,
Tucker Hermans
Abstract:
If robots could reliably manipulate the shape of 3D deformable objects, they could find applications in fields ranging from home care to warehouse fulfillment to surgical assistance. Analytic models of elastic, 3D deformable objects require numerous parameters to describe the potentially infinite degrees of freedom present in determining the object's shape. Previous attempts at performing 3D shape…
▽ More
If robots could reliably manipulate the shape of 3D deformable objects, they could find applications in fields ranging from home care to warehouse fulfillment to surgical assistance. Analytic models of elastic, 3D deformable objects require numerous parameters to describe the potentially infinite degrees of freedom present in determining the object's shape. Previous attempts at performing 3D shape control rely on hand-crafted features to represent the object shape and require training of object-specific control models. We overcome these issues through the use of our novel DeformerNet neural network architecture, which operates on a partial-view point cloud of the object being manipulated and a point cloud of the goal shape to learn a low-dimensional representation of the object shape. This shape embedding enables the robot to learn to define a visual servo controller that provides Cartesian pose changes to the robot end-effector causing the object to deform towards its target shape. Crucially, we demonstrate both in simulation and on a physical robot that DeformerNet reliably generalizes to object shapes and material stiffness not seen during training and outperforms comparison methods for both the generic shape control and the surgical task of retraction.
△ Less
Submitted 18 April, 2022; v1 submitted 9 October, 2021;
originally announced October 2021.
-
Learning-based decentralized offloading decision making in an adversarial environment
Authors:
Byungjin Cho,
Yu Xiao
Abstract:
Vehicular fog computing (VFC) pushes the cloud computing capability to the distributed fog nodes at the edge of the Internet, enabling compute-intensive and latency-sensitive computing services for vehicles through task offloading. However, a heterogeneous mobility environment introduces uncertainties in terms of resource supply and demand, which are inevitable bottlenecks for the optimal offloadi…
▽ More
Vehicular fog computing (VFC) pushes the cloud computing capability to the distributed fog nodes at the edge of the Internet, enabling compute-intensive and latency-sensitive computing services for vehicles through task offloading. However, a heterogeneous mobility environment introduces uncertainties in terms of resource supply and demand, which are inevitable bottlenecks for the optimal offloading decision. Also, these uncertainties bring extra challenges to task offloading under the oblivious adversary attack and data privacy risks. In this article, we develop a new adversarial online learning algorithm with bandit feedback based on the adversarial multi-armed bandit theory, to enable scalable and low-complexity offloading decision making. Specifically, we focus on optimizing fog node selection with the aim of minimizing the offloading service costs in terms of delay and energy. The key is to implicitly tune the exploration bonus in the selection process and the assessment rules of the designed algorithm, taking into account volatile resource supply and demand. We theoretically prove that the input-size dependent selection rule allows to choose a suitable fog node without exploring the sub-optimal actions, and also an appropriate score patching rule allows to quickly adapt to evolving circumstances, which reduce variance and bias simultaneously, thereby achieving a better exploitation-exploration balance. Simulation results verify the effectiveness and robustness of the proposed algorithm.
△ Less
Submitted 6 September, 2021; v1 submitted 26 April, 2021;
originally announced April 2021.
-
Accelerating Bandwidth-Bound Deep Learning Inference with Main-Memory Accelerators
Authors:
Benjamin Y. Cho,
Jeageun Jung,
Mattan Erez
Abstract:
DL inference queries play an important role in diverse internet services and a large fraction of datacenter cycles are spent on processing DL inference queries. Specifically, the matrix-matrix multiplication (GEMM) operations of fully-connected MLP layers dominate many inference tasks. We find that the GEMM operations for datacenter DL inference tasks are memory bandwidth bound, contrary to common…
▽ More
DL inference queries play an important role in diverse internet services and a large fraction of datacenter cycles are spent on processing DL inference queries. Specifically, the matrix-matrix multiplication (GEMM) operations of fully-connected MLP layers dominate many inference tasks. We find that the GEMM operations for datacenter DL inference tasks are memory bandwidth bound, contrary to common assumptions: (1) strict query latency constraints force small-batch operation, which limits reuse and increases bandwidth demands; and (2) large and colocated models require reading the large weight matrices from main memory, again requiring high bandwidth without offering reuse opportunities. We demonstrate the large potential of accelerating these small-batch GEMMs with processing in the main CPU memory. We develop a novel GEMM execution flow and corresponding memory-side address-generation logic that exploits GEMM locality and enables long-running PIM kernels despite the complex address-mapping functions employed by the CPU that would otherwise destroy locality. Our evaluation of StepStone variants at the channel, device, and within-device PIM levels, along with optimizations that balance parallelism benefits with data-distribution overheads demonstrate $12\times$ better minimum latency than a CPU and $2.8\times$ greater throughput for strict query latency constraints. End-to-end performance analysis of recent recommendation and language models shows that StepStone PIM outperforms a fast CPU (by up to $16\times$) and prior main-memory acceleration approaches (by up to $2.4\times$ compared to the best prior approach).
△ Less
Submitted 30 November, 2020;
originally announced December 2020.
-
Training with Multi-Layer Embeddings for Model Reduction
Authors:
Benjamin Ghaemmaghami,
Zihao Deng,
Benjamin Cho,
Leo Orshansky,
Ashish Kumar Singh,
Mattan Erez,
Michael Orshansky
Abstract:
Modern recommendation systems rely on real-valued embeddings of categorical features. Increasing the dimension of embedding vectors improves model accuracy but comes at a high cost to model size. We introduce a multi-layer embedding training (MLET) architecture that trains embeddings via a sequence of linear layers to derive superior embedding accuracy vs. model size trade-off.
Our approach is f…
▽ More
Modern recommendation systems rely on real-valued embeddings of categorical features. Increasing the dimension of embedding vectors improves model accuracy but comes at a high cost to model size. We introduce a multi-layer embedding training (MLET) architecture that trains embeddings via a sequence of linear layers to derive superior embedding accuracy vs. model size trade-off.
Our approach is fundamentally based on the ability of factorized linear layers to produce superior embeddings to that of a single linear layer. We focus on the analysis and implementation of a two-layer scheme. Harnessing the recent results in dynamics of backpropagation in linear neural networks, we explain the ability to get superior multi-layer embeddings via their tendency to have lower effective rank. We show that substantial advantages are obtained in the regime where the width of the hidden layer is much larger than that of the final embedding (d). Crucially, at conclusion of training, we convert the two-layer solution into a single-layer one: as a result, the inference-time model size scales as d.
We prototype the MLET scheme within Facebook's PyTorch-based open-source Deep Learning Recommendation Model. We show that it allows reducing d by 4-8X, with a corresponding improvement in memory footprint, at given model accuracy. The experiments are run on two publicly available click-through-rate prediction benchmarks (Criteo-Kaggle and Avazu). The runtime cost of MLET is 25%, on average.
△ Less
Submitted 9 June, 2020;
originally announced June 2020.
-
Fast and resilient manipulation planning for target retrieval in clutter
Authors:
Changjoo Nam,
Jinhwi Lee,
Sang Hun Cheong,
Brian Y. Cho,
ChangHwan Kim
Abstract:
This paper presents a task and motion planning (TAMP) framework for a robotic manipulator in order to retrieve a target object from clutter. We consider a configuration of objects in a confined space with a high density so no collision-free path to the target exists. The robot must relocate some objects to retrieve the target without collisions. For fast completion of object rearrangement, the rob…
▽ More
This paper presents a task and motion planning (TAMP) framework for a robotic manipulator in order to retrieve a target object from clutter. We consider a configuration of objects in a confined space with a high density so no collision-free path to the target exists. The robot must relocate some objects to retrieve the target without collisions. For fast completion of object rearrangement, the robot aims to optimize the number of pick-and-place actions which often determines the efficiency of a TAMP framework.
We propose a task planner incorporating motion planning to generate executable plans which aims to minimize the number of pick-and-place actions. In addition to fully known and static environments, our method can deal with uncertain and dynamic situations incurred by occluded views. Our method is shown to reduce the number of pick-and-place actions compared to baseline methods (e.g., at least 28.0% of reduction in a known static environment with 20 objects).
△ Less
Submitted 24 March, 2020;
originally announced March 2020.
-
Where to relocate?: Object rearrangement inside cluttered and confined environments for robotic manipulation
Authors:
Sang Hun Cheong,
Brian Y. Cho,
Jinhwi Lee,
ChangHwan Kim,
Changjoo Nam
Abstract:
We present an algorithm determining where to relocate objects inside a cluttered and confined space while rearranging objects to retrieve a target object. Although methods that decide what to remove have been proposed, planning for the placement of removed objects inside a workspace has not received much attention. Rather, removed objects are often placed outside the workspace, which incurs additi…
▽ More
We present an algorithm determining where to relocate objects inside a cluttered and confined space while rearranging objects to retrieve a target object. Although methods that decide what to remove have been proposed, planning for the placement of removed objects inside a workspace has not received much attention. Rather, removed objects are often placed outside the workspace, which incurs additional laborious work (e.g., motion planning and execution of the manipulator and the mobile base, perception of other areas). Some other methods manipulate objects only inside the workspace but without a principle so the rearrangement becomes inefficient.
In this work, we consider both monotone (each object is moved only once) and non-monotone arrangement problems which have shown to be NP-hard. Once the sequence of objects to be relocated is given by any existing algorithm, our method aims to minimize the number of pick-and-place actions to place the objects until the target becomes accessible. From extensive experiments, we show that our method reduces the number of pick-and-place actions and the total execution time (the reduction is up to 23.1% and 28.1% respectively) compared to baseline methods while achieving higher success rates.
△ Less
Submitted 24 March, 2020;
originally announced March 2020.
-
Fides: Managing Data on Untrusted Infrastructure
Authors:
Sujaya Maiyya,
Danny Hyun Bum Cho,
Divyakant Agrawal,
Amr El Abbadi
Abstract:
Significant amounts of data are currently being stored and managed on third-party servers. It is impractical for many small scale enterprises to own their private datacenters, hence renting third-party servers is a viable solution for such businesses. But the increasing number of malicious attacks, both internal and external, as well as buggy software on third-party servers is causing clients to l…
▽ More
Significant amounts of data are currently being stored and managed on third-party servers. It is impractical for many small scale enterprises to own their private datacenters, hence renting third-party servers is a viable solution for such businesses. But the increasing number of malicious attacks, both internal and external, as well as buggy software on third-party servers is causing clients to lose their trust in these external infrastructures. While small enterprises cannot avoid using external infrastructures, they need the right set of protocols to manage their data on untrusted infrastructures. In this paper, we propose TFCommit, a novel atomic commitment protocol that executes transactions on data stored across multiple untrusted servers. To our knowledge, TFCommit is the first atomic commitment protocol to execute transactions in an untrusted environment without using expensive Byzantine replication. Using TFCommit, we propose an auditable data management system, Fides, residing completely on untrustworthy infrastructure. As an auditable system, Fides guarantees the detection of potentially malicious failures occurring on untrusted servers using tamper-resistant logs with the support of cryptographic techniques. The experimental evaluation demonstrates the scalability and the relatively low overhead of our approach that allows executing transactions on untrusted infrastructure.
△ Less
Submitted 19 January, 2020;
originally announced January 2020.
-
RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing
Authors:
Liu Ke,
Udit Gupta,
Carole-Jean Wu,
Benjamin Youngjae Cho,
Mark Hempstead,
Brandon Reagen,
Xuan Zhang,
David Brooks,
Vikas Chandra,
Utku Diril,
Amin Firoozshahian,
Kim Hazelwood,
Bill Jia,
Hsien-Hsin S. Lee,
Meng Li,
Bert Maher,
Dheevatsa Mudigere,
Maxim Naumov,
Martin Schatz,
Mikhail Smelyanskiy,
Xiaodong Wang
Abstract:
Personalized recommendation systems leverage deep learning models and account for the majority of data center AI cycles. Their performance is dominated by memory-bound sparse embedding operations with unique irregular memory access patterns that pose a fundamental challenge to accelerate. This paper proposes a lightweight, commodity DRAM compliant, near-memory processing solution to accelerate per…
▽ More
Personalized recommendation systems leverage deep learning models and account for the majority of data center AI cycles. Their performance is dominated by memory-bound sparse embedding operations with unique irregular memory access patterns that pose a fundamental challenge to accelerate. This paper proposes a lightweight, commodity DRAM compliant, near-memory processing solution to accelerate personalized recommendation inference. The in-depth characterization of production-grade recommendation models shows that embedding operations with high model-, operator- and data-level parallelism lead to memory bandwidth saturation, limiting recommendation inference performance. We propose RecNMP which provides a scalable solution to improve system throughput, supporting a broad range of sparse embedding models. RecNMP is specifically tailored to production environments with heavy co-location of operators on a single server. Several hardware/software co-optimization techniques such as memory-side caching, table-aware packet scheduling, and hot entry profiling are studied, resulting in up to 9.8x memory latency speedup over a highly-optimized baseline. Overall, RecNMP offers 4.2x throughput improvement and 45.8% memory energy savings.
△ Less
Submitted 30 December, 2019;
originally announced December 2019.
-
Near Data Acceleration with Concurrent Host Access
Authors:
Benjamin Y. Cho,
Yongkee Kwon,
Sangkug Lym,
Mattan Erez
Abstract:
Near-data accelerators (NDAs) that are integrated with main memory have the potential for significant power and performance benefits. Fully realizing these benefits requires the large available memory capacity to be shared between the host and the NDAs in a way that permits both regular memory access by some applications and accelerating others with an NDA, avoids copying data, enables collaborati…
▽ More
Near-data accelerators (NDAs) that are integrated with main memory have the potential for significant power and performance benefits. Fully realizing these benefits requires the large available memory capacity to be shared between the host and the NDAs in a way that permits both regular memory access by some applications and accelerating others with an NDA, avoids copying data, enables collaborative processing, and simultaneously offers high performance for both host and NDA. We identify and solve new challenges in this context: mitigating row-locality interference from host to NDAs, reducing read/write-turnaround overhead caused by fine-grain interleaving of host and NDA requests, architecting a memory layout that supports the locality required for NDAs and sophisticated address interleaving for host performance, and supporting both packetized and traditional memory interfaces. We demonstrate our approach in a simulated system that consists of a multi-core CPU and NDA-enabled DDR4 memory modules. We show that our mechanisms enable effective and efficient concurrent access using a set of microbenchmarks, and then demonstrate the potential of the system for the important stochastic variance-reduced gradient (SVRG) algorithm.
△ Less
Submitted 30 November, 2020; v1 submitted 17 August, 2019;
originally announced August 2019.
-
Deep-neural-network based sinogram synthesis for sparse-view CT image reconstruction
Authors:
Hoyeon Lee,
Jongha Lee,
Hyeongseok Kim,
Byungchul Cho,
Seungryong Cho
Abstract:
Recently, a number of approaches to low-dose computed tomography (CT) have been developed and deployed in commercialized CT scanners. Tube current reduction is perhaps the most actively explored technology with advanced image reconstruction algorithms. Sparse data sampling is another viable option to the low-dose CT, and sparse-view CT has been particularly of interest among the researchers in CT…
▽ More
Recently, a number of approaches to low-dose computed tomography (CT) have been developed and deployed in commercialized CT scanners. Tube current reduction is perhaps the most actively explored technology with advanced image reconstruction algorithms. Sparse data sampling is another viable option to the low-dose CT, and sparse-view CT has been particularly of interest among the researchers in CT community. Since analytic image reconstruction algorithms would lead to severe image artifacts, various iterative algorithms have been developed for reconstructing images from sparsely view-sampled projection data. However, iterative algorithms take much longer computation time than the analytic algorithms, and images are usually prone to different types of image artifacts that heavily depend on the reconstruction parameters. Interpolation methods have also been utilized to fill the missing data in the sinogram of sparse-view CT thus providing synthetically full data for analytic image reconstruction. In this work, we introduce a deep-neural-network-enabled sinogram synthesis method for sparse-view CT, and show its outperformance to the existing interpolation methods and also to the iterative image reconstruction approach.
△ Less
Submitted 5 March, 2018; v1 submitted 1 March, 2018;
originally announced March 2018.
-
Co-primary Spectrum Sharing for Inter-operator Device-to-Device Communication
Authors:
Byungjin Cho,
Konstantinos Koufos,
Riku Jäntti,
Seong-Lyun Kim
Abstract:
The business potential of device-to-device (D2D) communication including public safety and vehicular communications will be realized only if direct communication between devices subscribed to different mobile operators (OPs) is supported. One possible way to implement inter-operator D2D communication may use the licensed spectrum of the OPs, i.e., OPs agree to share spectrum in a co-primary manner…
▽ More
The business potential of device-to-device (D2D) communication including public safety and vehicular communications will be realized only if direct communication between devices subscribed to different mobile operators (OPs) is supported. One possible way to implement inter-operator D2D communication may use the licensed spectrum of the OPs, i.e., OPs agree to share spectrum in a co-primary manner, and inter-operator D2D communication is allocated over spectral resources contributed from both parties. In this paper, we consider a spectrum sharing scenario where a number of OPs construct a spectrum pool dedicated to support inter-operator D2D communication. OPs negotiate in the form of a non-cooperative game about how much spectrum each OP contributes to the spectrum pool. OPs submit proposals to each other in parallel until a consensus is reached. When every OP has a concave utility function on the box-constrained region, we identify the conditions guaranteeing the existence of a unique equilibrium point. We show that the iterative algorithm based on the OP's best response might not converge to the equilibrium point due to myopically overreacting to the response of the other OPs, while the Jacobi-play strategy update algorithm can converge with an appropriate selection of update parameter. Using the Jacobi-play update algorithm, we illustrate that asymmetric OPs contribute an unequal amount of resources to the spectrum pool; However all participating OPs may experience significant performance gains compared to the scheme without spectrum sharing.
△ Less
Submitted 7 November, 2016;
originally announced November 2016.
-
Modeling the Interference Generated from Car Base Stations towards Indoor Femto-cells
Authors:
Byungjin Cho,
Konstantinos Koufos,
Kalle Ruttik,
Riku Jäntti
Abstract:
In future wireless networks, a significant number of users will be vehicular. One promising solution to improve the capacity for these vehicular users is to employ moving relays or car base stations. The system forms cell inside the vehicle and then uses rooftop antenna for back-hauling to overcome the vehicular penetration loss. In this paper, we develop a model for aggregate interference distrib…
▽ More
In future wireless networks, a significant number of users will be vehicular. One promising solution to improve the capacity for these vehicular users is to employ moving relays or car base stations. The system forms cell inside the vehicle and then uses rooftop antenna for back-hauling to overcome the vehicular penetration loss. In this paper, we develop a model for aggregate interference distribution generated from moving/parked cars to indoor users in order to study whether indoor femto-cells can coexist on the same spectrum with vehicular communications. Since spectrum authorization for vehicular communications is open at moment, we consider two spectrum sharing scenarios (i) communication from mounted antennas on the roof of the vehicles to the infrastructure network utilizes same spectrum with indoor femto-cells (ii) in-vehicle communication utilizes same spectrum with indoor femto-cells while vehicular to infrastructure (V2I) communication is allocated at different spectrum. Based on our findings we suggest that V2I and indoor femto-cells should be allocated at different spectrum. The reason being that mounted roof-top antennas facing the indoor cells generate unacceptable interference levels. On the other hand, in-vehicle communication and indoor cells can share the spectrum thanks to the vehicle body isolation and the lower transmit power levels that can be used inside the vehicle.
△ Less
Submitted 26 May, 2015;
originally announced May 2015.
-
Spectrum Allocation for Multi-Operator Device-to-Device Communication
Authors:
Byungjin Cho,
Konstantinos Koufos,
Riku Jäntti,
Zexian Li,
Mikko A. Uusitalo
Abstract:
In order to harvest the business potential of device-to-device (D2D) communication, direct communication between devices subscribed to different mobile operators should be supported. This would also support meeting requirements resulting from D2D relevant scenarios, like vehicle-to-vehicle communication. In this paper, we propose to allocate the multi-operator D2D communication over dedicated cell…
▽ More
In order to harvest the business potential of device-to-device (D2D) communication, direct communication between devices subscribed to different mobile operators should be supported. This would also support meeting requirements resulting from D2D relevant scenarios, like vehicle-to-vehicle communication. In this paper, we propose to allocate the multi-operator D2D communication over dedicated cellular spectral resources contributed from both operators. Ideally, the operators should negotiate about the amount of spectrum to contribute, without revealing proprietary information to each other and/or to other parties. One possible way to do that is to use the sequence of operators' best responses, i.e., the operators make offers about the amount of spectrum to contribute using a sequential updating procedure until reaching consensus. Besides spectrum allocation, we need a mode selection scheme for the multi-operator D2D users. We use a stochastic geometry framework to capture the impact of mode selection on the distribution of D2D users and assess the performance of the best response iteration algorithm. With the performance metrics considered in the paper, we show that the best response iteration has a unique Nash equilibrium that can be reached from any initial strategy. In general, asymmetric operators would contribute unequal amounts of spectrum for multi-operator D2D communication. Provided that the multi-operator D2D density is not negligible, we show that both operators may experience significant performance gains as compared to the scheme without spectrum sharing.
△ Less
Submitted 14 February, 2015;
originally announced February 2015.