-
Accelerator-as-a-Service in Public Clouds: An Intra-Host Traffic Management View for Performance Isolation in the Wild
Authors:
Jiechen Zhao,
Ran Shu,
Katie Lim,
Zewen Fan,
Thomas Anderson,
Mingyu Gao,
Natalie Enright Jerger
Abstract:
I/O devices in public clouds have integrated increasing numbers of hardware accelerators, e.g., AWS Nitro, Azure FPGA and Nvidia BlueField. However, such specialized compute (1) is not explicitly accessible to cloud users with performance guarantee, (2) cannot be leveraged simultaneously by both providers and users, unlike general-purpose compute (e.g., CPUs). Through ten observations, we present…
▽ More
I/O devices in public clouds have integrated increasing numbers of hardware accelerators, e.g., AWS Nitro, Azure FPGA and Nvidia BlueField. However, such specialized compute (1) is not explicitly accessible to cloud users with performance guarantee, (2) cannot be leveraged simultaneously by both providers and users, unlike general-purpose compute (e.g., CPUs). Through ten observations, we present that the fundamental difficulty of democratizing accelerators is insufficient performance isolation support. The key obstacles to enforcing accelerator isolation are (1) too many unknown traffic patterns in public clouds and (2) too many possible contention sources in the datapath. In this work, instead of scheduling such complex traffic on-the-fly and augmenting isolation support on each system component, we propose to model traffic as network flows and proactively re-shape the traffic to avoid unpredictable contention. We discuss the implications of our findings on the design of future I/O management stacks and device interfaces.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
Beehive: A Flexible Network Stack for Direct-Attached Accelerators
Authors:
Katie Lim,
Matthew Giordano,
Theano Stavrinos,
Pratyush Patel,
Jacob Nelson,
Irene Zhang,
Baris Kasikci,
Tom Anderson
Abstract:
Direct-attached accelerators, where application accelerators are directly connected to the datacenter network via a hardware network stack, offer substantial benefits in terms of reduced latency, CPU overhead, and energy use. However, a key challenge is that modern datacenter network stacks are complex, with interleaved protocol layers, network management functions, and virtualization support. To…
▽ More
Direct-attached accelerators, where application accelerators are directly connected to the datacenter network via a hardware network stack, offer substantial benefits in terms of reduced latency, CPU overhead, and energy use. However, a key challenge is that modern datacenter network stacks are complex, with interleaved protocol layers, network management functions, and virtualization support. To operators, network feature agility, diagnostics, and manageability are often considered just as important as raw performance. By contrast, existing hardware network stacks only support basic protocols and are often difficult to extend since they use fixed processing pipelines.
We propose Beehive, a new, open-source FPGA network stack for direct-attached accelerators designed to enable flexible and adaptive construction of complex network functionality in hardware. Application and network protocol elements are modularized as tiles over a network-on-chip substrate. Elements can be added or scaled up/down to match workload characteristics with minimal effort or changes to other elements. Flexible diagnostics and control are integral, with tooling to ensure deadlock safety. Our implementation interoperates with standard Linux TCP and UDP clients, with a 4x improvement in end-to-end remote procedure call tail latency for Linux UDP clients versus a CPU-attached accelerator
△ Less
Submitted 30 May, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
Detecting Concrete Visual Tokens for Multimodal Machine Translation
Authors:
Braeden Bowen,
Vipin Vijayan,
Scott Grigsby,
Timothy Anderson,
Jeremy Gwinnup
Abstract:
The challenge of visual grounding and masking in multimodal machine translation (MMT) systems has encouraged varying approaches to the detection and selection of visually-grounded text tokens for masking. We introduce new methods for detection of visually and contextually relevant (concrete) tokens from source sentences, including detection with natural language processing (NLP), detection with ob…
▽ More
The challenge of visual grounding and masking in multimodal machine translation (MMT) systems has encouraged varying approaches to the detection and selection of visually-grounded text tokens for masking. We introduce new methods for detection of visually and contextually relevant (concrete) tokens from source sentences, including detection with natural language processing (NLP), detection with object detection, and a joint detection-verification technique. We also introduce new methods for selection of detected tokens, including shortest $n$ tokens, longest $n$ tokens, and all detected concrete tokens. We utilize the GRAM MMT architecture to train models against synthetically collated multimodal datasets of source images with masked sentences, showing performance improvements and improved usage of visual context during translation tasks over the baseline model.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Adding Multimodal Capabilities to a Text-only Translation Model
Authors:
Vipin Vijayan,
Braeden Bowen,
Scott Grigsby,
Timothy Anderson,
Jeremy Gwinnup
Abstract:
While most current work in multimodal machine translation (MMT) uses the Multi30k dataset for training and evaluation, we find that the resulting models overfit to the Multi30k dataset to an extreme degree. Consequently, these models perform very badly when evaluated against typical text-only testing sets such as the WMT newstest datasets. In order to perform well on both Multi30k and typical text…
▽ More
While most current work in multimodal machine translation (MMT) uses the Multi30k dataset for training and evaluation, we find that the resulting models overfit to the Multi30k dataset to an extreme degree. Consequently, these models perform very badly when evaluated against typical text-only testing sets such as the WMT newstest datasets. In order to perform well on both Multi30k and typical text-only datasets, we use a performant text-only machine translation (MT) model as the starting point of our MMT model. We add vision-text adapter layers connected via gating mechanisms to the MT model, and incrementally transform the MT model into an MMT model by 1) pre-training using vision-based masking of the source text and 2) fine-tuning on Multi30k.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
The Case for Evaluating Multimodal Translation Models on Text Datasets
Authors:
Vipin Vijayan,
Braeden Bowen,
Scott Grigsby,
Timothy Anderson,
Jeremy Gwinnup
Abstract:
A good evaluation framework should evaluate multimodal machine translation (MMT) models by measuring 1) their use of visual information to aid in the translation task and 2) their ability to translate complex sentences such as done for text-only machine translation. However, most current work in MMT is evaluated against the Multi30k testing sets, which do not measure these properties. Namely, the…
▽ More
A good evaluation framework should evaluate multimodal machine translation (MMT) models by measuring 1) their use of visual information to aid in the translation task and 2) their ability to translate complex sentences such as done for text-only machine translation. However, most current work in MMT is evaluated against the Multi30k testing sets, which do not measure these properties. Namely, the use of visual information by the MMT model cannot be shown directly from the Multi30k test set results and the sentences in Multi30k are are image captions, i.e., short, descriptive sentences, as opposed to complex sentences that typical text-only machine translation models are evaluated against.
Therefore, we propose that MMT models be evaluated using 1) the CoMMuTE evaluation framework, which measures the use of visual information by MMT models, 2) the text-only WMT news translation task test sets, which evaluates translation performance against complex sentences, and 3) the Multi30k test sets, for measuring MMT model performance against a real MMT dataset. Finally, we evaluate recent MMT models trained solely against the Multi30k dataset against our proposed evaluation framework and demonstrate the dramatic drop performance against text-only testing sets compared to recent text-only MT models.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
MoSAR: Monocular Semi-Supervised Model for Avatar Reconstruction using Differentiable Shading
Authors:
Abdallah Dib,
Luiz Gustavo Hafemann,
Emeline Got,
Trevor Anderson,
Amin Fadaeinejad,
Rafael M. O. Cruz,
Marc-Andre Carbonneau
Abstract:
Reconstructing an avatar from a portrait image has many applications in multimedia, but remains a challenging research problem. Extracting reflectance maps and geometry from one image is ill-posed: recovering geometry is a one-to-many mapping problem and reflectance and light are difficult to disentangle. Accurate geometry and reflectance can be captured under the controlled conditions of a light…
▽ More
Reconstructing an avatar from a portrait image has many applications in multimedia, but remains a challenging research problem. Extracting reflectance maps and geometry from one image is ill-posed: recovering geometry is a one-to-many mapping problem and reflectance and light are difficult to disentangle. Accurate geometry and reflectance can be captured under the controlled conditions of a light stage, but it is costly to acquire large datasets in this fashion. Moreover, training solely with this type of data leads to poor generalization with in-the-wild images. This motivates the introduction of MoSAR, a method for 3D avatar generation from monocular images. We propose a semi-supervised training scheme that improves generalization by learning from both light stage and in-the-wild datasets. This is achieved using a novel differentiable shading formulation. We show that our approach effectively disentangles the intrinsic face parameters, producing relightable avatars. As a result, MoSAR estimates a richer set of skin reflectance maps, and generates more realistic avatars than existing state-of-the-art methods. We also introduce a new dataset, named FFHQ-UV-Intrinsics, the first public dataset providing intrinsic face attributes at scale (diffuse, specular, ambient occlusion and translucency maps) for a total of 10k subjects. The project website and the dataset are available on the following link: https://ubisoft-laforge.github.io/character/mosar/
△ Less
Submitted 21 December, 2023; v1 submitted 20 December, 2023;
originally announced December 2023.
-
Efficient Transformer Knowledge Distillation: A Performance Review
Authors:
Nathan Brown,
Ashton Williamson,
Tahj Anderson,
Logan Lawrence
Abstract:
As pretrained transformer language models continue to achieve state-of-the-art performance, the Natural Language Processing community has pushed for advances in model compression and efficient attention mechanisms to address high computational requirements and limited input sequence length. Despite these separate efforts, no investigation has been done into the intersection of these two fields. In…
▽ More
As pretrained transformer language models continue to achieve state-of-the-art performance, the Natural Language Processing community has pushed for advances in model compression and efficient attention mechanisms to address high computational requirements and limited input sequence length. Despite these separate efforts, no investigation has been done into the intersection of these two fields. In this work, we provide an evaluation of model compression via knowledge distillation on efficient attention transformers. We provide cost-performance trade-offs for the compression of state-of-the-art efficient attention architectures and the gains made in performance in comparison to their full attention counterparts. Furthermore, we introduce a new long-context Named Entity Recognition dataset, GONERD, to train and test the performance of NER models on long sequences. We find that distilled efficient attention transformers can preserve a significant amount of original model performance, preserving up to 98.6% across short-context tasks (GLUE, SQUAD, CoNLL-2003), up to 94.6% across long-context Question-and-Answering tasks (HotpotQA, TriviaQA), and up to 98.8% on long-context Named Entity Recognition (GONERD), while decreasing inference times by up to 57.8%. We find that, for most models on most tasks, performing knowledge distillation is an effective method to yield high-performing efficient attention models with low costs.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
Towards Mobility Data Science (Vision Paper)
Authors:
Mohamed Mokbel,
Mahmoud Sakr,
Li Xiong,
Andreas Züfle,
Jussara Almeida,
Taylor Anderson,
Walid Aref,
Gennady Andrienko,
Natalia Andrienko,
Yang Cao,
Sanjay Chawla,
Reynold Cheng,
Panos Chrysanthis,
Xiqi Fei,
Gabriel Ghinita,
Anita Graser,
Dimitrios Gunopulos,
Christian Jensen,
Joon-Seok Kim,
Kyoung-Sook Kim,
Peer Kröger,
John Krumm,
Johannes Lauer,
Amr Magdy,
Mario Nascimento
, et al. (23 additional authors not shown)
Abstract:
Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences…
▽ More
Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences. In this paper, we present the emerging domain of mobility data science. Towards a unified approach to mobility data science, we envision a pipeline having the following components: mobility data collection, cleaning, analysis, management, and privacy. For each of these components, we explain how mobility data science differs from general data science, we survey the current state of the art and describe open challenges for the research community in the coming years.
△ Less
Submitted 7 March, 2024; v1 submitted 21 June, 2023;
originally announced July 2023.
-
Observation of high-energy neutrinos from the Galactic plane
Authors:
R. Abbasi,
M. Ackermann,
J. Adams,
J. A. Aguilar,
M. Ahlers,
M. Ahrens,
J. M. Alameddine,
A. A. Alves Jr.,
N. M. Amin,
K. Andeen,
T. Anderson,
G. Anton,
C. Argüelles,
Y. Ashida,
S. Athanasiadou,
S. Axani,
X. Bai,
A. Balagopal V.,
S. W. Barwick,
V. Basu,
S. Baur,
R. Bay,
J. J. Beatty,
K. -H. Becker,
J. Becker Tjus
, et al. (364 additional authors not shown)
Abstract:
The origin of high-energy cosmic rays, atomic nuclei that continuously impact Earth's atmosphere, has been a mystery for over a century. Due to deflection in interstellar magnetic fields, cosmic rays from the Milky Way arrive at Earth from random directions. However, near their sources and during propagation, cosmic rays interact with matter and produce high-energy neutrinos. We search for neutrin…
▽ More
The origin of high-energy cosmic rays, atomic nuclei that continuously impact Earth's atmosphere, has been a mystery for over a century. Due to deflection in interstellar magnetic fields, cosmic rays from the Milky Way arrive at Earth from random directions. However, near their sources and during propagation, cosmic rays interact with matter and produce high-energy neutrinos. We search for neutrino emission using machine learning techniques applied to ten years of data from the IceCube Neutrino Observatory. We identify neutrino emission from the Galactic plane at the 4.5$σ$ level of significance, by comparing diffuse emission models to a background-only hypothesis. The signal is consistent with modeled diffuse emission from the Galactic plane, but could also arise from a population of unresolved point sources.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
Agile Development of Linux Schedulers with Ekiben
Authors:
Samantha Miller,
Anirudh Kumar,
Tanay Vakharia,
Tom Anderson,
Ang Chen,
Danyang Zhuo
Abstract:
Kernel task scheduling is important for application performance, adaptability to new hardware, and complex user requirements. However, developing, testing, and debugging new scheduling algorithms in Linux, the most widely used cloud operating system, is slow and difficult. We developed Ekiben, a framework for high velocity development of Linux kernel schedulers. Ekiben schedulers are written in sa…
▽ More
Kernel task scheduling is important for application performance, adaptability to new hardware, and complex user requirements. However, developing, testing, and debugging new scheduling algorithms in Linux, the most widely used cloud operating system, is slow and difficult. We developed Ekiben, a framework for high velocity development of Linux kernel schedulers. Ekiben schedulers are written in safe Rust, and the system supports live upgrade of new scheduling policies into the kernel, userspace debugging, and bidirectional communication with applications. A scheduler implemented with Ekiben achieved near identical performance (within 1% on average) to the default Linux scheduler CFS on a wide range of benchmarks. Ekiben is also able to support a range of research schedulers, specifically the Shinjuku scheduler, a locality aware scheduler, and the Arachne core arbiter, with good performance.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
Large Language Models Based Automatic Synthesis of Software Specifications
Authors:
Shantanu Mandal,
Adhrik Chethan,
Vahid Janfaza,
S M Farabi Mahmud,
Todd A Anderson,
Javier Turek,
Jesmin Jahan Tithi,
Abdullah Muzahid
Abstract:
Software configurations play a crucial role in determining the behavior of software systems. In order to ensure safe and error-free operation, it is necessary to identify the correct configuration, along with their valid bounds and rules, which are commonly referred to as software specifications. As software systems grow in complexity and scale, the number of configurations and associated specific…
▽ More
Software configurations play a crucial role in determining the behavior of software systems. In order to ensure safe and error-free operation, it is necessary to identify the correct configuration, along with their valid bounds and rules, which are commonly referred to as software specifications. As software systems grow in complexity and scale, the number of configurations and associated specifications required to ensure the correct operation can become large and prohibitively difficult to manipulate manually. Due to the fast pace of software development, it is often the case that correct software specifications are not thoroughly checked or validated within the software itself. Rather, they are frequently discussed and documented in a variety of external sources, including software manuals, code comments, and online discussion forums. Therefore, it is hard for the system administrator to know the correct specifications of configurations due to the lack of clarity, organization, and a centralized unified source to look at. To address this challenge, we propose SpecSyn a framework that leverages a state-of-the-art large language model to automatically synthesize software specifications from natural language sources. Our approach formulates software specification synthesis as a sequence-to-sequence learning problem and investigates the extraction of specifications from large contextual texts. This is the first work that uses a large language model for end-to-end specification synthesis from natural language texts. Empirical results demonstrate that our system outperforms prior the state-of-the-art specification synthesis tool by 21% in terms of F1 score and can find specifications from single as well as multiple sentences.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
Remote Procedure Call as a Managed System Service
Authors:
Jingrong Chen,
Yongji Wu,
Shihan Lin,
Yechen Xu,
Xinhao Kong,
Thomas Anderson,
Matthew Lentz,
Xiaowei Yang,
Danyang Zhuo
Abstract:
Remote Procedure Call (RPC) is a widely used abstraction for cloud computing. The programmer specifies type information for each remote procedure, and a compiler generates stub code linked into each application to marshal and unmarshal arguments into message buffers. Increasingly, however, application and service operations teams need a high degree of visibility and control over the flow of RPCs b…
▽ More
Remote Procedure Call (RPC) is a widely used abstraction for cloud computing. The programmer specifies type information for each remote procedure, and a compiler generates stub code linked into each application to marshal and unmarshal arguments into message buffers. Increasingly, however, application and service operations teams need a high degree of visibility and control over the flow of RPCs between services, leading many installations to use sidecars or service mesh proxies for manageability and policy flexibility. These sidecars typically involve inspection and modification of RPC data that the stub compiler had just carefully assembled, adding needless overhead. Further, upgrading diverse application RPC stubs to use advanced hardware capabilities such as RDMA or DPDK is a long and involved process, and often incompatible with sidecar policy control.
In this paper, we propose, implement, and evaluate a novel approach, where RPC marshalling and policy enforcement are done as a system service rather than as a library linked into each application. Applications specify type information to the RPC system as before, while the RPC service executes policy engines and arbitrates resource use, and then marshals data customized to the underlying network hardware capabilities. Our system, mRPC, also supports live upgrades so that both policy and marshalling code can be updated transparently to application code. Compared with using a sidecar, mRPC speeds up a standard microservice benchmark, DeathStarBench, by up to 2.5$\times$ while having a higher level of policy flexibility and availability.
△ Less
Submitted 14 April, 2023;
originally announced April 2023.
-
Hybrid Computing for Interactive Datacenter Applications
Authors:
Pratyush Patel,
Katie Lim,
Kushal Jhunjhunwalla,
Ashlie Martinez,
Max Demoulin,
Jacob Nelson,
Irene Zhang,
Thomas Anderson
Abstract:
Field-Programmable Gate Arrays (FPGAs) are more energy efficient and cost effective than CPUs for a wide variety of datacenter applications. Yet, for latency-sensitive and bursty workloads, this advantage can be difficult to harness due to high FPGA spin-up costs. We propose that a hybrid FPGA and CPU computing framework can harness the energy efficiency benefits of FPGAs for such workloads at rea…
▽ More
Field-Programmable Gate Arrays (FPGAs) are more energy efficient and cost effective than CPUs for a wide variety of datacenter applications. Yet, for latency-sensitive and bursty workloads, this advantage can be difficult to harness due to high FPGA spin-up costs. We propose that a hybrid FPGA and CPU computing framework can harness the energy efficiency benefits of FPGAs for such workloads at reasonable cost. Our key insight is to use FPGAs for stable-state workload and CPUs for short-term workload bursts. Using this insight, we design Spork, a lightweight hybrid scheduler that can realize these energy efficiency and cost benefits in practice. Depending on the desired objective, Spork can trade off energy efficiency for cost reduction and vice versa. It is parameterized with key differences between FPGAs and CPUs in terms of power draw, performance, cost, and spin-up latency. We vary this parameter space and analyze various application and worker configurations on production and synthetic traces. Our evaluation of cloud workloads shows that energy-optimized Spork is not only more energy efficient but it is also cheaper than homogeneous platforms--for short application requests with tight deadlines, it is 1.53x more energy efficient and 2.14x cheaper than using only FPGAs. Relative to an idealized version of an existing cost-optimized hybrid scheduler, energy-optimized Spork provides 1.2-2.4x higher energy efficiency at comparable cost, while cost-optimized Spork provides 1.1-2x higher energy efficiency at 1.06-1.2x lower cost.
△ Less
Submitted 10 April, 2023;
originally announced April 2023.
-
Improved Quantum Query Complexity on Easier Inputs
Authors:
Noel T. Anderson,
Jay-U Chung,
Shelby Kimmel,
Da-Yeon Koh,
Xiaohan Ye
Abstract:
Quantum span program algorithms for function evaluation sometimes have reduced query complexity when promised that the input has a certain structure. We design a modified span program algorithm to show these improvements persist even without a promise ahead of time, and we extend this approach to the more general problem of state conversion. As an application, we prove exponential and superpolynom…
▽ More
Quantum span program algorithms for function evaluation sometimes have reduced query complexity when promised that the input has a certain structure. We design a modified span program algorithm to show these improvements persist even without a promise ahead of time, and we extend this approach to the more general problem of state conversion. As an application, we prove exponential and superpolynomial quantum advantages in average query complexity for several search problems, generalizing Montanaro's Search with Advice [Montanaro, TQC 2010].
△ Less
Submitted 1 April, 2024; v1 submitted 28 February, 2023;
originally announced March 2023.
-
Synthesizing Programs with Continuous Optimization
Authors:
Shantanu Mandal,
Todd A. Anderson,
Javier Turek,
Justin Gottschlich,
Abdullah Muzahid
Abstract:
Automatic software generation based on some specification is known as program synthesis. Most existing approaches formulate program synthesis as a search problem with discrete parameters. In this paper, we present a novel formulation of program synthesis as a continuous optimization problem and use a state-of-the-art evolutionary approach, known as Covariance Matrix Adaptation Evolution Strategy t…
▽ More
Automatic software generation based on some specification is known as program synthesis. Most existing approaches formulate program synthesis as a search problem with discrete parameters. In this paper, we present a novel formulation of program synthesis as a continuous optimization problem and use a state-of-the-art evolutionary approach, known as Covariance Matrix Adaptation Evolution Strategy to solve the problem. We then propose a mapping scheme to convert the continuous formulation into actual programs. We compare our system, called GENESYS, with several recent program synthesis techniques (in both discrete and continuous domains) and show that GENESYS synthesizes more programs within a fixed time budget than those existing schemes. For example, for programs of length 10, GENESYS synthesizes 28% more programs than those existing schemes within the same time budget.
△ Less
Submitted 3 April, 2023; v1 submitted 1 November, 2022;
originally announced November 2022.
-
Learning When to Say "I Don't Know"
Authors:
Nicholas Kashani Motlagh,
Jim Davis,
Tim Anderson,
Jeremy Gwinnup
Abstract:
We propose a new Reject Option Classification technique to identify and remove regions of uncertainty in the decision space for a given neural classifier and dataset. Such existing formulations employ a learned rejection (remove)/selection (keep) function and require either a known cost for rejecting examples or strong constraints on the accuracy or coverage of the selected examples. We consider a…
▽ More
We propose a new Reject Option Classification technique to identify and remove regions of uncertainty in the decision space for a given neural classifier and dataset. Such existing formulations employ a learned rejection (remove)/selection (keep) function and require either a known cost for rejecting examples or strong constraints on the accuracy or coverage of the selected examples. We consider an alternative formulation by instead analyzing the complementary reject region and employing a validation set to learn per-class softmax thresholds. The goal is to maximize the accuracy of the selected examples subject to a natural randomness allowance on the rejected examples (rejecting more incorrect than correct predictions). We provide results showing the benefits of the proposed method over naïvely thresholding calibrated/uncalibrated softmax scores with 2-D points, imagery, and text classification datasets using state-of-the-art pretrained models. Source code is available at https://github.com/osu-cvl/learning-idk.
△ Less
Submitted 15 February, 2023; v1 submitted 11 September, 2022;
originally announced September 2022.
-
Graph Neural Networks for Low-Energy Event Classification & Reconstruction in IceCube
Authors:
R. Abbasi,
M. Ackermann,
J. Adams,
N. Aggarwal,
J. A. Aguilar,
M. Ahlers,
M. Ahrens,
J. M. Alameddine,
A. A. Alves Jr.,
N. M. Amin,
K. Andeen,
T. Anderson,
G. Anton,
C. Argüelles,
Y. Ashida,
S. Athanasiadou,
S. Axani,
X. Bai,
A. Balagopal V.,
M. Baricevic,
S. W. Barwick,
V. Basu,
R. Bay,
J. J. Beatty,
K. -H. Becker
, et al. (359 additional authors not shown)
Abstract:
IceCube, a cubic-kilometer array of optical sensors built to detect atmospheric and astrophysical neutrinos between 1 GeV and 1 PeV, is deployed 1.45 km to 2.45 km below the surface of the ice sheet at the South Pole. The classification and reconstruction of events from the in-ice detectors play a central role in the analysis of data from IceCube. Reconstructing and classifying events is a challen…
▽ More
IceCube, a cubic-kilometer array of optical sensors built to detect atmospheric and astrophysical neutrinos between 1 GeV and 1 PeV, is deployed 1.45 km to 2.45 km below the surface of the ice sheet at the South Pole. The classification and reconstruction of events from the in-ice detectors play a central role in the analysis of data from IceCube. Reconstructing and classifying events is a challenge due to the irregular detector geometry, inhomogeneous scattering and absorption of light in the ice and, below 100 GeV, the relatively low number of signal photons produced per event. To address this challenge, it is possible to represent IceCube events as point cloud graphs and use a Graph Neural Network (GNN) as the classification and reconstruction method. The GNN is capable of distinguishing neutrino events from cosmic-ray backgrounds, classifying different neutrino event types, and reconstructing the deposited energy, direction and interaction vertex. Based on simulation, we provide a comparison in the 1-100 GeV energy range to the current state-of-the-art maximum likelihood techniques used in current IceCube analyses, including the effects of known systematic uncertainties. For neutrino event classification, the GNN increases the signal efficiency by 18% at a fixed false positive rate (FPR), compared to current IceCube methods. Alternatively, the GNN offers a reduction of the FPR by over a factor 8 (to below half a percent) at a fixed signal efficiency. For the reconstruction of energy, direction, and interaction vertex, the resolution improves by an average of 13%-20% compared to current maximum likelihood techniques in the energy range of 1-30 GeV. The GNN, when run on a GPU, is capable of processing IceCube events at a rate nearly double of the median IceCube trigger rate of 2.7 kHz, which opens the possibility of using low energy neutrinos in online searches for transient events.
△ Less
Submitted 11 October, 2022; v1 submitted 7 September, 2022;
originally announced September 2022.
-
Scalable Tail Latency Estimation for Data Center Networks
Authors:
Kevin Zhao,
Prateesh Goyal,
Mohammad Alizadeh,
Thomas E. Anderson
Abstract:
In this paper, we consider how to provide fast estimates of flow-level tail latency performance for very large scale data center networks. Network tail latency is often a crucial metric for cloud application performance that can be affected by a wide variety of factors, including network load, inter-rack traffic skew, traffic burstiness, flow size distributions, oversubscription, and topology asym…
▽ More
In this paper, we consider how to provide fast estimates of flow-level tail latency performance for very large scale data center networks. Network tail latency is often a crucial metric for cloud application performance that can be affected by a wide variety of factors, including network load, inter-rack traffic skew, traffic burstiness, flow size distributions, oversubscription, and topology asymmetry. Network simulators such as ns-3 and OMNeT++ can provide accurate answers, but are very hard to parallelize, taking hours or days to answer what if questions for a single configuration at even moderate scale. Recent work with MimicNet has shown how to use machine learning to improve simulation performance, but at a cost of including a long training step per configuration, and with assumptions about workload and topology uniformity that typically do not hold in practice.
We address this gap by developing a set of techniques to provide fast performance estimates for large scale networks with general traffic matrices and topologies. A key step is to decompose the problem into a large number of parallel independent single-link simulations; we carefully combine these link-level simulations to produce accurate estimates of end-to-end flow level performance distributions for the entire network. Like MimicNet, we exploit symmetry where possible to gain additional speedups, but without relying on machine learning, so there is no training delay. On large-scale networks where ns-3 takes 11 to 27 hours to simulate five seconds of network behavior, our techniques run in one to two minutes with 99th percentile accuracy within 9% for flow completion times.
△ Less
Submitted 30 September, 2022; v1 submitted 2 May, 2022;
originally announced May 2022.
-
Minimizing Trust with Exclusively-Used Physically-Isolated Hardware
Authors:
Zhihao Yao,
Seyed Mohammadjavad Seyed Talebi,
Mingyi Chen,
Ardalan Amiri Sani,
Thomas Anderson
Abstract:
Smartphone owners often need to run security-critical programs on the same device as other untrusted and potentially malicious programs. This requires users to trust hardware and system software to correctly sandbox malicious programs, trust that is often misplaced.
Our goal is to minimize the number and complexity of hardware and software components that a smartphone owner needs to trust to wit…
▽ More
Smartphone owners often need to run security-critical programs on the same device as other untrusted and potentially malicious programs. This requires users to trust hardware and system software to correctly sandbox malicious programs, trust that is often misplaced.
Our goal is to minimize the number and complexity of hardware and software components that a smartphone owner needs to trust to withstand adversarial inputs. We present a multi-domain hardware design composed of statically-partitioned, physically-isolated trust domains. We introduce a few simple, formally-verified hardware components to enable a program to gain provably exclusive and simultaneous access to both computation and I/O on a temporary basis. To manage this hardware, we present OctopOS, an OS composed of mutually distrustful subsystems.
We present a prototype of this machine (hardware and OS) on a CPU-FPGA board and show that it incurs a small hardware cost compared to modern SoCs. For security-critical programs, we show that this machine significantly reduces the required trust compared to mainstream TEEs while achieving decent performance. For normal programs, performance is similar to a legacy machine.
△ Less
Submitted 20 October, 2022; v1 submitted 15 March, 2022;
originally announced March 2022.
-
Optimal Congestion Control for Time-varying Wireless Links
Authors:
Prateesh Goyal,
Mohammad Alizadeh,
Thomas E. Anderson
Abstract:
Modern networks exhibit a high degree of variability in link rates. Cellular network bandwidth inherently varies with receiver motion and orientation, while class-based packet scheduling in datacenter and service provider networks induces high variability in available capacity for network tenants. Recent work has proposed numerous congestion control protocols to cope with this variability, offerin…
▽ More
Modern networks exhibit a high degree of variability in link rates. Cellular network bandwidth inherently varies with receiver motion and orientation, while class-based packet scheduling in datacenter and service provider networks induces high variability in available capacity for network tenants. Recent work has proposed numerous congestion control protocols to cope with this variability, offering different tradeoffs between link utilization and queuing delay. In this paper, we develop a formal model of congestion control over time-varying links, and we use this model to derive a bound on the performance of any congestion control protocol running over a time-varying link with a given distribution of rate variation. Using the insights from this analysis, we derive an optimal control law that offers a smooth tradeoff between link utilization and queuing delay. We compare the performance of this control law to several existing control algorithms on cellular link traces to show that there is significant room for optimization.
△ Less
Submitted 9 February, 2022;
originally announced February 2022.
-
Treehouse: A Case For Carbon-Aware Datacenter Software
Authors:
Thomas Anderson,
Adam Belay,
Mosharaf Chowdhury,
Asaf Cidon,
Irene Zhang
Abstract:
The end of Dennard scaling and the slowing of Moore's Law has put the energy use of datacenters on an unsustainable path. Datacenters are already a significant fraction of worldwide electricity use, with application demand scaling at a rapid rate. We argue that substantial reductions in the carbon intensity of datacenter computing are possible with a software-centric approach: by making energy and…
▽ More
The end of Dennard scaling and the slowing of Moore's Law has put the energy use of datacenters on an unsustainable path. Datacenters are already a significant fraction of worldwide electricity use, with application demand scaling at a rapid rate. We argue that substantial reductions in the carbon intensity of datacenter computing are possible with a software-centric approach: by making energy and carbon visible to application developers on a fine-grained basis, by modifying system APIs to make it possible to make informed trade offs between performance and carbon emissions, and by raising the level of application programming to allow for flexible use of more energy efficient means of compute and storage. We also lay out a research agenda for systems software to reduce the carbon footprint of datacenter computing.
△ Less
Submitted 6 January, 2022;
originally announced January 2022.
-
Change of human mobility during COVID-19: A United States case study
Authors:
Justin Elarde,
Joon-Seok Kim,
Hamdi Kavak,
Andreas Züfle,
Taylor Anderson
Abstract:
With the onset of COVID-19 and the resulting shelter in place guidelines combined with remote working practices, human mobility in 2020 has been dramatically impacted. Existing studies typically examine whether mobility in specific localities increases or decreases at specific points in time and relate these changes to certain pandemic and policy events. In this paper, we study mobility change in…
▽ More
With the onset of COVID-19 and the resulting shelter in place guidelines combined with remote working practices, human mobility in 2020 has been dramatically impacted. Existing studies typically examine whether mobility in specific localities increases or decreases at specific points in time and relate these changes to certain pandemic and policy events. In this paper, we study mobility change in the US through a five-step process using mobility footprint data. (Step 1) Propose the delta Time Spent in Public Places (Delta-TSPP) as a measure to quantify daily changes in mobility for each US county from 2019-2020. (Step 2) Conduct Principal Component Analysis (PCA) to reduce the Delta-TSPP time series of each county to lower-dimensional latent components of change in mobility. (Step 3) Conduct clustering analysis to find counties that exhibit similar latent components. (Step 4) Investigate local and global spatial autocorrelation for each component. (Step 5) Conduct correlation analysis to investigate how various population characteristics and behavior correlate with mobility patterns. Results show that by describing each county as a linear combination of the three latent components, we can explain 59% of the variation in mobility trends across all US counties. Specifically, change in mobility in 2020 for US counties can be explained as a combination of three latent components: 1) long-term reduction in mobility, 2) no change in mobility, and 3) short-term reduction in mobility. We observe significant correlations between the three latent components of mobility change and various population characteristics, including political leaning, population, COVID-19 cases and deaths, and unemployment. We find that our analysis provides a comprehensive understanding of mobility change in response to the COVID-19 pandemic.
△ Less
Submitted 18 September, 2021;
originally announced September 2021.
-
Understanding the factors driving the opioid epidemic using machine learning
Authors:
Sachin Gavali,
Chuming Chen,
Julie Cowart,
Xi Peng,
Shanshan Ding,
Cathy Wu,
Tammy Anderson
Abstract:
In recent years, the US has experienced an opioid epidemic with an unprecedented number of drugs overdose deaths. Research finds such overdose deaths are linked to neighborhood-level traits, thus providing opportunity to identify effective interventions. Typically, techniques such as Ordinary Least Squares (OLS) or Maximum Likelihood Estimation (MLE) are used to document neighborhood-level factors…
▽ More
In recent years, the US has experienced an opioid epidemic with an unprecedented number of drugs overdose deaths. Research finds such overdose deaths are linked to neighborhood-level traits, thus providing opportunity to identify effective interventions. Typically, techniques such as Ordinary Least Squares (OLS) or Maximum Likelihood Estimation (MLE) are used to document neighborhood-level factors significant in explaining such adverse outcomes. These techniques are, however, less equipped to ascertain non-linear relationships between confounding factors. Hence, in this study we apply machine learning based techniques to identify opioid risks of neighborhoods in Delaware and explore the correlation of these factors using Shapley Additive explanations (SHAP). We discovered that the factors related to neighborhoods environment, followed by education and then crime, were highly correlated with higher opioid risk. We also explored the change in these correlations over the years to understand the changing dynamics of the epidemic. Furthermore, we discovered that, as the epidemic has shifted from legal (i.e., prescription opioids) to illegal (e.g.,heroin and fentanyl) drugs in recent years, the correlation of environment, crime and health related variables with the opioid risk has increased significantly while the correlation of economic and socio-demographic variables has decreased. The correlation of education related factors has been higher from the start and has increased slightly in recent years suggesting a need for increased awareness about the opioid epidemic.
△ Less
Submitted 6 December, 2021; v1 submitted 16 August, 2021;
originally announced August 2021.
-
SWP: Microsecond Network SLOs Without Priorities
Authors:
Kevin Zhao,
Prateesh Goyal,
Mohammad Alizadeh,
Thomas E. Anderson
Abstract:
The increasing use of cloud computing for latency-sensitive applications has sparked renewed interest in providing tight bounds on network tail latency. Achieving this in practice at reasonable network utilization has proved elusive, due to a combination of highly bursty application demand, faster link speeds, and heavy-tailed message sizes. While priority scheduling can be used to reduce tail lat…
▽ More
The increasing use of cloud computing for latency-sensitive applications has sparked renewed interest in providing tight bounds on network tail latency. Achieving this in practice at reasonable network utilization has proved elusive, due to a combination of highly bursty application demand, faster link speeds, and heavy-tailed message sizes. While priority scheduling can be used to reduce tail latency for some traffic, this comes at a cost of much worse delay behavior for all other traffic on the network. Most operators choose to run their networks at very low average utilization, despite the added cost, and yet still suffer poor tail behavior.
This paper takes a different approach. We build a system, swp, to help operators (and network designers) to understand and control tail latency without relying on priority scheduling. As network workload changes, swp is designed to give real-time advice on the network switch configurations needed to maintain tail latency objectives for each traffic class. The core of swp is an efficient model for simulating the combined effect of traffic characteristics, end-to-end congestion control, and switch scheduling on service-level objectives (SLOs), along with an optimizer that adjusts switch-level scheduling weights assigned to each class. Using simulation across a diverse set of workloads with different SLOs, we show that to meet the same SLOs as swp provides, FIFO would require 65% greater link capacity, and 79% more for scenarios with tight SLOs on bursty traffic classes.
△ Less
Submitted 2 March, 2021; v1 submitted 1 March, 2021;
originally announced March 2021.
-
A Convolutional Neural Network based Cascade Reconstruction for the IceCube Neutrino Observatory
Authors:
R. Abbasi,
M. Ackermann,
J. Adams,
J. A. Aguilar,
M. Ahlers,
M. Ahrens,
C. Alispach,
A. A. Alves Jr.,
N. M. Amin,
R. An,
K. Andeen,
T. Anderson,
I. Ansseau,
G. Anton,
C. Argüelles,
S. Axani,
X. Bai,
A. Balagopal V.,
A. Barbano,
S. W. Barwick,
B. Bastian,
V. Basu,
V. Baum,
S. Baur,
R. Bay
, et al. (343 additional authors not shown)
Abstract:
Continued improvements on existing reconstruction methods are vital to the success of high-energy physics experiments, such as the IceCube Neutrino Observatory. In IceCube, further challenges arise as the detector is situated at the geographic South Pole where computational resources are limited. However, to perform real-time analyses and to issue alerts to telescopes around the world, powerful an…
▽ More
Continued improvements on existing reconstruction methods are vital to the success of high-energy physics experiments, such as the IceCube Neutrino Observatory. In IceCube, further challenges arise as the detector is situated at the geographic South Pole where computational resources are limited. However, to perform real-time analyses and to issue alerts to telescopes around the world, powerful and fast reconstruction methods are desired. Deep neural networks can be extremely powerful, and their usage is computationally inexpensive once the networks are trained. These characteristics make a deep learning-based approach an excellent candidate for the application in IceCube. A reconstruction method based on convolutional architectures and hexagonally shaped kernels is presented. The presented method is robust towards systematic uncertainties in the simulation and has been tested on experimental data. In comparison to standard reconstruction methods in IceCube, it can improve upon the reconstruction accuracy, while reducing the time necessary to run the reconstruction by two to three orders of magnitude.
△ Less
Submitted 26 July, 2021; v1 submitted 27 January, 2021;
originally announced January 2021.
-
Leveraging Unknown Structure in Quantum Query Algorithms
Authors:
Noel T. Anderson,
Jay-U Chung,
Shelby Kimmel
Abstract:
Quantum span program algorithms for function evaluation commonly have reduced query complexity when promised that the input has a certain structure. We design a modified span program algorithm to show these speed-ups persist even without having a promise ahead of time, and we extend this approach to the more general problem of state conversion. For example, there is a span program algorithm that d…
▽ More
Quantum span program algorithms for function evaluation commonly have reduced query complexity when promised that the input has a certain structure. We design a modified span program algorithm to show these speed-ups persist even without having a promise ahead of time, and we extend this approach to the more general problem of state conversion. For example, there is a span program algorithm that decides whether two vertices are connected in an $n$-vertex graph with $O(n^{3/2})$ queries in general, but with $O(\sqrt{k}n)$ queries if promised that, if there is a path, there is one with at most $k$ edges. Our algorithm uses $\tilde{O}(\sqrt{k}n)$ queries to solve this problem if there is a path with at most $k$ edges, without knowing $k$ ahead of time.
△ Less
Submitted 10 June, 2021; v1 submitted 2 December, 2020;
originally announced December 2020.
-
H2O-Net: Self-Supervised Flood Segmentation via Adversarial Domain Adaptation and Label Refinement
Authors:
Peri Akiva,
Matthew Purri,
Kristin Dana,
Beth Tellman,
Tyler Anderson
Abstract:
Accurate flood detection in near real time via high resolution, high latency satellite imagery is essential to prevent loss of lives by providing quick and actionable information. Instruments and sensors useful for flood detection are only available in low resolution, low latency satellites with region re-visit periods of up to 16 days, making flood alerting systems that use such satellites unreli…
▽ More
Accurate flood detection in near real time via high resolution, high latency satellite imagery is essential to prevent loss of lives by providing quick and actionable information. Instruments and sensors useful for flood detection are only available in low resolution, low latency satellites with region re-visit periods of up to 16 days, making flood alerting systems that use such satellites unreliable. This work presents H2O-Network, a self supervised deep learning method to segment floods from satellites and aerial imagery by bridging domain gap between low and high latency satellite and coarse-to-fine label refinement. H2O-Net learns to synthesize signals highly correlative with water presence as a domain adaptation step for semantic segmentation in high resolution satellite imagery. Our work also proposes a self-supervision mechanism, which does not require any hand annotation, used during training to generate high quality ground truth data. We demonstrate that H2O-Net outperforms the state-of-the-art semantic segmentation methods on satellite imagery by 10% and 12% pixel accuracy and mIoU respectively for the task of flood segmentation. We emphasize the generalizability of our model by transferring model weights trained on satellite imagery to drone imagery, a highly different sensor and domain.
△ Less
Submitted 11 October, 2020;
originally announced October 2020.
-
Frequency Regulation with Heterogeneous Energy Resources: A Realization using Distributed Control
Authors:
Tor Anderson,
Manasa Muralidharan,
Priyank Srivastava,
Hamed Valizadeh Haghi,
Jorge Cortes,
Jan Kleissl,
Sonia Martinez,
Byron Washom
Abstract:
This paper presents one of the first real-life demonstrations of coordinated and distributed resource control for secondary frequency response in a power distribution grid. We conduct a series of tests with up to 69 heterogeneous active devices consisting of air handling units, unidirectional and bidirectional electric vehicle charging stations, a battery energy storage system, and 107 passive dev…
▽ More
This paper presents one of the first real-life demonstrations of coordinated and distributed resource control for secondary frequency response in a power distribution grid. We conduct a series of tests with up to 69 heterogeneous active devices consisting of air handling units, unidirectional and bidirectional electric vehicle charging stations, a battery energy storage system, and 107 passive devices consisting of building loads and photovoltaic generators. Actuation commands for the test devices are obtained by solving an economic dispatch problem at every regulation instant using distributed ratio-consensus, primal-dual, and Newton-like algorithms. The distributed control setup consists of a set of Raspberry Pi end-points exchanging messages via an ethernet switch. The problem formulation minimizes the sum of device costs while tracking the setpoints provided by the system operator. We demonstrate accurate and fast real-time distributed computation of the optimization solution and effective tracking of the regulation signal by measuring physical device outputs over 40-minute time horizons. We also perform an economic benefit analysis which confirms eligibility to participate in an ancillary services market and demonstrates up to $53K of potential annual revenue for the selected population of devices.
△ Less
Submitted 4 February, 2021; v1 submitted 15 July, 2020;
originally announced July 2020.
-
High Velocity Kernel File Systems with Bento
Authors:
Samantha Miller,
Kaiyuan Zhang,
Mengqi Chen,
Ryan Jennings,
Ang Chen,
Danyang Zhuo,
Tom Anderson
Abstract:
High development velocity is critical for modern systems. This is especially true for Linux file systems which are seeing increased pressure from new storage devices and new demands on storage systems. However, high velocity Linux kernel development is challenging due to the ease of introducing bugs, the difficulty of testing and debugging, and the lack of support for redeployment without service…
▽ More
High development velocity is critical for modern systems. This is especially true for Linux file systems which are seeing increased pressure from new storage devices and new demands on storage systems. However, high velocity Linux kernel development is challenging due to the ease of introducing bugs, the difficulty of testing and debugging, and the lack of support for redeployment without service disruption. Existing approaches to high-velocity development of file systems for Linux have major downsides, such as the high performance penalty for FUSE file systems, slowing the deployment cycle for new file system functionality.
We propose Bento, a framework for high velocity development of Linux kernel file systems. It enables file systems written in safe Rust to be installed in the Linux kernel, with errors largely sandboxed to the file system. Bento file systems can be replaced with no disruption to running applications, allowing daily or weekly upgrades in a cloud server setting. Bento also supports userspace debugging. We implement a simple file system using Bento and show that it performs similarly to VFS-native ext4 on a variety of benchmarks and outperforms a FUSE version by 7x on 'git clone'. We also show that we can dynamically add file provenance tracking to a running kernel file system with only 15ms of service interruption.
△ Less
Submitted 8 February, 2021; v1 submitted 19 May, 2020;
originally announced May 2020.
-
Broad Area Search and Detection of Surface-to-Air Missile Sites Using Spatial Fusion of Component Object Detections from Deep Neural Networks
Authors:
Alan B. Cannaday II,
Curt H. Davis,
Grant J. Scott,
Blake Ruprecht,
Derek T. Anderson
Abstract:
Here we demonstrate how Deep Neural Network (DNN) detections of multiple constitutive or component objects that are part of a larger, more complex, and encompassing feature can be spatially fused to improve the search, detection, and retrieval (ranking) of the larger complex feature. First, scores computed from a spatial clustering algorithm are normalized to a reference space so that they are ind…
▽ More
Here we demonstrate how Deep Neural Network (DNN) detections of multiple constitutive or component objects that are part of a larger, more complex, and encompassing feature can be spatially fused to improve the search, detection, and retrieval (ranking) of the larger complex feature. First, scores computed from a spatial clustering algorithm are normalized to a reference space so that they are independent of image resolution and DNN input chip size. Then, multi-scale DNN detections from various component objects are fused to improve the detection and retrieval of DNN detections of a larger complex feature. We demonstrate the utility of this approach for broad area search and detection of Surface-to-Air Missile (SAM) sites that have a very low occurrence rate (only 16 sites) over a ~90,000 km^2 study area in SE China. The results demonstrate that spatial fusion of multi-scale component-object DNN detections can reduce the detection error rate of SAM Sites by $>$85% while still maintaining a 100% recall. The novel spatial fusion approach demonstrated here can be easily extended to a wide variety of other challenging object search and detection problems in large-scale remote sensing image datasets.
△ Less
Submitted 20 July, 2020; v1 submitted 23 March, 2020;
originally announced March 2020.
-
Introducing Fuzzy Layers for Deep Learning
Authors:
Stanton R. Price,
Steven R. Price,
Derek T. Anderson
Abstract:
Many state-of-the-art technologies developed in recent years have been influenced by machine learning to some extent. Most popular at the time of this writing are artificial intelligence methodologies that fall under the umbrella of deep learning. Deep learning has been shown across many applications to be extremely powerful and capable of handling problems that possess great complexity and diffic…
▽ More
Many state-of-the-art technologies developed in recent years have been influenced by machine learning to some extent. Most popular at the time of this writing are artificial intelligence methodologies that fall under the umbrella of deep learning. Deep learning has been shown across many applications to be extremely powerful and capable of handling problems that possess great complexity and difficulty. In this work, we introduce a new layer to deep learning: the fuzzy layer. Traditionally, the network architecture of neural networks is composed of an input layer, some combination of hidden layers, and an output layer. We propose the introduction of fuzzy layers into the deep learning architecture to exploit the powerful aggregation properties expressed through fuzzy methodologies, such as the Choquet and Sugueno fuzzy integrals. To date, fuzzy approaches taken to deep learning have been through the application of various fusion strategies at the decision level to aggregate outputs from state-of-the-art pre-trained models, e.g., AlexNet, VGG16, GoogLeNet, Inception-v3, ResNet-18, etc. While these strategies have been shown to improve accuracy performance for image classification tasks, none have explored the use of fuzzified intermediate, or hidden, layers. Herein, we present a new deep learning strategy that incorporates fuzzy strategies into the deep learning architecture focused on the application of semantic segmentation using per-pixel classification. Experiments are conducted on a benchmark data set as well as a data set collected via an unmanned aerial system at a U.S. Army test site for the task of automatic road segmentation, and preliminary results are promising.
△ Less
Submitted 21 February, 2020;
originally announced March 2020.
-
Talek: Private Group Messaging with Hidden Access Patterns
Authors:
Raymond Cheng,
William Scott,
Elisaweta Masserova,
Irene Zhang,
Vipul Goyal,
Thomas Anderson,
Arvind Krishnamurthy,
Bryan Parno
Abstract:
Talek is a private group messaging system that sends messages through potentially untrustworthy servers, while hiding both data content and the communication patterns among its users. Talek explores a new point in the design space of private messaging; it guarantees access sequence indistinguishability, which is among the strongest guarantees in the space, while assuming an anytrust threat model,…
▽ More
Talek is a private group messaging system that sends messages through potentially untrustworthy servers, while hiding both data content and the communication patterns among its users. Talek explores a new point in the design space of private messaging; it guarantees access sequence indistinguishability, which is among the strongest guarantees in the space, while assuming an anytrust threat model, which is only slightly weaker than the strongest threat model currently found in related work. Our results suggest that this is a pragmatic point in the design space, since it supports strong privacy and good performance: we demonstrate a 3-server Talek cluster that achieves throughput of 9,433 messages/second for 32,000 active users with 1.7-second end-to-end latency. To achieve its security goals without coordination between clients, Talek relies on information-theoretic private information retrieval. To achieve good performance and minimize server-side storage, Talek introduces new techniques and optimizations that may be of independent interest, e.g., a novel use of blocked cuckoo hashing and support for private notifications. The latter provide a private, efficient mechanism for users to learn, without polling, which logs have new messages.
△ Less
Submitted 15 December, 2020; v1 submitted 22 January, 2020;
originally announced January 2020.
-
Extending the Morphological Hit-or-Miss Transform to Deep Neural Networks
Authors:
Muhammad Aminul Islam,
Bryce Murray,
Andrew Buck,
Derek T. Anderson,
Grant Scott,
Mihail Popescu,
James Keller
Abstract:
While most deep learning architectures are built on convolution, alternative foundations like morphology are being explored for purposes like interpretability and its connection to the analysis and processing of geometric structures. The morphological hit-or-miss operation has the advantage that it takes into account both foreground and background information when evaluating target shape in an ima…
▽ More
While most deep learning architectures are built on convolution, alternative foundations like morphology are being explored for purposes like interpretability and its connection to the analysis and processing of geometric structures. The morphological hit-or-miss operation has the advantage that it takes into account both foreground and background information when evaluating target shape in an image. Herein, we identify limitations in existing hit-or-miss neural definitions and we formulate an optimization problem to learn the transform relative to deeper architectures. To this end, we model the semantically important condition that the intersection of the hit and miss structuring elements (SEs) should be empty and we present a way to express Don't Care (DNC), which is important for denoting regions of an SE that are not relevant to detecting a target pattern. Our analysis shows that convolution, in fact, acts like a hit-miss transform through semantic interpretation of its filter differences. On these premises, we introduce an extension that outperforms conventional convolution on benchmark data. Quantitative experiments are provided on synthetic and benchmark data, showing that the direct encoding hit-or-miss transform provides better interpretability on learned shapes consistent with objects whereas our morphologically inspired generalized convolution yields higher classification accuracy. Last, qualitative hit and miss filter visualizations are provided relative to single morphological layer.
△ Less
Submitted 27 September, 2020; v1 submitted 4 December, 2019;
originally announced December 2019.
-
Assise: Performance and Availability via NVM Colocation in a Distributed File System
Authors:
Thomas E. Anderson,
Marco Canini,
Jongyul Kim,
Dejan Kostić,
Youngjin Kwon,
Simon Peter,
Waleed Reda,
Henry N. Schuh,
Emmett Witchel
Abstract:
The adoption of very low latency persistent memory modules (PMMs) upends the long-established model of disaggregated file system access. Instead, by colocating computation and PMM storage, we can provide applications much higher I/O performance, sub-second application failover, and strong consistency. To demonstrate this, we built the Assise distributed file system, based on a persistent, replicat…
▽ More
The adoption of very low latency persistent memory modules (PMMs) upends the long-established model of disaggregated file system access. Instead, by colocating computation and PMM storage, we can provide applications much higher I/O performance, sub-second application failover, and strong consistency. To demonstrate this, we built the Assise distributed file system, based on a persistent, replicated coherence protocol for managing a set of server-colocated PMMs as a fast, crash-recoverable cache between applications and slower disaggregated storage, such as SSDs. Unlike disaggregated file systems, Assise maximizes locality for all file IO by carrying out IO on colocated PMM whenever possible and minimizes coherence overhead by maintaining consistency at IO operation granularity, rather than at fixed block sizes.
We compare Assise to Ceph/Bluestore, NFS, and Octopus on a cluster with Intel Optane DC PMMs and SSDs for common cloud applications and benchmarks, such as LevelDB, Postfix, and FileBench. We find that Assise improves write latency up to 22x, throughput up to 56x, fail-over time up to 103x, and scales up to 6x better than its counterparts, while providing stronger consistency semantics. Assise promises to beat the MinuteSort world record by 1.5x.
△ Less
Submitted 1 June, 2020; v1 submitted 6 October, 2019;
originally announced October 2019.
-
Backpressure Flow Control
Authors:
Prateesh Goyal,
Preey Shah,
Kevin Zhao,
Georgios Nikolaidis,
Mohammad Alizadeh,
Thomas E. Anderson
Abstract:
Effective congestion control for data center networks is becoming increasingly challenging with a growing amount of latency sensitive traffic, much fatter links, and extremely bursty traffic. Widely deployed algorithms, such as DCTCP and DCQCN, are still far from optimal in many plausible scenarios, particularly for tail latency. Many operators compensate by running their networks at low average u…
▽ More
Effective congestion control for data center networks is becoming increasingly challenging with a growing amount of latency sensitive traffic, much fatter links, and extremely bursty traffic. Widely deployed algorithms, such as DCTCP and DCQCN, are still far from optimal in many plausible scenarios, particularly for tail latency. Many operators compensate by running their networks at low average utilization, dramatically increasing costs.
In this paper, we argue that we have reached the practical limits of end-to-end congestion control. Instead, we propose, implement, and evaluate a new congestion control architecture called Backpressure Flow Control (BFC). BFC provides per-hop per-flow flow control, but with bounded state, constant-time switch operations, and careful use of buffers. We demonstrate BFC's feasibility by implementing it on Tofino2, a state-of-the-art P4-based programmable hardware switch. In simulation, we show that BFC achieves near optimal throughput and tail latency behavior even under challenging conditions such as high network load and incast cross traffic. Compared to existing end-to-end schemes, BFC achieves 2.3 - 60 X lower tail latency for short flows and 1.6 - 5 X better average completion time for long flows.
△ Less
Submitted 29 March, 2021; v1 submitted 21 September, 2019;
originally announced September 2019.
-
Out the Window: A Crowd-Sourced Dataset for Activity Classification in Security Video
Authors:
Gregory Castanon,
Nathan Shnidman,
Tim Anderson,
Jeffrey Byrne
Abstract:
The Out the Window (OTW) dataset is a crowdsourced activity dataset containing 5,668 instances of 17 activities from the NIST Activities in Extended Video (ActEV) challenge. These videos are crowdsourced from workers on the Amazon Mechanical Turk using a novel scenario acting strategy, which collects multiple instances of natural activities per scenario. Turkers are instructed to lean their mobile…
▽ More
The Out the Window (OTW) dataset is a crowdsourced activity dataset containing 5,668 instances of 17 activities from the NIST Activities in Extended Video (ActEV) challenge. These videos are crowdsourced from workers on the Amazon Mechanical Turk using a novel scenario acting strategy, which collects multiple instances of natural activities per scenario. Turkers are instructed to lean their mobile device against an upper story window overlooking an outdoor space, walk outside to perform a scenario involving people, vehicles and objects, and finally upload the video to us for annotation. Performance evaluation for activity classification on VIRAT Ground 2.0 shows that the OTW dataset provides an 8.3% improvement in mean classification accuracy, and a 12.5% improvement on the most challenging activities involving people with vehicles.
△ Less
Submitted 15 September, 2019; v1 submitted 28 August, 2019;
originally announced August 2019.
-
Learning Fitness Functions for Machine Programming
Authors:
Shantanu Mandal,
Todd A. Anderson,
Javier S. Turek,
Justin Gottschlich,
Shengtian Zhou,
Abdullah Muzahid
Abstract:
The problem of automatic software generation is known as Machine Programming. In this work, we propose a framework based on genetic algorithms to solve this problem. Although genetic algorithms have been used successfully for many problems, one criticism is that hand-crafting its fitness function, the test that aims to effectively guide its evolution, can be notably challenging. Our framework pres…
▽ More
The problem of automatic software generation is known as Machine Programming. In this work, we propose a framework based on genetic algorithms to solve this problem. Although genetic algorithms have been used successfully for many problems, one criticism is that hand-crafting its fitness function, the test that aims to effectively guide its evolution, can be notably challenging. Our framework presents a novel approach to learn the fitness function using neural networks to predict values of ideal fitness functions. We also augment the evolutionary process with a minimally intrusive search heuristic. This heuristic improves the framework's ability to discover correct programs from ones that are approximately correct and does so with negligible computational overhead. We compare our approach with several state-of-the-art program synthesis methods and demonstrate that it finds more correct programs with fewer candidate program generations.
△ Less
Submitted 23 January, 2021; v1 submitted 22 August, 2019;
originally announced August 2019.
-
Recognizing Image Objects by Relational Analysis Using Heterogeneous Superpixels and Deep Convolutional Features
Authors:
Alex Yang,
Charlie T. Veal,
Derek T. Anderson,
Grant J. Scott
Abstract:
Superpixel-based methodologies have become increasingly popular in computer vision, especially when the computation is too expensive in time or memory to perform with a large number of pixels or features. However, rarely is superpixel segmentation examined within the context of deep convolutional neural network architectures. This paper presents a novel neural architecture that exploits the superp…
▽ More
Superpixel-based methodologies have become increasingly popular in computer vision, especially when the computation is too expensive in time or memory to perform with a large number of pixels or features. However, rarely is superpixel segmentation examined within the context of deep convolutional neural network architectures. This paper presents a novel neural architecture that exploits the superpixel feature space. The visual feature space is organized using superpixels to provide the neural network with a substructure of the images. As the superpixels associate the visual feature space with parts of the objects in an image, the visual feature space is transformed into a structured vector representation per superpixel. It is shown that it is feasible to learn superpixel features using capsules and it is potentially beneficial to perform image analysis in such a structured manner. This novel deep learning architecture is examined in the context of an image classification task, highlighting explicit interpretability (explainability) of the network's decision making. The results are compared against a baseline deep neural model, as well as among superpixel capsule networks with a variety of hyperparameter settings.
△ Less
Submitted 1 August, 2019;
originally announced August 2019.
-
Fusion of heterogeneous bands and kernels in hyperspectral image processing
Authors:
Muhammad Aminul Islam,
Derek T. Anderson,
John E. Ball,
Nicolas H. Younan
Abstract:
Hyperspectral imaging is a powerful technology that is plagued by large dimensionality. Herein, we explore a way to combat that hindrance via non-contiguous and contiguous (simpler to realize sensor) band grouping for dimensionality reduction. Our approach is different in the respect that it is flexible and it follows a well-studied process of visual clustering in high-dimensional spaces. Specific…
▽ More
Hyperspectral imaging is a powerful technology that is plagued by large dimensionality. Herein, we explore a way to combat that hindrance via non-contiguous and contiguous (simpler to realize sensor) band grouping for dimensionality reduction. Our approach is different in the respect that it is flexible and it follows a well-studied process of visual clustering in high-dimensional spaces. Specifically, we extend the improved visual assessment of cluster tendency and clustering in ordered dissimilarity data unsupervised clustering algorithms for supervised hyperspectral learning. In addition, we propose a way to extract diverse features via the use of different proximity metrics (ways to measure the similarity between bands) and kernel functions. The discovered features are fused with $l_{\infty}$-norm multiple kernel learning. Experiments are conducted on two benchmark datasets and our results are compared to related work. These datasets indicate that contiguous or not is application specific, but heterogeneous features and kernels usually lead to performance gain.
△ Less
Submitted 22 May, 2019;
originally announced May 2019.
-
Enabling Explainable Fusion in Deep Learning with Fuzzy Integral Neural Networks
Authors:
Muhammad Aminul Islam,
Derek T. Anderson,
Anthony J. Pinar,
Timothy C. Havens,
Grant Scott,
James M. Keller
Abstract:
Information fusion is an essential part of numerous engineering systems and biological functions, e.g., human cognition. Fusion occurs at many levels, ranging from the low-level combination of signals to the high-level aggregation of heterogeneous decision-making processes. While the last decade has witnessed an explosion of research in deep learning, fusion in neural networks has not observed the…
▽ More
Information fusion is an essential part of numerous engineering systems and biological functions, e.g., human cognition. Fusion occurs at many levels, ranging from the low-level combination of signals to the high-level aggregation of heterogeneous decision-making processes. While the last decade has witnessed an explosion of research in deep learning, fusion in neural networks has not observed the same revolution. Specifically, most neural fusion approaches are ad hoc, are not understood, are distributed versus localized, and/or explainability is low (if present at all). Herein, we prove that the fuzzy Choquet integral (ChI), a powerful nonlinear aggregation function, can be represented as a multi-layer network, referred to hereafter as ChIMP. We also put forth an improved ChIMP (iChIMP) that leads to a stochastic gradient descent-based optimization in light of the exponential number of ChI inequality constraints. An additional benefit of ChIMP/iChIMP is that it enables eXplainable AI (XAI). Synthetic validation experiments are provided and iChIMP is applied to the fusion of a set of heterogeneous architecture deep models in remote sensing. We show an improvement in model accuracy and our previously established XAI indices shed light on the quality of our data, model, and its decisions.
△ Less
Submitted 10 May, 2019;
originally announced May 2019.
-
Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation
Authors:
Brian Thompson,
Huda Khayrallah,
Antonios Anastasopoulos,
Arya D. McCarthy,
Kevin Duh,
Rebecca Marvin,
Paul McNamee,
Jeremy Gwinnup,
Tim Anderson,
Philipp Koehn
Abstract:
To better understand the effectiveness of continued training, we analyze the major components of a neural machine translation system (the encoder, decoder, and each embedding space) and consider each component's contribution to, and capacity for, domain adaptation. We find that freezing any single component during continued training has minimal impact on performance, and that performance is surpri…
▽ More
To better understand the effectiveness of continued training, we analyze the major components of a neural machine translation system (the encoder, decoder, and each embedding space) and consider each component's contribution to, and capacity for, domain adaptation. We find that freezing any single component during continued training has minimal impact on performance, and that performance is surprisingly good when a single component is adapted while holding the rest of the model fixed. We also find that continued training does not move the model very far from the out-of-domain model, compared to a sensitivity analysis metric, suggesting that the out-of-domain model can provide a good generic initialization for the new domain.
△ Less
Submitted 15 January, 2019; v1 submitted 13 September, 2018;
originally announced September 2018.
-
State-of-the-art and gaps for deep learning on limited training data in remote sensing
Authors:
John E. Ball,
Derek T. Anderson,
Pan Wei
Abstract:
Deep learning usually requires big data, with respect to both volume and variety. However, most remote sensing applications only have limited training data, of which a small subset is labeled. Herein, we review three state-of-the-art approaches in deep learning to combat this challenge. The first topic is transfer learning, in which some aspects of one domain, e.g., features, are transferred to an…
▽ More
Deep learning usually requires big data, with respect to both volume and variety. However, most remote sensing applications only have limited training data, of which a small subset is labeled. Herein, we review three state-of-the-art approaches in deep learning to combat this challenge. The first topic is transfer learning, in which some aspects of one domain, e.g., features, are transferred to another domain. The next is unsupervised learning, e.g., autoencoders, which operate on unlabeled data. The last is generative adversarial networks, which can generate realistic looking data that can fool the likes of both a deep learning network and human. The aim of this article is to raise awareness of this dilemma, to direct the reader to existing work and to highlight current gaps that need solving.
△ Less
Submitted 11 July, 2018;
originally announced July 2018.
-
A Graphical Interactive Debugger for Distributed Systems
Authors:
Doug Woos,
Zachary Tatlock,
Michael D. Ernst,
Thomas E. Anderson
Abstract:
Designing and debugging distributed systems is notoriously difficult. The correctness of a distributed system is largely determined by its handling of failure scenarios. The sequence of events leading to a bug can be long and complex, and it is likely to include message reorderings and failures. On single-node systems, interactive debuggers enable stepping through an execution of the program, but…
▽ More
Designing and debugging distributed systems is notoriously difficult. The correctness of a distributed system is largely determined by its handling of failure scenarios. The sequence of events leading to a bug can be long and complex, and it is likely to include message reorderings and failures. On single-node systems, interactive debuggers enable stepping through an execution of the program, but they lack the ability to easily simulate failure scenarios and control the order in which messages are delivered.
Oddity is a graphical, interactive debugger for distributed systems. It brings the power of traditional step-through debugging---fine-grained control and observation of a program as it executes---to distributed systems. It also enables exploratory testing, in which an engineer examines and perturbs the behavior of a system in order to better understand it, perhaps without a specific bug in mind. A programmer can directly control message and failure interleaving. Oddity supports time travel, allowing a developer to explore multiple branching executions of a system within a single debugging session. Above all, Oddity encourages distributed systems thinking: rather than assuming the normal case and attaching failure handling as an afterthought, distributed systems should be developed around the certainty of message loss and node failure.
Graduate and undergraduate students used Oddity in two distributed systems classes. Usage tracking and qualitative surveys showed that students found Oddity useful for both debugging and exploratory testing.
△ Less
Submitted 13 June, 2018;
originally announced June 2018.
-
Volur: Concurrent Edge/Core Route Control in Data Center Networks
Authors:
Qiao Zhang,
Danyang Zhuo,
Vincent Liu,
Petr Lapukhov,
Simon Peter,
Arvind Krishnamurthy,
Thomas Anderson
Abstract:
A perennial question in computer networks is where to place functionality among components of a distributed computer system. In data centers, one option is to move all intelligence to the edge, essentially relegating switches and middleboxes, regardless of their programmability, to simple static routing policies. Another is to add more intelligence to the middle of the network in the hopes that it…
▽ More
A perennial question in computer networks is where to place functionality among components of a distributed computer system. In data centers, one option is to move all intelligence to the edge, essentially relegating switches and middleboxes, regardless of their programmability, to simple static routing policies. Another is to add more intelligence to the middle of the network in the hopes that it can handle any issue that arises.
This paper presents an architecture, called Volur, that provides a third option by facilitating the co-existence of an intelligent network with an intelligent edge. The key architectural principle of Volur is predictability of the network. We describe the key design requirements, and show through case studies how our approach facilitates more democratic innovation of all parts of the network. We also demonstrate the practicality of our architecture by describing how to implement the architecture on top of existing hardware and by deploying a prototype on top of a large production data center.
△ Less
Submitted 18 April, 2018;
originally announced April 2018.
-
Fusion of an Ensemble of Augmented Image Detectors for Robust Object Detection
Authors:
Pan Wei,
John E. Ball,
Derek T. Anderson
Abstract:
A significant challenge in object detection is accurate identification of an object's position in image space, whereas one algorithm with one set of parameters is usually not enough, and the fusion of multiple algorithms and/or parameters can lead to more robust results. Herein, a new computational intelligence fusion approach based on the dynamic analysis of agreement among object detection outpu…
▽ More
A significant challenge in object detection is accurate identification of an object's position in image space, whereas one algorithm with one set of parameters is usually not enough, and the fusion of multiple algorithms and/or parameters can lead to more robust results. Herein, a new computational intelligence fusion approach based on the dynamic analysis of agreement among object detection outputs is proposed. Furthermore, we propose an online versus just in training image augmentation strategy. Experiments comparing the results both with and without fusion are presented. We demonstrate that the augmented and fused combination results are the best, with respect to higher accuracy rates and reduction of outlier influences. The approach is demonstrated in the context of cone, pedestrian and box detection for Advanced Driver Assistance Systems (ADAS) applications.
△ Less
Submitted 17 March, 2018;
originally announced March 2018.
-
Measuring Conflict in a Multi-Source Environment as a Normal Measure
Authors:
Pan Wei,
John E. Ball,
Derek T. Anderson,
Archit Harsh,
Christopher Archibald
Abstract:
In a multi-source environment, each source has its own credibility. If there is no external knowledge about credibility then we can use the information provided by the sources to assess their credibility. In this paper, we propose a way to measure conflict in a multi-source environment as a normal measure. We examine our algorithm using three simulated examples of increasing conflict and one exper…
▽ More
In a multi-source environment, each source has its own credibility. If there is no external knowledge about credibility then we can use the information provided by the sources to assess their credibility. In this paper, we propose a way to measure conflict in a multi-source environment as a normal measure. We examine our algorithm using three simulated examples of increasing conflict and one experimental example. The results demonstrate that the proposed measure can represent conflict in a meaningful way similar to what a human might expect and from it we can identify conflict within our sources.
△ Less
Submitted 12 March, 2018;
originally announced March 2018.
-
Multi-Sensor Conflict Measurement and Information Fusion
Authors:
Pan Wei,
John E. Ball,
Derek T. Anderson
Abstract:
In sensing applications where multiple sensors observe the same scene, fusing sensor outputs can provide improved results. However, if some of the sensors are providing lower quality outputs, the fused results can be degraded. In this work, a multi-sensor conflict measure is proposed which estimates multi-sensor conflict by representing each sensor output as interval-valued information and examine…
▽ More
In sensing applications where multiple sensors observe the same scene, fusing sensor outputs can provide improved results. However, if some of the sensors are providing lower quality outputs, the fused results can be degraded. In this work, a multi-sensor conflict measure is proposed which estimates multi-sensor conflict by representing each sensor output as interval-valued information and examines the sensor output overlaps on all possible n-tuple sensor combinations. The conflict is based on the sizes of the intervals and how many sensors output values lie in these intervals. In this work, conflict is defined in terms of how little the output from multiple sensors overlap. That is, high degrees of overlap mean low sensor conflict, while low degrees of overlap mean high conflict. This work is a preliminary step towards a robust conflict and sensor fusion framework. In addition, a sensor fusion algorithm is proposed based on a weighted sum of sensor outputs, where the weights for each sensor diminish as the conflict measure increases. The proposed methods can be utilized to (1) assess a measure of multi-sensor conflict, and (2) improve sensor output fusion by lessening weighting for sensors with high conflict. Using this measure, a simulated example is given to explain the mechanics of calculating the conflict measure, and stereo camera 3D outputs are analyzed and fused. In the stereo camera case, the sensor output is corrupted by additive impulse noise, DC offset, and Gaussian noise. Impulse noise is common in sensors due to intermittent interference, a DC offset a sensor bias or registration error, and Gaussian noise represents a sensor output with low SNR. The results show that sensor output fusion based on the conflict measure shows improved accuracy over a simple averaging fusion strategy.
△ Less
Submitted 12 March, 2018;
originally announced March 2018.
-
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
Authors:
John E. Ball,
Derek T. Anderson,
Chee Seng Chan
Abstract:
In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion,…
▽ More
In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.
△ Less
Submitted 24 September, 2017; v1 submitted 1 September, 2017;
originally announced September 2017.
-
HiFrames: High Performance Data Frames in a Scripting Language
Authors:
Ehsan Totoni,
Wajih Ul Hassan,
Todd A. Anderson,
Tatiana Shpeisman
Abstract:
Data frames in scripting languages are essential abstractions for processing structured data. However, existing data frame solutions are either not distributed (e.g., Pandas in Python) and therefore have limited scalability, or they are not tightly integrated with array computations (e.g., Spark SQL). This paper proposes a novel compiler-based approach where we integrate data frames into the High…
▽ More
Data frames in scripting languages are essential abstractions for processing structured data. However, existing data frame solutions are either not distributed (e.g., Pandas in Python) and therefore have limited scalability, or they are not tightly integrated with array computations (e.g., Spark SQL). This paper proposes a novel compiler-based approach where we integrate data frames into the High Performance Analytics Toolkit (HPAT) to build HiFrames. It provides expressive and flexible data frame APIs which are tightly integrated with array operations. HiFrames then automatically parallelizes and compiles relational operations along with other array computations in end-to-end data analytics programs, and generates efficient MPI/C++ code. We demonstrate that HiFrames is significantly faster than alternatives such as Spark SQL on clusters, without forcing the programmer to switch to embedded SQL for part of the program. HiFrames is 3.6x to 70x faster than Spark SQL for basic relational operations, and can be up to 20,000x faster for advanced analytics operations, such as weighted moving averages (WMA), that the map-reduce paradigm cannot handle effectively. HiFrames is also 5x faster than Spark SQL for TPCx-BB Q26 on 64 nodes of Cori supercomputer.
△ Less
Submitted 7 April, 2017;
originally announced April 2017.
-
Weight Design of Distributed Approximate Newton Algorithms for Constrained Optimization
Authors:
Tor Anderson,
Chin-Yao Chang,
Sonia Martinez
Abstract:
Motivated by economic dispatch and linearly-constrained resource allocation problems, this paper proposes a novel Distributed Approx-Newton algorithm that approximates the standard Newton optimization method. A main property of this distributed algorithm is that it only requires agents to exchange constant-size communication messages. The convergence of this algorithm is discussed and rigorously a…
▽ More
Motivated by economic dispatch and linearly-constrained resource allocation problems, this paper proposes a novel Distributed Approx-Newton algorithm that approximates the standard Newton optimization method. A main property of this distributed algorithm is that it only requires agents to exchange constant-size communication messages. The convergence of this algorithm is discussed and rigorously analyzed. In addition, we aim to address the problem of designing communication topologies and weightings that are optimal for second-order methods. To this end, we propose an effective approximation which is loosely based on completing the square to address the NP-hard bilinear optimization involved in the design. Simulations demonstrate that our proposed weight design applied to the Distributed Approx-Newton algorithm has a superior convergence property compared to existing weighted and distributed first-order gradient descent methods.
△ Less
Submitted 22 March, 2017;
originally announced March 2017.