subscribe to arXiv mailings

Adversarial Attacks on Reinforcement Learning Agents for Command and Control

Authors: Ahaan Dabholkar, James Z. Hare, Mark Mittrick, John Richardson, Nicholas Waytowich, Priya Narayanan, Saurabh Bagchi

Abstract: Given the recent impact of Deep Reinforcement Learning in training agents to win complex games like StarCraft and DoTA(Defense Of The Ancients) - there has been a surge in research for exploiting learning based techniques for professional wargaming, battlefield simulation and modeling. Real time strategy games and simulators have become a valuable resource for operational planning and military res… ▽ More Given the recent impact of Deep Reinforcement Learning in training agents to win complex games like StarCraft and DoTA(Defense Of The Ancients) - there has been a surge in research for exploiting learning based techniques for professional wargaming, battlefield simulation and modeling. Real time strategy games and simulators have become a valuable resource for operational planning and military research. However, recent work has shown that such learning based approaches are highly susceptible to adversarial perturbations. In this paper, we investigate the robustness of an agent trained for a Command and Control task in an environment that is controlled by an active adversary. The C2 agent is trained on custom StarCraft II maps using the state of the art RL algorithms - A3C and PPO. We empirically show that an agent trained using these algorithms is highly susceptible to noise injected by the adversary and investigate the effects these perturbations have on the performance of the trained agent. Our work highlights the urgent need to develop more robust training algorithms especially for critical arenas like the battlefield. △ Less

Submitted 1 July, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

Comments: Accepted to appear in the Journal Of Defense Modeling and Simulation (JDMS)

arXiv:2401.08965 [pdf, other]

Dynamic DNNs and Runtime Management for Efficient Inference on Mobile/Embedded Devices

Authors: Lei Xun, Jonathon Hare, Geoff V. Merrett

Abstract: Deep neural network (DNN) inference is increasingly being executed on mobile and embedded platforms due to several key advantages in latency, privacy and always-on availability. However, due to limited computing resources, efficient DNN deployment on mobile and embedded platforms is challenging. Although many hardware accelerators and static model compression methods were proposed by previous work… ▽ More Deep neural network (DNN) inference is increasingly being executed on mobile and embedded platforms due to several key advantages in latency, privacy and always-on availability. However, due to limited computing resources, efficient DNN deployment on mobile and embedded platforms is challenging. Although many hardware accelerators and static model compression methods were proposed by previous works, at system runtime, multiple applications are typically executed concurrently and compete for hardware resources. This raises two main challenges: Runtime Hardware Availability and Runtime Application Variability. Previous works have addressed these challenges through either dynamic neural networks that contain sub-networks with different performance trade-offs or runtime hardware resource management. In this thesis, we proposed a combined method, a system was developed for DNN performance trade-off management, combining the runtime trade-off opportunities in both algorithms and hardware to meet dynamically changing application performance targets and hardware constraints in real time. We co-designed novel Dynamic Super-Networks to maximise runtime system-level performance and energy efficiency on heterogeneous hardware platforms. Compared with SOTA, our experimental results using ImageNet on the GPU of Jetson Xavier NX show our model is 2.4x faster for similar ImageNet Top-1 accuracy, or 5.1% higher accuracy at similar latency. We also designed a hierarchical runtime resource manager that tunes both dynamic neural networks and DVFS at runtime. Compared with the Linux DVFS governor schedutil, our runtime approach achieves up to a 19% energy reduction and a 9% latency reduction in single model deployment scenario, and an 89% energy reduction and a 23% latency reduction in a two concurrent model deployment scenario. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: Accepted at Design, Automation & Test in Europe Conference (DATE) 2024, PhD Forum

arXiv:2401.08943 [pdf, other]

Fluid Dynamic DNNs for Reliable and Adaptive Distributed Inference on Edge Devices

Authors: Lei Xun, Mingyu Hu, Hengrui Zhao, Amit Kumar Singh, Jonathon Hare, Geoff V. Merrett

Abstract: Distributed inference is a popular approach for efficient DNN inference at the edge. However, traditional Static and Dynamic DNNs are not distribution-friendly, causing system reliability and adaptability issues. In this paper, we introduce Fluid Dynamic DNNs (Fluid DyDNNs), tailored for distributed inference. Distinct from Static and Dynamic DNNs, Fluid DyDNNs utilize a novel nested incremental t… ▽ More Distributed inference is a popular approach for efficient DNN inference at the edge. However, traditional Static and Dynamic DNNs are not distribution-friendly, causing system reliability and adaptability issues. In this paper, we introduce Fluid Dynamic DNNs (Fluid DyDNNs), tailored for distributed inference. Distinct from Static and Dynamic DNNs, Fluid DyDNNs utilize a novel nested incremental training algorithm to enable independent and combined operation of its sub-networks, enhancing system reliability and adaptability. Evaluation on embedded Arm CPUs with a DNN model and the MNIST dataset, shows that in scenarios of single device failure, Fluid DyDNNs ensure continued inference, whereas Static and Dynamic DNNs fail. When devices are fully operational, Fluid DyDNNs can operate in either a High-Accuracy mode and achieve comparable accuracy with Static DNNs, or in a High-Throughput mode and achieve 2.5x and 2x throughput compared with Static and Dynamic DNNs, respectively. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: Accepted at Design, Automation & Test in Europe Conference (DATE) 2024

arXiv:2401.04290 [pdf, other]

StarCraftImage: A Dataset For Prototyping Spatial Reasoning Methods For Multi-Agent Environments

Authors: Sean Kulinski, Nicholas R. Waytowich, James Z. Hare, David I. Inouye

Abstract: Spatial reasoning tasks in multi-agent environments such as event prediction, agent type identification, or missing data imputation are important for multiple applications (e.g., autonomous surveillance over sensor networks and subtasks for reinforcement learning (RL)). StarCraft II game replays encode intelligent (and adversarial) multi-agent behavior and could provide a testbed for these tasks;… ▽ More Spatial reasoning tasks in multi-agent environments such as event prediction, agent type identification, or missing data imputation are important for multiple applications (e.g., autonomous surveillance over sensor networks and subtasks for reinforcement learning (RL)). StarCraft II game replays encode intelligent (and adversarial) multi-agent behavior and could provide a testbed for these tasks; however, extracting simple and standardized representations for prototyping these tasks is laborious and hinders reproducibility. In contrast, MNIST and CIFAR10, despite their extreme simplicity, have enabled rapid prototyping and reproducibility of ML methods. Following the simplicity of these datasets, we construct a benchmark spatial reasoning dataset based on StarCraft II replays that exhibit complex multi-agent behaviors, while still being as easy to use as MNIST and CIFAR10. Specifically, we carefully summarize a window of 255 consecutive game states to create 3.6 million summary images from 60,000 replays, including all relevant metadata such as game outcome and player races. We develop three formats of decreasing complexity: Hyperspectral images that include one channel for every unit type (similar to multispectral geospatial images), RGB images that mimic CIFAR10, and grayscale images that mimic MNIST. We show how this dataset can be used for prototyping spatial reasoning methods. All datasets, code for extraction, and code for dataset loading can be found at https://starcraftdata.davidinouye.com △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: Published in CVPR 23'

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023

arXiv:2311.04740 [pdf, other]

Enhancing Multi-Agent Coordination through Common Operating Picture Integration

Authors: Peihong Yu, Bhoram Lee, Aswin Raghavan, Supun Samarasekara, Pratap Tokekar, James Zachary Hare

Abstract: In multi-agent systems, agents possess only local observations of the environment. Communication between teammates becomes crucial for enhancing coordination. Past research has primarily focused on encoding local information into embedding messages which are unintelligible to humans. We find that using these messages in agent's policy learning leads to brittle policies when tested on out-of-distri… ▽ More In multi-agent systems, agents possess only local observations of the environment. Communication between teammates becomes crucial for enhancing coordination. Past research has primarily focused on encoding local information into embedding messages which are unintelligible to humans. We find that using these messages in agent's policy learning leads to brittle policies when tested on out-of-distribution initial states. We present an approach to multi-agent coordination, where each agent is equipped with the capability to integrate its (history of) observations, actions and messages received into a Common Operating Picture (COP) and disseminate the COP. This process takes into account the dynamic nature of the environment and the shared mission. We conducted experiments in the StarCraft2 environment to validate our approach. Our results demonstrate the efficacy of COP integration, and show that COP-based training leads to robust policies compared to state-of-the-art Multi-Agent Reinforcement Learning (MARL) methods when faced with out-of-distribution initial states. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: accepted to OODWorkshop@CoRL23; please see https://openreview.net/forum?id=fADcJl0B0P for the paper

arXiv:2211.05624 [pdf, other]

Improving the Robustness of Neural Multiplication Units with Reversible Stochasticity

Authors: Bhumika Mistry, Katayoun Farrahi, Jonathon Hare

Abstract: Multilayer Perceptrons struggle to learn certain simple arithmetic tasks. Specialist neural modules for arithmetic can outperform classical architectures with gains in extrapolation, interpretability and convergence speeds, but are highly sensitive to the training range. In this paper, we show that Neural Multiplication Units (NMUs) are unable to reliably learn tasks as simple as multiplying two i… ▽ More Multilayer Perceptrons struggle to learn certain simple arithmetic tasks. Specialist neural modules for arithmetic can outperform classical architectures with gains in extrapolation, interpretability and convergence speeds, but are highly sensitive to the training range. In this paper, we show that Neural Multiplication Units (NMUs) are unable to reliably learn tasks as simple as multiplying two inputs when given different training ranges. Causes of failure are linked to inductive and input biases which encourage convergence to solutions in undesirable optima. A solution, the stochastic NMU (sNMU), is proposed to apply reversible stochasticity, encouraging avoidance of such optima whilst converging to the true solution. Empirically, we show that stochasticity provides improved robustness with the potential to improve learned representations of upstream networks for numerical and image tasks. △ Less

Submitted 10 November, 2022; originally announced November 2022.

Comments: 26 pages (10 page main body)

arXiv:2206.02525 [pdf, other]

Dynamic DNNs Meet Runtime Resource Management on Mobile and Embedded Platforms

Authors: Lei Xun, Bashir M. Al-Hashimi, Jonathon Hare, Geoff V. Merrett

Abstract: Deep neural network (DNN) inference is increasingly being executed on mobile and embedded platforms due to low latency and better privacy. However, efficient deployment on these platforms is challenging due to the intensive computation and memory access. We propose a holistic system design for DNN performance and energy optimisation, combining the trade-off opportunities in both algorithms and har… ▽ More Deep neural network (DNN) inference is increasingly being executed on mobile and embedded platforms due to low latency and better privacy. However, efficient deployment on these platforms is challenging due to the intensive computation and memory access. We propose a holistic system design for DNN performance and energy optimisation, combining the trade-off opportunities in both algorithms and hardware. The system can be viewed as three abstract layers: the device layer contains heterogeneous computing resources; the application layer has multiple concurrent workloads; and the runtime resource management layer monitors the dynamically changing algorithms' performance targets as well as hardware resources and constraints, and tries to meet them by tuning the algorithm and hardware at the same time. Moreover, We illustrate the runtime approach through a dynamic version of 'once-for-all network' (namely Dynamic-OFA), which can scale the ConvNet architecture to fit heterogeneous computing resources efficiently and has good generalisation for different model architectures such as Transformer. Compared to the state-of-the-art Dynamic DNNs, our experimental results using ImageNet on a Jetson Xavier NX show that the Dynamic-OFA is up to 3.5x (CPU), 2.4x (GPU) faster for similar ImageNet Top-1 accuracy, or 3.8% (CPU), 5.1% (GPU) higher accuracy at similar latency. Furthermore, compared with Linux governor (e.g. performance, schedutil), our runtime approach reduces the energy consumption by 16.5% at similar latency. △ Less

Submitted 6 June, 2022; v1 submitted 17 May, 2022; originally announced June 2022.

Comments: Accepted as a presentation at Fourth UK Mobile, Wearable and Ubiquitous Systems Research Symposium (MobiUK 2022)

arXiv:2205.05784 [pdf, other]

Learning to Guide Multiple Heterogeneous Actors from a Single Human Demonstration via Automatic Curriculum Learning in StarCraft II

Authors: Nicholas Waytowich, James Hare, Vinicius G. Goecks, Mark Mittrick, John Richardson, Anjon Basak, Derrik E. Asher

Abstract: Traditionally, learning from human demonstrations via direct behavior cloning can lead to high-performance policies given that the algorithm has access to large amounts of high-quality data covering the most likely scenarios to be encountered when the agent is operating. However, in real-world scenarios, expert data is limited and it is desired to train an agent that learns a behavior policy gener… ▽ More Traditionally, learning from human demonstrations via direct behavior cloning can lead to high-performance policies given that the algorithm has access to large amounts of high-quality data covering the most likely scenarios to be encountered when the agent is operating. However, in real-world scenarios, expert data is limited and it is desired to train an agent that learns a behavior policy general enough to handle situations that were not demonstrated by the human expert. Another alternative is to learn these policies with no supervision via deep reinforcement learning, however, these algorithms require a large amount of computing time to perform well on complex tasks with high-dimensional state and action spaces, such as those found in StarCraft II. Automatic curriculum learning is a recent mechanism comprised of techniques designed to speed up deep reinforcement learning by adjusting the difficulty of the current task to be solved according to the agent's current capabilities. Designing a proper curriculum, however, can be challenging for sufficiently complex tasks, and thus we leverage human demonstrations as a way to guide agent exploration during training. In this work, we aim to train deep reinforcement learning agents that can command multiple heterogeneous actors where starting positions and overall difficulty of the task are controlled by an automatically-generated curriculum from a single human demonstration. Our results show that an agent trained via automated curriculum learning can outperform state-of-the-art deep reinforcement learning baselines and match the performance of the human expert in a simulated command and control task in StarCraft II modeled over a real military scenario. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: Submitted to the 2022 SPIE Defense + Commercial Sensing (DCS) Conference on "Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications IV"

ACM Class: I.2.6; I.2.11

arXiv:2202.07052 [pdf, other]

Orthogonalising gradients to speed up neural network optimisation

Authors: Mark Tuddenham, Adam Prügel-Bennett, Jonathan Hare

Abstract: The optimisation of neural networks can be sped up by orthogonalising the gradients before the optimisation step, ensuring the diversification of the learned representations. We orthogonalise the gradients of the layer's components/filters with respect to each other to separate out the intermediate representations. Our method of orthogonalisation allows the weights to be used more flexibly, in con… ▽ More The optimisation of neural networks can be sped up by orthogonalising the gradients before the optimisation step, ensuring the diversification of the learned representations. We orthogonalise the gradients of the layer's components/filters with respect to each other to separate out the intermediate representations. Our method of orthogonalisation allows the weights to be used more flexibly, in contrast to restricting the weights to an orthogonalised sub-space. We tested this method on ImageNet and CIFAR-10 resulting in a large decrease in learning time, and also obtain a speed-up on the semi-supervised learning BarlowTwins. We obtain similar accuracy to SGD without fine-tuning and better accuracy for naïvely chosen hyper-parameters. △ Less

Submitted 14 February, 2022; originally announced February 2022.

arXiv:2201.08142 [pdf, other]

Physically Embodied Deep Image Optimisation

Authors: Daniela Mihai, Jonathon Hare

Abstract: Physical sketches are created by learning programs to control a drawing robot. A differentiable rasteriser is used to optimise sets of drawing strokes to match an input image, using deep networks to provide an encoding for which we can compute a loss. The optimised drawing primitives can then be translated into G-code commands which command a robot to draw the image using drawing instruments such… ▽ More Physical sketches are created by learning programs to control a drawing robot. A differentiable rasteriser is used to optimise sets of drawing strokes to match an input image, using deep networks to provide an encoding for which we can compute a loss. The optimised drawing primitives can then be translated into G-code commands which command a robot to draw the image using drawing instruments such as pens and pencils on a physical support medium. △ Less

Submitted 20 January, 2022; originally announced January 2022.

Journal ref: 5th Workshop on Machine Learning for Creativity and Design of the Neural Information Processing Systems (NeurIPS) 2021 Conference

arXiv:2110.08203 [pdf, other]

Shared Visual Representations of Drawing for Communication: How do different biases affect human interpretability and intent?

Authors: Daniela Mihai, Jonathon Hare

Abstract: We present an investigation into how representational losses can affect the drawings produced by artificial agents playing a communication game. Building upon recent advances, we show that a combination of powerful pretrained encoder networks, with appropriate inductive biases, can lead to agents that draw recognisable sketches, whilst still communicating well. Further, we start to develop an appr… ▽ More We present an investigation into how representational losses can affect the drawings produced by artificial agents playing a communication game. Building upon recent advances, we show that a combination of powerful pretrained encoder networks, with appropriate inductive biases, can lead to agents that draw recognisable sketches, whilst still communicating well. Further, we start to develop an approach to help automatically analyse the semantic content being conveyed by a sketch and demonstrate that current approaches to inducing perceptual biases lead to a notion of objectness being a key feature despite the agent training being self-supervised. △ Less

Submitted 20 January, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

Journal ref: 3rd Workshop on Shared Visual Representations in Human and Machine Intelligence (SVRHM 2021) of the Neural Information Processing Systems (NeurIPS) conference

arXiv:2110.05177 [pdf, other]

Learning Division with Neural Arithmetic Logic Modules

Authors: Bhumika Mistry, Katayoun Farrahi, Jonathon Hare

Abstract: To achieve systematic generalisation, it first makes sense to master simple tasks such as arithmetic. Of the four fundamental arithmetic operations (+,-,$\times$,$÷$), division is considered the most difficult for both humans and computers. In this paper we show that robustly learning division in a systematic manner remains a challenge even at the simplest level of dividing two numbers. We propose… ▽ More To achieve systematic generalisation, it first makes sense to master simple tasks such as arithmetic. Of the four fundamental arithmetic operations (+,-,$\times$,$÷$), division is considered the most difficult for both humans and computers. In this paper we show that robustly learning division in a systematic manner remains a challenge even at the simplest level of dividing two numbers. We propose two novel approaches for division which we call the Neural Reciprocal Unit (NRU) and the Neural Multiplicative Reciprocal Unit (NMRU), and present improvements for an existing division module, the Real Neural Power Unit (Real NPU). Experiments in learning division with input redundancy on 225 different training sets, find that our proposed modifications to the Real NPU obtains an average success of 85.3$\%$ improving over the original by 15.1$\%$. In light of the suggestion above, our NMRU approach can further improve the success to 91.6$\%$. △ Less

Submitted 12 October, 2021; v1 submitted 11 October, 2021; originally announced October 2021.

Comments: 28 pages, 24 figures. New experiments included(Section 7 and Appendix G)

arXiv:2109.09495 [pdf, other]

GhostShiftAddNet: More Features from Energy-Efficient Operations

Authors: Jia Bi, Jonathon Hare, Geoff V. Merrett

Abstract: Deep convolutional neural networks (CNNs) are computationally and memory intensive. In CNNs, intensive multiplication can have resource implications that may challenge the ability for effective deployment of inference on resource-constrained edge devices. This paper proposes GhostShiftAddNet, where the motivation is to implement a hardware-efficient deep network: a multiplication-free CNN with few… ▽ More Deep convolutional neural networks (CNNs) are computationally and memory intensive. In CNNs, intensive multiplication can have resource implications that may challenge the ability for effective deployment of inference on resource-constrained edge devices. This paper proposes GhostShiftAddNet, where the motivation is to implement a hardware-efficient deep network: a multiplication-free CNN with fewer redundant features. We introduce a new bottleneck block, GhostSA, that converts all multiplications in the block to cheap operations. The bottleneck uses an appropriate number of bit-shift filters to process intrinsic feature maps, then applies a series of transformations that consist of bit-wise shifts with addition operations to generate more feature maps that fully learn to capture information underlying intrinsic features. We schedule the number of bit-shift and addition operations for different hardware platforms. We conduct extensive experiments and ablation studies with desktop and embedded (Jetson Nano) devices for implementation and measurements. We demonstrate the proposed GhostSA block can replace bottleneck blocks in the backbone of state-of-the-art networks architectures and gives improved performance on image classification benchmarks. Further, our GhostShiftAddNet can achieve higher classification accuracy with fewer FLOPs and parameters (reduced by up to 3x) than GhostNet. When compared to GhostNet, inference latency on the Jetson Nano is improved by 1.3x and 2x on the GPU and CPU respectively. Code is available open-source on \url{https://github.com/JIABI/GhostShiftAddNet}. △ Less

Submitted 3 February, 2022; v1 submitted 20 September, 2021; originally announced September 2021.

Journal ref: The 32nd British Machine Vision Conference BMVC 2021

arXiv:2107.12021 [pdf, other]

Language Models as Zero-shot Visual Semantic Learners

Authors: Yue Jiao, Jonathon Hare, Adam Prügel-Bennett

Abstract: Visual Semantic Embedding (VSE) models, which map images into a rich semantic embedding space, have been a milestone in object recognition and zero-shot learning. Current approaches to VSE heavily rely on static word em-bedding techniques. In this work, we propose a Visual Se-mantic Embedding Probe (VSEP) designed to probe the semantic information of contextualized word embeddings in visual semant… ▽ More Visual Semantic Embedding (VSE) models, which map images into a rich semantic embedding space, have been a milestone in object recognition and zero-shot learning. Current approaches to VSE heavily rely on static word em-bedding techniques. In this work, we propose a Visual Se-mantic Embedding Probe (VSEP) designed to probe the semantic information of contextualized word embeddings in visual semantic understanding tasks. We show that the knowledge encoded in transformer language models can be exploited for tasks requiring visual semantic understanding.The VSEP with contextual representations can distinguish word-level object representations in complicated scenes as a compositional zero-shot learner. We further introduce a zero-shot setting with VSEPs to evaluate a model's ability to associate a novel word with a novel visual category. We find that contextual representations in language mod-els outperform static word embeddings, when the compositional chain of object is short. We notice that current visual semantic embedding models lack a mutual exclusivity bias which limits their performance. △ Less

Submitted 26 July, 2021; originally announced July 2021.

arXiv:2107.11991 [pdf, other]

What Remains of Visual Semantic Embeddings

Authors: Yue Jiao, Jonathon Hare, Adam Prügel-Bennett

Abstract: Zero shot learning (ZSL) has seen a surge in interest over the decade for its tight links with the mechanism making young children recognize novel objects. Although different paradigms of visual semantic embedding models are designed to align visual features and distributed word representations, it is unclear to what extent current ZSL models encode semantic information from distributed word repre… ▽ More Zero shot learning (ZSL) has seen a surge in interest over the decade for its tight links with the mechanism making young children recognize novel objects. Although different paradigms of visual semantic embedding models are designed to align visual features and distributed word representations, it is unclear to what extent current ZSL models encode semantic information from distributed word representations. In this work, we introduce the split of tiered-ImageNet to the ZSL task, in order to avoid the structural flaws in the standard ImageNet benchmark. We build a unified framework for ZSL with contrastive learning as pre-training, which guarantees no semantic information leakage and encourages linearly separable visual features. Our work makes it fair for evaluating visual semantic embedding models on a ZSL setting in which semantic inference is decisive. With this framework, we show that current ZSL models struggle with encoding semantic relationships from word analogy and word hierarchy. Our analyses provide motivation for exploring the role of context language representations in ZSL tasks. △ Less

Submitted 26 July, 2021; originally announced July 2021.

arXiv:2107.08199 [pdf, other]

Dynamic Transformer for Efficient Machine Translation on Embedded Devices

Authors: Hishan Parry, Lei Xun, Amin Sabet, Jia Bi, Jonathon Hare, Geoff V. Merrett

Abstract: The Transformer architecture is widely used for machine translation tasks. However, its resource-intensive nature makes it challenging to implement on constrained embedded devices, particularly where available hardware resources can vary at run-time. We propose a dynamic machine translation model that scales the Transformer architecture based on the available resources at any particular time. The… ▽ More The Transformer architecture is widely used for machine translation tasks. However, its resource-intensive nature makes it challenging to implement on constrained embedded devices, particularly where available hardware resources can vary at run-time. We propose a dynamic machine translation model that scales the Transformer architecture based on the available resources at any particular time. The proposed approach, 'Dynamic-HAT', uses a HAT SuperTransformer as the backbone to search for SubTransformers with different accuracy-latency trade-offs at design time. The optimal SubTransformers are sampled from the SuperTransformer at run-time, depending on latency constraints. The Dynamic-HAT is tested on the Jetson Nano and the approach uses inherited SubTransformers sampled directly from the SuperTransformer with a switching time of <1s. Using inherited SubTransformers results in a BLEU score loss of <1.5% because the SubTransformer configuration is not retrained from scratch after sampling. However, to recover this loss in performance, the dimensions of the design space can be reduced to tailor it to a family of target hardware. The new reduced design space results in a BLEU score increase of approximately 1% for sub-optimal models from the original design space, with a wide range for performance scaling between 0.356s - 1.526s for the GPU and 2.9s - 7.31s for the CPU. △ Less

Submitted 30 July, 2021; v1 submitted 17 July, 2021; originally announced July 2021.

Comments: Accepted at MLCAD 2021

arXiv:2106.11208 [pdf, other]

Temporal Early Exits for Efficient Video Object Detection

Authors: Amin Sabet, Jonathon Hare, Bashir Al-Hashimi, Geoff V. Merrett

Abstract: Transferring image-based object detectors to the domain of video remains challenging under resource constraints. Previous efforts utilised optical flow to allow unchanged features to be propagated, however, the overhead is considerable when working with very slowly changing scenes from applications such as surveillance. In this paper, we propose temporal early exits to reduce the computational com… ▽ More Transferring image-based object detectors to the domain of video remains challenging under resource constraints. Previous efforts utilised optical flow to allow unchanged features to be propagated, however, the overhead is considerable when working with very slowly changing scenes from applications such as surveillance. In this paper, we propose temporal early exits to reduce the computational complexity of per-frame video object detection. Multiple temporal early exit modules with low computational overhead are inserted at early layers of the backbone network to identify the semantic differences between consecutive frames. Full computation is only required if the frame is identified as having a semantic change to previous frames; otherwise, detection results from previous frames are reused. Experiments on CDnet show that our method significantly reduces the computational complexity and execution of per-frame video object detection up to $34 \times$ compared to existing methods with an acceptable reduction of 2.2\% in mAP. △ Less

Submitted 21 June, 2021; originally announced June 2021.

arXiv:2106.02067 [pdf, other]

Learning to Draw: Emergent Communication through Sketching

Authors: Daniela Mihai, Jonathon Hare

Abstract: Evidence that visual communication preceded written language and provided a basis for it goes back to prehistory, in forms such as cave and rock paintings depicting traces of our distant ancestors. Emergent communication research has sought to explore how agents can learn to communicate in order to collaboratively solve tasks. Existing research has focused on language, with a learned communication… ▽ More Evidence that visual communication preceded written language and provided a basis for it goes back to prehistory, in forms such as cave and rock paintings depicting traces of our distant ancestors. Emergent communication research has sought to explore how agents can learn to communicate in order to collaboratively solve tasks. Existing research has focused on language, with a learned communication channel transmitting sequences of discrete tokens between the agents. In this work, we explore a visual communication channel between agents that are allowed to draw with simple strokes. Our agents are parameterised by deep neural networks, and the drawing procedure is differentiable, allowing for end-to-end training. In the framework of a referential communication game, we demonstrate that agents can not only successfully learn to communicate by drawing, but with appropriate inductive biases, can do so in a fashion that humans can interpret. We hope to encourage future research to consider visual communication as a more flexible and directly interpretable alternative of training collaborative agents. △ Less

Submitted 9 November, 2021; v1 submitted 3 June, 2021; originally announced June 2021.

arXiv:2105.03596 [pdf, other]

Dynamic-OFA: Runtime DNN Architecture Switching for Performance Scaling on Heterogeneous Embedded Platforms

Authors: Wei Lou, Lei Xun, Amin Sabet, Jia Bi, Jonathon Hare, Geoff V. Merrett

Abstract: Mobile and embedded platforms are increasingly required to efficiently execute computationally demanding DNNs across heterogeneous processing elements. At runtime, the available hardware resources to DNNs can vary considerably due to other concurrently running applications. The performance requirements of the applications could also change under different scenarios. To achieve the desired performa… ▽ More Mobile and embedded platforms are increasingly required to efficiently execute computationally demanding DNNs across heterogeneous processing elements. At runtime, the available hardware resources to DNNs can vary considerably due to other concurrently running applications. The performance requirements of the applications could also change under different scenarios. To achieve the desired performance, dynamic DNNs have been proposed in which the number of channels/layers can be scaled in real time to meet different requirements under varying resource constraints. However, the training process of such dynamic DNNs can be costly, since platform-aware models of different deployment scenarios must be retrained to become dynamic. This paper proposes Dynamic-OFA, a novel dynamic DNN approach for state-of-the-art platform-aware NAS models (i.e. Once-for-all network (OFA)). Dynamic-OFA pre-samples a family of sub-networks from a static OFA backbone model, and contains a runtime manager to choose different sub-networks under different runtime environments. As such, Dynamic-OFA does not need the traditional dynamic DNN training pipeline. Compared to the state-of-the-art, our experimental results using ImageNet on a Jetson Xavier NX show that the approach is up to 3.5x (CPU), 2.4x (GPU) faster for similar ImageNet Top-1 accuracy, or 3.8% (CPU), 5.1% (GPU) higher accuracy at similar latency. △ Less

Submitted 11 May, 2021; v1 submitted 8 May, 2021; originally announced May 2021.

Comments: Accepted at CVPR ECV Workshop 2021

arXiv:2103.16194 [pdf, other]

Differentiable Drawing and Sketching

Authors: Daniela Mihai, Jonathon Hare

Abstract: We present a bottom-up differentiable relaxation of the process of drawing points, lines and curves into a pixel raster. Our approach arises from the observation that rasterising a pixel in an image given parameters of a primitive can be reformulated in terms of the primitive's distance transform, and then relaxed to allow the primitive's parameters to be learned. This relaxation allows end-to-end… ▽ More We present a bottom-up differentiable relaxation of the process of drawing points, lines and curves into a pixel raster. Our approach arises from the observation that rasterising a pixel in an image given parameters of a primitive can be reformulated in terms of the primitive's distance transform, and then relaxed to allow the primitive's parameters to be learned. This relaxation allows end-to-end differentiable programs and deep networks to be learned and optimised and provides several building blocks that allow control over how a compositional drawing process is modelled. We emphasise the bottom-up nature of our proposed approach, which allows for drawing operations to be composed in ways that can mimic the physical reality of drawing rather than being tied to, for example, approaches in modern computer graphics. With the proposed approach we demonstrate how sketches can be generated by directly optimising against photographs and how auto-encoders can be built to transform rasterised handwritten digits into vectors without supervision. Extensive experimental results highlight the power of this approach under different modelling assumptions for drawing tasks. △ Less

Submitted 19 July, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

arXiv:2101.11134 [pdf, other]

Open-domain Topic Identification of Out-of-domain Utterances using Wikipedia

Authors: A. Augustin, A. Papangelis, M. Kotti, P. Vougiouklis, J. Hare, N. Braunschweiler

Abstract: Users of spoken dialogue systems (SDS) expect high quality interactions across a wide range of diverse topics. However, the implementation of SDS capable of responding to every conceivable user utterance in an informative way is a challenging problem. Multi-domain SDS must necessarily identify and deal with out-of-domain (OOD) utterances to generate appropriate responses as users do not always kno… ▽ More Users of spoken dialogue systems (SDS) expect high quality interactions across a wide range of diverse topics. However, the implementation of SDS capable of responding to every conceivable user utterance in an informative way is a challenging problem. Multi-domain SDS must necessarily identify and deal with out-of-domain (OOD) utterances to generate appropriate responses as users do not always know in advance what domains the SDS can handle. To address this problem, we extend the current state-of-the-art in multi-domain SDS by estimating the topic of OOD utterances using external knowledge representation from Wikipedia. Experimental results on real human-to-human dialogues showed that our approach does not degrade domain prediction performance when compared to the base model. But more significantly, our joint training achieves more accurate predictions of the nearest Wikipedia article by up to about 30% when compared to the benchmarks. △ Less

Submitted 26 January, 2021; originally announced January 2021.

arXiv:2101.10253 [pdf, other]

The emergence of visual semantics through communication games

Authors: Daniela Mihai, Jonathon Hare

Abstract: The emergence of communication systems between agents which learn to play referential signalling games with realistic images has attracted a lot of attention recently. The majority of work has focused on using fixed, pretrained image feature extraction networks which potentially bias the information the agents learn to communicate. In this work, we consider a signalling game setting in which a `se… ▽ More The emergence of communication systems between agents which learn to play referential signalling games with realistic images has attracted a lot of attention recently. The majority of work has focused on using fixed, pretrained image feature extraction networks which potentially bias the information the agents learn to communicate. In this work, we consider a signalling game setting in which a `sender' agent must communicate the information about an image to a `receiver' who must select the correct image from many distractors. We investigate the effect of the feature extractor's weights and of the task being solved on the visual semantics learned by the models. We first demonstrate to what extent the use of pretrained feature extraction networks inductively bias the visual semantics conveyed by emergent communication channel and quantify the visual semantics that are induced. We then go on to explore ways in which inductive biases can be introduced to encourage the emergence of semantically meaningful communication without the need for any form of supervised pretraining of the visual feature extractor. We impose various augmentations to the input images and additional tasks in the game with the aim to induce visual representations which capture conceptual properties of images. Through our experiments, we demonstrate that communication systems which capture visual semantics can be learned in a completely self-supervised manner by playing the right types of game. Our work bridges a gap between emergent communication research and self-supervised feature learning. △ Less

Submitted 25 January, 2021; originally announced January 2021.

Comments: arXiv admin note: text overlap with arXiv:1911.05546

arXiv:2101.09530 [pdf, other]

A Primer for Neural Arithmetic Logic Modules

Authors: Bhumika Mistry, Katayoun Farrahi, Jonathon Hare

Abstract: Neural Arithmetic Logic Modules have become a growing area of interest, though remain a niche field. These modules are neural networks which aim to achieve systematic generalisation in learning arithmetic and/or logic operations such as $\{+, -, \times, ÷, \leq, \textrm{AND}\}$ while also being interpretable. This paper is the first in discussing the current state of progress of this field, explai… ▽ More Neural Arithmetic Logic Modules have become a growing area of interest, though remain a niche field. These modules are neural networks which aim to achieve systematic generalisation in learning arithmetic and/or logic operations such as $\{+, -, \times, ÷, \leq, \textrm{AND}\}$ while also being interpretable. This paper is the first in discussing the current state of progress of this field, explaining key works, starting with the Neural Arithmetic Logic Unit (NALU). Focusing on the shortcomings of the NALU, we provide an in-depth analysis to reason about design choices of recent modules. A cross-comparison between modules is made on experiment setups and findings, where we highlight inconsistencies in a fundamental experiment causing the inability to directly compare across papers. To alleviate the existing inconsistencies, we create a benchmark which compares all existing arithmetic NALMs. We finish by providing a novel discussion of existing applications for NALU and research directions requiring further exploration. △ Less

Submitted 8 August, 2022; v1 submitted 23 January, 2021; originally announced January 2021.

Comments: JMLR Accepted Version, 58 pages

Journal ref: Journal of Machine Learning Research 23 (2022) 1-58

arXiv:2012.01938 [pdf, other]

Quasi-Newton's method in the class gradient defined high-curvature subspace

Authors: Mark Tuddenham, Adam Prügel-Bennett, Jonathan Hare

Abstract: Classification problems using deep learning have been shown to have a high-curvature subspace in the loss landscape equal in dimension to the number of classes. Moreover, this subspace corresponds to the subspace spanned by the logit gradients for each class. An obvious strategy to speed up optimisation would be to use Newton's method in the high-curvature subspace and stochastic gradient descent… ▽ More Classification problems using deep learning have been shown to have a high-curvature subspace in the loss landscape equal in dimension to the number of classes. Moreover, this subspace corresponds to the subspace spanned by the logit gradients for each class. An obvious strategy to speed up optimisation would be to use Newton's method in the high-curvature subspace and stochastic gradient descent in the co-space. We show that a naive implementation actually slows down convergence and we speculate why this might be. △ Less

Submitted 28 November, 2020; originally announced December 2020.

Journal ref: OPT2020: 12th Annual Workshop on Optimization for Machine Learning

arXiv:2011.10669 [pdf, other]

A General Framework for Distributed Inference with Uncertain Models

Authors: James Z. Hare, Cesar A. Uribe, Lance Kaplan, Ali Jadbabaie

Abstract: This paper studies the problem of distributed classification with a network of heterogeneous agents. The agents seek to jointly identify the underlying target class that best describes a sequence of observations. The problem is first abstracted to a hypothesis-testing framework, where we assume that the agents seek to agree on the hypothesis (target class) that best matches the distribution of obs… ▽ More This paper studies the problem of distributed classification with a network of heterogeneous agents. The agents seek to jointly identify the underlying target class that best describes a sequence of observations. The problem is first abstracted to a hypothesis-testing framework, where we assume that the agents seek to agree on the hypothesis (target class) that best matches the distribution of observations. Non-Bayesian social learning theory provides a framework that solves this problem in an efficient manner by allowing the agents to sequentially communicate and update their beliefs for each hypothesis over the network. Most existing approaches assume that agents have access to exact statistical models for each hypothesis. However, in many practical applications, agents learn the likelihood models based on limited data, which induces uncertainty in the likelihood function parameters. In this work, we build upon the concept of uncertain models to incorporate the agents' uncertainty in the likelihoods by identifying a broad set of parametric distribution that allows the agents' beliefs to converge to the same result as a centralized approach. Furthermore, we empirically explore extensions to non-parametric models to provide a generalized framework of uncertain models in non-Bayesian social learning. △ Less

Submitted 20 November, 2020; originally announced November 2020.

arXiv:2010.02634 [pdf, other]

How Convolutional Neural Network Architecture Biases Learned Opponency and Colour Tuning

Authors: Ethan Harris, Daniela Mihai, Jonathon Hare

Abstract: Recent work suggests that changing Convolutional Neural Network (CNN) architecture by introducing a bottleneck in the second layer can yield changes in learned function. To understand this relationship fully requires a way of quantitatively comparing trained networks. The fields of electrophysiology and psychophysics have developed a wealth of methods for characterising visual systems which permit… ▽ More Recent work suggests that changing Convolutional Neural Network (CNN) architecture by introducing a bottleneck in the second layer can yield changes in learned function. To understand this relationship fully requires a way of quantitatively comparing trained networks. The fields of electrophysiology and psychophysics have developed a wealth of methods for characterising visual systems which permit such comparisons. Inspired by these methods, we propose an approach to obtaining spatial and colour tuning curves for convolutional neurons, which can be used to classify cells in terms of their spatial and colour opponency. We perform these classifications for a range of CNNs with different depths and bottleneck widths. Our key finding is that networks with a bottleneck show a strong functional organisation: almost all cells in the bottleneck layer become both spatially and colour opponent, cells in the layer following the bottleneck become non-opponent. The colour tuning data can further be used to form a rich understanding of how colour is encoded by a network. As a concrete demonstration, we show that shallower networks without a bottleneck learn a complex non-linear colour system, whereas deeper networks with tight bottlenecks learn a simple channel opponent code in the bottleneck layer. We further develop a method of obtaining a hue sensitivity curve for a trained CNN which enables high level insights that complement the low level findings from the colour tuning data. We go on to train a series of networks under different conditions to ascertain the robustness of the discussed results. Ultimately, our methods and findings coalesce with prior art, strengthening our ability to interpret trained CNNs and furthering our understanding of the connection between architecture and learned representation. Code for all experiments is available at https://github.com/ecs-vlc/opponency. △ Less

Submitted 6 October, 2020; originally announced October 2020.

Comments: Final version; Accepted for publication in Neural Computation

arXiv:2008.07922 [pdf, other]

Linear Disentangled Representations and Unsupervised Action Estimation

Authors: Matthew Painter, Jonathon Hare, Adam Prugel-Bennett

Abstract: Disentangled representation learning has seen a surge in interest over recent times, generally focusing on new models which optimise one of many disparate disentanglement metrics. Symmetry Based Disentangled Representation learning introduced a robust mathematical framework that defined precisely what is meant by a "linear disentangled representation". This framework determined that such represent… ▽ More Disentangled representation learning has seen a surge in interest over recent times, generally focusing on new models which optimise one of many disparate disentanglement metrics. Symmetry Based Disentangled Representation learning introduced a robust mathematical framework that defined precisely what is meant by a "linear disentangled representation". This framework determined that such representations would depend on a particular decomposition of the symmetry group acting on the data, showing that actions would manifest through irreducible group representations acting on independent representational subspaces. Caselles-Dupre et al [2019] subsequently proposed the first model to induce and demonstrate a linear disentangled representation in a VAE model. In this work we empirically show that linear disentangled representations are not generally present in standard VAE models and that they instead require altering the loss landscape to induce them. We proceed to show that such representations are a desirable property with regard to classical disentanglement metrics. Finally we propose a method to induce irreducible representations which forgoes the need for labelled action sequences, as was required by prior work. We explore a number of properties of this method, including the ability to learn from action sequences without knowledge of intermediate states and robustness under visual noise. We also demonstrate that it can successfully learn 4 independent symmetries directly from pixels. △ Less

Submitted 15 December, 2020; v1 submitted 18 August, 2020; originally announced August 2020.

Journal ref: Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

arXiv:2002.12047 [pdf, other]

FMix: Enhancing Mixed Sample Data Augmentation

Authors: Ethan Harris, Antonia Marcu, Matthew Painter, Mahesan Niranjan, Adam Prügel-Bennett, Jonathon Hare

Abstract: Mixed Sample Data Augmentation (MSDA) has received increasing attention in recent years, with many successful variants such as MixUp and CutMix. By studying the mutual information between the function learned by a VAE on the original data and on the augmented data we show that MixUp distorts learned functions in a way that CutMix does not. We further demonstrate this by showing that MixUp acts as… ▽ More Mixed Sample Data Augmentation (MSDA) has received increasing attention in recent years, with many successful variants such as MixUp and CutMix. By studying the mutual information between the function learned by a VAE on the original data and on the augmented data we show that MixUp distorts learned functions in a way that CutMix does not. We further demonstrate this by showing that MixUp acts as a form of adversarial training, increasing robustness to attacks such as Deep Fool and Uniform Noise which produce examples similar to those generated by MixUp. We argue that this distortion prevents models from learning about sample specific features in the data, aiding generalisation performance. In contrast, we suggest that CutMix works more like a traditional augmentation, improving performance by preventing memorisation without distorting the data distribution. However, we argue that an MSDA which builds on CutMix to include masks of arbitrary shape, rather than just square, could further prevent memorisation whilst preserving the data distribution in the same way. To this end, we propose FMix, an MSDA that uses random binary masks obtained by applying a threshold to low frequency images sampled from Fourier space. These random masks can take on a wide range of shapes and can be generated for use with one, two, and three dimensional data. FMix improves performance over MixUp and CutMix, without an increase in training time, for a number of models across a range of data sets and problem settings, obtaining a new single model state-of-the-art result on CIFAR-10 without external data. Finally, we show that a consequence of the difference between interpolating MSDA such as MixUp and masking MSDA such as FMix is that the two can be combined to improve performance even further. Code for all experiments is provided at https://github.com/ecs-vlc/FMix . △ Less

Submitted 28 February, 2021; v1 submitted 27 February, 2020; originally announced February 2020.

Comments: Code available at https://github.com/ecs-vlc/FMix

arXiv:1911.05546 [pdf, other]

Avoiding hashing and encouraging visual semantics in referential emergent language games

Authors: Daniela Mihai, Jonathon Hare

Abstract: There has been an increasing interest in the area of emergent communication between agents which learn to play referential signalling games with realistic images. In this work, we consider the signalling game setting of Havrylov and Titov and investigate the effect of the feature extractor's weights and of the task being solved on the visual semantics learned or captured by the models. We impose v… ▽ More There has been an increasing interest in the area of emergent communication between agents which learn to play referential signalling games with realistic images. In this work, we consider the signalling game setting of Havrylov and Titov and investigate the effect of the feature extractor's weights and of the task being solved on the visual semantics learned or captured by the models. We impose various augmentation to the input images and additional tasks in the game with the aim to induce visual representations which capture conceptual properties of images. Through our set of experiments, we demonstrate that communication systems which capture visual semantics can be learned in a completely self-supervised manner by playing the right types of game. △ Less

Submitted 13 November, 2019; originally announced November 2019.

Comments: 4 pages, presented at Emergent Communication: Towards Natural Language workshop (NeurIPS 2019)

arXiv:1910.11251 [pdf, other]

Non-Bayesian Social Learning with Gaussian Uncertain Models

Authors: James Z. Hare, Cesar Uribe, Lance Kaplan, Ali Jadbabaie

Abstract: Non-Bayesian social learning theory provides a framework for distributed inference of a group of agents interacting over a social network by sequentially communicating and updating beliefs about the unknown state of the world through likelihood updates from their observations. Typically, likelihood models are assumed known precisely. However, in many situations the models are generated from sparse… ▽ More Non-Bayesian social learning theory provides a framework for distributed inference of a group of agents interacting over a social network by sequentially communicating and updating beliefs about the unknown state of the world through likelihood updates from their observations. Typically, likelihood models are assumed known precisely. However, in many situations the models are generated from sparse training data due to lack of data availability, high cost of collection/calibration, limits within the communications network, and/or the high dynamics of the operational environment. Recently, social learning theory was extended to handle those model uncertainties for categorical models. In this paper, we introduce the theory of Gaussian uncertain models and study the properties of the beliefs generated by the network of agents. We show that even with finite amounts of training data, non-Bayesian social learning can be achieved and all agents in the network will converge to a consensus belief that provably identifies the best estimate for the state of the world given the set of prior information. △ Less

Submitted 24 October, 2019; originally announced October 2019.

arXiv:1910.11086 [pdf, other]

Spatial and Colour Opponency in Anatomically Constrained Deep Networks

Authors: Ethan Harris, Daniela Mihai, Jonathon Hare

Abstract: Colour vision has long fascinated scientists, who have sought to understand both the physiology of the mechanics of colour vision and the psychophysics of colour perception. We consider representations of colour in anatomically constrained convolutional deep neural networks. Following ideas from neuroscience, we classify cells in early layers into groups relating to their spectral and spatial func… ▽ More Colour vision has long fascinated scientists, who have sought to understand both the physiology of the mechanics of colour vision and the psychophysics of colour perception. We consider representations of colour in anatomically constrained convolutional deep neural networks. Following ideas from neuroscience, we classify cells in early layers into groups relating to their spectral and spatial functionality. We show the emergence of single and double opponent cells in our networks and characterise how the distribution of these cells changes under the constraint of a retinal bottleneck. Our experiments not only open up a new understanding of how deep networks process spatial and colour information, but also provide new tools to help understand the black box of deep learning. The code for all experiments is avaialable at \url{https://github.com/ecs-vlc/opponency}. △ Less

Submitted 14 October, 2019; originally announced October 2019.

arXiv:1910.10795 [pdf, other]

doi 10.1145/3419755

POSE.R: Prediction-based Opportunistic Sensing for Resilient and Efficient Sensor Networks

Authors: James Z. Hare, Junnan Song, Shalabh Gupta, Thomas A. Wettergren

Abstract: The paper presents a distributed algorithm, called Prediction-based Opportunistic Sensing for Resilient and Efficient Sensor Networks (POSE.R), where the sensor nodes utilize predictions of the targets positions to probabilistically control their multi-modal operating states to track the target. There are two desired features of the algorithm: energy-efficiency and resilience. If the target is tra… ▽ More The paper presents a distributed algorithm, called Prediction-based Opportunistic Sensing for Resilient and Efficient Sensor Networks (POSE.R), where the sensor nodes utilize predictions of the targets positions to probabilistically control their multi-modal operating states to track the target. There are two desired features of the algorithm: energy-efficiency and resilience. If the target is traveling through a high node density area, then an optimal sensor selection approach is employed that maximizes a joint cost function of remaining energy and geometric diversity around the targets position. This provides energy-efficiency and increases the network lifetime while preventing redundant nodes from tracking the target. On the other hand, if the target is traveling through a low node density area or in a coverage gap (e.g., formed by node failures or non-uniform node deployment), then a potential game is played amongst the surrounding nodes to optimally expand their sensing ranges via minimizing energy consumption and maximizing target coverage. This provides resilience, that is the self-healing capability to track the target in the presence of low node densities and coverage gaps. The algorithm is comparatively evaluated against existing approaches through Monte Carlo simulations which demonstrate its superiority in terms of tracking performance, network-resilience and network-lifetime. △ Less

Submitted 2 September, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

Journal ref: ACM Transactions on Sensor Networks, Vol. 17, Issue 1, Article 5, pp. 1-41, 2020

arXiv:1910.09281 [pdf, other]

Dealing with Sparse Rewards in Reinforcement Learning

Authors: Joshua Hare

Abstract: Successfully navigating a complex environment to obtain a desired outcome is a difficult task, that up to recently was believed to be capable only by humans. This perception has been broken down over time, especially with the introduction of deep reinforcement learning, which has greatly increased the difficulty of tasks that can be automated. However, for traditional reinforcement learning agents… ▽ More Successfully navigating a complex environment to obtain a desired outcome is a difficult task, that up to recently was believed to be capable only by humans. This perception has been broken down over time, especially with the introduction of deep reinforcement learning, which has greatly increased the difficulty of tasks that can be automated. However, for traditional reinforcement learning agents this requires an environment to be able to provide frequent extrinsic rewards, which are not known or accessible for many real-world environments. This project aims to explore and contrast existing reinforcement learning solutions that circumnavigate the difficulties of an environment that provide sparse rewards. Different reinforcement solutions will be implemented over a several video game environments with varying difficulty and varying frequency of rewards, as to properly investigate the applicability of these solutions. This project introduces a novel reinforcement learning solution by combining aspects of two existing state of the art sparse reward solutions, curiosity driven exploration and unsupervised auxiliary tasks. △ Less

Submitted 11 November, 2019; v1 submitted 21 October, 2019; originally announced October 2019.

arXiv:1909.09228 [pdf, other]

doi 10.1109/TSP.2020.3006755

Non-Bayesian Social Learning with Uncertain Models

Authors: James Z. Hare, Cesar A. Uribe, Lance Kaplan, Ali Jadbabaie

Abstract: Non-Bayesian social learning theory provides a framework that models distributed inference for a group of agents interacting over a social network. In this framework, each agent iteratively forms and communicates beliefs about an unknown state of the world with their neighbors using a learning rule. Existing approaches assume agents have access to precise statistical models (in the form of likelih… ▽ More Non-Bayesian social learning theory provides a framework that models distributed inference for a group of agents interacting over a social network. In this framework, each agent iteratively forms and communicates beliefs about an unknown state of the world with their neighbors using a learning rule. Existing approaches assume agents have access to precise statistical models (in the form of likelihoods) for the state of the world. However in many situations, such models must be learned from finite data. We propose a social learning rule that takes into account uncertainty in the statistical models using second-order probabilities. Therefore, beliefs derived from uncertain models are sensitive to the amount of past evidence collected for each hypothesis. We characterize how well the hypotheses can be tested on a social network, as consistent or not with the state of the world. We explicitly show the dependency of the generated beliefs with respect to the amount of prior evidence. Moreover, as the amount of prior evidence goes to infinity, learning occurs and is consistent with traditional social learning theory. △ Less

Submitted 27 September, 2019; v1 submitted 9 September, 2019; originally announced September 2019.

arXiv:1909.04255 [pdf, other]

Non-Bayesian Social Learning with Uncertain Models over Time-Varying Directed Graphs

Authors: César A. Uribe, James Z. Hare, Lance Kaplan, Ali Jadbabaie

Abstract: We study the problem of non-Bayesian social learning with uncertain models, in which a network of agents seek to cooperatively identify the state of the world based on a sequence of observed signals. In contrast with the existing literature, we focus our attention on the scenario where the statistical models held by the agents about possible states of the world are built from finite observations.… ▽ More We study the problem of non-Bayesian social learning with uncertain models, in which a network of agents seek to cooperatively identify the state of the world based on a sequence of observed signals. In contrast with the existing literature, we focus our attention on the scenario where the statistical models held by the agents about possible states of the world are built from finite observations. We show that existing non-Bayesian social learning approaches may select a wrong hypothesis with non-zero probability under these conditions. Therefore, we propose a new algorithm to iteratively construct a set of beliefs that indicate whether a certain hypothesis is supported by the empirical evidence. This new algorithm can be implemented over time-varying directed graphs, with non{-}doubly stochastic weights. △ Less

Submitted 9 September, 2019; originally announced September 2019.

Comments: To appear at CDC2019

arXiv:1906.06565 [pdf, other]

Deep Set Prediction Networks

Authors: Yan Zhang, Jonathon Hare, Adam Prügel-Bennett

Abstract: Current approaches for predicting sets from feature vectors ignore the unordered nature of sets and suffer from discontinuity issues as a result. We propose a general model for predicting sets that properly respects the structure of sets and avoids this problem. With a single feature vector as input, we show that our model is able to auto-encode point sets, predict the set of bounding boxes of obj… ▽ More Current approaches for predicting sets from feature vectors ignore the unordered nature of sets and suffer from discontinuity issues as a result. We propose a general model for predicting sets that properly respects the structure of sets and avoids this problem. With a single feature vector as input, we show that our model is able to auto-encode point sets, predict the set of bounding boxes of objects in an image, and predict the set of attributes of these objects. △ Less

Submitted 24 April, 2020; v1 submitted 15 June, 2019; originally announced June 2019.

Comments: Appendix C contains an erratum

Journal ref: Advances in Neural Information Processing Systems 32 (NeurIPS 2019)

arXiv:1906.02795 [pdf, other]

FSPool: Learning Set Representations with Featurewise Sort Pooling

Authors: Yan Zhang, Jonathon Hare, Adam Prügel-Bennett

Abstract: Traditional set prediction models can struggle with simple datasets due to an issue we call the responsibility problem. We introduce a pooling method for sets of feature vectors based on sorting features across elements of the set. This can be used to construct a permutation-equivariant auto-encoder that avoids this responsibility problem. On a toy dataset of polygons and a set version of MNIST, w… ▽ More Traditional set prediction models can struggle with simple datasets due to an issue we call the responsibility problem. We introduce a pooling method for sets of feature vectors based on sorting features across elements of the set. This can be used to construct a permutation-equivariant auto-encoder that avoids this responsibility problem. On a toy dataset of polygons and a set version of MNIST, we show that such an auto-encoder produces considerably better reconstructions and representations. Replacing the pooling function in existing set encoders with FSPool improves accuracy and convergence speed on a variety of datasets. △ Less

Submitted 1 May, 2020; v1 submitted 6 June, 2019; originally announced June 2019.

Comments: Published at International Conference on Learning Representations (ICLR) 2020

arXiv:1901.03665 [pdf, other]

A Biologically Inspired Visual Working Memory for Deep Networks

Authors: Ethan Harris, Mahesan Niranjan, Jonathon Hare

Abstract: The ability to look multiple times through a series of pose-adjusted glimpses is fundamental to human vision. This critical faculty allows us to understand highly complex visual scenes. Short term memory plays an integral role in aggregating the information obtained from these glimpses and informing our interpretation of the scene. Computational models have attempted to address glimpsing and visua… ▽ More The ability to look multiple times through a series of pose-adjusted glimpses is fundamental to human vision. This critical faculty allows us to understand highly complex visual scenes. Short term memory plays an integral role in aggregating the information obtained from these glimpses and informing our interpretation of the scene. Computational models have attempted to address glimpsing and visual attention but have failed to incorporate the notion of memory. We introduce a novel, biologically inspired visual working memory architecture that we term the Hebb-Rosenblatt memory. We subsequently introduce a fully differentiable Short Term Attentive Working Memory model (STAWM) which uses transformational attention to learn a memory over each image it sees. The state of our Hebb-Rosenblatt memory is embedded in STAWM as the weights space of a layer. By projecting different queries through this layer we can obtain goal-oriented latent representations for tasks including classification and visual reconstruction. Our model obtains highly competitive classification performance on MNIST and CIFAR-10. As demonstrated through the CelebA dataset, to perform reconstruction the model learns to make a sequence of updates to a canvas which constitute a parts-based representation. Classification with the self supervised representation obtained from MNIST is shown to be in line with the state of the art models (none of which use a visual attention mechanism). Finally, we show that STAWM can be trained under the dual constraints of classification and reconstruction to provide an interpretable visual sketchpad which helps open the 'black-box' of deep learning. △ Less

Submitted 9 January, 2019; originally announced January 2019.

arXiv:1812.03928 [pdf, other]

Learning Representations of Sets through Optimized Permutations

Authors: Yan Zhang, Jonathon Hare, Adam Prügel-Bennett

Abstract: Representations of sets are challenging to learn because operations on sets should be permutation-invariant. To this end, we propose a Permutation-Optimisation module that learns how to permute a set end-to-end. The permuted set can be further processed to learn a permutation-invariant representation of that set, avoiding a bottleneck in traditional set models. We demonstrate our model's ability t… ▽ More Representations of sets are challenging to learn because operations on sets should be permutation-invariant. To this end, we propose a Permutation-Optimisation module that learns how to permute a set end-to-end. The permuted set can be further processed to learn a permutation-invariant representation of that set, avoiding a bottleneck in traditional set models. We demonstrate our model's ability to learn permutations and set representations with either explicit or implicit supervision on four datasets, on which we achieve state-of-the-art results: number sorting, image mosaics, classification from image mosaics, and visual question answering. △ Less

Submitted 14 January, 2019; v1 submitted 10 December, 2018; originally announced December 2018.

Comments: Published in ICLR 2019

arXiv:1809.03363 [pdf, ps, other]

Torchbearer: A Model Fitting Library for PyTorch

Authors: Ethan Harris, Matthew Painter, Jonathon Hare

Abstract: We introduce torchbearer, a model fitting library for pytorch aimed at researchers working on deep learning or differentiable programming. The torchbearer library provides a high level metric and callback API that can be used for a wide range of applications. We also include a series of built in callbacks that can be used for: model persistence, learning rate decay, logging, data visualization and… ▽ More We introduce torchbearer, a model fitting library for pytorch aimed at researchers working on deep learning or differentiable programming. The torchbearer library provides a high level metric and callback API that can be used for a wide range of applications. We also include a series of built in callbacks that can be used for: model persistence, learning rate decay, logging, data visualization and more. The extensive documentation includes an example library for deep learning and dynamic programming problems and can be found at http://torchbearer.readthedocs.io. The code is licensed under the MIT License and available at https://github.com/ecs-vlc/torchbearer. △ Less

Submitted 10 September, 2018; originally announced September 2018.

Comments: 5 pages

arXiv:1803.07116 [pdf, other]

Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata

Authors: Lucie-Aimée Kaffee, Hady Elsahar, Pavlos Vougiouklis, Christophe Gravier, Frédérique Laforest, Jonathon Hare, Elena Simperl

Abstract: While Wikipedia exists in 287 languages, its content is unevenly distributed among them. In this work, we investigate the generation of open domain Wikipedia summaries in underserved languages using structured data from Wikidata. To this end, we propose a neural network architecture equipped with copy actions that learns to generate single-sentence and comprehensible textual summaries from Wikidat… ▽ More While Wikipedia exists in 287 languages, its content is unevenly distributed among them. In this work, we investigate the generation of open domain Wikipedia summaries in underserved languages using structured data from Wikidata. To this end, we propose a neural network architecture equipped with copy actions that learns to generate single-sentence and comprehensible textual summaries from Wikidata triples. We demonstrate the effectiveness of the proposed approach by evaluating it against a set of baselines on two languages of different natures: Arabic, a morphological rich language with a larger vocabulary than English, and Esperanto, a constructed language known for its easy acquisition. △ Less

Submitted 29 April, 2018; v1 submitted 19 March, 2018; originally announced March 2018.

Comments: NAACL HTL 2018

arXiv:1802.05766 [pdf, other]

Learning to Count Objects in Natural Images for Visual Question Answering

Authors: Yan Zhang, Jonathon Hare, Adam Prügel-Bennett

Abstract: Visual Question Answering (VQA) models have struggled with counting objects in natural images so far. We identify a fundamental problem due to soft attention in these models as a cause. To circumvent this problem, we propose a neural network component that allows robust counting from object proposals. Experiments on a toy task show the effectiveness of this component and we obtain state-of-the-art… ▽ More Visual Question Answering (VQA) models have struggled with counting objects in natural images so far. We identify a fundamental problem due to soft attention in these models as a cause. To circumvent this problem, we propose a neural network component that allows robust counting from object proposals. Experiments on a toy task show the effectiveness of this component and we obtain state-of-the-art accuracy on the number category of the VQA v2 dataset without negatively affecting other categories, even outperforming ensemble models with our single model. On a difficult balanced pair metric, the component gives a substantial improvement in counting over a strong baseline by 6.6%. △ Less

Submitted 15 February, 2018; originally announced February 2018.

Comments: Published in ICLR 2018

arXiv:1711.00155 [pdf, other]

doi 10.1016/j.websem.2018.07.002

Neural Wikipedian: Generating Textual Summaries from Knowledge Base Triples

Authors: Pavlos Vougiouklis, Hady Elsahar, Lucie-Aimée Kaffee, Christoph Gravier, Frederique Laforest, Jonathon Hare, Elena Simperl

Abstract: Most people do not interact with Semantic Web data directly. Unless they have the expertise to understand the underlying technology, they need textual or visual interfaces to help them make sense of it. We explore the problem of generating natural language summaries for Semantic Web data. This is non-trivial, especially in an open-domain context. To address this problem, we explore the use of neur… ▽ More Most people do not interact with Semantic Web data directly. Unless they have the expertise to understand the underlying technology, they need textual or visual interfaces to help them make sense of it. We explore the problem of generating natural language summaries for Semantic Web data. This is non-trivial, especially in an open-domain context. To address this problem, we explore the use of neural networks. Our system encodes the information from a set of triples into a vector of fixed dimensionality and generates a textual summary by conditioning the output on the encoded vector. We train and evaluate our models on two corpora of loosely aligned Wikipedia snippets and DBpedia and Wikidata triples with promising results. △ Less

Submitted 31 October, 2017; originally announced November 2017.

Showing 1–43 of 43 results for author: Hare, J