subscribe to arXiv mailings

UltraCortex: Submillimeter Ultra-High Field 9.4 T1 Brain MR Image Collection and Manual Cortical Segmentations

Authors: Lucas Mahler, Julius Steiglechner, Benjamin Bender, Tobias Lindig, Dana Ramadan, Jonas Bause, Florian Birk, Rahel Heule, Edyta Charyasz, Michael Erb, Vinod Jangir Kumar, Gisela E Hagberg, Pascal Martin, Gabriele Lohmann, Klaus Scheffler

Abstract: The UltraCortex repository (https://www.ultracortex.org) houses magnetic resonance imaging data of the human brain obtained at an ultra-high field strength of 9.4 T. It contains 86 structural MR images with spatial resolutions ranging from 0.6 to 0.8 mm. Additionally, the repository includes segmentations of 12 brains into gray and white matter compartments. These segmentations have been independe… ▽ More The UltraCortex repository (https://www.ultracortex.org) houses magnetic resonance imaging data of the human brain obtained at an ultra-high field strength of 9.4 T. It contains 86 structural MR images with spatial resolutions ranging from 0.6 to 0.8 mm. Additionally, the repository includes segmentations of 12 brains into gray and white matter compartments. These segmentations have been independently validated by two expert neuroradiologists, thus establishing them as a reliable gold standard. This resource provides researchers with access to high-quality brain imaging data and validated segmentations, facilitating neuroimaging studies and advancing our understanding of brain structure and function. Existing repositories do not accommodate field strengths beyond 7 T, nor do they offer validated segmentations, underscoring the significance of this new resource. △ Less

Submitted 5 July, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.20078 [pdf]

NeRF View Synthesis: Subjective Quality Assessment and Objective Metrics Evaluation

Authors: Pedro Martin, Antonio Rodrigues, Joao Ascenso, Maria Paula Queluz

Abstract: Neural radiance fields (NeRF) are a groundbreaking computer vision technology that enables the generation of high-quality, immersive visual content from multiple viewpoints. This capability holds significant advantages for applications such as virtual/augmented reality, 3D modelling and content creation for the film and entertainment industry. However, the evaluation of NeRF methods poses several… ▽ More Neural radiance fields (NeRF) are a groundbreaking computer vision technology that enables the generation of high-quality, immersive visual content from multiple viewpoints. This capability holds significant advantages for applications such as virtual/augmented reality, 3D modelling and content creation for the film and entertainment industry. However, the evaluation of NeRF methods poses several challenges, including a lack of comprehensive datasets, reliable assessment methodologies, and objective quality metrics. This paper addresses the problem of NeRF quality assessment thoroughly, by conducting a rigorous subjective quality assessment test that considers several scene classes and recently proposed NeRF view synthesis methods. Additionally, the performance of a wide range of state-of-the-art conventional and learning-based full-reference 2D image and video quality assessment metrics is evaluated against the subjective scores of the subjective study. The experimental results are analyzed in depth, providing a comparative evaluation of several NeRF methods and objective quality metrics, across different classes of visual scenes, including real and synthetic content for front-face and 360-degree camera trajectories. △ Less

Submitted 31 May, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.15338 [pdf, other]

SoundLoCD: An Efficient Conditional Discrete Contrastive Latent Diffusion Model for Text-to-Sound Generation

Authors: Xinlei Niu, Jing Zhang, Christian Walder, Charles Patrick Martin

Abstract: We present SoundLoCD, a novel text-to-sound generation framework, which incorporates a LoRA-based conditional discrete contrastive latent diffusion model. Unlike recent large-scale sound generation models, our model can be efficiently trained under limited computational resources. The integration of a contrastive learning strategy further enhances the connection between text conditions and the gen… ▽ More We present SoundLoCD, a novel text-to-sound generation framework, which incorporates a LoRA-based conditional discrete contrastive latent diffusion model. Unlike recent large-scale sound generation models, our model can be efficiently trained under limited computational resources. The integration of a contrastive learning strategy further enhances the connection between text conditions and the generated outputs, resulting in coherent and high-fidelity performance. Our experiments demonstrate that SoundLoCD outperforms the baseline with greatly reduced computational resources. A comprehensive ablation study further validates the contribution of each component within SoundLoCD. Demo page: \url{https://XinleiNIU.github.io/demo-SoundLoCD/}. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.04592 [pdf]

Integrating knowledge-guided symbolic regression and model-based design of experiments to automate process flow diagram development

Authors: Alexander W. Rogers, Amanda Lane, Cesar Mendoza, Simon Watson, Adam Kowalski, Philip Martin, Dongda Zhang

Abstract: New products must be formulated rapidly to succeed in the global formulated product market; however, key product indicators (KPIs) can be complex, poorly understood functions of the chemical composition and processing history. Consequently, scale-up must currently undergo expensive trial-and-error campaigns. To accelerate process flow diagram (PFD) optimisation and knowledge discovery, this work p… ▽ More New products must be formulated rapidly to succeed in the global formulated product market; however, key product indicators (KPIs) can be complex, poorly understood functions of the chemical composition and processing history. Consequently, scale-up must currently undergo expensive trial-and-error campaigns. To accelerate process flow diagram (PFD) optimisation and knowledge discovery, this work proposed a novel digital framework to automatically quantify process mechanisms by integrating symbolic regression (SR) within model-based design of experiments (MBDoE). Each iteration, SR proposed a Pareto front of interpretable mechanistic expressions, and then MBDoE designed a new experiment to discriminate between them while balancing PFD optimisation. To investigate the framework's performance, a new process model capable of simulating general formulated product synthesis was constructed to generate in-silico data for different case studies. The framework could effectively discover ground-truth process mechanisms within a few iterations, indicating its great potential for use within the general chemical industry for digital manufacturing and product innovation. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.01886 [pdf, other]

Aloe: A Family of Fine-tuned Open Healthcare LLMs

Authors: Ashwin Kumar Gururajan, Enrique Lopez-Cuena, Jordi Bayarri-Planas, Adrian Tormos, Daniel Hinjos, Pablo Bernabeu-Perez, Anna Arias-Duart, Pablo Agustin Martin-Torres, Lucia Urcelay-Ganzabal, Marta Gonzalez-Mallo, Sergio Alvarez-Napagao, Eduard Ayguadé-Parra, Ulises Cortés Dario Garcia-Gasulla

Abstract: As the capabilities of Large Language Models (LLMs) in healthcare and medicine continue to advance, there is a growing need for competitive open-source models that can safeguard public interest. With the increasing availability of highly competitive open base models, the impact of continued pre-training is increasingly uncertain. In this work, we explore the role of instruct tuning, model merging,… ▽ More As the capabilities of Large Language Models (LLMs) in healthcare and medicine continue to advance, there is a growing need for competitive open-source models that can safeguard public interest. With the increasing availability of highly competitive open base models, the impact of continued pre-training is increasingly uncertain. In this work, we explore the role of instruct tuning, model merging, alignment, red teaming and advanced inference schemes, as means to improve current open models. To that end, we introduce the Aloe family, a set of open medical LLMs highly competitive within its scale range. Aloe models are trained on the current best base models (Mistral, LLaMA 3), using a new custom dataset which combines public data sources improved with synthetic Chain of Thought (CoT). Aloe models undergo an alignment phase, becoming one of the first few policy-aligned open healthcare LLM using Direct Preference Optimization, setting a new standard for ethical performance in healthcare LLMs. Model evaluation expands to include various bias and toxicity datasets, a dedicated red teaming effort, and a much-needed risk assessment for healthcare LLMs. Finally, to explore the limits of current LLMs in inference, we study several advanced prompt engineering strategies to boost performance across benchmarks, yielding state-of-the-art results for open healthcare 7B LLMs, unprecedented at this scale. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: Five appendix

arXiv:2404.15637 [pdf, other]

HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts

Authors: Xinlei Niu, Jing Zhang, Charles Patrick Martin

Abstract: We introduce HybridVC, a voice conversion (VC) framework built upon a pre-trained conditional variational autoencoder (CVAE) that combines the strengths of a latent model with contrastive learning. HybridVC supports text and audio prompts, enabling more flexible voice style conversion. HybridVC models a latent distribution conditioned on speaker embeddings acquired by a pretrained speaker encoder… ▽ More We introduce HybridVC, a voice conversion (VC) framework built upon a pre-trained conditional variational autoencoder (CVAE) that combines the strengths of a latent model with contrastive learning. HybridVC supports text and audio prompts, enabling more flexible voice style conversion. HybridVC models a latent distribution conditioned on speaker embeddings acquired by a pretrained speaker encoder and optimises style text embeddings to align with the speaker style information through contrastive learning in parallel. Therefore, HybridVC can be efficiently trained under limited computational resources. Our experiments demonstrate HybridVC's superior training efficiency and its capability for advanced multi-modal voice style conversion. This underscores its potential for widespread applications such as user-defined personalised voice in various social media platforms. A comprehensive ablation study further validates the effectiveness of our method. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.12356 [pdf, other]

Improving the interpretability of GNN predictions through conformal-based graph sparsification

Authors: Pablo Sanchez-Martin, Kinaan Aamir Khan, Isabel Valera

Abstract: Graph Neural Networks (GNNs) have achieved state-of-the-art performance in solving graph classification tasks. However, most GNN architectures aggregate information from all nodes and edges in a graph, regardless of their relevance to the task at hand, thus hindering the interpretability of their predictions. In contrast to prior work, in this paper we propose a GNN \emph{training} approach that j… ▽ More Graph Neural Networks (GNNs) have achieved state-of-the-art performance in solving graph classification tasks. However, most GNN architectures aggregate information from all nodes and edges in a graph, regardless of their relevance to the task at hand, thus hindering the interpretability of their predictions. In contrast to prior work, in this paper we propose a GNN \emph{training} approach that jointly i) finds the most predictive subgraph by removing edges and/or nodes -- -\emph{without making assumptions about the subgraph structure} -- while ii) optimizing the performance of the graph classification task. To that end, we rely on reinforcement learning to solve the resulting bi-level optimization with a reward function based on conformal predictions to account for the current in-training uncertainty of the classifier. Our empirical results on nine different graph classification datasets show that our method competes in performance with baselines while relying on significantly sparser subgraphs, leading to more interpretable GNN-based predictions. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2403.17776 [pdf, other]

Exploring the Boundaries of Ambient Awareness in Twitter

Authors: Pablo Sanchez-Martin, Sonja Utz, Isabel Valera

Abstract: Ambient awareness refers to the ability of social media users to obtain knowledge about who knows what (i.e., users' expertise) in their network, by simply being exposed to other users' content (e.g, tweets on Twitter). Previous work, based on user surveys, reveals that individuals self-report ambient awareness only for parts of their networks. However, it is unclear whether it is their limited co… ▽ More Ambient awareness refers to the ability of social media users to obtain knowledge about who knows what (i.e., users' expertise) in their network, by simply being exposed to other users' content (e.g, tweets on Twitter). Previous work, based on user surveys, reveals that individuals self-report ambient awareness only for parts of their networks. However, it is unclear whether it is their limited cognitive capacity or the limited exposure to diagnostic tweets (i.e., online content) that prevents people from developing ambient awareness for their complete network. In this work, we focus on in-wall ambient awareness (IWAA) in Twitter and conduct a two-step data-driven analysis, that allows us to explore to which extent IWAA is likely, or even possible. First, we rely on reactions (e.g., likes), as strong evidence of users being aware of experts in Twitter. Unfortunately, such strong evidence can be only measured for active users, which represent the minority in the network. Thus to study the boundaries of IWAA to a larger extent, in the second part of our analysis, we instead focus on the passive exposure to content generated by other users -- which we refer to as in-wall visibility. This analysis shows that (in line with \citet{levordashka2016ambient}) only for a subset of users IWAA is plausible, while for the majority it is unlikely, if even possible, to develop IWAA. We hope that our methodology paves the way for the emergence of data-driven approaches for the study of ambient awareness. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2309.08048 [pdf, other]

Padding Aware Neurons

Authors: Dario Garcia-Gasulla, Victor Gimenez-Abalos, Pablo Martin-Torres

Abstract: Convolutional layers are a fundamental component of most image-related models. These layers often implement by default a static padding policy (\eg zero padding), to control the scale of the internal representations, and to allow kernel activations centered on the border regions. In this work we identify Padding Aware Neurons (PANs), a type of filter that is found in most (if not all) convolutiona… ▽ More Convolutional layers are a fundamental component of most image-related models. These layers often implement by default a static padding policy (\eg zero padding), to control the scale of the internal representations, and to allow kernel activations centered on the border regions. In this work we identify Padding Aware Neurons (PANs), a type of filter that is found in most (if not all) convolutional models trained with static padding. PANs focus on the characterization and recognition of input border location, introducing a spatial inductive bias into the model (e.g., how close to the input's border a pattern typically is). We propose a method to identify PANs through their activations, and explore their presence in several popular pre-trained models, finding PANs on all models explored, from dozens to hundreds. We discuss and illustrate different types of PANs, their kernels and behaviour. To understand their relevance, we test their impact on model performance, and find padding and PANs to induce strong and characteristic biases in the data. Finally, we discuss whether or not PANs are desirable, as well as the potential side effects of their presence in the context of model performance, generalisation, efficiency and safety. △ Less

Submitted 14 September, 2023; originally announced September 2023.

Comments: In 4th Visual Inductive Priors for Data-Efficient Deep Learning Workshop, ICCV 2023

arXiv:2309.03671 [pdf, other]

Dataset Generation and Bonobo Classification from Weakly Labelled Videos

Authors: Pierre-Etienne Martin

Abstract: This paper presents a bonobo detection and classification pipeline built from the commonly used machine learning methods. Such application is motivated by the need to test bonobos in their enclosure using touch screen devices without human assistance. This work introduces a newly acquired dataset based on bonobo recordings generated semi-automatically. The recordings are weakly labelled and fed to… ▽ More This paper presents a bonobo detection and classification pipeline built from the commonly used machine learning methods. Such application is motivated by the need to test bonobos in their enclosure using touch screen devices without human assistance. This work introduces a newly acquired dataset based on bonobo recordings generated semi-automatically. The recordings are weakly labelled and fed to a macaque detector in order to spatially detect the individual present in the video. Handcrafted features coupled with different classification algorithms and deep-learning methods using a ResNet architecture are investigated for bonobo identification. Performance is compared in terms of classification accuracy on the splits of the database using different data separation methods. We demonstrate the importance of data preparation and how a wrong data separation can lead to false good results. Finally, after a meaningful separation of the data, the best classification performance is obtained using a fine-tuned ResNet model and reaches 75% of accuracy. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: IntelliSys 2023 paper

arXiv:2308.02534 [pdf, other]

Exploring the Role of Explainability in AI-Assisted Embryo Selection

Authors: Lucia Urcelay, Daniel Hinjos, Pablo A. Martin-Torres, Marta Gonzalez, Marta Mendez, Salva Cívico, Sergio Álvarez-Napagao, Dario Garcia-Gasulla

Abstract: In Vitro Fertilization is among the most widespread treatments for infertility. One of its main challenges is the evaluation and selection of embryo for implantation, a process with large inter- and intra-clinician variability. Deep learning based methods are gaining attention, but their opaque nature compromises their acceptance in the clinical context, where transparency in the decision making i… ▽ More In Vitro Fertilization is among the most widespread treatments for infertility. One of its main challenges is the evaluation and selection of embryo for implantation, a process with large inter- and intra-clinician variability. Deep learning based methods are gaining attention, but their opaque nature compromises their acceptance in the clinical context, where transparency in the decision making is key. In this paper we analyze the current work in the explainability of AI-assisted embryo analysis models, identifying the limitations. We also discuss how these models could be integrated in the clinical context as decision support systems, considering the needs of clinicians and patients. Finally, we propose guidelines for the sake of increasing interpretability and trustworthiness, pushing this technology forward towards established clinical practice. △ Less

Submitted 1 August, 2023; originally announced August 2023.

arXiv:2306.11840 [pdf, ps, other]

A C++20 Interface for MPI 4.0

Authors: Ali Can Demiralp, Philipp Martin, Niko Sakic, Marcel Krüger, Tim Gerrits

Abstract: We present a modern C++20 interface for MPI 4.0. The interface utilizes recent language features to ease development of MPI applications. An aggregate reflection system enables generation of MPI data types from user-defined classes automatically. Immediate and persistent operations are mapped to futures, which can be chained to describe sequential asynchronous operations and task graphs in a conci… ▽ More We present a modern C++20 interface for MPI 4.0. The interface utilizes recent language features to ease development of MPI applications. An aggregate reflection system enables generation of MPI data types from user-defined classes automatically. Immediate and persistent operations are mapped to futures, which can be chained to describe sequential asynchronous operations and task graphs in a concise way. This work introduces the prominent features of the interface with examples. We further measure its performance overhead with respect to the raw C interface. △ Less

Submitted 20 June, 2023; originally announced June 2023.

Comments: To appear in SC '22: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

arXiv:2306.02568 [pdf, other]

Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming

Authors: Xinlei Niu, Christian Walder, Jing Zhang, Charles Patrick Martin

Abstract: We propose the stochastic optimal path which solves the classical optimal path problem by a probability-softening solution. This unified approach transforms a wide range of DP problems into directed acyclic graphs in which all paths follow a Gibbs distribution. We show the equivalence of the Gibbs distribution to a message-passing algorithm by the properties of the Gumbel distribution and give all… ▽ More We propose the stochastic optimal path which solves the classical optimal path problem by a probability-softening solution. This unified approach transforms a wide range of DP problems into directed acyclic graphs in which all paths follow a Gibbs distribution. We show the equivalence of the Gibbs distribution to a message-passing algorithm by the properties of the Gumbel distribution and give all the ingredients required for variational Bayesian inference of a latent path, namely Bayesian dynamic programming (BDP). We demonstrate the usage of BDP in the latent space of variational autoencoders (VAEs) and propose the BDP-VAE which captures structured sparse optimal paths as latent variables. This enables end-to-end training for generative tasks in which models rely on unobserved structural information. At last, we validate the behavior of our approach and showcase its applicability in two real-world applications: text-to-speech and singing voice synthesis. Our implementation code is available at \url{https://github.com/XinleiNIU/LatentOptimalPathsBayesianDP}. △ Less

Submitted 25 June, 2024; v1 submitted 4 June, 2023; originally announced June 2023.

Comments: Accepted by ICML 2024

arXiv:2305.03176 [pdf]

NeRF-QA: Neural Radiance Fields Quality Assessment Database

Authors: Pedro Martin, António Rodrigues, João Ascenso, Maria Paula Queluz

Abstract: This short paper proposes a new database - NeRF-QA - containing 48 videos synthesized with seven NeRF based methods, along with their perceived quality scores, resulting from subjective assessment tests; for the videos selection, both real and synthetic, 360 degrees scenes were considered. This database will allow to evaluate the suitability, to NeRF based synthesized views, of existing objective… ▽ More This short paper proposes a new database - NeRF-QA - containing 48 videos synthesized with seven NeRF based methods, along with their perceived quality scores, resulting from subjective assessment tests; for the videos selection, both real and synthetic, 360 degrees scenes were considered. This database will allow to evaluate the suitability, to NeRF based synthesized views, of existing objective quality metrics and also the development of new quality metrics, specific for this case. △ Less

Submitted 4 May, 2023; originally announced May 2023.

arXiv:2303.16960 [pdf, ps, other]

Boltzmann Distribution on "Short'' Integer Partitions with Power Parts: Limit Laws and Sampling

Authors: Jean C. Peyen, Leonid V. Bogachev, Paul P. Martin

Abstract: The paper is concerned with the asymptotic analysis of a family of Boltzmann (multiplicative) distributions over the set $\check{\varLambda}^{q}$ of strict integer partitions (i.e., with unequal parts) into perfect $q$-th powers. A combinatorial link is provided via a suitable conditioning by fixing the partition weight (the sum of parts) and length (the number of parts), leading to uniform distri… ▽ More The paper is concerned with the asymptotic analysis of a family of Boltzmann (multiplicative) distributions over the set $\check{\varLambda}^{q}$ of strict integer partitions (i.e., with unequal parts) into perfect $q$-th powers. A combinatorial link is provided via a suitable conditioning by fixing the partition weight (the sum of parts) and length (the number of parts), leading to uniform distribution on the corresponding subspaces of partitions. The Boltzmann measure is calibrated through the hyper-parameters $\langle N\rangle$ and $\langle M\rangle$ controlling the expected weight and length, respectively. We study ``short'' partitions, where the parameter $\langle M\rangle$ is either fixed or grows slower than for typical plain (unconstrained) partitions. For this model, we obtain a variety of limit theorems including the asymptotics of the cumulative cardinality in the case of fixed $\langle M\rangle$ and a limit shape result in the case of slow growth of $\langle M\rangle$. In both cases, we also characterize the joint distribution of the weight and length, as well as the growth of the smallest and largest parts. Using these results we construct suitable sampling algorithms and analyse their performance. △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: 62 pages, 5 figures, 4 tables

MSC Class: 05A17 (Primary); 05A16; 60C05; 68Q87; 82B10 (Secondary)

arXiv:2302.02755 [pdf, other]

Fine-Grained Action Detection with RGB and Pose Information using Two Stream Convolutional Networks

Authors: Leonard Hacker, Finn Bartels, Pierre-Etienne Martin

Abstract: As participants of the MediaEval 2022 Sport Task, we propose a two-stream network approach for the classification and detection of table tennis strokes. Each stream is a succession of 3D Convolutional Neural Network (CNN) blocks using attention mechanisms. Each stream processes different 4D inputs. Our method utilizes raw RGB data and pose information computed from MMPose toolbox. The pose informa… ▽ More As participants of the MediaEval 2022 Sport Task, we propose a two-stream network approach for the classification and detection of table tennis strokes. Each stream is a succession of 3D Convolutional Neural Network (CNN) blocks using attention mechanisms. Each stream processes different 4D inputs. Our method utilizes raw RGB data and pose information computed from MMPose toolbox. The pose information is treated as an image by applying the pose either on a black background or on the original RGB frame it has been computed from. Best performance is obtained by feeding raw RGB data to one stream, Pose + RGB (PRGB) information to the other stream and applying late fusion on the features. The approaches were evaluated on the provided TTStroke-21 data sets. We can report an improvement in stroke classification, reaching 87.3% of accuracy, while the detection does not outperform the baseline but still reaches an IoU of 0.349 and mAP of 0.110. △ Less

Submitted 6 February, 2023; originally announced February 2023.

Comments: Working note paper of the sport task of MediaEval 2022 in Bergen, Norway, 12-13 Jan 2023

arXiv:2302.02752 [pdf, other]

Baseline Method for the Sport Task of MediaEval 2022 with 3D CNNs using Attention Mechanisms

Authors: Pierre-Etienne Martin

Abstract: This paper presents the baseline method proposed for the Sports Video task part of the MediaEval 2022 benchmark. This task proposes two subtasks: stroke classification from trimmed videos, and stroke detection from untrimmed videos. This baseline addresses both subtasks. We propose two types of 3D-CNN architectures to solve the two subtasks. Both 3D-CNNs use Spatio-temporal convolutions and attent… ▽ More This paper presents the baseline method proposed for the Sports Video task part of the MediaEval 2022 benchmark. This task proposes two subtasks: stroke classification from trimmed videos, and stroke detection from untrimmed videos. This baseline addresses both subtasks. We propose two types of 3D-CNN architectures to solve the two subtasks. Both 3D-CNNs use Spatio-temporal convolutions and attention mechanisms. The architectures and the training process are tailored to solve the addressed subtask. This baseline method is shared publicly online to help the participants in their investigation and alleviate eventually some aspects of the task such as video processing, training method, evaluation and submission routine. The baseline method reaches 86.4% of accuracy with our v2 model for the classification subtask. For the detection subtask, the baseline reaches a mAP of 0.131 and IoU of 0.515 with our v1 model. △ Less

Submitted 6 February, 2023; originally announced February 2023.

Comments: Baseline paper for the sport Task of MediaEval 2022

arXiv:2302.00129 [pdf, other]

Universal Topological Regularities of Syntactic Structures: Decoupling Efficiency from Optimization

Authors: Fermín Moscoso del Prado Martín

Abstract: Human syntactic structures are usually represented as graphs. Much research has focused on the mapping between such graphs and linguistic sequences, but less attention has been paid to the shapes of the graphs themselves: their topologies. This study investigates how the topologies of syntactic graphs reveal traces of the processes that led to their emergence. I report a new universal regularity i… ▽ More Human syntactic structures are usually represented as graphs. Much research has focused on the mapping between such graphs and linguistic sequences, but less attention has been paid to the shapes of the graphs themselves: their topologies. This study investigates how the topologies of syntactic graphs reveal traces of the processes that led to their emergence. I report a new universal regularity in syntactic structures: Their topology is communicatively efficient above chance. The pattern holds, without exception, for all 124 languages studied, across linguistic families and modalities (spoken, written, and signed). This pattern can arise from a process optimizing for communicative efficiency or, alternatively, by construction, as a by-effect of a sublinear preferential attachment process reflecting language production mechanisms known from psycholinguistics. This dual explanation shows how communicative efficiency, per se, does not require optimization. Among the two options, efficiency without optimization offers the better explanation for the new pattern. △ Less

Submitted 31 January, 2023; originally announced February 2023.

Comments: 30 pages, 7 figures

arXiv:2301.13576 [pdf, other]

Sport Task: Fine Grained Action Detection and Classification of Table Tennis Strokes from Videos for MediaEval 2022

Authors: Pierre-Etienne Martin, Jordan Calandre, Boris Mansencal, Jenny Benois-Pineau, Renaud Péteri, Laurent Mascarilla, Julien Morlier

Abstract: Sports video analysis is a widespread research topic. Its applications are very diverse, like events detection during a match, video summary, or fine-grained movement analysis of athletes. As part of the MediaEval 2022 benchmarking initiative, this task aims at detecting and classifying subtle movements from sport videos. We focus on recordings of table tennis matches. Conducted since 2019, this t… ▽ More Sports video analysis is a widespread research topic. Its applications are very diverse, like events detection during a match, video summary, or fine-grained movement analysis of athletes. As part of the MediaEval 2022 benchmarking initiative, this task aims at detecting and classifying subtle movements from sport videos. We focus on recordings of table tennis matches. Conducted since 2019, this task provides a classification challenge from untrimmed videos recorded under natural conditions with known temporal boundaries for each stroke. Since 2021, the task also provides a stroke detection challenge from unannotated, untrimmed videos. This year, the training, validation, and test sets are enhanced to ensure that all strokes are represented in each dataset. The dataset is now similar to the one used in [1, 2]. This research is intended to build tools for coaches and athletes who want to further evaluate their sport performances. △ Less

Submitted 31 January, 2023; originally announced January 2023.

Comments: MediaEval 2022 Workshop, Jan 2023, Bergen, Norway. arXiv admin note: substantial text overlap with arXiv:2112.11384

arXiv:2212.08484 [pdf, other]

Emergent communication enhances foraging behaviour in evolved swarms controlled by Spiking Neural Networks

Authors: Cristian Jimenez Romero, Alper Yegenoglu, Aarón Pérez Martín, Sandra Diaz-Pier, Abigail Morrison

Abstract: Social insects such as ants communicate via pheromones which allows them to coordinate their activity and solve complex tasks as a swarm, e.g. foraging for food. This behavior was shaped through evolutionary processes. In computational models, self-coordination in swarms has been implemented using probabilistic or simple action rules to shape the decision of each agent and the collective behavior.… ▽ More Social insects such as ants communicate via pheromones which allows them to coordinate their activity and solve complex tasks as a swarm, e.g. foraging for food. This behavior was shaped through evolutionary processes. In computational models, self-coordination in swarms has been implemented using probabilistic or simple action rules to shape the decision of each agent and the collective behavior. However, manual tuned decision rules may limit the behavior of the swarm. In this work we investigate the emergence of self-coordination and communication in evolved swarms without defining any explicit rule. We evolve a swarm of agents representing an ant colony. We use an evolutionary algorithm to optimize a spiking neural network (SNN) which serves as an artificial brain to control the behavior of each agent. The goal of the evolved colony is to find optimal ways to forage for food and return it to the nest in the shortest amount of time. In the evolutionary phase, the ants are able to learn to collaborate by depositing pheromone near food piles and near the nest to guide other ants. The pheromone usage is not manually encoded into the network; instead, this behavior is established through the optimization procedure. We observe that pheromone-based communication enables the ants to perform better in comparison to colonies where communication via pheromone did not emerge. We assess the foraging performance by comparing the SNN based model to a rule based system. Our results show that the SNN based model can efficiently complete the foraging task in a short amount of time. Our approach illustrates self coordination via pheromone emerges as a result of the network optimization. This work serves as a proof of concept for the possibility of creating complex applications utilizing SNNs as underlying architectures for multi-agent interactions where communication and self-coordination is desired. △ Less

Submitted 8 September, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

Comments: 27 pages, 16 figures

arXiv:2210.09291 [pdf, other]

Embodying the Glitch: Perspectives on Generative AI in Dance Practice

Authors: Benedikte Wallace, Charles P. Martin

Abstract: What role does the break from realism play in the potential for generative artificial intelligence as a creative tool? Through exploration of glitch, we examine the prospective value of these artefacts in creative practice. This paper describes findings from an exploration of AI-generated "mistakes" when using movement produced by a generative deep learning model as an inspiration source in dance… ▽ More What role does the break from realism play in the potential for generative artificial intelligence as a creative tool? Through exploration of glitch, we examine the prospective value of these artefacts in creative practice. This paper describes findings from an exploration of AI-generated "mistakes" when using movement produced by a generative deep learning model as an inspiration source in dance composition. △ Less

Submitted 5 October, 2022; originally announced October 2022.

arXiv:2209.14030 [pdf, other]

doi 10.4204/EPTCS.371.15

Monitoring ROS2: from Requirements to Autonomous Robots

Authors: Ivan Perez, Anastasia Mavridou, Tom Pressburger, Alexander Will, Patrick J. Martin

Abstract: Runtime verification (RV) has the potential to enable the safe operation of safety-critical systems that are too complex to formally verify, such as Robot Operating System 2 (ROS2) applications. Writing correct monitors can itself be complex, and errors in the monitoring subsystem threaten the mission as a whole. This paper provides an overview of a formal approach to generating runtime monitors f… ▽ More Runtime verification (RV) has the potential to enable the safe operation of safety-critical systems that are too complex to formally verify, such as Robot Operating System 2 (ROS2) applications. Writing correct monitors can itself be complex, and errors in the monitoring subsystem threaten the mission as a whole. This paper provides an overview of a formal approach to generating runtime monitors for autonomous robots from requirements written in a structured natural language. Our approach integrates the Formal Requirement Elicitation Tool (FRET) with Copilot, a runtime verification framework, through the Ogma integration tool. FRET is used to specify requirements with unambiguous semantics, which are then automatically translated into temporal logic formulae. Ogma generates monitor specifications from the FRET output, which are compiled into hard-real time C99. To facilitate integration of the monitors in ROS2, we have extended Ogma to generate ROS2 packages defining monitoring nodes, which run the monitors when new data becomes available, and publish the results of any violations. The goal of our approach is to treat the generated ROS2 packages as black boxes and integrate them into larger ROS2 systems with minimal effort. △ Less

Submitted 28 September, 2022; originally announced September 2022.

Comments: In Proceedings FMAS2022 ASYDE2022, arXiv:2209.13181

ACM Class: D.2.1; D.2.4; I.2.9;

Journal ref: EPTCS 371, 2022, pp. 208-216

arXiv:2208.02758 [pdf, other]

Learning Interaction Variables and Kernels from Observations of Agent-Based Systems

Authors: Jinchao Feng, Mauro Maggioni, Patrick Martin, Ming Zhong

Abstract: Dynamical systems across many disciplines are modeled as interacting particles or agents, with interaction rules that depend on a very small number of variables (e.g. pairwise distances, pairwise differences of phases, etc...), functions of the state of pairs of agents. Yet, these interaction rules can generate self-organized dynamics, with complex emergent behaviors (clustering, flocking, swarmin… ▽ More Dynamical systems across many disciplines are modeled as interacting particles or agents, with interaction rules that depend on a very small number of variables (e.g. pairwise distances, pairwise differences of phases, etc...), functions of the state of pairs of agents. Yet, these interaction rules can generate self-organized dynamics, with complex emergent behaviors (clustering, flocking, swarming, etc.). We propose a learning technique that, given observations of states and velocities along trajectories of the agents, yields both the variables upon which the interaction kernel depends and the interaction kernel itself, in a nonparametric fashion. This yields an effective dimension reduction which avoids the curse of dimensionality from the high-dimensional observation data (states and velocities of all the agents). We demonstrate the learning capability of our method to a variety of first-order interacting systems. △ Less

Submitted 4 August, 2022; originally announced August 2022.

arXiv:2204.08460 [pdf, other]

3D Convolutional Networks for Action Recognition: Application to Sport Gesture Recognition

Authors: Pierre-Etienne Martin, J Benois-Pineau, R Péteri, A Zemmari, J Morlier

Abstract: 3D convolutional networks is a good means to perform tasks such as video segmentation into coherent spatio-temporal chunks and classification of them with regard to a target taxonomy. In the chapter we are interested in the classification of continuous video takes with repeatable actions, such as strokes of table tennis. Filmed in a free marker less ecological environment, these videos represent a… ▽ More 3D convolutional networks is a good means to perform tasks such as video segmentation into coherent spatio-temporal chunks and classification of them with regard to a target taxonomy. In the chapter we are interested in the classification of continuous video takes with repeatable actions, such as strokes of table tennis. Filmed in a free marker less ecological environment, these videos represent a challenge from both segmentation and classification point of view. The 3D convnets are an efficient tool for solving these problems with window-based approaches. △ Less

Submitted 13 April, 2022; originally announced April 2022.

Comments: Multi-faceted Deep Learning, 2021

arXiv:2202.13822 [pdf, other]

doi 10.3389/fncom.2022.885207

Exploring hyper-parameter spaces of neuroscience models on high performance computers with Learning to Learn

Authors: Alper Yegenoglu, Anand Subramoney, Thorsten Hater, Cristian Jimenez-Romero, Wouter Klijn, Aaron Perez Martin, Michiel van der Vlag, Michael Herty, Abigail Morrison, Sandra Diaz-Pier

Abstract: Neuroscience models commonly have a high number of degrees of freedom and only specific regions within the parameter space are able to produce dynamics of interest. This makes the development of tools and strategies to efficiently find these regions of high importance to advance brain research. Exploring the high dimensional parameter space using numerical simulations has been a frequently used te… ▽ More Neuroscience models commonly have a high number of degrees of freedom and only specific regions within the parameter space are able to produce dynamics of interest. This makes the development of tools and strategies to efficiently find these regions of high importance to advance brain research. Exploring the high dimensional parameter space using numerical simulations has been a frequently used technique in the last years in many areas of computational neuroscience. High performance computing (HPC) can provide today a powerful infrastructure to speed up explorations and increase our general understanding of the model's behavior in reasonable times. △ Less

Submitted 28 February, 2022; originally announced February 2022.

arXiv:2202.09977 [pdf, other]

RTGNN: A Novel Approach to Model Stochastic Traffic Dynamics

Authors: Ke Sun, Stephen Chaves, Paul Martin, Vijay Kumar

Abstract: Modeling stochastic traffic dynamics is critical to developing self-driving cars. Because it is difficult to develop first principle models of cars driven by humans, there is great potential for using data driven approaches in developing traffic dynamical models. While there is extensive literature on this subject, previous works mainly address the prediction accuracy of data-driven models. Moreov… ▽ More Modeling stochastic traffic dynamics is critical to developing self-driving cars. Because it is difficult to develop first principle models of cars driven by humans, there is great potential for using data driven approaches in developing traffic dynamical models. While there is extensive literature on this subject, previous works mainly address the prediction accuracy of data-driven models. Moreover, it is often difficult to apply these models to common planning frameworks since they fail to meet the assumptions therein. In this work, we propose a new stochastic traffic model, Recurrent Traffic Graph Neural Network (RTGNN), by enforcing additional structures on the model so that the proposed model can be seamlessly integrated with existing motion planning algorithms. RTGNN is a Markovian model and is able to infer future traffic states conditioned on the motion of the ego vehicle. Specifically, RTGNN uses a definition of the traffic state that includes the state of all players in a local region and is therefore able to make joint predictions for all agents of interest. Meanwhile, we explicitly model the hidden states of agents, "intentions," as part of the traffic state to reflect the inherent partial observability of traffic dynamics. The above mentioned properties are critical for integrating RTGNN with motion planning algorithms coupling prediction and decision making. Despite the additional structures, we show that RTGNN is able to achieve state-of-the-art accuracy through comparisons with other similar works. △ Less

Submitted 20 February, 2022; originally announced February 2022.

Comments: Accepted by ICRA 2022

arXiv:2112.12074 [pdf, other]

Spatio-Temporal CNN baseline method for the Sports Video Task of MediaEval 2021 benchmark

Authors: Pierre-Etienne Martin

Abstract: This paper presents the baseline method proposed for the Sports Video task part of the MediaEval 2021 benchmark. This task proposes a stroke detection and a stroke classification subtasks. This baseline addresses both subtasks. The spatio-temporal CNN architecture and the training process of the model are tailored according to the addressed subtask. The method has the purpose of helping the partic… ▽ More This paper presents the baseline method proposed for the Sports Video task part of the MediaEval 2021 benchmark. This task proposes a stroke detection and a stroke classification subtasks. This baseline addresses both subtasks. The spatio-temporal CNN architecture and the training process of the model are tailored according to the addressed subtask. The method has the purpose of helping the participants to solve the task and is not meant to reach stateof-the-art performance. Still, for the detection task, the baseline is performing better than the other participants, which stresses the difficulty of such a task. △ Less

Submitted 16 December, 2021; originally announced December 2021.

Journal ref: MediaEval 2021, Dec 2021, Online, Germany

arXiv:2112.12073 [pdf, other]

Two Stream Network for Stroke Detection in Table Tennis

Authors: Anam Zahra, Pierre-Etienne Martin

Abstract: This paper presents a table tennis stroke detection method from videos. The method relies on a two-stream Convolutional Neural Network processing in parallel the RGB Stream and its computed optical flow. The method has been developed as part of the MediaEval 2021 benchmark for the Sport task. Our contribution did not outperform the provided baseline on the test set but has performed the best among… ▽ More This paper presents a table tennis stroke detection method from videos. The method relies on a two-stream Convolutional Neural Network processing in parallel the RGB Stream and its computed optical flow. The method has been developed as part of the MediaEval 2021 benchmark for the Sport task. Our contribution did not outperform the provided baseline on the test set but has performed the best among the other participants with regard to the mAP metric. △ Less

Submitted 16 December, 2021; originally announced December 2021.

Comments: MediaEval 2021, Dec 2021, Online, Germany

arXiv:2112.11384 [pdf, other]

Sports Video: Fine-Grained Action Detection and Classification of Table Tennis Strokes from Videos for MediaEval 2021

Authors: Pierre-Etienne Martin, Jordan Calandre, Boris Mansencal, Jenny Benois-Pineau, Renaud Péteri, Laurent Mascarilla, Julien Morlier

Abstract: Sports video analysis is a prevalent research topic due to the variety of application areas, ranging from multimedia intelligent devices with user-tailored digests up to analysis of athletes' performance. The Sports Video task is part of the MediaEval 2021 benchmark. This task tackles fine-grained action detection and classification from videos. The focus is on recordings of table tennis games. Ru… ▽ More Sports video analysis is a prevalent research topic due to the variety of application areas, ranging from multimedia intelligent devices with user-tailored digests up to analysis of athletes' performance. The Sports Video task is part of the MediaEval 2021 benchmark. This task tackles fine-grained action detection and classification from videos. The focus is on recordings of table tennis games. Running since 2019, the task has offered a classification challenge from untrimmed video recorded in natural conditions with known temporal boundaries for each stroke. This year, the dataset is extended and offers, in addition, a detection challenge from untrimmed videos without annotations. This work aims at creating tools for sports coaches and players in order to analyze sports performance. Movement analysis and player profiling may be built upon such technology to enrich the training experience of athletes and improve their performance. △ Less

Submitted 16 December, 2021; originally announced December 2021.

Comments: MediaEval 2021, Dec 2021, Online, Germany

arXiv:2110.14690 [pdf, other]

VACA: Design of Variational Graph Autoencoders for Interventional and Counterfactual Queries

Authors: Pablo Sanchez-Martin, Miriam Rateike, Isabel Valera

Abstract: In this paper, we introduce VACA, a novel class of variational graph autoencoders for causal inference in the absence of hidden confounders, when only observational data and the causal graph are available. Without making any parametric assumptions, VACA mimics the necessary properties of a Structural Causal Model (SCM) to provide a flexible and practical framework for approximating interventions (… ▽ More In this paper, we introduce VACA, a novel class of variational graph autoencoders for causal inference in the absence of hidden confounders, when only observational data and the causal graph are available. Without making any parametric assumptions, VACA mimics the necessary properties of a Structural Causal Model (SCM) to provide a flexible and practical framework for approximating interventions (do-operator) and abduction-action-prediction steps. As a result, and as shown by our empirical results, VACA accurately approximates the interventional and counterfactual distributions on diverse SCMs. Finally, we apply VACA to evaluate counterfactual fairness in fair classification problems, as well as to learn fair classifiers without compromising performance. △ Less

Submitted 27 October, 2021; originally announced October 2021.

arXiv:2109.14306 [pdf, other]

doi 10.1145/3475722.3482793

Three-Stream 3D/1D CNN for Fine-Grained Action Classification and Segmentation in Table Tennis

Authors: Pierre-Etienne Martin, Jenny Benois-Pineau, Renaud Péteri, Julien Morlier

Abstract: This paper proposes a fusion method of modalities extracted from video through a three-stream network with spatio-temporal and temporal convolutions for fine-grained action classification in sport. It is applied to TTStroke-21 dataset which consists of untrimmed videos of table tennis games. The goal is to detect and classify table tennis strokes in the videos, the first step of a bigger scheme ai… ▽ More This paper proposes a fusion method of modalities extracted from video through a three-stream network with spatio-temporal and temporal convolutions for fine-grained action classification in sport. It is applied to TTStroke-21 dataset which consists of untrimmed videos of table tennis games. The goal is to detect and classify table tennis strokes in the videos, the first step of a bigger scheme aiming at giving feedback to the players for improving their performance. The three modalities are raw RGB data, the computed optical flow and the estimated pose of the player. The network consists of three branches with attention blocks. Features are fused at the latest stage of the network using bilinear layers. Compared to previous approaches, the use of three modalities allows faster convergence and better performances on both tasks: classification of strokes with known temporal boundaries and joint segmentation and classification. The pose is also further investigated in order to offer richer feedback to the athletes. △ Less

Submitted 29 September, 2021; originally announced September 2021.

Comments: MMSports '21, October 20, 2021, Virtual Event,, Oct 2021, Chengdu, China

arXiv:2107.13386 [pdf, other]

SPOTS: An Accelerator for Sparse Convolutional Networks Leveraging Systolic General Matrix-Matrix Multiplication

Authors: Mohammadreza Soltaniyeh, Richard P. Martin, Santosh Nagarakatte

Abstract: This paper proposes a new hardware accelerator for sparse convolutional neural networks (CNNs) by building a hardware unit to perform the Image to Column (IM2COL) transformation of the input feature map coupled with a systolic array-based general matrix-matrix multiplication (GEMM) unit. Our design carefully overlaps the IM2COL transformation with the GEMM computation to maximize parallelism. We p… ▽ More This paper proposes a new hardware accelerator for sparse convolutional neural networks (CNNs) by building a hardware unit to perform the Image to Column (IM2COL) transformation of the input feature map coupled with a systolic array-based general matrix-matrix multiplication (GEMM) unit. Our design carefully overlaps the IM2COL transformation with the GEMM computation to maximize parallelism. We propose a novel design for the IM2COL unit that uses a set of distributed local memories connected by a ring network, which improves energy efficiency and latency by streaming the input feature map only once. We propose a tall systolic array for the GEMM unit while also providing the ability to organize it as multiple small GEMM units, which enables our design to handle a wide range of CNNs and their parameters. Further, our design improves performance by effectively mapping the sparse data to the hardware units by utilizing sparsity in both input feature maps and weights. Our prototype, SPOTS, is on average 1.74X faster than Eyeriss. It is also 78X, and 12X more energy-efficient when compared to CPU and GPU implementations, respectively. △ Less

Submitted 24 November, 2021; v1 submitted 28 July, 2021; originally announced July 2021.

Comments: 24 pages

Report number: Rutgers Department of Computer Science Technical Report DCS-TR-756

arXiv:2105.06166 [pdf, ps, other]

The Dynamic k-Mismatch Problem

Authors: Raphaël Clifford, Paweł Gawrychowski, Tomasz Kociumaka, Daniel P. Martin, Przemysław Uznański

Abstract: The text-to-pattern Hamming distances problem asks to compute the Hamming distances between a given pattern of length $m$ and all length-$m$ substrings of a given text of length $n\ge m$. We focus on the $k$-mismatch version of the problem, where a distance needs to be returned only if it does not exceed a threshold $k$. We assume $n\le 2m$ (in general, one can partition the text into overlapping… ▽ More The text-to-pattern Hamming distances problem asks to compute the Hamming distances between a given pattern of length $m$ and all length-$m$ substrings of a given text of length $n\ge m$. We focus on the $k$-mismatch version of the problem, where a distance needs to be returned only if it does not exceed a threshold $k$. We assume $n\le 2m$ (in general, one can partition the text into overlapping blocks). In this work, we show data structures for the dynamic version of this problem supporting two operations: An update performs a single-letter substitution in the pattern or the text, and a query, given an index $i$, returns the Hamming distance between the pattern and the text substring starting at position $i$, or reports that it exceeds $k$. First, we show a data structure with $\tilde{O}(1)$ update and $\tilde{O}(k)$ query time. Then we show that $\tilde{O}(k)$ update and $\tilde{O}(1)$ query time is also possible. These two provide an optimal trade-off for the dynamic $k$-mismatch problem with $k \le \sqrt{n}$: we prove that, conditioned on the strong 3SUM conjecture, one cannot simultaneously achieve $k^{1-Ω(1)}$ time for all operations. For $k\ge \sqrt{n}$, we give another lower bound, conditioned on the Online Matrix-Vector conjecture, that excludes algorithms taking $n^{1/2-Ω(1)}$ time per operation. This is tight for constant-sized alphabets: Clifford et al. (STACS 2018) achieved $\tilde{O}(\sqrt{n})$ time per operation in that case, but with $\tilde{O}(n^{3/4})$ time per operation for large alphabets. We improve and extend this result with an algorithm that, given $1\le x\le k$, achieves update time $\tilde{O}(\frac{n}{k} +\sqrt{\frac{nk}{x}})$ and query time $\tilde{O}(x)$. In particular, for $k\ge \sqrt{n}$, an appropriate choice of $x$ yields $\tilde{O}(\sqrt[3]{nk})$ time per operation, which is $\tilde{O}(n^{2/3})$ when no threshold $k$ is provided. △ Less

Submitted 28 March, 2022; v1 submitted 13 May, 2021; originally announced May 2021.

arXiv:2012.05342 [pdf, other]

3D attention mechanism for fine-grained classification of table tennis strokes using a Twin Spatio-Temporal Convolutional Neural Networks

Authors: Pierre-Etienne Martin, Jenny Benois-Pineau, Renaud Péteri, Julien Morlier

Abstract: The paper addresses the problem of recognition of actions in video with low inter-class variability such as Table Tennis strokes. Two stream, "twin" convolutional neural networks are used with 3D convolutions both on RGB data and optical flow. Actions are recognized by classification of temporal windows. We introduce 3D attention modules and examine their impact on classification efficiency. In th… ▽ More The paper addresses the problem of recognition of actions in video with low inter-class variability such as Table Tennis strokes. Two stream, "twin" convolutional neural networks are used with 3D convolutions both on RGB data and optical flow. Actions are recognized by classification of temporal windows. We introduce 3D attention modules and examine their impact on classification efficiency. In the context of the study of sportsmen performances, a corpus of the particular actions of table tennis strokes is considered. The use of attention blocks in the network speeds up the training step and improves the classification scores up to 5% with our twin model. We visualize the impact on the obtained features and notice correlation between attention and player movements and position. Score comparison of state-of-the-art action classification method and proposed approach with attentional blocks is performed on the corpus. Proposed model with attention blocks outperforms previous model without them and our baseline. △ Less

Submitted 20 November, 2020; originally announced December 2020.

Journal ref: 25th International Conference on Pattern Recognition (ICPR2020), Jan 2021, Milano, Italy

arXiv:2012.02404 [pdf, other]

doi 10.5281/zenodo.1302543

Composing an Ensemble Standstill Work for Myo and Bela

Authors: Charles Patrick Martin, Alexander Refsum Jensenius, Jim Torresen

Abstract: This paper describes the process of developing a standstill performance work using the Myo gesture control armband and the Bela embedded computing platform. The combination of Myo and Bela allows a portable and extensible version of the standstill performance concept while introducing muscle tension as an additional control parameter. We describe the technical details of our setup and introduce My… ▽ More This paper describes the process of developing a standstill performance work using the Myo gesture control armband and the Bela embedded computing platform. The combination of Myo and Bela allows a portable and extensible version of the standstill performance concept while introducing muscle tension as an additional control parameter. We describe the technical details of our setup and introduce Myo-to-Bela and Myo-to-OSC software bridges that assist with prototyping compositions using the Myo controller. △ Less

Submitted 4 December, 2020; originally announced December 2020.

ACM Class: H.5.5

Journal ref: Proceedings of the International Conference on New Interfaces for Musical Expression, 2018, pp. 196-197

arXiv:2012.02322 [pdf, other]

A Laptop Ensemble Performance System using Recurrent Neural Networks

Authors: Rohan Proctor, Charles Patrick Martin

Abstract: The popularity of applying machine learning techniques in musical domains has created an inherent availability of freely accessible pre-trained neural network (NN) models ready for use in creative applications. This work outlines the implementation of one such application in the form of an assistance tool designed for live improvisational performances by laptop ensembles. The primary intention was… ▽ More The popularity of applying machine learning techniques in musical domains has created an inherent availability of freely accessible pre-trained neural network (NN) models ready for use in creative applications. This work outlines the implementation of one such application in the form of an assistance tool designed for live improvisational performances by laptop ensembles. The primary intention was to leverage off-the-shelf pre-trained NN models as a basis for assisting individual performers either as musical novices looking to engage with more experienced performers or as a tool to expand musical possibilities through new forms of creative expression. The system expands upon a variety of ideas found in different research areas including new interfaces for musical expression, generative music and group performance to produce a networked performance solution served via a web-browser interface. The final implementation of the system offers performers a mixture of high and low-level controls to influence the shape of sequences of notes output by locally run NN models in real time, also allowing performers to define their level of engagement with the assisting generative models. Two test performances were played, with the system shown to feasibly support four performers over a four minute piece while producing musically cohesive and engaging music. Iterations on the design of the system exposed technical constraints on the use of a JavaScript environment for generative models in a live music context, largely derived from inescapable processing overheads. △ Less

Submitted 3 December, 2020; originally announced December 2020.

ACM Class: H.5.5; H.5.3

Journal ref: Proceedings of the International Conference on New Interfaces for Musical Expression, 2020, pp. 43-48

arXiv:2012.02311 [pdf, other]

Sonic Sculpture: Activating Engagement with Head-Mounted Augmented Reality

Authors: Charles Patrick Martin, Zeruo Liu, Yichen Wang, Wennan He, Henry Gardner

Abstract: This work examines how head-mounted AR can be used to build an interactive sonic landscape to engage with a public sculpture. We describe a sonic artwork, "Listening To Listening", that has been designed to accompany a real-world sculpture with two prototype interaction schemes. Our artwork is created for the HoloLens platform so that users can have an individual experience in a mixed reality cont… ▽ More This work examines how head-mounted AR can be used to build an interactive sonic landscape to engage with a public sculpture. We describe a sonic artwork, "Listening To Listening", that has been designed to accompany a real-world sculpture with two prototype interaction schemes. Our artwork is created for the HoloLens platform so that users can have an individual experience in a mixed reality context. Personal head-mounted AR systems have recently become available and practical for integration into public art projects, however research into sonic sculpture works has yet to account for the affordances of current portable and mainstream AR systems. In this work, we take advantage of the HoloLens' spatial awareness to build sonic spaces that have a precise spatial relationship to a given sculpture and where the sculpture itself is modelled in the augmented scene as an "invisible hologram". We describe the artistic rationale for our artwork, the design of the two interaction schemes, and the technical and usability feedback that we have obtained from demonstrations during iterative development. △ Less

Submitted 3 December, 2020; originally announced December 2020.

ACM Class: H.5.5; H.5.1

Journal ref: Proceedings of the International Conference on New Interfaces for Musical Expression, 2020, pp. 48-52

arXiv:2011.13453 [pdf, other]

Towards Movement Generation with Audio Features

Authors: Benedikte Wallace, Charles P. Martin, Jim Torresen, Kristian Nymoen

Abstract: Sound and movement are closely coupled, particularly in dance. Certain audio features have been found to affect the way we move to music. Is this relationship between sound and movement something which can be modelled using machine learning? This work presents initial experiments wherein high-level audio features calculated from a set of music pieces are included in a movement generation model tra… ▽ More Sound and movement are closely coupled, particularly in dance. Certain audio features have been found to affect the way we move to music. Is this relationship between sound and movement something which can be modelled using machine learning? This work presents initial experiments wherein high-level audio features calculated from a set of music pieces are included in a movement generation model trained on motion capture recordings of improvised dance. Our results indicate that the model learns to generate realistic dance movements which vary depending on the audio features. △ Less

Submitted 26 November, 2020; originally announced November 2020.

arXiv:2010.02663 [pdf, other]

Heterogeneous Multi-Agent Reinforcement Learning for Unknown Environment Mapping

Authors: Ceyer Wakilpoor, Patrick J. Martin, Carrie Rebhuhn, Amanda Vu

Abstract: Reinforcement learning in heterogeneous multi-agent scenarios is important for real-world applications but presents challenges beyond those seen in homogeneous settings and simple benchmarks. In this work, we present an actor-critic algorithm that allows a team of heterogeneous agents to learn decentralized control policies for covering an unknown environment. This task is of interest to national… ▽ More Reinforcement learning in heterogeneous multi-agent scenarios is important for real-world applications but presents challenges beyond those seen in homogeneous settings and simple benchmarks. In this work, we present an actor-critic algorithm that allows a team of heterogeneous agents to learn decentralized control policies for covering an unknown environment. This task is of interest to national security and emergency response organizations that would like to enhance situational awareness in hazardous areas by deploying teams of unmanned aerial vehicles. To solve this multi-agent coverage path planning problem in unknown environments, we augment a multi-agent actor-critic architecture with a new state encoding structure and triplet learning loss to support heterogeneous agent learning. We developed a simulation environment that includes real-world environmental factors such as turbulence, delayed communication, and agent loss, to train teams of agents as well as probe their robustness and flexibility to such disturbances. △ Less

Submitted 6 October, 2020; originally announced October 2020.

Comments: Presented at AAAI FSS-20: Artificial Intelligence in Government and Public Sector, Washington, DC, USA. 8 pages, 6 figures

arXiv:2007.05794 [pdf, other]

Feedback Enhanced Motion Planning for Autonomous Vehicles

Authors: Ke Sun, Brent Schlotfeldt, Stephen Chaves, Paul Martin, Gulshan Mandhyan, Vijay Kumar

Abstract: In this work, we address the motion planning problem for autonomous vehicles through a new lattice planning approach, called Feedback Enhanced Lattice Planner (FELP). Existing lattice planners have two major limitations, namely the high dimensionality of the lattice and the lack of modeling of agent vehicle behaviors. We propose to apply the Intelligent Driver Model (IDM) as a speed feedback polic… ▽ More In this work, we address the motion planning problem for autonomous vehicles through a new lattice planning approach, called Feedback Enhanced Lattice Planner (FELP). Existing lattice planners have two major limitations, namely the high dimensionality of the lattice and the lack of modeling of agent vehicle behaviors. We propose to apply the Intelligent Driver Model (IDM) as a speed feedback policy to address both of these limitations. IDM both enables the responsive behavior of the agents, and uniquely determines the acceleration and speed profile of the ego vehicle on a given path. Therefore, only a spatial lattice is needed, while discretization of higher order dimensions is no longer required. Additionally, we propose a directed-graph map representation to support the implementation and execution of lattice planners. The map can reflect local geometric structure, embed the traffic rules adhering to the road, and is efficient to construct and update. We show that FELP is more efficient compared to other existing lattice planners through runtime complexity analysis, and we propose two variants of FELP to further reduce the complexity to polynomial time. We demonstrate the improvement by comparing FELP with an existing spatiotemporal lattice planner using simulations of a merging scenario and continuous highway traffic. We also study the performance of FELP under different traffic densities. △ Less

Submitted 11 July, 2020; originally announced July 2020.

Comments: To appear in IROS 2020

arXiv:2007.05149 [pdf, other]

Localized Motion Artifact Reduction on Brain MRI Using Deep Learning with Effective Data Augmentation Techniques

Authors: Yijun Zhao, Jacek Ossowski, Xuming Wang, Shangjin Li, Orrin Devinsky, Samantha P. Martin, Heath R. Pardoe

Abstract: In-scanner motion degrades the quality of magnetic resonance imaging (MRI) thereby reducing its utility in the detection of clinically relevant abnormalities. We introduce a deep learning-based MRI artifact reduction model (DMAR) to localize and correct head motion artifacts in brain MRI scans. Our approach integrates the latest advances in object detection and noise reduction in Computer Vision.… ▽ More In-scanner motion degrades the quality of magnetic resonance imaging (MRI) thereby reducing its utility in the detection of clinically relevant abnormalities. We introduce a deep learning-based MRI artifact reduction model (DMAR) to localize and correct head motion artifacts in brain MRI scans. Our approach integrates the latest advances in object detection and noise reduction in Computer Vision. Specifically, DMAR employs a two-stage approach: in the first, degraded regions are detected using the Single Shot Multibox Detector (SSD), and in the second, the artifacts within the found regions are reduced using a convolutional autoencoder (CAE). We further introduce a set of novel data augmentation techniques to address the high dimensionality of MRI images and the scarcity of available data. As a result, our model was trained on a large synthetic dataset of 225,000 images generated from 375 whole brain T1-weighted MRI scans. DMAR visibly reduces image artifacts when applied to both synthetic test images and 55 real-world motion-affected slices from 18 subjects from the multi-center Autism Brain Imaging Data Exchange (ABIDE) study. Quantitatively, depending on the level of degradation, our model achieves a 27.8%-48.1% reduction in RMSE and a 2.88--5.79 dB gain in PSNR on a 5000-sample set of synthetic images. For real-world artifact-affected scans from ABIDE, our model reduced the variance of image voxel intensity within artifact-affected brain regions (p = 0.014). △ Less

Submitted 30 October, 2020; v1 submitted 9 July, 2020; originally announced July 2020.

Comments: 11 pages, 8 figures

arXiv:2005.14645 [pdf, other]

DatashareNetwork: A Decentralized Privacy-Preserving Search Engine for Investigative Journalists

Authors: Kasra EdalatNejad, Wouter Lueks, Julien Pierre Martin, Soline Ledésert, Anne L'Hôte, Bruno Thomas, Laurent Girod, Carmela Troncoso

Abstract: Investigative journalists collect large numbers of digital documents during their investigations. These documents can greatly benefit other journalists' work. However, many of these documents contain sensitive information. Hence, possessing such documents can endanger reporters, their stories, and their sources. Consequently, many documents are used only for single, local, investigations. We pre… ▽ More Investigative journalists collect large numbers of digital documents during their investigations. These documents can greatly benefit other journalists' work. However, many of these documents contain sensitive information. Hence, possessing such documents can endanger reporters, their stories, and their sources. Consequently, many documents are used only for single, local, investigations. We present DatashareNetwork, a decentralized and privacy-preserving search system that enables journalists worldwide to find documents via a dedicated network of peers. DatashareNetwork combines well-known anonymous authentication mechanisms and anonymous communication primitives, a novel asynchronous messaging system, and a novel multi-set private set intersection protocol (MS-PSI) into a *decentralized peer-to-peer private document search engine*. We prove that DatashareNetwork is secure; and show using a prototype implementation that it scales to thousands of users and millions of documents. △ Less

Submitted 30 July, 2020; v1 submitted 29 May, 2020; originally announced May 2020.

Journal ref: USENIX Security Symposium 2020: 1911-1927

arXiv:2004.13907 [pdf, other]

Synergistic CPU-FPGA Acceleration of Sparse Linear Algebra

Authors: Mohammadreza Soltaniyeh, Richard P. Martin, Santosh Nagarakatte

Abstract: This paper describes REAP, a software-hardware approach that enables high performance sparse linear algebra computations on a cooperative CPU-FPGA platform. REAP carefully separates the task of organizing the matrix elements from the computation phase. It uses the CPU to provide a first-pass re-organization of the matrix elements, allowing the FPGA to focus on the computation. We introduce a new i… ▽ More This paper describes REAP, a software-hardware approach that enables high performance sparse linear algebra computations on a cooperative CPU-FPGA platform. REAP carefully separates the task of organizing the matrix elements from the computation phase. It uses the CPU to provide a first-pass re-organization of the matrix elements, allowing the FPGA to focus on the computation. We introduce a new intermediate representation that allows the CPU to communicate the sparse data and the scheduling decisions to the FPGA. The computation is optimized on the FPGA for effective resource utilization with pipelining. REAP improves the performance of Sparse General Matrix Multiplication (SpGEMM) and Sparse Cholesky Factorization by 3.2X and 1.85X compared to widely used sparse libraries for them on the CPU, respectively. △ Less

Submitted 28 April, 2020; originally announced April 2020.

Comments: 12 pages

Report number: Rutgers Computer Science Technical Report DCS-TR-750

arXiv:2003.13254 [pdf, other]

Environmental Adaptation of Robot Morphology and Control through Real-world Evolution

Authors: Tønnes F. Nygaard, Charles P. Martin, David Howard, Jim Torresen, Kyrre Glette

Abstract: Robots operating in the real world will experience a range of different environments and tasks. It is essential for the robot to have the ability to adapt to its surroundings to work efficiently in changing conditions. Evolutionary robotics aims to solve this by optimizing both the control and body (morphology) of a robot, allowing adaptation to internal, as well as external factors. Most work in… ▽ More Robots operating in the real world will experience a range of different environments and tasks. It is essential for the robot to have the ability to adapt to its surroundings to work efficiently in changing conditions. Evolutionary robotics aims to solve this by optimizing both the control and body (morphology) of a robot, allowing adaptation to internal, as well as external factors. Most work in this field has been done in physics simulators, which are relatively simple and not able to replicate the richness of interactions found in the real world. Solutions that rely on the complex interplay between control, body, and environment are therefore rarely found. In this paper, we rely solely on real-world evaluations and apply evolutionary search to yield combinations of morphology and control for our mechanically self-reconfiguring quadruped robot. We evolve solutions on two distinct physical surfaces and analyze the results in terms of both control and morphology. We then transition to two previously unseen surfaces to demonstrate the generality of our method. We find that the evolutionary search finds high-performing and diverse morphology-controller configurations by adapting both control and body to the different properties of the physical environments. We additionally find that morphology and control vary with statistical significance between the environments. Moreover, we observe that our method allows for morphology and control parameters to transfer to previously-unseen terrains, demonstrating the generality of our approach. △ Less

Submitted 20 October, 2020; v1 submitted 30 March, 2020; originally announced March 2020.

arXiv:2001.08988 [pdf]

doi 10.1016/j.jclinepi.2020.07.014

Towards a Framework for the Design, Implementation and Reporting of Methodology Scoping Reviews

Authors: Glen P. Martin, David Jenkins, Lucy Bull, Rose Sisk, Lijing Lin, William Hulme, Anthony Wilson, Wenjuan Wang, Michael Barrowman, Camilla Sammut-Powell, Alexander Pate, Matthew Sperrin, Niels Peek

Abstract: Background: In view of the growth of published papers, there is an increasing need for studies that summarise scientific research. An increasingly common review is a 'Methodology scoping review', which provides a summary of existing analytical methods, techniques and software, proposed or applied in research articles, which address an analytical problem or further an analytical approach. However,… ▽ More Background: In view of the growth of published papers, there is an increasing need for studies that summarise scientific research. An increasingly common review is a 'Methodology scoping review', which provides a summary of existing analytical methods, techniques and software, proposed or applied in research articles, which address an analytical problem or further an analytical approach. However, guidelines for their design, implementation and reporting are limited. Methods: Drawing on the experiences of the authors, which were consolidated through a series of face-to-face workshops, we summarise the challenges inherent in conducting a methodology scoping review and offer suggestions of best practice to promote future guideline development. Results: We identified three challenges of conducting a methodology scoping review. First, identification of search terms; one cannot usually define the search terms a priori and the language used for a particular method can vary across the literature. Second, the scope of the review requires careful consideration since new methodology is often not described (in full) within abstracts. Third, many new methods are motivated by a specific clinical question, where the methodology may only be documented in supplementary materials. We formulated several recommendations that build upon existing review guidelines. These recommendations ranged from an iterative approach to defining search terms through to screening and data extraction processes. Conclusion: Although methodology scoping reviews are an important aspect of research, there is currently a lack of guidelines to standardise their design, implementation and reporting. We recommend a wider discussion on this topic. △ Less

Submitted 16 January, 2020; originally announced January 2020.

Comments: 22 pages, 2 tables

Journal ref: Journal of Clinical Epidemiology. (2020)

arXiv:1911.01425 [pdf, other]

Improved BiGAN training with marginal likelihood equalization

Authors: Pablo Sánchez-Martín, Pablo M. Olmos, Fernando Perez-Cruz

Abstract: We propose a novel training procedure for improving the performance of generative adversarial networks (GANs), especially to bidirectional GANs. First, we enforce that the empirical distribution of the inverse inference network matches the prior distribution, which favors the generator network reproducibility on the seen samples. Second, we have found that the marginal log-likelihood of the sample… ▽ More We propose a novel training procedure for improving the performance of generative adversarial networks (GANs), especially to bidirectional GANs. First, we enforce that the empirical distribution of the inverse inference network matches the prior distribution, which favors the generator network reproducibility on the seen samples. Second, we have found that the marginal log-likelihood of the samples shows a severe overrepresentation of a certain type of samples. To address this issue, we propose to train the bidirectional GAN using a non-uniform sampling for the mini-batch selection, resulting in improved quality and variety in generated samples measured quantitatively and by visual inspection. We illustrate our new procedure with the well-known CIFAR10, Fashion MNIST and CelebA datasets. △ Less

Submitted 23 May, 2020; v1 submitted 4 November, 2019; originally announced November 2019.

arXiv:1906.08362 [pdf, other]

Trepan Reloaded: A Knowledge-driven Approach to Explaining Artificial Neural Networks

Authors: Roberto Confalonieri, Tillman Weyde, Tarek R. Besold, Fermín Moscoso del Prado Martín

Abstract: Explainability in Artificial Intelligence has been revived as a topic of active research by the need of conveying safety and trust to users in the `how' and `why' of automated decision-making. Whilst a plethora of approaches have been developed for post-hoc explainability, only a few focus on how to use domain knowledge, and how this influences the understandability of global explanations from the… ▽ More Explainability in Artificial Intelligence has been revived as a topic of active research by the need of conveying safety and trust to users in the `how' and `why' of automated decision-making. Whilst a plethora of approaches have been developed for post-hoc explainability, only a few focus on how to use domain knowledge, and how this influences the understandability of global explanations from the users' perspective. In this paper, we show how ontologies help the understandability of global post-hoc explanations, presented in the form of symbolic models. In particular, we build on Trepan, an algorithm that explains artificial neural networks by means of decision trees, and we extend it to include ontologies modeling domain knowledge in the process of generating explanations. We present the results of a user study that measures the understandability of decision trees using a syntactic complexity measure, and through time and accuracy of responses as well as reported user confidence and understandability. The user study considers domains where explanations are critical, namely, in finance and medicine. The results show that decision trees generated with our algorithm, taking into account domain knowledge, are more understandable than those generated by standard Trepan without the use of ontologies. △ Less

Submitted 21 November, 2019; v1 submitted 19 June, 2019; originally announced June 2019.

arXiv:1905.05626 [pdf, other]

Lessons Learned from Real-World Experiments with DyRET: the Dynamic Robot for Embodied Testing

Authors: Tønnes F. Nygaard, Jørgen Nordmoen, Charles P. Martin, Kyrre Glette

Abstract: Robots are used in more and more complex environments, and are expected to be able to adapt to changes and unknown situations. The easiest and quickest way to adapt is to change the control system of the robot, but for increasingly complex environments one should also change the body of the robot -- its morphology -- to better fit the task at hand. The theory of Embodied Cognition states that cont… ▽ More Robots are used in more and more complex environments, and are expected to be able to adapt to changes and unknown situations. The easiest and quickest way to adapt is to change the control system of the robot, but for increasingly complex environments one should also change the body of the robot -- its morphology -- to better fit the task at hand. The theory of Embodied Cognition states that control is not the only source of cognition, and the body, environment, interaction between these and the mind all contribute as cognitive resources. Taking advantage of these concepts could lead to improved adaptivity, robustness, and versatility, however, executing these concepts on real-world robots puts additional requirements on the hardware and has several challenges when compared to learning just control. In contrast to the majority of work in Evolutionary Robotics, Eiben argues for real-world experiments in his `Grand Challenges for Evolutionary Robotics'. This requires robust hardware platforms that are capable of repeated experiments which should at the same time be flexible when unforeseen demands arise. In this paper, we introduce our unique robot platform with self-adaptive morphology. We discuss the challenges we have faced when designing it, and the lessons learned from real-world testing and learning. △ Less

Submitted 14 May, 2019; originally announced May 2019.

Comments: Accepted to the Learning Legged Locomotion Workshop @ ICRA 2019

arXiv:1905.01254 [pdf, ps, other]

RLE edit distance in near optimal time

Authors: Raphaël Clifford, Paweł Gawrychowski, Tomasz Kociumaka, Daniel P. Martin, Przemysław Uznański

Abstract: We show that the edit distance between two run-length encoded strings of compressed lengths $m$ and $n$ respectively, can be computed in $\mathcal{O}(mn\log(mn))$ time. This improves the previous record by a factor of $\mathcal{O}(n/\log(mn))$. The running time of our algorithm is within subpolynomial factors of being optimal, subject to the standard SETH-hardness assumption. This effectively clos… ▽ More We show that the edit distance between two run-length encoded strings of compressed lengths $m$ and $n$ respectively, can be computed in $\mathcal{O}(mn\log(mn))$ time. This improves the previous record by a factor of $\mathcal{O}(n/\log(mn))$. The running time of our algorithm is within subpolynomial factors of being optimal, subject to the standard SETH-hardness assumption. This effectively closes a line of algorithmic research first started in 1993. △ Less

Submitted 3 May, 2019; originally announced May 2019.

arXiv:1904.11886 [pdf, other]

doi 10.1162/qss_a_00030

Recommending research articles to consumers of online vaccination information

Authors: Eliza Harrison, Paige Martin, Didi Surian, Adam G. Dunn

Abstract: Online health communications often provide biased interpretations of evidence and have unreliable links to the source research. We tested the feasibility of a tool for matching webpages to their source evidence. From 207,538 eligible vaccination-related PubMed articles, we evaluated several approaches using 3,573 unique links to webpages from Altmetric. We evaluated methods for ranking the source… ▽ More Online health communications often provide biased interpretations of evidence and have unreliable links to the source research. We tested the feasibility of a tool for matching webpages to their source evidence. From 207,538 eligible vaccination-related PubMed articles, we evaluated several approaches using 3,573 unique links to webpages from Altmetric. We evaluated methods for ranking the source articles for vaccine-related research described on webpages, comparing simple baseline feature representation and dimensionality reduction approaches to those augmented with canonical correlation analysis (CCA). Performance measures included the median rank of the correct source article; the percentage of webpages for which the source article was correctly ranked first (recall@1); and the percentage ranked within the top 50 candidate articles (recall@50). While augmenting baseline methods using CCA generally improved results, no CCA-based approach outperformed a baseline method, which ranked the correct source article first for over one quarter of webpages and in the top 50 for more than half. Tools to help people identify evidence-based sources for the content they access on vaccination-related webpages are potentially feasible and may support the prevention of bias and misrepresentation of research in news and social media. △ Less

Submitted 19 August, 2020; v1 submitted 26 April, 2019; originally announced April 2019.

Comments: 12 pages, 5 figures, 2 tables

ACM Class: H.3.3

Journal ref: Quantitative Science Studies, 1(2):810-823 (2020)

Showing 1–50 of 83 results for author: Martin, P