subscribe to arXiv mailings

Cyber Physical Games

Authors: Warisa Sritriratanarak, Paulo Garcia

Abstract: We describe a formulation of multi-agents operating within a Cyber-Physical System, resulting in collaborative or adversarial games. We show that the non-determinism inherent in the communication medium between agents and the underlying physical environment gives rise to environment evolution that is a probabilistic function of agents' strategies. We name these emergent properties Cyber Physical G… ▽ More We describe a formulation of multi-agents operating within a Cyber-Physical System, resulting in collaborative or adversarial games. We show that the non-determinism inherent in the communication medium between agents and the underlying physical environment gives rise to environment evolution that is a probabilistic function of agents' strategies. We name these emergent properties Cyber Physical Games and study its properties. We present an algorithmic model that determines the most likely system evolution, approximating Cyber Physical Games through Probabilistic Finite State Automata, and evaluate it on collaborative and adversarial versions of the Iterated Boolean Game, comparing theoretical results with simulated ones. Results support the validity of the proposed model, and suggest several required research directions to continue evolving our understanding of Cyber Physical System, as well as how to best design agents that must operate within such environments. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2406.02210 [pdf, other]

An Open and Reconfigurable User Interface to Manage Complex ROS-based Robotic Systems

Authors: Pablo Malvido Fresnillo, Saigopal Vasudevan, Jose A. Perez Garcia, Jose L. Martinez Lastra

Abstract: The Robot Operating System (ROS) has significantly gained popularity among robotic engineers and researchers over the past five years, primarily due to its powerful infrastructure for node communication, which enables developers to build modular and large robotic applications. However, ROS presents a steep learning curve and lacks the intuitive usability of vendor-specific robotic Graphical User I… ▽ More The Robot Operating System (ROS) has significantly gained popularity among robotic engineers and researchers over the past five years, primarily due to its powerful infrastructure for node communication, which enables developers to build modular and large robotic applications. However, ROS presents a steep learning curve and lacks the intuitive usability of vendor-specific robotic Graphical User Interfaces (GUIs). Moreover, its modular and distributed nature complicates the control and monitoring of extensive systems, even for advanced users. To address these challenges, this paper proposes a highly adaptable and reconfigurable web-based GUI for intuitively controlling, monitoring, and configuring complex ROS-based robotic systems. The GUI leverages ROSBridge and roslibjs to ensure seamless communication with ROS systems via topics and services. Designed as a versatile platform, the GUI allows for the selective incorporation of modular features to accommodate diverse robotic systems and applications. An initial set of commonly used features in robotic applications is presented. To demonstrate its reconfigurability, the GUI was customized and tested for four industrial use cases, receiving positive feedback. The project's repository has been made publicly available to support the robotics community and lower the entry barrier for ROS in industrial applications. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 14 pages, 12 figures, 3 tables

arXiv:2405.03347 [pdf, ps, other]

doi 10.3390/math12111642

Perfect codes over non-prime power alphabets: an approach based on Diophantine equations

Authors: Pedro-José Cazorla García

Abstract: Perfect error correcting codes allow for an optimal transmission of information while guaranteeing error correction. For this reason, proving their existence has been a classical problem in both pure mathematics and information theory. Indeed, the classification of the parameters of $e-$error correcting perfect codes over $q-$ary alphabets was a very active topic of research in the late 20th centu… ▽ More Perfect error correcting codes allow for an optimal transmission of information while guaranteeing error correction. For this reason, proving their existence has been a classical problem in both pure mathematics and information theory. Indeed, the classification of the parameters of $e-$error correcting perfect codes over $q-$ary alphabets was a very active topic of research in the late 20th century. Consequently, all parameters of perfect $e-$error correcting codes were found if $e \ge 3$, and it was conjectured that no perfect $2-$error correcting codes exist over any $q-$ary alphabet, where $q > 3$. In the 1970s, this was proved for $q$ a prime power, for $q = 2^r3^s$ and for only $7$ other values of $q$. Almost $50$ years later, it is surprising to note that there have been no new results in this regard and the classification of $2-$error correcting codes over non-prime power alphabets remains an open problem. In this paper, we use techniques from the resolution of generalised Ramanujan--Nagell equation and from modern computational number theory to show that perfect $2-$error correcting codes do not exist for $172$ new values of $q$ which are not prime powers, substantially increasing the values of $q$ which are now classified. In addition, we prove that, for any fixed value of $q$, there can be at most finitely many perfect $2-$error correcting codes over an alphabet of size $q$. △ Less

Submitted 24 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

Comments: 12 pages, 2 tables. The new version includes the comments by the anonymous referees

MSC Class: 94B65; 11D61 (Primary); 11G05; 11G50; 14G05 (Secondary)

Journal ref: Mathematics 2024, 12(11), 1642

arXiv:2404.11498 [pdf, other]

Runtime Verification and Field Testing for ROS-Based Robotic Systems

Authors: Ricardo Caldas, Juan Antonio Piñera García, Matei Schiopu, Patrizio Pelliccione, Genaína Rodrigues, Thorsten Berger

Abstract: Robotic systems are becoming pervasive and adopted in increasingly many domains, such as manufacturing, healthcare, and space exploration. To this end, engineering software has emerged as a crucial discipline for building maintainable and reusable robotic systems. Robotics software engineering research has received increasing attention, fostering autonomy as a fundamental goal. However, robotics d… ▽ More Robotic systems are becoming pervasive and adopted in increasingly many domains, such as manufacturing, healthcare, and space exploration. To this end, engineering software has emerged as a crucial discipline for building maintainable and reusable robotic systems. Robotics software engineering research has received increasing attention, fostering autonomy as a fundamental goal. However, robotics developers are still challenged trying to achieve this goal given that simulation is not able to deliver solutions to realistically emulate real-world phenomena. Robots also need to operate in unpredictable and uncontrollable environments, which require safe and trustworthy self-adaptation capabilities implemented in software. Typical techniques to address the challenges are runtime verification, field-based testing, and mitigation techniques that enable fail-safe solutions. However, there is no clear guidance to architect ROS-based systems to enable and facilitate runtime verification and field-based testing. This paper aims to fill in this gap by providing guidelines that can help developers and QA teams when developing, verifying or testing their robots in the field. These guidelines are carefully tailored to address the challenges and requirements of testing robotics systems in real-world scenarios. We conducted a literature review on studies addressing runtime verification and field-based testing for robotic systems, mined ROS-based application repositories, and validated the applicability, clarity, and usefulness via two questionnaires with 55 answers. We contribute 20 guidelines formulated for researchers and practitioners in robotic software engineering. Finally, we map our guidelines to open challenges thus far in runtime verification and field-based testing for ROS-based systems and, we outline promising research directions in the field. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2403.15488 [pdf]

doi 10.1007/s10956-013-9447-7

Enhancing Students' Learning Process Through Self-Generated Tests

Authors: Marcos Sánchez-Élez, Inmaculada Pardines, Pablo García, Guadalupe Miñana, Sara Román, Margarita Sánchez, José L. Risco-Martín

Abstract: The use of new technologies in higher education has surprisingly emphasized students' tendency to adopt a passive behavior in class. Participation and interaction of students are essential to improve academic results. This paper describes an educational experiment aimed at the promotion of students' autonomous learning by requiring them to generate test type questions related to the contents of th… ▽ More The use of new technologies in higher education has surprisingly emphasized students' tendency to adopt a passive behavior in class. Participation and interaction of students are essential to improve academic results. This paper describes an educational experiment aimed at the promotion of students' autonomous learning by requiring them to generate test type questions related to the contents of the course. The main idea is to make the student feel part of the evaluation process by including students' questions in the evaluation exams. A set of applications running on our university online learning environment has been developed in order to provide both students and teachers with the necessary tools for a good interaction between them. Questions uploaded by students are visible to every enrolled student as well as to each involved teacher. In this way, we enhance critical analysis skills, by solving and finding possible mistakes in the questions sent by their fellows. The experiment was applied over 769 students from 12 different courses. Results show that the students who have actively participated in the experiment have obtained better academic performance. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Journal ref: Journal of Science Education and Technology, 23(1), pp. 15-25, 2014

arXiv:2402.10642 [pdf, other]

Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model

Authors: Xiangyu Zhang, Daijiao Liu, Hexin Liu, Qiquan Zhang, Hanyu Meng, Leibny Paola Garcia, Eng Siong Chng, Lina Yao

Abstract: Recently, Denoising Diffusion Probabilistic Models (DDPMs) have attained leading performances across a diverse range of generative tasks. However, in the field of speech synthesis, although DDPMs exhibit impressive performance, their long training duration and substantial inference costs hinder practical deployment. Existing approaches primarily focus on enhancing inference speed, while approaches… ▽ More Recently, Denoising Diffusion Probabilistic Models (DDPMs) have attained leading performances across a diverse range of generative tasks. However, in the field of speech synthesis, although DDPMs exhibit impressive performance, their long training duration and substantial inference costs hinder practical deployment. Existing approaches primarily focus on enhancing inference speed, while approaches to accelerate training a key factor in the costs associated with adding or customizing voices often necessitate complex modifications to the model, compromising their universal applicability. To address the aforementioned challenges, we propose an inquiry: is it possible to enhance the training/inference speed and performance of DDPMs by modifying the speech signal itself? In this paper, we double the training and inference speed of Speech DDPMs by simply redirecting the generative target to the wavelet domain. This method not only achieves comparable or superior performance to the original model in speech synthesis tasks but also demonstrates its versatility. By investigating and utilizing different wavelet bases, our approach proves effective not just in speech synthesis, but also in speech enhancement. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.09734 [pdf, other]

Agents Need Not Know Their Purpose

Authors: Paulo Garcia

Abstract: Ensuring artificial intelligence behaves in such a way that is aligned with human values is commonly referred to as the alignment challenge. Prior work has shown that rational agents, behaving in such a way that maximizes a utility function, will inevitably behave in such a way that is not aligned with human values, especially as their level of intelligence goes up. Prior work has also shown that… ▽ More Ensuring artificial intelligence behaves in such a way that is aligned with human values is commonly referred to as the alignment challenge. Prior work has shown that rational agents, behaving in such a way that maximizes a utility function, will inevitably behave in such a way that is not aligned with human values, especially as their level of intelligence goes up. Prior work has also shown that there is no "one true utility function"; solutions must include a more holistic approach to alignment. This paper describes oblivious agents: agents that are architected in such a way that their effective utility function is an aggregation of a known and hidden sub-functions. The hidden component, to be maximized, is internally implemented as a black box, preventing the agent from examining it. The known component, to be minimized, is knowledge of the hidden sub-function. Architectural constraints further influence how agent actions can evolve its internal environment model. We show that an oblivious agent, behaving rationally, constructs an internal approximation of designers' intentions (i.e., infers alignment), and, as a consequence of its architecture and effective utility function, behaves in such a way that maximizes alignment; i.e., maximizing the approximated intention function. We show that, paradoxically, it does this for whatever utility function is used as the hidden component and, in contrast with extant techniques, chances of alignment actually improve as agent intelligence grows. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.06201 [pdf, other]

Maximizing Consistent Force Output for Shape Memory Alloy Artificial Muscles in Soft Robots

Authors: Meredith L. Anderson, Ran Jing, Juan C. Pacheco Garcia, Ilyoung Yang, Sarah Alizadeh-Shabdiz, Charles DeLorey, Andrew P. Sabelhaus

Abstract: Soft robots have immense potential given their inherent safety and adaptability, but challenges in soft actuator forces and design constraints have limited scaling up soft robots to larger sizes. Electrothermal shape memory alloy (SMA) artificial muscles have the potential to create these large forces and high displacements, but consistently using these muscles under a well-defined model, in-situ… ▽ More Soft robots have immense potential given their inherent safety and adaptability, but challenges in soft actuator forces and design constraints have limited scaling up soft robots to larger sizes. Electrothermal shape memory alloy (SMA) artificial muscles have the potential to create these large forces and high displacements, but consistently using these muscles under a well-defined model, in-situ in a soft robot, remains an open challenge. This article provides a system for maintaining the highest-possible consistent SMA forces, over long lifetimes, by combining a fatigue testing protocol with a supervisory control system for the muscles' internal temperature state. We propose a design of a soft limb with swap-able SMA muscles, and deploy the limb in a blocked-force test to quantify the relationship between the measured maximum force at different temperatures over different lifetimes. Then, by applying an invariance-based control system to maintain temperatures under our long-life limit, we demonstrate consistent high forces in a practical task over hundreds of cycles. The method we developed allows for practical implementation of SMAs in soft robots through characterizing and controlling their behavior in-situ, and provides a method to impose limits that maximize their consistent, repeatable behavior. △ Less

Submitted 9 February, 2024; originally announced February 2024.

Comments: 8 pages, 8 figures, accepted by 2024 IEEE International Conference on Soft Robotics (RoboSoft)

arXiv:2401.07726 [pdf, other]

Preserving Power Optimizations Across the High Level Synthesis of Distinct Application-Specific Circuits

Authors: Paulo Garcia

Abstract: We evaluate the use of software interpretation to push High Level Synthesis of application-specific accelerators toward a higher level of abstraction. Our methodology is supported by a formal power consumption model that computes the power consumption of accelerator components, accurately predicting the power consumption on new designs from prior optimization estimations. We demonstrate how our ap… ▽ More We evaluate the use of software interpretation to push High Level Synthesis of application-specific accelerators toward a higher level of abstraction. Our methodology is supported by a formal power consumption model that computes the power consumption of accelerator components, accurately predicting the power consumption on new designs from prior optimization estimations. We demonstrate how our approach simplifies the re-use of power optimizations across distinct designs, by leveraging the higher level of design abstraction, using two accelerators representative of the robotics domain, implemented through the Bambu High Level Synthesis tool. Results support the research hypothesis, achieving predictions accurate within +/- 1%. △ Less

Submitted 9 July, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

Comments: Accepted at IEEE 10th International Conference on Communications and Electronics (ICCE) 2024

arXiv:2401.07429 [pdf, other]

Accelerating Boolean Constraint Propagation for Efficient SAT-Solving on FPGAs

Authors: Hariprasadh Govindasamy, Babak Esfandiari, Paulo Garcia

Abstract: We present a hardware-accelerated SAT solver targeting processor/Field Programmable Gate Arrays (FPGA) SoCs. Our solution accelerates the most expensive subroutine of the Davis-Putnam-Logemann-Loveland (DPLL) algorithm, Boolean Constraint Propagation (BCP) through fine-grained FPGA parallelism. Unlike prior state-of-the-art solutions, our solver eliminates costly clause look-up operations by assig… ▽ More We present a hardware-accelerated SAT solver targeting processor/Field Programmable Gate Arrays (FPGA) SoCs. Our solution accelerates the most expensive subroutine of the Davis-Putnam-Logemann-Loveland (DPLL) algorithm, Boolean Constraint Propagation (BCP) through fine-grained FPGA parallelism. Unlike prior state-of-the-art solutions, our solver eliminates costly clause look-up operations by assigning clauses directly to clause processors on the FPGA and dividing large formulas into smaller partitions manageable by FPGA. Partitions are hot-swapped during runtime as required and the supported formula size is limited only by available external memory, not on-chip FPGA memory. We evaluate our solver on a Xilinx Zynq platform with results showing quicker execution time across various formula sizes, subject to formula partitioning strategy. Compared to prior state-of-the-art, we achieve 1.7x and 1.1x speed up on BCP for 2 representative benchmarks and up to 6x total speedup over software-only implementation. △ Less

Submitted 13 April, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

Comments: Accepted at ACM GLSVLSI 2024

arXiv:2312.11279 [pdf, other]

FPGAs (Can Get Some) SATisfaction

Authors: Hariprasadh Godindasamy, Babak Esfandiari, Paulo Garcia

Abstract: We present a hardware-accelerated SAT solver suitable for processor/Field Programmable Gate Arrays (FPGA) hybrid platforms, which have become the norm in the embedded domain. Our solution addresses a known bottleneck in SAT solving acceleration: unlike prior state-of-the-art solutions that have addressed the same bottleneck by limiting the amount of exploited parallelism, our solver takes advantag… ▽ More We present a hardware-accelerated SAT solver suitable for processor/Field Programmable Gate Arrays (FPGA) hybrid platforms, which have become the norm in the embedded domain. Our solution addresses a known bottleneck in SAT solving acceleration: unlike prior state-of-the-art solutions that have addressed the same bottleneck by limiting the amount of exploited parallelism, our solver takes advantage of fine-grained parallelization opportunities by hot-swapping FPGA clause assignments at runtime. It is also the first modern completely open-source SAT accelerator, and formula size is limited only by the amount of available external memory, not by on-chip FPGA memory. Evaluation is performed on a Xilinx Zynq platform: experiments support that hardware acceleration results in shorter execution time across varying formula sizes, subject to formula partitioning strategy. We outperform prior state-of-the-art by 1.7x and 1.1x, respectively, for 2 representative benchmarks, and boast up to 6x performance increase over software-only implementation. △ Less

Submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.09546 [pdf, other]

On a Functional Definition of Intelligence

Authors: Warisa Sritriratanarak, Paulo Garcia

Abstract: Without an agreed-upon definition of intelligence, asking "is this system intelligent?"" is an untestable question. This lack of consensus hinders research, and public perception, on Artificial Intelligence (AI), particularly since the rise of generative- and large-language models. Most work on precisely capturing what we mean by "intelligence" has come from the fields of philosophy, psychology, a… ▽ More Without an agreed-upon definition of intelligence, asking "is this system intelligent?"" is an untestable question. This lack of consensus hinders research, and public perception, on Artificial Intelligence (AI), particularly since the rise of generative- and large-language models. Most work on precisely capturing what we mean by "intelligence" has come from the fields of philosophy, psychology, and cognitive science. Because these perspectives are intrinsically linked to intelligence as it is demonstrated by natural creatures, we argue such fields cannot, and will not, provide a sufficiently rigorous definition that can be applied to artificial means. Thus, we present an argument for a purely functional, black-box definition of intelligence, distinct from how that intelligence is actually achieved; focusing on the "what", rather than the "how". To achieve this, we first distinguish other related concepts (sentience, sensation, agency, etc.) from the notion of intelligence, particularly identifying how these concepts pertain to artificial intelligent systems. As a result, we achieve a formal definition of intelligence that is conceptually testable from only external observation, that suggests intelligence is a continuous variable. We conclude by identifying challenges that still remain towards quantifiable measurement. This work provides a useful perspective for both the development of AI, and for public perception of the capabilities and risks of AI. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: submitted; under review at "Journal of Intelligent Computing, SPJ"

arXiv:2312.08650 [pdf, other]

PhyOT: Physics-informed object tracking in surveillance cameras

Authors: Kawisorn Kamtue, Jose M. F. Moura, Orathai Sangpetch, Paulo Garcia

Abstract: While deep learning has been very successful in computer vision, real world operating conditions such as lighting variation, background clutter, or occlusion hinder its accuracy across several tasks. Prior work has shown that hybrid models -- combining neural networks and heuristics/algorithms -- can outperform vanilla deep learning for several computer vision tasks, such as classification or trac… ▽ More While deep learning has been very successful in computer vision, real world operating conditions such as lighting variation, background clutter, or occlusion hinder its accuracy across several tasks. Prior work has shown that hybrid models -- combining neural networks and heuristics/algorithms -- can outperform vanilla deep learning for several computer vision tasks, such as classification or tracking. We consider the case of object tracking, and evaluate a hybrid model (PhyOT) that conceptualizes deep neural networks as ``sensors'' in a Kalman filter setup, where prior knowledge, in the form of Newtonian laws of motion, is used to fuse sensor observations and to perform improved estimations. Our experiments combine three neural networks, performing position, indirect velocity and acceleration estimation, respectively, and evaluate such a formulation on two benchmark datasets: a warehouse security camera dataset that we collected and annotated and a traffic camera open dataset. Results suggest that our PhyOT can track objects in extreme conditions that the state-of-the-art deep neural networks fail while its performance in general cases does not degrade significantly from that of existing deep learning approaches. Results also suggest that our PhyOT components are generalizable and transferable. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: Accepted at IEEE ICASSP 2024 on December 13, 2023

arXiv:2311.15954 [pdf, other]

A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extractors

Authors: Shuyue Stella Li, Beining Xu, Xiangyu Zhang, Hexin Liu, Wenhan Chao, Leibny Paola Garcia

Abstract: In this work, we study the features extracted by English self-supervised learning (SSL) models in cross-lingual contexts and propose a new metric to predict the quality of feature representations. Using automatic speech recognition (ASR) as a downstream task, we analyze the effect of model size, training objectives, and model architecture on the models' performance as a feature extractor for a set… ▽ More In this work, we study the features extracted by English self-supervised learning (SSL) models in cross-lingual contexts and propose a new metric to predict the quality of feature representations. Using automatic speech recognition (ASR) as a downstream task, we analyze the effect of model size, training objectives, and model architecture on the models' performance as a feature extractor for a set of topologically diverse corpora. We develop a novel metric, the Phonetic-Syntax Ratio (PSR), to measure the phonetic and synthetic information in the extracted representations using deep generalized canonical correlation analysis. Results show the contrastive loss in the wav2vec2.0 objective facilitates more effective cross-lingual feature extraction. There is a positive correlation between PSR scores and ASR performance, suggesting that phonetic information extracted by monolingual SSL models can be used for downstream tasks in cross-lingual settings. The proposed metric is an effective indicator of the quality of the representations and can be useful for model selection. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: 12 pages, 5 figures, 4 tables

arXiv:2310.01719 [pdf]

Software Testing and Code Refactoring: A Survey with Practitioners

Authors: Danilo Leandro Lima, Ronnie de Souza Santos, Guilherme Pires Garcia, Sildemir S. da Silva, Cesar Franca, Luiz Fernando Capretz

Abstract: Nowadays, software testing professionals are commonly required to develop coding skills to work on test automation. One essential skill required from those who code is the ability to implement code refactoring, a valued quality aspect of software development; however, software developers usually encounter obstacles in successfully applying this practice. In this scenario, the present study aims to… ▽ More Nowadays, software testing professionals are commonly required to develop coding skills to work on test automation. One essential skill required from those who code is the ability to implement code refactoring, a valued quality aspect of software development; however, software developers usually encounter obstacles in successfully applying this practice. In this scenario, the present study aims to explore how software testing professionals (e.g., software testers, test engineers, test analysts, and software QAs) deal with code refactoring to understand the benefits and limitations of this practice in the context of software testing. We followed the guidelines to conduct surveys in software engineering and applied three sampling techniques, namely convenience sampling, purposive sampling, and snowballing sampling, to collect data from testing professionals. We received answers from 80 individuals reporting their experience refactoring the code of automated tests. We concluded that in the context of software testing, refactoring offers several benefits, such as supporting the maintenance of automated tests and improving the performance of the testing team. However, practitioners might encounter barriers in effectively implementing this practice, in particular, the lack of interest from managers and leaders. Our study raises discussions on the importance of having testing professionals implement refactoring in the code of automated tests, allowing them to improve their coding abilities. △ Less

Submitted 2 October, 2023; originally announced October 2023.

arXiv:2309.16953 [pdf, other]

Enhancing Code-switching Speech Recognition with Interactive Language Biases

Authors: Hexin Liu, Leibny Paola Garcia, Xiangyu Zhang, Andy W. H. Khong, Sanjeev Khudanpur

Abstract: Languages usually switch within a multilingual speech signal, especially in a bilingual society. This phenomenon is referred to as code-switching (CS), making automatic speech recognition (ASR) challenging under a multilingual scenario. We propose to improve CS-ASR by biasing the hybrid CTC/attention ASR model with multi-level language information comprising frame- and token-level language posteri… ▽ More Languages usually switch within a multilingual speech signal, especially in a bilingual society. This phenomenon is referred to as code-switching (CS), making automatic speech recognition (ASR) challenging under a multilingual scenario. We propose to improve CS-ASR by biasing the hybrid CTC/attention ASR model with multi-level language information comprising frame- and token-level language posteriors. The interaction between various resolutions of language biases is subsequently explored in this work. We conducted experiments on datasets from the ASRU 2019 code-switching challenge. Compared to the baseline, the proposed interactive language biases (ILB) method achieves higher performance and ablation studies highlight the effects of different language biases and their interactions. In addition, the results presented indicate that language bias implicitly enhances internal language modeling, leading to performance degradation after employing an external language model. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: Submitted to IEEE ICASSP 2024

arXiv:2309.15018 [pdf, other]

Unidirectional brain-computer interface: Artificial neural network encoding natural images to fMRI response in the visual cortex

Authors: Ruixing Liang, Xiangyu Zhang, Qiong Li, Lai Wei, Hexin Liu, Avisha Kumar, Kelley M. Kempski Leadingham, Joshua Punnoose, Leibny Paola Garcia, Amir Manbachi

Abstract: While significant advancements in artificial intelligence (AI) have catalyzed progress across various domains, its full potential in understanding visual perception remains underexplored. We propose an artificial neural network dubbed VISION, an acronym for "Visual Interface System for Imaging Output of Neural activity," to mimic the human brain and show how it can foster neuroscientific inquiries… ▽ More While significant advancements in artificial intelligence (AI) have catalyzed progress across various domains, its full potential in understanding visual perception remains underexplored. We propose an artificial neural network dubbed VISION, an acronym for "Visual Interface System for Imaging Output of Neural activity," to mimic the human brain and show how it can foster neuroscientific inquiries. Using visual and contextual inputs, this multimodal model predicts the brain's functional magnetic resonance imaging (fMRI) scan response to natural images. VISION successfully predicts human hemodynamic responses as fMRI voxel values to visual inputs with an accuracy exceeding state-of-the-art performance by 45%. We further probe the trained networks to reveal representational biases in different visual areas, generate experimentally testable hypotheses, and formulate an interpretable metric to associate these hypotheses with cortical functions. With both a model and evaluation metric, the cost and time burdens associated with designing and implementing functional analysis on the visual cortex could be reduced. Our work suggests that the evolution of computational models may shed light on our fundamental understanding of the visual cortex and provide a viable approach toward reliable brain-machine interfaces. △ Less

Submitted 26 September, 2023; originally announced September 2023.

arXiv:2306.13734 [pdf, other]

The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios

Authors: Samuele Cornell, Matthew Wiesner, Shinji Watanabe, Desh Raj, Xuankai Chang, Paola Garcia, Matthew Maciejewski, Yoshiki Masuyama, Zhong-Qiu Wang, Stefano Squartini, Sanjeev Khudanpur

Abstract: The CHiME challenges have played a significant role in the development and evaluation of robust automatic speech recognition (ASR) systems. We introduce the CHiME-7 distant ASR (DASR) task, within the 7th CHiME challenge. This task comprises joint ASR and diarization in far-field settings with multiple, and possibly heterogeneous, recording devices. Different from previous challenges, we evaluate… ▽ More The CHiME challenges have played a significant role in the development and evaluation of robust automatic speech recognition (ASR) systems. We introduce the CHiME-7 distant ASR (DASR) task, within the 7th CHiME challenge. This task comprises joint ASR and diarization in far-field settings with multiple, and possibly heterogeneous, recording devices. Different from previous challenges, we evaluate systems on 3 diverse scenarios: CHiME-6, DiPCo, and Mixer 6. The goal is for participants to devise a single system that can generalize across different array geometries and use cases with no a-priori information. Another departure from earlier CHiME iterations is that participants are allowed to use open-source pre-trained models and datasets. In this paper, we describe the challenge design, motivation, and fundamental research questions in detail. We also present the baseline system, which is fully array-topology agnostic and features multi-channel diarization, channel selection, guided source separation and a robust ASR model that leverages self-supervised speech representations (SSLR). △ Less

Submitted 14 July, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

arXiv:2306.01031 [pdf, other]

Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts

Authors: Dongji Gao, Matthew Wiesner, Hainan Xu, Leibny Paola Garcia, Daniel Povey, Sanjeev Khudanpur

Abstract: This paper presents a novel algorithm for building an automatic speech recognition (ASR) model with imperfect training data. Imperfectly transcribed speech is a prevalent issue in human-annotated speech corpora, which degrades the performance of ASR models. To address this problem, we propose Bypass Temporal Classification (BTC) as an expansion of the Connectionist Temporal Classification (CTC) cr… ▽ More This paper presents a novel algorithm for building an automatic speech recognition (ASR) model with imperfect training data. Imperfectly transcribed speech is a prevalent issue in human-annotated speech corpora, which degrades the performance of ASR models. To address this problem, we propose Bypass Temporal Classification (BTC) as an expansion of the Connectionist Temporal Classification (CTC) criterion. BTC explicitly encodes the uncertainties associated with transcripts during training. This is accomplished by enhancing the flexibility of the training graph, which is implemented as a weighted finite-state transducer (WFST) composition. The proposed algorithm improves the robustness and accuracy of ASR systems, particularly when working with imprecisely transcribed speech corpora. Our implementation will be open-sourced. △ Less

Submitted 1 June, 2023; originally announced June 2023.

arXiv:2212.10249 [pdf, other]

Learning efficient backprojections across cortical hierarchies in real time

Authors: Kevin Max, Laura Kriener, Garibaldi Pineda García, Thomas Nowotny, Ismael Jaras, Walter Senn, Mihai A. Petrovici

Abstract: Models of sensory processing and learning in the cortex need to efficiently assign credit to synapses in all areas. In deep learning, a known solution is error backpropagation, which however requires biologically implausible weight transport from feed-forward to feedback paths. We introduce Phaseless Alignment Learning (PAL), a bio-plausible method to learn efficient feedback weights in layered… ▽ More Models of sensory processing and learning in the cortex need to efficiently assign credit to synapses in all areas. In deep learning, a known solution is error backpropagation, which however requires biologically implausible weight transport from feed-forward to feedback paths. We introduce Phaseless Alignment Learning (PAL), a bio-plausible method to learn efficient feedback weights in layered cortical hierarchies. This is achieved by exploiting the noise naturally found in biophysical systems as an additional carrier of information. In our dynamical system, all weights are learned simultaneously with always-on plasticity and using only information locally available to the synapses. Our method is completely phase-free (no forward and backward passes or phased learning) and allows for efficient error propagation across multi-layer cortical hierarchies, while maintaining biologically plausible signal transport and learning. Our method is applicable to a wide class of models and improves on previously known biologically plausible ways of credit assignment: compared to random synaptic feedback, it can solve complex tasks with less neurons and learn more useful latent representations. We demonstrate this on various classification tasks using a cortical microcircuit model with prospective coding. △ Less

Submitted 2 February, 2024; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: Updated with streamlined main part, CIFAR-10 simulations, including DFA and minor fixes

arXiv:2212.06039 [pdf]

Technological taxonomies for hypernym and hyponym retrieval in patent texts

Authors: You Zuo, Yixuan Li, Alma Parias García, Kim Gerdes

Abstract: This paper presents an automatic approach to creating taxonomies of technical terms based on the Cooperative Patent Classification (CPC). The resulting taxonomy contains about 170k nodes in 9 separate technological branches and is freely available. We also show that a Text-to-Text Transfer Transformer (T5) model can be fine-tuned to generate hypernyms and hyponyms with relatively high precision, c… ▽ More This paper presents an automatic approach to creating taxonomies of technical terms based on the Cooperative Patent Classification (CPC). The resulting taxonomy contains about 170k nodes in 9 separate technological branches and is freely available. We also show that a Text-to-Text Transfer Transformer (T5) model can be fine-tuned to generate hypernyms and hyponyms with relatively high precision, confirming the manually assessed quality of the resource. The T5 model opens the taxonomy to any new technological terms for which a hypernym can be generated, thus making the resource updateable with new terms, an essential feature for the constantly evolving field of technological terminology. △ Less

Submitted 13 December, 2022; v1 submitted 14 November, 2022; originally announced December 2022.

Comments: ToTh 2022 - Terminology & Ontology: Theories and applications, Jun 2022, Chamb{é}ry, France

arXiv:2211.17196 [pdf, other]

EURO: ESPnet Unsupervised ASR Open-source Toolkit

Authors: Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola Garcia, Hung-yi Lee, Shinji Watanabe, Sanjeev Khudanpur

Abstract: This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-to-end open-source toolkit for unsupervised automatic speech recognition (UASR). EURO adopts the state-of-the-art UASR learning method introduced by the Wav2vec-U, originally implemented at FAIRSEQ, which leverages self-supervised speech representations and adversarial training. In addition to wav2vec2, EURO extend… ▽ More This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-to-end open-source toolkit for unsupervised automatic speech recognition (UASR). EURO adopts the state-of-the-art UASR learning method introduced by the Wav2vec-U, originally implemented at FAIRSEQ, which leverages self-supervised speech representations and adversarial training. In addition to wav2vec2, EURO extends the functionality and promotes reproducibility for UASR tasks by integrating S3PRL and k2, resulting in flexible frontends from 27 self-supervised models and various graph-based decoding strategies. EURO is implemented in ESPnet and follows its unified pipeline to provide UASR recipes with a complete setup. This improves the pipeline's efficiency and allows EURO to be easily applied to existing datasets in ESPnet. Extensive experiments on three mainstream self-supervised models demonstrate the toolkit's effectiveness and achieve state-of-the-art UASR performance on TIMIT and LibriSpeech datasets. EURO will be publicly available at https://github.com/espnet/espnet, aiming to promote this exciting and emerging research area based on UASR through open-source activity. △ Less

Submitted 20 May, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

arXiv:2211.13101 [pdf, other]

High-Quality Fault Resiliency in Fat Trees

Authors: John Gliksberg, Antoine Capra, Alexandre Louvet, Pedro Javier Garcia, Devan Sohier

Abstract: Coupling regular topologies with optimised routing algorithms is key in pushing the performance of interconnection networks of supercomputers.In this paper we present Dmodc, a fast deterministic routing algorithm for Parallel Generalised Fat-Trees (PGFTs) which minimises congestion risk even under massive network degradation caused by equipment failure.Dmodc computes forwarding tables with a close… ▽ More Coupling regular topologies with optimised routing algorithms is key in pushing the performance of interconnection networks of supercomputers.In this paper we present Dmodc, a fast deterministic routing algorithm for Parallel Generalised Fat-Trees (PGFTs) which minimises congestion risk even under massive network degradation caused by equipment failure.Dmodc computes forwarding tables with a closed-form arithmetic formula by relying on a fast preprocessing phase.This allows complete re-routing of networks with tens of thousands of nodes in less than a second.In turn, this greatly helps centralised fabric management react to faults with high-quality routing tables and no impact to running applications in current and future very large-scale HPC clusters. △ Less

Submitted 23 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2211.11817

Journal ref: IEEE Micro, 2020, 40 (1), pp.44-49. \&\#x27E8;10.1109/MM.2019.2949978\&\#x27E9

arXiv:2211.11818 [pdf, other]

doi 10.1109/HiPINEB.2018.00010

Node-Type-Based Load-Balancing Routing for Parallel Generalized Fat-Trees

Authors: John Gliksberg, Jean-Noel Quintin, Pedro Javier Garcia

Abstract: High-Performance Computing (HPC) clusters are made up of a variety of node types (usually compute, I/O, service, and GPGPU nodes) and applications don't use nodes of a different type the same way. Resulting communication patterns reflect organization of groups of nodes, and current optimal routing algorithms for all-to-all patterns will not always maximize performance for group-specific communicat… ▽ More High-Performance Computing (HPC) clusters are made up of a variety of node types (usually compute, I/O, service, and GPGPU nodes) and applications don't use nodes of a different type the same way. Resulting communication patterns reflect organization of groups of nodes, and current optimal routing algorithms for all-to-all patterns will not always maximize performance for group-specific communications. Since application communication patterns are rarely available beforehand, we choose to rely on node types as a good guess for node usage. We provide a description of node type heterogeneity and analyse performance degradation caused by unlucky repartition of nodes of the same type. We provide an extension to routing algorithms for Parallel Generalized Fat-Tree topologies (PGFTs) which balances load amongst groups of nodes of the same type. We show how it removes these performance issues by comparing results in a variety of situations against corresponding classical algorithms. △ Less

Submitted 21 November, 2022; originally announced November 2022.

Journal ref: 2018 IEEE 4th International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB), Feb 2018, Vienna, France. pp.9-15

arXiv:2211.11817 [pdf, other]

doi 10.1109/HOTI.2019.00015

High-Quality Fault-Resiliency in Fat-Tree Networks (Extended Abstract)

Authors: John Gliksberg, Antoine Capra, Alexandre Louvet, Pedro Javier Garcia, Devan Sohier

Abstract: Coupling regular topologies with optimized routing algorithms is key in pushing the performance of interconnection networks of HPC systems. In this paper we present Dmodc, a fast deterministic routing algorithm for Parallel Generalized Fat-Trees (PGFTs) which minimizes congestion risk even under massive topology degradation caused by equipment failure. It applies a modulo-based computation of forw… ▽ More Coupling regular topologies with optimized routing algorithms is key in pushing the performance of interconnection networks of HPC systems. In this paper we present Dmodc, a fast deterministic routing algorithm for Parallel Generalized Fat-Trees (PGFTs) which minimizes congestion risk even under massive topology degradation caused by equipment failure. It applies a modulo-based computation of forwarding tables among switches closer to the destination, using only knowledge of subtrees for pre-modulo division. Dmodc allows complete rerouting of topologies with tens of thousands of nodes in less than a second, which greatly helps centralized fabric management react to faults with high-quality routing tables and no impact to running applications in current and future very large-scale HPC clusters. We compare Dmodc against routing algorithms available in the InfiniBand control software (OpenSM) first for routing execution time to show feasibility at scale, and then for congestion risk under degradation to demonstrate robustness. The latter comparison is done using static analysis of routing tables under random permutation (RP), shift permutation (SP) and all-to-all (A2A) traffic patterns. Results for Dmodc show A2A and RP congestion risks similar under heavy degradation as the most stable algorithms compared, and near-optimal SP congestion risk up to 1% of random degradation. △ Less

Submitted 21 November, 2022; originally announced November 2022.

Journal ref: 2019 IEEE Symposium on High-Performance Interconnects (HOTI), Aug 2019, Santa Clara, United States. pp.9-12

arXiv:2211.03025 [pdf, other]

Bridging Speech and Textual Pre-trained Models with Unsupervised ASR

Authors: Jiatong Shi, Chan-Jan Hsu, Holam Chung, Dongji Gao, Paola Garcia, Shinji Watanabe, Ann Lee, Hung-yi Lee

Abstract: Spoken language understanding (SLU) is a task aiming to extract high-level semantics from spoken utterances. Previous works have investigated the use of speech self-supervised models and textual pre-trained models, which have shown reasonable improvements to various SLU tasks. However, because of the mismatched modalities between speech signals and text tokens, previous methods usually need comple… ▽ More Spoken language understanding (SLU) is a task aiming to extract high-level semantics from spoken utterances. Previous works have investigated the use of speech self-supervised models and textual pre-trained models, which have shown reasonable improvements to various SLU tasks. However, because of the mismatched modalities between speech signals and text tokens, previous methods usually need complex designs of the frameworks. This work proposes a simple yet efficient unsupervised paradigm that connects speech and textual pre-trained models, resulting in an unsupervised speech-to-semantic pre-trained model for various tasks in SLU. To be specific, we propose to use unsupervised automatic speech recognition (ASR) as a connector that bridges different modalities used in speech and textual pre-trained models. Our experiments show that unsupervised ASR itself can improve the representations from speech self-supervised models. More importantly, it is shown as an efficient connector between speech and textual pre-trained models, improving the performances of five different SLU tasks. Notably, on spoken question answering, we reach the state-of-the-art result over the challenging NMSQA benchmark. △ Less

Submitted 6 November, 2022; originally announced November 2022.

Comments: ICASSP2023 submission

arXiv:2211.00482 [pdf, other]

Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

Authors: Zili Huang, Desh Raj, Paola García, Sanjeev Khudanpur

Abstract: Self-supervised learning (SSL) methods which learn representations of data without explicit supervision have gained popularity in speech-processing tasks, particularly for single-talker applications. However, these models often have degraded performance for multi-talker scenarios -- possibly due to the domain mismatch -- which severely limits their use for such applications. In this paper, we inve… ▽ More Self-supervised learning (SSL) methods which learn representations of data without explicit supervision have gained popularity in speech-processing tasks, particularly for single-talker applications. However, these models often have degraded performance for multi-talker scenarios -- possibly due to the domain mismatch -- which severely limits their use for such applications. In this paper, we investigate the adaptation of upstream SSL models to the multi-talker automatic speech recognition (ASR) task under two conditions. First, when segmented utterances are given, we show that adding a target speaker extraction (TSE) module based on enrollment embeddings is complementary to mixture-aware pre-training. Second, for unsegmented mixtures, we propose a novel joint speaker modeling (JSM) approach, which aggregates information from all speakers in the mixture through their embeddings. With controlled experiments on Libri2Mix, we show that using speaker embeddings provides relative WER improvements of 9.1% and 42.1% over strong baselines for the segmented and unsegmented cases, respectively. We also demonstrate the effectiveness of our models for real conversational mixtures through experiments on the AMI dataset. △ Less

Submitted 1 November, 2022; originally announced November 2022.

Comments: submitted to ICASSP 2023

arXiv:2210.14567 [pdf, other]

Reducing Language confusion for Code-switching Speech Recognition with Token-level Language Diarization

Authors: Hexin Liu, Haihua Xu, Leibny Paola Garcia, Andy W. H. Khong, Yi He, Sanjeev Khudanpur

Abstract: Code-switching (CS) refers to the phenomenon that languages switch within a speech signal and leads to language confusion for automatic speech recognition (ASR). This paper aims to address language confusion for improving CS-ASR from two perspectives: incorporating and disentangling language information. We incorporate language information in the CS-ASR model by dynamically biasing the model with… ▽ More Code-switching (CS) refers to the phenomenon that languages switch within a speech signal and leads to language confusion for automatic speech recognition (ASR). This paper aims to address language confusion for improving CS-ASR from two perspectives: incorporating and disentangling language information. We incorporate language information in the CS-ASR model by dynamically biasing the model with token-level language posteriors which are outputs of a sequence-to-sequence auxiliary language diarization module. In contrast, the disentangling process reduces the difference between languages via adversarial training so as to normalize two languages. We conduct the experiments on the SEAME dataset. Compared to the baseline model, both the joint optimization with LD and the language posterior bias achieve performance improvement. The comparison of the proposed methods indicates that incorporating language information is more effective than disentangling for reducing language confusion in CS speech. △ Less

Submitted 26 October, 2022; originally announced October 2022.

Comments: Submitted to ICASSP 2023

arXiv:2210.07189 [pdf, other]

On Compressing Sequences for Self-Supervised Speech Models

Authors: Yen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe, Paola Garcia, Hung-yi Lee, Hao Tang

Abstract: Compressing self-supervised models has become increasingly necessary, as self-supervised models become larger. While previous approaches have primarily focused on compressing the model size, shortening sequences is also effective in reducing the computational cost. In this work, we study fixed-length and variable-length subsampling along the time axis in self-supervised learning. We explore how in… ▽ More Compressing self-supervised models has become increasingly necessary, as self-supervised models become larger. While previous approaches have primarily focused on compressing the model size, shortening sequences is also effective in reducing the computational cost. In this work, we study fixed-length and variable-length subsampling along the time axis in self-supervised learning. We explore how individual downstream tasks are sensitive to input frame rates. Subsampling while training self-supervised models not only improves the overall performance on downstream tasks under certain frame rates, but also brings significant speed-up in inference. Variable-length subsampling performs particularly well under low frame rates. In addition, if we have access to phonetic boundaries, we find no degradation in performance for an average frame rate as low as 10 Hz. △ Less

Submitted 25 October, 2022; v1 submitted 13 October, 2022; originally announced October 2022.

Comments: Accepted to IEEE SLT 2022

arXiv:2210.05581 [pdf, other]

Aggregating Crowdsourced and Automatic Judgments to Scale Up a Corpus of Anaphoric Reference for Fiction and Wikipedia Texts

Authors: Juntao Yu, Silviu Paun, Maris Camilleri, Paloma Carretero Garcia, Jon Chamberlain, Udo Kruschwitz, Massimo Poesio

Abstract: Although several datasets annotated for anaphoric reference/coreference exist, even the largest such datasets have limitations in terms of size, range of domains, coverage of anaphoric phenomena, and size of documents included. Yet, the approaches proposed to scale up anaphoric annotation haven't so far resulted in datasets overcoming these limitations. In this paper, we introduce a new release of… ▽ More Although several datasets annotated for anaphoric reference/coreference exist, even the largest such datasets have limitations in terms of size, range of domains, coverage of anaphoric phenomena, and size of documents included. Yet, the approaches proposed to scale up anaphoric annotation haven't so far resulted in datasets overcoming these limitations. In this paper, we introduce a new release of a corpus for anaphoric reference labelled via a game-with-a-purpose. This new release is comparable in size to the largest existing corpora for anaphoric reference due in part to substantial activity by the players, in part thanks to the use of a new resolve-and-aggregate paradigm to 'complete' markable annotations through the combination of an anaphoric resolver and an aggregation method for anaphoric reference. The proposed method could be adopted to greatly speed up annotation time in other projects involving games-with-a-purpose. In addition, the corpus covers genres for which no comparable size datasets exist (Fiction and Wikipedia); it covers singletons and non-referring expressions; and it includes a substantial number of long documents (> 2K in length). △ Less

Submitted 11 October, 2022; originally announced October 2022.

arXiv:2210.03459 [pdf, other]

Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization

Authors: Shota Horiguchi, Yuki Takashima, Shinji Watanabe, Paola Garcia

Abstract: Due to the high performance of multi-channel speech processing, we can use the outputs from a multi-channel model as teacher labels when training a single-channel model with knowledge distillation. To the contrary, it is also known that single-channel speech data can benefit multi-channel models by mixing it with multi-channel speech data during training or by using it for model pretraining. This… ▽ More Due to the high performance of multi-channel speech processing, we can use the outputs from a multi-channel model as teacher labels when training a single-channel model with knowledge distillation. To the contrary, it is also known that single-channel speech data can benefit multi-channel models by mixing it with multi-channel speech data during training or by using it for model pretraining. This paper focuses on speaker diarization and proposes to conduct the above bi-directional knowledge transfer alternately. We first introduce an end-to-end neural diarization model that can handle both single- and multi-channel inputs. Using this model, we alternately conduct i) knowledge distillation from a multi-channel model to a single-channel model and ii) finetuning from the distilled single-channel model to a multi-channel model. Experimental results on two-speaker data show that the proposed method mutually improved single- and multi-channel speaker diarization performances. △ Less

Submitted 7 October, 2022; originally announced October 2022.

Comments: Accepted to IEEE SLT 2022

arXiv:2210.03221 [pdf, other]

PQLM -- Multilingual Decentralized Portable Quantum Language Model for Privacy Protection

Authors: Shuyue Stella Li, Xiangyu Zhang, Shu Zhou, Hongchao Shu, Ruixing Liang, Hexin Liu, Leibny Paola Garcia

Abstract: With careful manipulation, malicious agents can reverse engineer private information encoded in pre-trained language models. Security concerns motivate the development of quantum pre-training. In this work, we propose a highly Portable Quantum Language Model (PQLM) that can easily transmit information to downstream tasks on classical machines. The framework consists of a cloud PQLM built with rand… ▽ More With careful manipulation, malicious agents can reverse engineer private information encoded in pre-trained language models. Security concerns motivate the development of quantum pre-training. In this work, we propose a highly Portable Quantum Language Model (PQLM) that can easily transmit information to downstream tasks on classical machines. The framework consists of a cloud PQLM built with random Variational Quantum Classifiers (VQC) and local models for downstream applications. We demonstrate the ad hoc portability of the quantum model by extracting only the word embeddings and effectively applying them to downstream tasks on classical machines. Our PQLM exhibits comparable performance to its classical counterpart on both intrinsic evaluation (loss, perplexity) and extrinsic evaluation (multilingual sentiment analysis accuracy) metrics. We also perform ablation studies on the factors affecting PQLM performance to analyze model stability. Our work establishes a theoretical foundation for a portable quantum pre-trained language model that could be trained on private data and made available for public use with privacy protection guarantees. △ Less

Submitted 26 February, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

Comments: 5 pages, 3 figures, 3 tables

arXiv:2209.12702 [pdf, other]

End-to-End Lyrics Recognition with Self-supervised Learning

Authors: Xiangyu Zhang, Shuyue Stella Li, Zhanhong He, Roberto Togneri, Leibny Paola Garcia

Abstract: Lyrics recognition is an important task in music processing. Despite traditional algorithms such as the hybrid HMM- TDNN model achieving good performance, studies on applying end-to-end models and self-supervised learning (SSL) are limited. In this paper, we first establish an end-to-end baseline for lyrics recognition and then explore the performance of SSL models on lyrics recognition task. We e… ▽ More Lyrics recognition is an important task in music processing. Despite traditional algorithms such as the hybrid HMM- TDNN model achieving good performance, studies on applying end-to-end models and self-supervised learning (SSL) are limited. In this paper, we first establish an end-to-end baseline for lyrics recognition and then explore the performance of SSL models on lyrics recognition task. We evaluate a variety of upstream SSL models with different training methods (masked reconstruction, masked prediction, autoregressive reconstruction, and contrastive learning). Our end-to-end self-supervised models, evaluated on the DAMP music dataset, outperform the previous state-of-the-art (SOTA) system by 5.23% for the dev set and 2.4% for the test set even without a language model trained by a large corpus. Moreover, we investigate the effect of background music on the performance of self-supervised learning models and conclude that the SSL models cannot extract features efficiently in the presence of background music. Finally, we study the out-of-domain generalization ability of the SSL features considering that those models were not trained on music datasets. △ Less

Submitted 26 October, 2022; v1 submitted 26 September, 2022; originally announced September 2022.

Comments: 4 pages, 2 figures, 3 tables

arXiv:2208.02693 [pdf, other]

doi 10.1080/01431161.2023.2197130

Relict landslide detection using Deep-Learning architectures for image segmentation in rainforest areas: A new framework

Authors: Guilherme P. B. Garcia, Carlos H. Grohmann, Lucas P. Soares, Mateus Espadoto

Abstract: Landslides are destructive and recurrent natural disasters on steep slopes and represent a risk to lives and properties. Knowledge of relict landslides location is vital to understand their mechanisms, update inventory maps and improve risk assessment. However, relict landslide mapping is complex in tropical regions covered with rainforest vegetation. A new CNN framework is proposed for semi-autom… ▽ More Landslides are destructive and recurrent natural disasters on steep slopes and represent a risk to lives and properties. Knowledge of relict landslides location is vital to understand their mechanisms, update inventory maps and improve risk assessment. However, relict landslide mapping is complex in tropical regions covered with rainforest vegetation. A new CNN framework is proposed for semi-automatic detection of relict landslides, which uses a dataset generated by a k-means clustering algorithm and has a pre-training step. The weights computed in the pre-training are used to fine-tune the CNN training process. A comparison between the proposed and the standard framework is performed using CBERS-04A WPM images. Three CNNs for semantic segmentation are used (Unet, FPN, Linknet) with two augmented datasets. A total of 42 combinations of CNNs are tested. Values of precision and recall were very similar between the combinations tested. Recall was higher than 75% for every combination, but precision values were usually smaller than 20%. False positives (FP) samples were addressed as the cause for these low precision values. Predictions of the proposed framework were more accurate and correctly detected more landslides. This work demonstrates that there are limitations for detecting relict landslides in areas covered with rainforest, mainly related to similarities between the spectral response of pastures and deforested areas with Gleichenella sp. ferns, commonly used as an indicator of landslide scars. △ Less

Submitted 29 May, 2023; v1 submitted 4 August, 2022; originally announced August 2022.

arXiv:2206.02432 [pdf, other]

doi 10.1109/TASLP.2022.3233237

Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors

Authors: Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yuki Takashima, Yohei Kawaguchi

Abstract: A method to perform offline and online speaker diarization for an unlimited number of speakers is described in this paper. End-to-end neural diarization (EEND) has achieved overlap-aware speaker diarization by formulating it as a multi-label classification problem. It has also been extended for a flexible number of speakers by introducing speaker-wise attractors. However, the output number of spea… ▽ More A method to perform offline and online speaker diarization for an unlimited number of speakers is described in this paper. End-to-end neural diarization (EEND) has achieved overlap-aware speaker diarization by formulating it as a multi-label classification problem. It has also been extended for a flexible number of speakers by introducing speaker-wise attractors. However, the output number of speakers of attractor-based EEND is empirically capped; it cannot deal with cases where the number of speakers appearing during inference is higher than that during training because its speaker counting is trained in a fully supervised manner. Our method, EEND-GLA, solves this problem by introducing unsupervised clustering into attractor-based EEND. In the method, the input audio is first divided into short blocks, then attractor-based diarization is performed for each block, and finally, the results of each block are clustered on the basis of the similarity between locally-calculated attractors. While the number of output speakers is limited within each block, the total number of speakers estimated for the entire input can be higher than the limitation. To use EEND-GLA in an online manner, our method also extends the speaker-tracing buffer, which was originally proposed to enable online inference of conventional EEND. We introduce a block-wise buffer update to make the speaker-tracing buffer compatible with EEND-GLA. Finally, to improve online diarization, our method improves the buffer update method and revisits the variable chunk-size training of EEND. The experimental results demonstrate that EEND-GLA can perform speaker diarization of an unseen number of speakers in both offline and online inferences. △ Less

Submitted 22 December, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

Comments: Accepted to IEEE/ACM TASLP

Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 706-720, 2023

arXiv:2201.04093 [pdf, other]

Systematic Literature Review: Quantum Machine Learning and its applications

Authors: David Peral García, Juan Cruz-Benito, Francisco José García-Peñalvo

Abstract: Quantum computing is the process of performing calculations using quantum mechanics. This field studies the quantum behavior of certain subatomic particles for subsequent use in performing calculations, as well as for large-scale information processing. These capabilities can give quantum computers an advantage in terms of computational time and cost over classical computers. Nowadays, there are s… ▽ More Quantum computing is the process of performing calculations using quantum mechanics. This field studies the quantum behavior of certain subatomic particles for subsequent use in performing calculations, as well as for large-scale information processing. These capabilities can give quantum computers an advantage in terms of computational time and cost over classical computers. Nowadays, there are scientific challenges that are impossible to perform by classical computation due to computational complexity or the time the calculation would take, and quantum computation is one of the possible answers. However, current quantum devices have not yet the necessary qubits and are not fault-tolerant enough to achieve these goals. Nonetheless, there are other fields like machine learning or chemistry where quantum computation could be useful with current quantum devices. This manuscript aims to present a Systematic Literature Review of the papers published between 2017 and 2023 to identify, analyze and classify the different algorithms used in quantum machine learning and their applications. Consequently, this study identified 94 articles that used quantum machine learning techniques and algorithms. The main types of found algorithms are quantum implementations of classical machine learning algorithms, such as support vector machines or the k-nearest neighbor model, and classical deep learning algorithms, like quantum neural networks. Many articles try to solve problems currently answered by classical machine learning but using quantum devices and algorithms. Even though results are promising, quantum machine learning is far from achieving its full potential. An improvement in the quantum hardware is required since the existing quantum computers lack enough quality, speed, and scale to allow quantum computing to achieve its full potential. △ Less

Submitted 6 December, 2023; v1 submitted 11 January, 2022; originally announced January 2022.

Comments: 28 pages, 25 figures

arXiv:2110.04694 [pdf, other]

Multi-Channel End-to-End Neural Diarization with Distributed Microphones

Authors: Shota Horiguchi, Yuki Takashima, Paola Garcia, Shinji Watanabe, Yohei Kawaguchi

Abstract: Recent progress on end-to-end neural diarization (EEND) has enabled overlap-aware speaker diarization with a single neural network. This paper proposes to enhance EEND by using multi-channel signals from distributed microphones. We replace Transformer encoders in EEND with two types of encoders that process a multi-channel input: spatio-temporal and co-attention encoders. Both are independent of t… ▽ More Recent progress on end-to-end neural diarization (EEND) has enabled overlap-aware speaker diarization with a single neural network. This paper proposes to enhance EEND by using multi-channel signals from distributed microphones. We replace Transformer encoders in EEND with two types of encoders that process a multi-channel input: spatio-temporal and co-attention encoders. Both are independent of the number and geometry of microphones and suitable for distributed microphone settings. We also propose a model adaptation method using only single-channel recordings. With simulated and real-recorded datasets, we demonstrated that the proposed method outperformed conventional EEND when a multi-channel input was given while maintaining comparable performance with a single-channel input. We also showed that the proposed method performed well even when spatial information is inoperative given multi-channel inputs, such as in hybrid meetings in which the utterances of multiple remote participants are played back from the same loudspeaker. △ Less

Submitted 28 March, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

Comments: Accepted to ICASSP 2022

arXiv:2109.11224 [pdf, other]

A Novel Open Set Energy-based Flow Classifier for Network Intrusion Detection

Authors: Manuela M. C. Souza, Camila Pontes, Joao Gondim, Luis P. F. Garcia, Luiz DaSilva, Marcelo A. Marotta

Abstract: Network intrusion detection systems (NIDS) are one of many solutions that make up a computer security system. Several machine learning-based NIDS have been proposed in recent years, but most of them were developed and evaluated under the assumption that the training context is similar to the test context. In real networks, this assumption is false, given the emergence of new attacks and variants o… ▽ More Network intrusion detection systems (NIDS) are one of many solutions that make up a computer security system. Several machine learning-based NIDS have been proposed in recent years, but most of them were developed and evaluated under the assumption that the training context is similar to the test context. In real networks, this assumption is false, given the emergence of new attacks and variants of known attacks. To deal with this reality, the open set recognition field, which is the most general task of recognizing classes not seen during training in any domain, began to gain importance in NIDS research. Yet, existing solutions are often bounded to high temporal complexities and performance bottlenecks. In this work, we propose an algorithm to be used in NIDS that performs open set recognition. Our proposal is an adaptation of the single-class Energy-based Flow Classifier (EFC), which proved to be an algorithm with strong generalization capability and low computational cost. The new version of EFC correctly classifies not only known attacks, but also unknown ones, and differs from other proposals from the literature by presenting a single layer with low temporal complexity. Our proposal was evaluated against well-established multi-class algorithms and as an open set classifier. It proved to be an accurate classifier in both evaluations, similar to the state of the art. As a conclusion of our work, we consider EFC a promising algorithm to be used in NIDS for its high performance and applicability in real networks. △ Less

Submitted 26 April, 2022; v1 submitted 23 September, 2021; originally announced September 2021.

arXiv:2109.05935 [pdf, other]

doi 10.1016/j.chaos.2022.112872

Modeling Systems with Machine Learning based Differential Equations

Authors: Pedro Garcia

Abstract: The prediction of behavior in dynamical systems, is frequently subject to the design of models. When a time series obtained from observing the system is available, the task can be performed by designing the model from these observations without additional assumptions or by assuming a preconceived structure in the model, with the help of additional information about the system. In the second case,… ▽ More The prediction of behavior in dynamical systems, is frequently subject to the design of models. When a time series obtained from observing the system is available, the task can be performed by designing the model from these observations without additional assumptions or by assuming a preconceived structure in the model, with the help of additional information about the system. In the second case, it is a question of adequately combining theory with observations and subsequently optimizing the mixture. In this work, we proposes the design of time-continuous models of dynamical systems as solutions of differential equations, from non-uniform sampled or noisy observations, using machine learning techniques. The performance of strategy is shown with both, several simulated data sets and experimental data from Hare-Lynx population and Coronavirus 2019 outbreack. Our results suggest that this approach to the modeling systems, can be an useful technique in the case of synthetic or experimental data. △ Less

Submitted 9 September, 2021; originally announced September 2021.

arXiv:2107.01545 [pdf, other]

Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors

Authors: Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yawen Xue, Yuki Takashima, Yohei Kawaguchi

Abstract: Attractor-based end-to-end diarization is achieving comparable accuracy to the carefully tuned conventional clustering-based methods on challenging datasets. However, the main drawback is that it cannot deal with the case where the number of speakers is larger than the one observed during training. This is because its speaker counting relies on supervised learning. In this work, we introduce an un… ▽ More Attractor-based end-to-end diarization is achieving comparable accuracy to the carefully tuned conventional clustering-based methods on challenging datasets. However, the main drawback is that it cannot deal with the case where the number of speakers is larger than the one observed during training. This is because its speaker counting relies on supervised learning. In this work, we introduce an unsupervised clustering process embedded in the attractor-based end-to-end diarization. We first split a sequence of frame-wise embeddings into short subsequences and then perform attractor-based diarization for each subsequence. Given subsequence-wise diarization results, inter-subsequence speaker correspondence is obtained by unsupervised clustering of the vectors computed from the attractors from all the subsequences. This makes it possible to produce diarization results of a large number of speakers for the whole recording even if the number of output speakers for each subsequence is limited. Experimental results showed that our method could produce accurate diarization results of an unseen number of speakers. Our method achieved 11.84 %, 28.33 %, and 19.49 % on the CALLHOME, DIHARD II, and DIHARD III datasets, respectively, each of which is better than the conventional end-to-end diarization methods. △ Less

Submitted 23 September, 2021; v1 submitted 4 July, 2021; originally announced July 2021.

Comments: Accepted to ASRU 2021

arXiv:2106.10654 [pdf, other]

doi 10.1109/TASLP.2022.3162080

Encoder-Decoder Based Attractors for End-to-End Neural Diarization

Authors: Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, Paola Garcia

Abstract: This paper investigates an end-to-end neural diarization (EEND) method for an unknown number of speakers. In contrast to the conventional cascaded approach to speaker diarization, EEND methods are better in terms of speaker overlap handling. However, EEND still has a disadvantage in that it cannot deal with a flexible number of speakers. To remedy this problem, we introduce encoder-decoder-based a… ▽ More This paper investigates an end-to-end neural diarization (EEND) method for an unknown number of speakers. In contrast to the conventional cascaded approach to speaker diarization, EEND methods are better in terms of speaker overlap handling. However, EEND still has a disadvantage in that it cannot deal with a flexible number of speakers. To remedy this problem, we introduce encoder-decoder-based attractor calculation module (EDA) to EEND. Once frame-wise embeddings are obtained, EDA sequentially generates speaker-wise attractors on the basis of a sequence-to-sequence method using an LSTM encoder-decoder. The attractor generation continues until a stopping condition is satisfied; thus, the number of attractors can be flexible. Diarization results are then estimated as dot products of the attractors and embeddings. The embeddings from speaker overlaps result in larger dot product values with multiple attractors; thus, this method can deal with speaker overlaps. Because the maximum number of output speakers is still limited by the training set, we also propose an iterative inference method to remove this restriction. Further, we propose a method that aligns the estimated diarization results with the results of an external speech activity detector, which enables fair comparison against cascaded approaches. Extensive evaluations on simulated and real datasets show that EEND-EDA outperforms the conventional cascaded approach. △ Less

Submitted 28 March, 2022; v1 submitted 20 June, 2021; originally announced June 2021.

Comments: Accepted to IEEE/ACM TASLP. This article is based on our previous conference paper arxiv:2005.09921

arXiv:2106.04764 [pdf, other]

Semi-Supervised Training with Pseudo-Labeling for End-to-End Neural Diarization

Authors: Yuki Takashima, Yusuke Fujita, Shota Horiguchi, Shinji Watanabe, Paola García, Kenji Nagamatsu

Abstract: In this paper, we present a semi-supervised training technique using pseudo-labeling for end-to-end neural diarization (EEND). The EEND system has shown promising performance compared with traditional clustering-based methods, especially in the case of overlapping speech. However, to get a well-tuned model, EEND requires labeled data for all the joint speech activities of every speaker at each tim… ▽ More In this paper, we present a semi-supervised training technique using pseudo-labeling for end-to-end neural diarization (EEND). The EEND system has shown promising performance compared with traditional clustering-based methods, especially in the case of overlapping speech. However, to get a well-tuned model, EEND requires labeled data for all the joint speech activities of every speaker at each time frame in a recording. In this paper, we explore a pseudo-labeling approach that employs unlabeled data. First, we propose an iterative pseudo-label method for EEND, which trains the model using unlabeled data of a target condition. Then, we also propose a committee-based training method to improve the performance of EEND. To evaluate our proposed method, we conduct the experiments of model adaptation using labeled and unlabeled data. Experimental results on the CALLHOME dataset show that our proposed pseudo-label achieved a 37.4% relative diarization error rate reduction compared to a seed model. Moreover, we analyzed the results of semi-supervised adaptation with pseudo-labeling. We also show the effectiveness of our approach on the third DIHARD dataset. △ Less

Submitted 8 June, 2021; originally announced June 2021.

Comments: Accepted for Interspeech 2021

arXiv:2106.04078 [pdf, other]

End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection

Authors: Yuki Takashima, Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Paola García, Kenji Nagamatsu

Abstract: In this paper, we present a conditional multitask learning method for end-to-end neural speaker diarization (EEND). The EEND system has shown promising performance compared with traditional clustering-based methods, especially in the case of overlapping speech. In this paper, to further improve the performance of the EEND system, we propose a novel multitask learning framework that solves speaker… ▽ More In this paper, we present a conditional multitask learning method for end-to-end neural speaker diarization (EEND). The EEND system has shown promising performance compared with traditional clustering-based methods, especially in the case of overlapping speech. In this paper, to further improve the performance of the EEND system, we propose a novel multitask learning framework that solves speaker diarization and a desired subtask while explicitly considering the task dependency. We optimize speaker diarization conditioned on speech activity and overlap detection that are subtasks of speaker diarization, based on the probabilistic chain rule. Experimental results show that our proposed method can leverage a subtask to effectively model speaker diarization, and outperforms conventional EEND systems in terms of diarization error rate. △ Less

Submitted 7 June, 2021; originally announced June 2021.

Comments: Accepted for SLT 2021

Journal ref: IEEE Spoken Language Technology Workshop (SLT), 2021, pp. 849-856

arXiv:2103.06732 [pdf]

doi 10.48011/asba.v2i1.1144

Swarm Robots in Agriculture

Authors: Daniel Albiero, Angel Pontin Garcia, Claudio Kiyoshi Umezu, Rodrigo Leme de Paulo

Abstract: Agricultural mechanization is an area of knowledge that has evolved a lot over the past century, its main actors being agricultural tractors that, in 100 years, have increased their powers by 3,300%. This evolution has resulted in an exponential increase in the field capacity of such machines. However, it has also generated negative results such as excessive consumption of fossil fuel, excessive w… ▽ More Agricultural mechanization is an area of knowledge that has evolved a lot over the past century, its main actors being agricultural tractors that, in 100 years, have increased their powers by 3,300%. This evolution has resulted in an exponential increase in the field capacity of such machines. However, it has also generated negative results such as excessive consumption of fossil fuel, excessive weight on the soil, very high operating costs, and millionaire acquisition value. This paper aims to present an antiparadigmatic alternative in this area. It is proposing a swarm of small electric robotic tractors that together have the same field capacity as a large tractor with an internal combustion engine. A comparison of costs and field capacity between a 270 kW tractor and a swarm of ten swarm tractors of 24 kW each was carried out. The result demonstrated a wide advantage for the small robot team. It was also proposed the preliminary design of an electric swarm robot tractor. Finally, research challenges were suggested to operationalize such a proposal, calling on the Brazilian Robotics Research Community to elaborate a roadmap for research in the area of swarm robot for mechanized agricultural operations. △ Less

Submitted 11 March, 2021; originally announced March 2021.

Comments: Paper published in Brazilian Congress of Automatic. Porto Alegre, 2020

arXiv:2102.01363 [pdf, other]

The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap

Authors: Shota Horiguchi, Nelson Yalta, Paola Garcia, Yuki Takashima, Yawen Xue, Desh Raj, Zili Huang, Yusuke Fujita, Shinji Watanabe, Sanjeev Khudanpur

Abstract: This paper provides a detailed description of the Hitachi-JHU system that was submitted to the Third DIHARD Speech Diarization Challenge. The system outputs the ensemble results of the five subsystems: two x-vector-based subsystems, two end-to-end neural diarization-based subsystems, and one hybrid subsystem. We refine each system and all five subsystems become competitive and complementary. After… ▽ More This paper provides a detailed description of the Hitachi-JHU system that was submitted to the Third DIHARD Speech Diarization Challenge. The system outputs the ensemble results of the five subsystems: two x-vector-based subsystems, two end-to-end neural diarization-based subsystems, and one hybrid subsystem. We refine each system and all five subsystems become competitive and complementary. After the DOVER-Lap based system combination, it achieved diarization error rates of 11.58 % and 14.09 % in Track 1 full and core, and 16.94 % and 20.01 % in Track 2 full and core, respectively. With their results, we won second place in all the tasks of the challenge. △ Less

Submitted 2 February, 2021; originally announced February 2021.

arXiv:2101.08473 [pdf, other]

Online Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers

Authors: Yawen Xue, Shota Horiguchi, Yusuke Fujita, Yuki Takashima, Shinji Watanabe, Paola Garcia, Kenji Nagamatsu

Abstract: We propose a streaming diarization method based on an end-to-end neural diarization (EEND) model, which handles flexible numbers of speakers and overlapping speech. In our previous study, the speaker-tracing buffer (STB) mechanism was proposed to achieve a chunk-wise streaming diarization using a pre-trained EEND model. STB traces the speaker information in previous chunks to map the speakers in a… ▽ More We propose a streaming diarization method based on an end-to-end neural diarization (EEND) model, which handles flexible numbers of speakers and overlapping speech. In our previous study, the speaker-tracing buffer (STB) mechanism was proposed to achieve a chunk-wise streaming diarization using a pre-trained EEND model. STB traces the speaker information in previous chunks to map the speakers in a new chunk. However, it only worked with two-speaker recordings. In this paper, we propose an extended STB for flexible numbers of speakers, FLEX-STB. The proposed method uses a zero-padding followed by speaker-tracing, which alleviates the difference in the number of speakers between a buffer and a current chunk. We also examine buffer update strategies to select important frames for tracing multiple speakers. Experiments on CALLHOME and DIHARD II datasets show that the proposed method achieves comparable performance to the offline EEND method with 1-second latency. The results also show that our proposed method outperforms recently proposed chunk-wise diarization methods based on EEND (BW-EDA-EEND). △ Less

Submitted 6 April, 2021; v1 submitted 21 January, 2021; originally announced January 2021.

arXiv:2012.10055 [pdf, other]

End-to-End Speaker Diarization as Post-Processing

Authors: Shota Horiguchi, Paola Garcia, Yusuke Fujita, Shinji Watanabe, Kenji Nagamatsu

Abstract: This paper investigates the utilization of an end-to-end diarization model as post-processing of conventional clustering-based diarization. Clustering-based diarization methods partition frames into clusters of the number of speakers; thus, they typically cannot handle overlapping speech because each frame is assigned to one speaker. On the other hand, some end-to-end diarization methods can handl… ▽ More This paper investigates the utilization of an end-to-end diarization model as post-processing of conventional clustering-based diarization. Clustering-based diarization methods partition frames into clusters of the number of speakers; thus, they typically cannot handle overlapping speech because each frame is assigned to one speaker. On the other hand, some end-to-end diarization methods can handle overlapping speech by treating the problem as multi-label classification. Although some methods can treat a flexible number of speakers, they do not perform well when the number of speakers is large. To compensate for each other's weakness, we propose to use a two-speaker end-to-end diarization method as post-processing of the results obtained by a clustering-based method. We iteratively select two speakers from the results and update the results of the two speakers to improve the overlapped region. Experimental results show that the proposed algorithm consistently improved the performance of the state-of-the-art methods across CALLHOME, AMI, and DIHARD II datasets. △ Less

Submitted 23 December, 2020; v1 submitted 18 December, 2020; originally announced December 2020.

arXiv:2008.06004 [pdf, other]

doi 10.1145/3372297.3421761

Déjà Vu: Side-Channel Analysis of Mozilla's NSS

Authors: Sohaib ul Hassan, Iaroslav Gridin, Ignacio M. Delgado-Lozano, Cesar Pereida García, Jesús-Javier Chi-Domínguez, Alejandro Cabrera Aldaya, Billy Bob Brumley

Abstract: Recent work on Side Channel Analysis (SCA) targets old, well-known vulnerabilities, even previously exploited, reported, and patched in high-profile cryptography libraries. Nevertheless, researchers continue to find and exploit the same vulnerabilities in old and new products, highlighting a big issue among vendors: effectively tracking and fixing security vulnerabilities when disclosure is not do… ▽ More Recent work on Side Channel Analysis (SCA) targets old, well-known vulnerabilities, even previously exploited, reported, and patched in high-profile cryptography libraries. Nevertheless, researchers continue to find and exploit the same vulnerabilities in old and new products, highlighting a big issue among vendors: effectively tracking and fixing security vulnerabilities when disclosure is not done directly to them. In this work, we present another instance of this issue by performing the first library-wide SCA security evaluation of Mozilla's NSS security library. We use a combination of two independently-developed SCA security frameworks to identify and test security vulnerabilities. Our evaluation uncovers several new vulnerabilities in NSS affecting DSA, ECDSA, and RSA cryptosystems. We exploit said vulnerabilities and implement key recovery attacks using signals---extracted through different techniques such as timing, microarchitecture, and EM---and improved lattice methods. △ Less

Submitted 13 August, 2020; originally announced August 2020.

Comments: To appear at ACM CCS 2020

arXiv:2008.02931 [pdf, other]

doi 10.4204/EPTCS.320.4

From Big-Step to Small-Step Semantics and Back with Interpreter Specialisation

Authors: John P. Gallagher, Manuel Hermenegildo, Bishoksan Kafle, Maximiliano Klemen, Pedro López García, José Morales

Abstract: We investigate representations of imperative programs as constrained Horn clauses. Starting from operational semantics transition rules, we proceed by writing interpreters as constrained Horn clause programs directly encoding the rules. We then specialise an interpreter with respect to a given source program to achieve a compilation of the source language to Horn clauses (an instance of the first… ▽ More We investigate representations of imperative programs as constrained Horn clauses. Starting from operational semantics transition rules, we proceed by writing interpreters as constrained Horn clause programs directly encoding the rules. We then specialise an interpreter with respect to a given source program to achieve a compilation of the source language to Horn clauses (an instance of the first Futamura projection). The process is described in detail for an interpreter for a subset of C, directly encoding the rules of big-step operational semantics for C. A similar translation based on small-step semantics could be carried out, but we show an approach to obtaining a small-step representation using a linear interpreter for big-step Horn clauses. This interpreter is again specialised to achieve the translation from big-step to small-step style. The linear small-step program can be transformed back to a big-step non-linear program using a third interpreter. A regular path expression is computed for the linear program using Tarjan's algorithm, and this regular expression then guides an interpreter to compute a program path. The transformation is realised by specialisation of the path interpreter. In all of the transformation phases, we use an established partial evaluator and exploit standard logic program transformation to remove redundant data structures and arguments in predicates and rename predicates to make clear their link to statements in the original source program. △ Less

Submitted 6 August, 2020; originally announced August 2020.

Comments: In Proceedings VPT/HCVS 2020, arXiv:2008.02483

Journal ref: EPTCS 320, 2020, pp. 50-64

arXiv:2007.05963 [pdf, other]

doi 10.1016/j.ssci.2021.105215

CellEVAC: An adaptive guidance system for crowd evacuation through behavioral optimization

Authors: Miguel A. Lopez-Carmona, Alvaro Paricio Garcia

Abstract: A critical aspect of crowds' evacuation processes is the dynamism of individual decision making. Here, we investigate how to favor a coordinated group dynamic through optimal exit-choice instructions using behavioral strategy optimization. We propose and evaluate an adaptive guidance system (Cell-based Crowd Evacuation, CellEVAC) that dynamically allocates colors to cells in a cell-based pedestria… ▽ More A critical aspect of crowds' evacuation processes is the dynamism of individual decision making. Here, we investigate how to favor a coordinated group dynamic through optimal exit-choice instructions using behavioral strategy optimization. We propose and evaluate an adaptive guidance system (Cell-based Crowd Evacuation, CellEVAC) that dynamically allocates colors to cells in a cell-based pedestrian positioning infrastructure, to provide efficient exit-choice indications. The operational module of CellEVAC implements an optimized discrete-choice model that integrates the influential factors that would make evacuees adapt their exit choice. To optimize the model, we used a simulation-optimization modeling framework that integrates microscopic pedestrian simulation based on the classical Social Force Model. We paid particular attention to safety by using Pedestrian Fundamental Diagrams that model the dynamics of the exit gates. CellEVAC has been tested in a simulated real scenario (Madrid Arena) under different external pedestrian flow patterns that simulate complex pedestrian interactions. Results showed that CellEVAC outperforms evacuation processes in which the system is not used, with an exponential improvement as interactions become complex. We compared our system with an existing approach based on Cartesian Genetic Programming. Our system exhibited a better overall performance in terms of safety, evacuation time, and the number of revisions of exit-choice decisions. Further analyses also revealed that Cartesian Genetic Programming generates less natural pedestrian reactions and movements than CellEVAC. The fact that the decision logic module is built upon a behavioral model seems to favor a more natural and effective response. We also found that our proposal has a positive influence on evacuations even for a low compliance rate (40%). △ Less

Submitted 18 May, 2021; v1 submitted 12 July, 2020; originally announced July 2020.

Comments: 47 pages, 26 figures

ACM Class: I.6.4

Showing 1–50 of 63 results for author: García, P