subscribe to arXiv mailings

Modality Agnostic Heterogeneous Face Recognition with Switch Style Modulators

Authors: Anjith George, Sebastien Marcel

Abstract: Heterogeneous Face Recognition (HFR) systems aim to enhance the capability of face recognition in challenging cross-modal authentication scenarios. However, the significant domain gap between the source and target modalities poses a considerable challenge for cross-domain matching. Existing literature primarily focuses on developing HFR approaches for specific pairs of face modalities, necessitati… ▽ More Heterogeneous Face Recognition (HFR) systems aim to enhance the capability of face recognition in challenging cross-modal authentication scenarios. However, the significant domain gap between the source and target modalities poses a considerable challenge for cross-domain matching. Existing literature primarily focuses on developing HFR approaches for specific pairs of face modalities, necessitating the explicit training of models for each source-target combination. In this work, we introduce a novel framework designed to train a modality-agnostic HFR method capable of handling multiple modalities during inference, all without explicit knowledge of the target modality labels. We achieve this by implementing a computationally efficient automatic routing mechanism called Switch Style Modulation Blocks (SSMB) that trains various domain expert modulators which transform the feature maps adaptively reducing the domain gap. Our proposed SSMB can be trained end-to-end and seamlessly integrated into pre-trained face recognition models, transforming them into modality-agnostic HFR models. We have performed extensive evaluations on HFR benchmark datasets to demonstrate its effectiveness. The source code and protocols will be made publicly available. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 8 pages

arXiv:2406.15639 [pdf, other]

Low Fidelity Visuo-Tactile Pretraining Improves Vision-Only Manipulation Performance

Authors: Selam Gano, Abraham George, Amir Barati Farimani

Abstract: Tactile perception is a critical component of solving real-world manipulation tasks, but tactile sensors for manipulation have barriers to use such as fragility and cost. In this work, we engage a robust, low-cost tactile sensor, BeadSight, as an alternative to precise pre-calibrated sensors for a pretraining approach to manipulation. We show that tactile pretraining, even with a low-fidelity sens… ▽ More Tactile perception is a critical component of solving real-world manipulation tasks, but tactile sensors for manipulation have barriers to use such as fragility and cost. In this work, we engage a robust, low-cost tactile sensor, BeadSight, as an alternative to precise pre-calibrated sensors for a pretraining approach to manipulation. We show that tactile pretraining, even with a low-fidelity sensor as BeadSight, can improve an imitation learning agent's performance on complex manipulation tasks. We demonstrate this method against a baseline USB cable plugging task, previously achieved with a much higher precision GelSight sensor as the tactile input to pretraining. Our best BeadSight pretrained visuo-tactile agent completed the task with 70\% accuracy compared to 85\% for the best GelSight pretrained visuo-tactile agent, with vision-only inference for both. △ Less

Submitted 25 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

arXiv:2405.13204 [pdf, other]

BeadSight: An Inexpensive Tactile Sensor Using Hydro-Gel Beads

Authors: Abraham George, Yibo Chen, Atharva Dikshit, Peter Pak, Amir Barati Farimani

Abstract: In robotic manipulation, tactile sensors are indispensable, especially when dealing with soft objects, objects of varying dimensions, or those out of the robot's direct line of sight. Traditional tactile sensors often grapple with challenges related to cost and durability. To address these issues, our study introduces a novel approach to visuo-tactile sensing with an emphasis on economy and replac… ▽ More In robotic manipulation, tactile sensors are indispensable, especially when dealing with soft objects, objects of varying dimensions, or those out of the robot's direct line of sight. Traditional tactile sensors often grapple with challenges related to cost and durability. To address these issues, our study introduces a novel approach to visuo-tactile sensing with an emphasis on economy and replacablity. Our proposed sensor, BeadSight, uses hydro-gel beads encased in a vinyl bag as an economical, easily replaceable sensing medium. When the sensor makes contact with a surface, the deformation of the hydrogel beads is observed using a rear camera. This observation is then passed through a U-net Neural Network to predict the forces acting on the surface of the bead bag, in the form of a pressure map. Our results show that the sensor can accurately predict these pressure maps, detecting the location and magnitude of forces applied to the surface. These abilities make BeadSight an effective, inexpensive, and easily replaceable tactile sensor, ideal for many robotics applications. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: BeadSight code is available at: https://github.com/Abraham190137/BeadSight 7 pages, 8 figures

arXiv:2404.14343 [pdf, other]

Heterogeneous Face Recognition Using Domain Invariant Units

Authors: Anjith George, Sebastien Marcel

Abstract: Heterogeneous Face Recognition (HFR) aims to expand the applicability of Face Recognition (FR) systems to challenging scenarios, enabling the matching of face images across different domains, such as matching thermal images to visible spectra. However, the development of HFR systems is challenging because of the significant domain gap between modalities and the lack of availability of large-scale… ▽ More Heterogeneous Face Recognition (HFR) aims to expand the applicability of Face Recognition (FR) systems to challenging scenarios, enabling the matching of face images across different domains, such as matching thermal images to visible spectra. However, the development of HFR systems is challenging because of the significant domain gap between modalities and the lack of availability of large-scale paired multi-channel data. In this work, we leverage a pretrained face recognition model as a teacher network to learn domaininvariant network layers called Domain-Invariant Units (DIU) to reduce the domain gap. The proposed DIU can be trained effectively even with a limited amount of paired training data, in a contrastive distillation framework. This proposed approach has the potential to enhance pretrained models, making them more adaptable to a wider range of variations in data. We extensively evaluate our approach on multiple challenging benchmarks, demonstrating superior performance compared to state-of-the-art methods. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 6 pages, Accepted ICASSP 2024

arXiv:2404.14247 [pdf, other]

doi 10.1109/TBIOM.2024.3365350

From Modalities to Styles: Rethinking the Domain Gap in Heterogeneous Face Recognition

Authors: Anjith George, Sebastien Marcel

Abstract: Heterogeneous Face Recognition (HFR) focuses on matching faces from different domains, for instance, thermal to visible images, making Face Recognition (FR) systems more versatile for challenging scenarios. However, the domain gap between these domains and the limited large-scale datasets in the target HFR modalities make it challenging to develop robust HFR models from scratch. In our work, we vi… ▽ More Heterogeneous Face Recognition (HFR) focuses on matching faces from different domains, for instance, thermal to visible images, making Face Recognition (FR) systems more versatile for challenging scenarios. However, the domain gap between these domains and the limited large-scale datasets in the target HFR modalities make it challenging to develop robust HFR models from scratch. In our work, we view different modalities as distinct styles and propose a method to modulate feature maps of the target modality to address the domain gap. We present a new Conditional Adaptive Instance Modulation (CAIM ) module that seamlessly fits into existing FR networks, turning them into HFR-ready systems. The CAIM block modulates intermediate feature maps, efficiently adapting to the style of the source modality and bridging the domain gap. Our method enables end-to-end training using a small set of paired samples. We extensively evaluate the proposed approach on various challenging HFR benchmarks, showing that it outperforms state-of-the-art methods. The source code and protocols for reproducing the findings will be made publicly available △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: Accepted for publication in IEEE TBIOM

arXiv:2404.10378 [pdf, other]

Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data

Authors: Ivan DeAndres-Tame, Ruben Tolosana, Pietro Melzi, Ruben Vera-Rodriguez, Minchul Kim, Christian Rathgeb, Xiaoming Liu, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Zhizhou Zhong, Yuge Huang, Yuxi Mi, Shouhong Ding, Shuigeng Zhou, Shuai He, Lingzhi Fu, Heng Cong, Rongyu Zhang, Zhihong Xiao, Evgeny Smirnov, Anton Pimenov, Aleksei Grigorev, Denis Timoshenko, Kaleb Mesfin Asfaw , et al. (33 additional authors not shown)

Abstract: Synthetic data is gaining increasing relevance for training machine learning models. This is mainly motivated due to several factors such as the lack of real data and intra-class variability, time and errors produced in manual labeling, and in some cases privacy concerns, among others. This paper presents an overview of the 2nd edition of the Face Recognition Challenge in the Era of Synthetic Data… ▽ More Synthetic data is gaining increasing relevance for training machine learning models. This is mainly motivated due to several factors such as the lack of real data and intra-class variability, time and errors produced in manual labeling, and in some cases privacy concerns, among others. This paper presents an overview of the 2nd edition of the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn) organized at CVPR 2024. FRCSyn aims to investigate the use of synthetic data in face recognition to address current technological limitations, including data privacy concerns, demographic biases, generalization to novel scenarios, and performance constraints in challenging situations such as aging, pose variations, and occlusions. Unlike the 1st edition, in which synthetic data from DCFace and GANDiffFace methods was only allowed to train face recognition systems, in this 2nd edition we propose new sub-tasks that allow participants to explore novel face generative methods. The outcomes of the 2nd FRCSyn Challenge, along with the proposed experimental protocol and benchmarking contribute significantly to the application of synthetic data to face recognition. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: arXiv admin note: text overlap with arXiv:2311.10476

Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRw 2024)

arXiv:2404.04580 [pdf, other]

SDFR: Synthetic Data for Face Recognition Competition

Authors: Hatef Otroshi Shahreza, Christophe Ecabert, Anjith George, Alexander Unnervik, Sébastien Marcel, Nicolò Di Domenico, Guido Borghi, Davide Maltoni, Fadi Boutros, Julia Vogel, Naser Damer, Ángela Sánchez-Pérez, EnriqueMas-Candela, Jorge Calvo-Zaragoza, Bernardo Biesseck, Pedro Vidal, Roger Granada, David Menotti, Ivan DeAndres-Tame, Simone Maurizio La Cava, Sara Concas, Pietro Melzi, Ruben Tolosana, Ruben Vera-Rodriguez, Gianpaolo Perelli , et al. (3 additional authors not shown)

Abstract: Large-scale face recognition datasets are collected by crawling the Internet and without individuals' consent, raising legal, ethical, and privacy concerns. With the recent advances in generative models, recently several works proposed generating synthetic face recognition datasets to mitigate concerns in web-crawled face recognition datasets. This paper presents the summary of the Synthetic Data… ▽ More Large-scale face recognition datasets are collected by crawling the Internet and without individuals' consent, raising legal, ethical, and privacy concerns. With the recent advances in generative models, recently several works proposed generating synthetic face recognition datasets to mitigate concerns in web-crawled face recognition datasets. This paper presents the summary of the Synthetic Data for Face Recognition (SDFR) Competition held in conjunction with the 18th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2024) and established to investigate the use of synthetic data for training face recognition models. The SDFR competition was split into two tasks, allowing participants to train face recognition systems using new synthetic datasets and/or existing ones. In the first task, the face recognition backbone was fixed and the dataset size was limited, while the second task provided almost complete freedom on the model backbone, the dataset, and the training pipeline. The submitted models were trained on existing and also new synthetic datasets and used clever methods to improve training with synthetic data. The submissions were evaluated and ranked on a diverse set of seven benchmarking datasets. The paper gives an overview of the submitted face recognition models and reports achieved performance compared to baseline models trained on real and synthetic datasets. Furthermore, the evaluation of submissions is extended to bias assessment across different demography groups. Lastly, an outlook on the current state of the research in training face recognition models using synthetic data is presented, and existing problems as well as potential future directions are also discussed. △ Less

Submitted 9 April, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

Comments: The 18th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2024)

arXiv:2403.11898 [pdf, other]

Visuo-Tactile Pretraining for Cable Plugging

Authors: Abraham George, Selam Gano, Pranav Katragadda, Amir Barati Farimani

Abstract: Tactile information is a critical tool for fine-grain manipulation. As humans, we rely heavily on tactile information to understand objects in our environments and how to interact with them. We use touch not only to perform manipulation tasks but also to learn how to perform these tasks. Therefore, to create robotic agents that can learn to complete manipulation tasks at a human or super-human lev… ▽ More Tactile information is a critical tool for fine-grain manipulation. As humans, we rely heavily on tactile information to understand objects in our environments and how to interact with them. We use touch not only to perform manipulation tasks but also to learn how to perform these tasks. Therefore, to create robotic agents that can learn to complete manipulation tasks at a human or super-human level of performance, we need to properly incorporate tactile information into both skill execution and skill learning. In this paper, we investigate how we can incorporate tactile information into imitation learning platforms to improve performance on complex tasks. To do this, we tackle the challenge of plugging in a USB cable, a dexterous manipulation task that relies on fine-grain visuo-tactile serving. By incorporating tactile information into imitation learning frameworks, we are able to train a robotic agent to plug in a USB cable - a first for imitation learning. Additionally, we explore how tactile information can be used to train non-tactile agents through a contrastive-loss pretraining process. Our results show that by pretraining with tactile information, the performance of a non-tactile agent can be significantly improved, reaching a level on par with visuo-tactile agents. For demonstration videos and access to our codebase, see the project website: https://sites.google.com/andrew.cmu.edu/visuo-tactile-cable-plugging/home △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 8 pages, 6 figures, submitted to IROS 2024

arXiv:2403.08086 [pdf, other]

Flow-Based Visual Stream Compression for Event Cameras

Authors: Daniel C. Stumpp, Himanshu Akolkar, Alan D. George, Ryad Benosman

Abstract: As the use of neuromorphic, event-based vision sensors expands, the need for compression of their output streams has increased. While their operational principle ensures event streams are spatially sparse, the high temporal resolution of the sensors can result in high data rates from the sensor depending on scene dynamics. For systems operating in communication-bandwidth-constrained and power-cons… ▽ More As the use of neuromorphic, event-based vision sensors expands, the need for compression of their output streams has increased. While their operational principle ensures event streams are spatially sparse, the high temporal resolution of the sensors can result in high data rates from the sensor depending on scene dynamics. For systems operating in communication-bandwidth-constrained and power-constrained environments, it is essential to compress these streams before transmitting them to a remote receiver. Therefore, we introduce a flow-based method for the real-time asynchronous compression of event streams as they are generated. This method leverages real-time optical flow estimates to predict future events without needing to transmit them, therefore, drastically reducing the amount of data transmitted. The flow-based compression introduced is evaluated using a variety of methods including spatiotemporal distance between event streams. The introduced method itself is shown to achieve an average compression ratio of 2.81 on a variety of event-camera datasets with the evaluation configuration used. That compression is achieved with a median temporal error of 0.48 ms and an average spatiotemporal event-stream distance of 3.07. When combined with LZMA compression for non-real-time applications, our method can achieve state-of-the-art average compression ratios ranging from 10.45 to 17.24. Additionally, we demonstrate that the proposed prediction algorithm is capable of performing real-time, low-latency event prediction. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 13 pages, 7 figures, 2 tables

arXiv:2402.18718 [pdf, other]

Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks

Authors: Alexander Unnervik, Hatef Otroshi Shahreza, Anjith George, Sébastien Marcel

Abstract: Backdoor attacks allow an attacker to embed a specific vulnerability in a machine learning algorithm, activated when an attacker-chosen pattern is presented, causing a specific misprediction. The need to identify backdoors in biometric scenarios has led us to propose a novel technique with different trade-offs. In this paper we propose to use model pairs on open-set classification tasks for detect… ▽ More Backdoor attacks allow an attacker to embed a specific vulnerability in a machine learning algorithm, activated when an attacker-chosen pattern is presented, causing a specific misprediction. The need to identify backdoors in biometric scenarios has led us to propose a novel technique with different trade-offs. In this paper we propose to use model pairs on open-set classification tasks for detecting backdoors. Using a simple linear operation to project embeddings from a probe model's embedding space to a reference model's embedding space, we can compare both embeddings and compute a similarity score. We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures, having been trained independently and on different datasets. Additionally, we show that backdoors can be detected even when both models are backdoored. The source code is made available for reproducibility purposes. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: Under review

arXiv:2401.10681 [pdf, other]

Maximizing Real-Time Video QoE via Bandwidth Sharing under Markovian setting

Authors: Sushi Anna George, Vinay Joseph

Abstract: We consider the problem of optimizing Quality of Experience (QoE) of clients streaming real-time video, served by networks managed by different operators that can share bandwidth with each other. The abundance of real-time video traffic is evident in the popularity of applications like video conferencing and video streaming of live events, which have increased significantly since the recent pandem… ▽ More We consider the problem of optimizing Quality of Experience (QoE) of clients streaming real-time video, served by networks managed by different operators that can share bandwidth with each other. The abundance of real-time video traffic is evident in the popularity of applications like video conferencing and video streaming of live events, which have increased significantly since the recent pandemic. We model the problem as a joint optimization of resource allocation for the clients and bandwidth sharing across the operators, with special attention to how the resource allocation impacts clients' perceived video quality. We propose an online policy as a solution, which involves dynamically sharing a portion of one operator's bandwidth with another operator. We provide strong theoretical optimality guarantees for the policy. We also use extensive simulations to demonstrate the policy's substantial performance improvements (of up to ninety percent), and identify insights into key system parameters (e.g., imbalance in arrival rates or channel conditions of the operators) that dictate the improvements. △ Less

Submitted 26 January, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2211.06666

arXiv:2312.11663 [pdf, other]

Eliciting Kemeny Rankings

Authors: Anne-Marie George, Christos Dimitrakakis

Abstract: We formulate the problem of eliciting agents' preferences with the goal of finding a Kemeny ranking as a Dueling Bandits problem. Here the bandits' arms correspond to alternatives that need to be ranked and the feedback corresponds to a pairwise comparison between alternatives by a randomly sampled agent. We consider both sampling with and without replacement, i.e., the possibility to ask the same… ▽ More We formulate the problem of eliciting agents' preferences with the goal of finding a Kemeny ranking as a Dueling Bandits problem. Here the bandits' arms correspond to alternatives that need to be ranked and the feedback corresponds to a pairwise comparison between alternatives by a randomly sampled agent. We consider both sampling with and without replacement, i.e., the possibility to ask the same agent about some comparison multiple times or not. We find approximation bounds for Kemeny rankings dependant on confidence intervals over estimated winning probabilities of arms. Based on these we state algorithms to find Probably Approximately Correct (PAC) solutions and elaborate on their sample complexity for sampling with or without replacement. Furthermore, if all agents' preferences are strict rankings over the alternatives, we provide means to prune confidence intervals and thereby guide a more efficient elicitation. We formulate several adaptive sampling methods that use look-aheads to estimate how much confidence intervals (and thus approximation guarantees) might be tightened. All described methods are compared on synthetic data. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: This is a long version of the AAAI'24 publication under the same title

arXiv:2312.10857 [pdf, ps, other]

Minimal Macro-Based Rewritings of Formal Languages: Theory and Applications in Ontology Engineering (and beyond)

Authors: Christian Kindermann, Anne-Marie George, Bijan Parsia, Uli Sattler

Abstract: In this paper, we introduce the problem of rewriting finite formal languages using syntactic macros such that the rewriting is minimal in size. We present polynomial-time algorithms to solve variants of this problem and show their correctness. To demonstrate the practical relevance of the proposed problems and the feasibility and effectiveness of our algorithms in practice, we apply these to biome… ▽ More In this paper, we introduce the problem of rewriting finite formal languages using syntactic macros such that the rewriting is minimal in size. We present polynomial-time algorithms to solve variants of this problem and show their correctness. To demonstrate the practical relevance of the proposed problems and the feasibility and effectiveness of our algorithms in practice, we apply these to biomedical ontologies authored in OWL. We find that such rewritings can significantly reduce the size of ontologies by capturing repeated expressions with macros. In addition to offering valuable assistance in enhancing ontology quality and comprehension, the presented approach introduces a systematic way of analysing and evaluating features of rewriting systems (including syntactic macros, templates, or other forms of rewriting rules) in terms of their influence on computational problems. △ Less

Submitted 17 December, 2023; originally announced December 2023.

Comments: Extended paper (including supplementary material) accepted at The 38th Annual AAAI Conference on Artificial Intelligence

arXiv:2311.10476 [pdf, other]

FRCSyn Challenge at WACV 2024:Face Recognition Challenge in the Era of Synthetic Data

Authors: Pietro Melzi, Ruben Tolosana, Ruben Vera-Rodriguez, Minchul Kim, Christian Rathgeb, Xiaoming Liu, Ivan DeAndres-Tame, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Weisong Zhao, Xiangyu Zhu, Zheyu Yan, Xiao-Yu Zhang, Jinlin Wu, Zhen Lei, Suvidha Tripathi, Mahak Kothari, Md Haider Zama, Debayan Deb, Bernardo Biesseck, Pedro Vidal, Roger Granada, Guilherme Fickel, Gustavo Führ , et al. (22 additional authors not shown)

Abstract: Despite the widespread adoption of face recognition technology around the world, and its remarkable performance on current benchmarks, there are still several challenges that must be covered in more detail. This paper offers an overview of the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn) organized at WACV 2024. This is the first international challenge aiming to explore the use… ▽ More Despite the widespread adoption of face recognition technology around the world, and its remarkable performance on current benchmarks, there are still several challenges that must be covered in more detail. This paper offers an overview of the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn) organized at WACV 2024. This is the first international challenge aiming to explore the use of synthetic data in face recognition to address existing limitations in the technology. Specifically, the FRCSyn Challenge targets concerns related to data privacy issues, demographic biases, generalization to unseen scenarios, and performance limitations in challenging scenarios, including significant age disparities between enrollment and testing, pose variations, and occlusions. The results achieved in the FRCSyn Challenge, together with the proposed benchmark, contribute significantly to the application of synthetic data to improve face recognition technology. △ Less

Submitted 17 November, 2023; originally announced November 2023.

Comments: 10 pages, 1 figure, WACV 2024 Workshops

arXiv:2309.10175 [pdf, other]

One ACT Play: Single Demonstration Behavior Cloning with Action Chunking Transformers

Authors: Abraham George, Amir Barati Farimani

Abstract: Learning from human demonstrations (behavior cloning) is a cornerstone of robot learning. However, most behavior cloning algorithms require a large number of demonstrations to learn a task, especially for general tasks that have a large variety of initial conditions. Humans, however, can learn to complete tasks, even complex ones, after only seeing one or two demonstrations. Our work seeks to emul… ▽ More Learning from human demonstrations (behavior cloning) is a cornerstone of robot learning. However, most behavior cloning algorithms require a large number of demonstrations to learn a task, especially for general tasks that have a large variety of initial conditions. Humans, however, can learn to complete tasks, even complex ones, after only seeing one or two demonstrations. Our work seeks to emulate this ability, using behavior cloning to learn a task given only a single human demonstration. We achieve this goal by using linear transforms to augment the single demonstration, generating a set of trajectories for a wide range of initial conditions. With these demonstrations, we are able to train a behavior cloning agent to successfully complete three block manipulation tasks. Additionally, we developed a novel addition to the temporal ensembling method used by action chunking agents during inference. By incorporating the standard deviation of the action predictions into the ensembling method, our approach is more robust to unforeseen changes in the environment, resulting in significant performance improvements. △ Less

Submitted 18 September, 2023; originally announced September 2023.

Comments: 7 pages, 6 figures

arXiv:2309.08892 [pdf, other]

Pour me a drink: Robotic Precision Pouring Carbonated Beverages into Transparent Containers

Authors: Feiya Zhu, Shuo Hu, Letian Leng, Alison Bartsch, Abraham George, Amir Barati Farimani

Abstract: With the growing emphasis on the development and integration of service robots within household environments, we will need to endow robots with the ability to reliably pour a variety of liquids. However, liquid handling and pouring is a challenging task due to the complex dynamics and varying properties of different liquids, the exacting precision required to prevent spills and ensure accurate pou… ▽ More With the growing emphasis on the development and integration of service robots within household environments, we will need to endow robots with the ability to reliably pour a variety of liquids. However, liquid handling and pouring is a challenging task due to the complex dynamics and varying properties of different liquids, the exacting precision required to prevent spills and ensure accurate pouring, and the necessity for robots to adapt seamlessly to a multitude of containers in real-world scenarios. In response to these challenges, we propose a novel autonomous robotics pipeline that empowers robots to execute precision pouring tasks, encompassing both carbonated and non-carbonated liquids, as well as opaque and transparent liquids, into a variety of transparent containers. Our proposed approach maximizes the potential of RGB input alone, achieving zero-shot capability by harnessing existing pre-trained vision segmentation models. This eliminates the need for additional data collection, manual image annotations, or extensive training. Furthermore, our work integrates ChatGPT, facilitating seamless interaction between individuals without prior expertise in robotics and our pouring pipeline, this integration enables users to effortlessly request and execute pouring actions. Our experiments demonstrate the pipeline's capability to successfully pour a diverse range of carbonated and non-carbonated beverages into containers of varying sizes, relying solely on visual input. △ Less

Submitted 19 September, 2023; v1 submitted 16 September, 2023; originally announced September 2023.

Comments: Supplementary materials will be available soon

arXiv:2308.14852 [pdf, other]

SynthDistill: Face Recognition with Knowledge Distillation from Synthetic Data

Authors: Hatef Otroshi Shahreza, Anjith George, Sébastien Marcel

Abstract: State-of-the-art face recognition networks are often computationally expensive and cannot be used for mobile applications. Training lightweight face recognition models also requires large identity-labeled datasets. Meanwhile, there are privacy and ethical concerns with collecting and using large face recognition datasets. While generating synthetic datasets for training face recognition models is… ▽ More State-of-the-art face recognition networks are often computationally expensive and cannot be used for mobile applications. Training lightweight face recognition models also requires large identity-labeled datasets. Meanwhile, there are privacy and ethical concerns with collecting and using large face recognition datasets. While generating synthetic datasets for training face recognition models is an alternative option, it is challenging to generate synthetic data with sufficient intra-class variations. In addition, there is still a considerable gap between the performance of models trained on real and synthetic data. In this paper, we propose a new framework (named SynthDistill) to train lightweight face recognition models by distilling the knowledge of a pretrained teacher face recognition model using synthetic data. We use a pretrained face generator network to generate synthetic face images and use the synthesized images to learn a lightweight student network. We use synthetic face images without identity labels, mitigating the problems in the intra-class variation generation of synthetic datasets. Instead, we propose a novel dynamic sampling strategy from the intermediate latent space of the face generator network to include new variations of the challenging images while further exploring new face images in the training batch. The results on five different face recognition datasets demonstrate the superiority of our lightweight model compared to models trained on previous synthetic datasets, achieving a verification accuracy of 99.52% on the LFW dataset with a lightweight network. The results also show that our proposed framework significantly reduces the gap between training with real and synthetic data. The source code for replicating the experiments is publicly released. △ Less

Submitted 28 August, 2023; originally announced August 2023.

Comments: Accepted in the IEEE International Joint Conference on Biometrics (IJCB 2023)

arXiv:2308.04168 [pdf, other]

EFaR 2023: Efficient Face Recognition Competition

Authors: Jan Niklas Kolf, Fadi Boutros, Jurek Elliesen, Markus Theuerkauf, Naser Damer, Mohamad Alansari, Oussama Abdul Hay, Sara Alansari, Sajid Javed, Naoufel Werghi, Klemen Grm, Vitomir Štruc, Fernando Alonso-Fernandez, Kevin Hernandez Diaz, Josef Bigun, Anjith George, Christophe Ecabert, Hatef Otroshi Shahreza, Ketan Kotwal, Sébastien Marcel, Iurii Medvedev, Bo Jin, Diogo Nunes, Ahmad Hassanpour, Pankaj Khatiwada , et al. (2 additional authors not shown)

Abstract: This paper presents the summary of the Efficient Face Recognition Competition (EFaR) held at the 2023 International Joint Conference on Biometrics (IJCB 2023). The competition received 17 submissions from 6 different teams. To drive further development of efficient face recognition models, the submitted solutions are ranked based on a weighted score of the achieved verification accuracies on a div… ▽ More This paper presents the summary of the Efficient Face Recognition Competition (EFaR) held at the 2023 International Joint Conference on Biometrics (IJCB 2023). The competition received 17 submissions from 6 different teams. To drive further development of efficient face recognition models, the submitted solutions are ranked based on a weighted score of the achieved verification accuracies on a diverse set of benchmarks, as well as the deployability given by the number of floating-point operations and model size. The evaluation of submissions is extended to bias, cross-quality, and large-scale recognition benchmarks. Overall, the paper gives an overview of the achieved performance values of the submitted solutions as well as a diverse set of baselines. The submitted solutions use small, efficient network architectures to reduce the computational cost, some solutions apply model quantization. An outlook on possible techniques that are underrepresented in current solutions is given as well. △ Less

Submitted 8 August, 2023; originally announced August 2023.

Comments: Accepted at IJCB 2023

arXiv:2308.02715 [pdf, other]

Fluid Viscosity Prediction Leveraging Computer Vision and Robot Interaction

Authors: Jong Hoon Park, Gauri Pramod Dalwankar, Alison Bartsch, Abraham George, Amir Barati Farimani

Abstract: Accurately determining fluid viscosity is crucial for various industrial and scientific applications. Traditional methods of viscosity measurement, though reliable, often require manual intervention and cannot easily adapt to real-time monitoring. With advancements in machine learning and computer vision, this work explores the feasibility of predicting fluid viscosity by analyzing fluid oscillati… ▽ More Accurately determining fluid viscosity is crucial for various industrial and scientific applications. Traditional methods of viscosity measurement, though reliable, often require manual intervention and cannot easily adapt to real-time monitoring. With advancements in machine learning and computer vision, this work explores the feasibility of predicting fluid viscosity by analyzing fluid oscillations captured in video data. The pipeline employs a 3D convolutional autoencoder pretrained in a self-supervised manner to extract and learn features from semantic segmentation masks of oscillating fluids. Then, the latent representations of the input data, produced from the pretrained autoencoder, is processed with a distinct inference head to infer either the fluid category (classification) or the fluid viscosity (regression) in a time-resolved manner. When the latent representations generated by the pretrained autoencoder are used for classification, the system achieves a 97.1% accuracy across a total of 4,140 test datapoints. Similarly, for regression tasks, employing an additional fully-connected network as a regression head allows the pipeline to achieve a mean absolute error of 0.258 over 4,416 test datapoints. This study represents an innovative contribution to both fluid characterization and the evolving landscape of Artificial Intelligence, demonstrating the potential of deep learning in achieving near real-time viscosity estimation and addressing practical challenges in fluid dynamics through the analysis of video data capturing oscillating fluid dynamics. △ Less

Submitted 2 December, 2023; v1 submitted 4 August, 2023; originally announced August 2023.

Comments: 12 pages, 7 figures

arXiv:2307.13159 [pdf, other]

RoboChop: Autonomous Framework for Fruit and Vegetable Chopping Leveraging Foundational Models

Authors: Atharva Dikshit, Alison Bartsch, Abraham George, Amir Barati Farimani

Abstract: With the goal of developing fully autonomous cooking robots, developing robust systems that can chop a wide variety of objects is important. Existing approaches focus primarily on the low-level dynamics of the cutting action, which overlooks some of the practical real-world challenges of implementing autonomous cutting systems. In this work we propose an autonomous framework to sequence together a… ▽ More With the goal of developing fully autonomous cooking robots, developing robust systems that can chop a wide variety of objects is important. Existing approaches focus primarily on the low-level dynamics of the cutting action, which overlooks some of the practical real-world challenges of implementing autonomous cutting systems. In this work we propose an autonomous framework to sequence together action primitives for the purpose of chopping fruits and vegetables on a cluttered cutting board. We present a novel technique to leverage vision foundational models SAM and YOLO to accurately detect, segment, and track fruits and vegetables as they visually change through the sequences of chops, finetuning YOLO on a novel dataset of whole and chopped fruits and vegetables. In our experiments, we demonstrate that our simple pipeline is able to reliably chop a variety of fruits and vegetables ranging in size, appearance, and texture, meeting a variety of chopping specifications, including fruit type, number of slices, and types of slices. △ Less

Submitted 24 July, 2023; originally announced July 2023.

arXiv:2307.07032 [pdf, other]

Bridging the Gap: Heterogeneous Face Recognition with Conditional Adaptive Instance Modulation

Authors: Anjith George, Sebastien Marcel

Abstract: Heterogeneous Face Recognition (HFR) aims to match face images across different domains, such as thermal and visible spectra, expanding the applicability of Face Recognition (FR) systems to challenging scenarios. However, the domain gap and limited availability of large-scale datasets in the target domain make training robust and invariant HFR models from scratch difficult. In this work, we treat… ▽ More Heterogeneous Face Recognition (HFR) aims to match face images across different domains, such as thermal and visible spectra, expanding the applicability of Face Recognition (FR) systems to challenging scenarios. However, the domain gap and limited availability of large-scale datasets in the target domain make training robust and invariant HFR models from scratch difficult. In this work, we treat different modalities as distinct styles and propose a framework to adapt feature maps, bridging the domain gap. We introduce a novel Conditional Adaptive Instance Modulation (CAIM) module that can be integrated into pre-trained FR networks, transforming them into HFR networks. The CAIM block modulates intermediate feature maps, to adapt the style of the target modality effectively bridging the domain gap. Our proposed method allows for end-to-end training with a minimal number of paired samples. We extensively evaluate our approach on multiple challenging benchmarks, demonstrating superior performance compared to state-of-the-art methods. The source code and protocols for reproducing the findings will be made publicly available. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: Accepted for publication in IJCB 2023

arXiv:2307.01838 [pdf, other]

EdgeFace: Efficient Face Recognition Model for Edge Devices

Authors: Anjith George, Christophe Ecabert, Hatef Otroshi Shahreza, Ketan Kotwal, Sebastien Marcel

Abstract: In this paper, we present EdgeFace, a lightweight and efficient face recognition network inspired by the hybrid architecture of EdgeNeXt. By effectively combining the strengths of both CNN and Transformer models, and a low rank linear layer, EdgeFace achieves excellent face recognition performance optimized for edge devices. The proposed EdgeFace network not only maintains low computational costs… ▽ More In this paper, we present EdgeFace, a lightweight and efficient face recognition network inspired by the hybrid architecture of EdgeNeXt. By effectively combining the strengths of both CNN and Transformer models, and a low rank linear layer, EdgeFace achieves excellent face recognition performance optimized for edge devices. The proposed EdgeFace network not only maintains low computational costs and compact storage, but also achieves high face recognition accuracy, making it suitable for deployment on edge devices. Extensive experiments on challenging benchmark face datasets demonstrate the effectiveness and efficiency of EdgeFace in comparison to state-of-the-art lightweight models and deep face recognition models. Our EdgeFace model with 1.77M parameters achieves state of the art results on LFW (99.73%), IJB-B (92.67%), and IJB-C (94.85%), outperforming other efficient models with larger computational complexities. The code to replicate the experiments will be made available publicly. △ Less

Submitted 12 January, 2024; v1 submitted 4 July, 2023; originally announced July 2023.

Comments: 11 pages, Accepted for publication in IEEE Transactions on Biometrics, Behavior, and Identity Science

arXiv:2305.15003 [pdf, other]

doi 10.3233/FAIA230349

Feasible Action-Space Reduction as a Metric of Causal Responsibility in Multi-Agent Spatial Interactions

Authors: Ashwin George, Luciano Cavalcante Siebert, David Abbink, Arkady Zgonnikov

Abstract: Modelling causal responsibility in multi-agent spatial interactions is crucial for safety and efficiency of interactions of humans with autonomous agents. However, current formal metrics and models of responsibility either lack grounding in ethical and philosophical concepts of responsibility, or cannot be applied to spatial interactions. In this work we propose a metric of causal responsibility w… ▽ More Modelling causal responsibility in multi-agent spatial interactions is crucial for safety and efficiency of interactions of humans with autonomous agents. However, current formal metrics and models of responsibility either lack grounding in ethical and philosophical concepts of responsibility, or cannot be applied to spatial interactions. In this work we propose a metric of causal responsibility which is tailored to multi-agent spatial interactions, for instance interactions in traffic. In such interactions, a given agent can, by reducing another agent's feasible action space, influence the latter. Therefore, we propose feasible action space reduction (FeAR) as a metric of causal responsibility among agents. Specifically, we look at ex-post causal responsibility for simultaneous actions. We propose the use of Moves de Rigueur (MdR) - a consistent set of prescribed actions for agents - to model the effect of norms on responsibility allocation. We apply the metric in a grid world simulation for spatial interactions and show how the actions, contexts, and norms affect the causal responsibility ascribed to agents. Finally, we demonstrate the application of this metric in complex multi-agent interactions. We argue that the FeAR metric is a step towards an interdisciplinary framework for quantifying responsibility that is needed to ensure safety and meaningful human control in human-AI systems. △ Less

Submitted 13 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

arXiv:2305.09765 [pdf, other]

OpenVR: Teleoperation for Manipulation

Authors: Abraham George, Alison Bartsch, Amir Barati Farimani

Abstract: Across the robotics field, quality demonstrations are an integral part of many control pipelines. However, collecting high-quality demonstration trajectories remains time-consuming and difficult, often resulting in the number of demonstrations being the performance bottleneck. To address this issue, we present a method of Virtual Reality (VR) Teleoperation that uses an Oculus VR headset to teleope… ▽ More Across the robotics field, quality demonstrations are an integral part of many control pipelines. However, collecting high-quality demonstration trajectories remains time-consuming and difficult, often resulting in the number of demonstrations being the performance bottleneck. To address this issue, we present a method of Virtual Reality (VR) Teleoperation that uses an Oculus VR headset to teleoperate a Franka Emika Panda robot. Although other VR teleoperation methods exist, our code is open source, designed for readily available consumer hardware, easy to modify, agnostic to experimental setup, and simple to use. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: 8 pages, 8 figures, GitHub: https://github.com/Abraham190137/TeleoperationUnity

arXiv:2212.09980 [pdf, ps, other]

Continual Mean Estimation Under User-Level Privacy

Authors: Anand Jerry George, Lekshmi Ramesh, Aditya Vikram Singh, Himanshu Tyagi

Abstract: We consider the problem of continually releasing an estimate of the population mean of a stream of samples that is user-level differentially private (DP). At each time instant, a user contributes a sample, and the users can arrive in arbitrary order. Until now these requirements of continual release and user-level privacy were considered in isolation. But, in practice, both these requirements come… ▽ More We consider the problem of continually releasing an estimate of the population mean of a stream of samples that is user-level differentially private (DP). At each time instant, a user contributes a sample, and the users can arrive in arbitrary order. Until now these requirements of continual release and user-level privacy were considered in isolation. But, in practice, both these requirements come together as the users often contribute data repeatedly and multiple queries are made. We provide an algorithm that outputs a mean estimate at every time instant $t$ such that the overall release is user-level $\varepsilon$-DP and has the following error guarantee: Denoting by $M_t$ the maximum number of samples contributed by a user, as long as $\tildeΩ(1/\varepsilon)$ users have $M_t/2$ samples each, the error at time $t$ is $\tilde{O}(1/\sqrt{t}+\sqrt{M}_t/t\varepsilon)$. This is a universal error guarantee which is valid for all arrival patterns of the users. Furthermore, it (almost) matches the existing lower bounds for the single-release setting at all time instants when users have contributed equal number of samples. △ Less

Submitted 19 December, 2022; originally announced December 2022.

arXiv:2211.07383 [pdf, other]

doi 10.1109/ACCESS.2023.3282780

Attacking Face Recognition with T-shirts: Database, Vulnerability Assessment and Detection

Authors: M. Ibsen, C. Rathgeb, F. Brechtel, R. Klepp, K. Pöppelmann, A. George, S. Marcel, C. Busch

Abstract: Face recognition systems are widely deployed for biometric authentication. Despite this, it is well-known that, without any safeguards, face recognition systems are highly vulnerable to presentation attacks. In response to this security issue, several promising methods for detecting presentation attacks have been proposed which show high performance on existing benchmarks. However, an ongoing chal… ▽ More Face recognition systems are widely deployed for biometric authentication. Despite this, it is well-known that, without any safeguards, face recognition systems are highly vulnerable to presentation attacks. In response to this security issue, several promising methods for detecting presentation attacks have been proposed which show high performance on existing benchmarks. However, an ongoing challenge is the generalization of presentation attack detection methods to unseen and new attack types. To this end, we propose a new T-shirt Face Presentation Attack (TFPA) database of 1,608 T-shirt attacks using 100 unique presentation attack instruments. In an extensive evaluation, we show that this type of attack can compromise the security of face recognition systems and that some state-of-the-art attack detection mechanisms trained on popular benchmarks fail to robustly generalize to the new attacks. Further, we propose three new methods for detecting T-shirt attack images, one which relies on the statistical differences between depth maps of bona fide images and T-shirt attacks, an anomaly detection approach trained on features only extracted from bona fide RGB images, and a fusion approach which achieves competitive detection performance. △ Less

Submitted 14 November, 2022; originally announced November 2022.

arXiv:2211.06666 [pdf, other]

Optimizing Bandwidth Sharing for Real-time Traffic in Wireless Networks

Authors: Sushi Anna George, Vinay Joseph

Abstract: We consider the problem of enhancing the delivery of real-time traffic in wireless networks using bandwidth sharing between operators. A key characteristic of real-time traffic is that a packet has to be delivered within a delay deadline for it to be useful. The abundance of real-time traffic is evident in the popularity of applications like video and audio conferencing, which increased significan… ▽ More We consider the problem of enhancing the delivery of real-time traffic in wireless networks using bandwidth sharing between operators. A key characteristic of real-time traffic is that a packet has to be delivered within a delay deadline for it to be useful. The abundance of real-time traffic is evident in the popularity of applications like video and audio conferencing, which increased significantly during the COVID-19 period. We propose a sharing and scheduling policy which involves dynamically sharing a portion of one operator's bandwidth with another operator. We provide strong theoretical guarantees for the policy. We also evaluate its performance via extensive simulations, which show significant improvements of up to 90% in the ability to carry real-time traffic when using the policy. We also explore how the improvements from bandwidth sharing depend on the amount of sharing, and on additional traffic characteristics. △ Less

Submitted 24 November, 2022; v1 submitted 12 November, 2022; originally announced November 2022.

arXiv:2210.06529 [pdf, other]

Prepended Domain Transformer: Heterogeneous Face Recognition without Bells and Whistles

Authors: Anjith George, Amir Mohammadi, Sebastien Marcel

Abstract: Heterogeneous Face Recognition (HFR) refers to matching face images captured in different domains, such as thermal to visible images (VIS), sketches to visible images, near-infrared to visible, and so on. This is particularly useful in matching visible spectrum images to images captured from other modalities. Though highly useful, HFR is challenging because of the domain gap between the source and… ▽ More Heterogeneous Face Recognition (HFR) refers to matching face images captured in different domains, such as thermal to visible images (VIS), sketches to visible images, near-infrared to visible, and so on. This is particularly useful in matching visible spectrum images to images captured from other modalities. Though highly useful, HFR is challenging because of the domain gap between the source and target domain. Often, large-scale paired heterogeneous face image datasets are absent, preventing training models specifically for the heterogeneous task. In this work, we propose a surprisingly simple, yet, very effective method for matching face images across different sensing modalities. The core idea of the proposed approach is to add a novel neural network block called Prepended Domain Transformer (PDT) in front of a pre-trained face recognition (FR) model to address the domain gap. Retraining this new block with few paired samples in a contrastive learning setup was enough to achieve state-of-the-art performance in many HFR benchmarks. The PDT blocks can be retrained for several source-target combinations using the proposed general framework. The proposed approach is architecture agnostic, meaning they can be added to any pre-trained FR models. Further, the approach is modular and the new block can be trained with a minimal set of paired samples, making it much easier for practical deployment. The source code and protocols will be made available publicly. △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: 16 pages. Accepted for publication in IEEE TIFS

arXiv:2209.11275 [pdf, other]

Minimizing Human Assistance: Augmenting a Single Demonstration for Deep Reinforcement Learning

Authors: Abraham George, Alison Bartsch, Amir Barati Farimani

Abstract: The use of human demonstrations in reinforcement learning has proven to significantly improve agent performance. However, any requirement for a human to manually 'teach' the model is somewhat antithetical to the goals of reinforcement learning. This paper attempts to minimize human involvement in the learning process while retaining the performance advantages by using a single human example collec… ▽ More The use of human demonstrations in reinforcement learning has proven to significantly improve agent performance. However, any requirement for a human to manually 'teach' the model is somewhat antithetical to the goals of reinforcement learning. This paper attempts to minimize human involvement in the learning process while retaining the performance advantages by using a single human example collected through a simple-to-use virtual reality simulation to assist with RL training. Our method augments a single demonstration to generate numerous human-like demonstrations that, when combined with Deep Deterministic Policy Gradients and Hindsight Experience Replay (DDPG + HER) significantly improve training time on simple tasks and allows the agent to solve a complex task (block stacking) that DDPG + HER alone cannot solve. The model achieves this significant training advantage using a single human example, requiring less than a minute of human input. Moreover, despite learning from a human example, the agent is not constrained to human-level performance, often learning a policy that is significantly different from the human demonstration. △ Less

Submitted 18 March, 2023; v1 submitted 22 September, 2022; originally announced September 2022.

Comments: 7 pages, 10 figures, ICRA 2023 (accepted)

arXiv:2205.07488 [pdf, other]

Robust Testing in High-Dimensional Sparse Models

Authors: Anand Jerry George, Clément L. Canonne

Abstract: We consider the problem of robustly testing the norm of a high-dimensional sparse signal vector under two different observation models. In the first model, we are given $n$ i.i.d. samples from the distribution $\mathcal{N}\left(θ,I_d\right)$ (with unknown $θ$), of which a small fraction has been arbitrarily corrupted. Under the promise that $\|θ\|_0\le s$, we want to correctly distinguish whether… ▽ More We consider the problem of robustly testing the norm of a high-dimensional sparse signal vector under two different observation models. In the first model, we are given $n$ i.i.d. samples from the distribution $\mathcal{N}\left(θ,I_d\right)$ (with unknown $θ$), of which a small fraction has been arbitrarily corrupted. Under the promise that $\|θ\|_0\le s$, we want to correctly distinguish whether $\|θ\|_2=0$ or $\|θ\|_2>γ$, for some input parameter $γ>0$. We show that any algorithm for this task requires $n=Ω\left(s\log\frac{ed}{s}\right)$ samples, which is tight up to logarithmic factors. We also extend our results to other common notions of sparsity, namely, $\|θ\|_q\le s$ for any $0 < q < 2$. In the second observation model that we consider, the data is generated according to a sparse linear regression model, where the covariates are i.i.d. Gaussian and the regression coefficient (signal) is known to be $s$-sparse. Here too we assume that an $ε$-fraction of the data is arbitrarily corrupted. We show that any algorithm that reliably tests the norm of the regression coefficient requires at least $n=Ω\left(\min(s\log d,{1}/{γ^4})\right)$ samples. Our results show that the complexity of testing in these two settings significantly increases under robustness constraints. This is in line with the recent observations made in robust mean testing and robust covariance testing. △ Less

Submitted 4 November, 2022; v1 submitted 16 May, 2022; originally announced May 2022.

Comments: Fixed typos, added a figure and discussion section

arXiv:2204.14094 [pdf, ps, other]

Single-Peaked Opinion Updates

Authors: Robert Bredereck, Anne-Marie George, Jonas Israel, Leon Kellerhals

Abstract: We consider opinion diffusion for undirected networks with sequential updates when the opinions of the agents are single-peaked preference rankings. Our starting point is the study of preserving single-peakedness. We identify voting rules that, when given a single-peaked profile, output at least one ranking that is single peaked w.r.t. a single-peaked axis of the input. For such voting rules we sh… ▽ More We consider opinion diffusion for undirected networks with sequential updates when the opinions of the agents are single-peaked preference rankings. Our starting point is the study of preserving single-peakedness. We identify voting rules that, when given a single-peaked profile, output at least one ranking that is single peaked w.r.t. a single-peaked axis of the input. For such voting rules we show convergence to a stable state of the diffusion process that uses the voting rule as the agents' update rule. Further, we establish an efficient algorithm that maximises the spread of extreme opinions. △ Less

Submitted 29 April, 2022; originally announced April 2022.

Comments: Accepted at IJCAI 2022

arXiv:2202.10286 [pdf, other]

A Comprehensive Evaluation on Multi-channel Biometric Face Presentation Attack Detection

Authors: Anjith George, David Geissbuhler, Sebastien Marcel

Abstract: The vulnerability against presentation attacks is a crucial problem undermining the wide-deployment of face recognition systems. Though presentation attack detection (PAD) systems try to address this problem, the lack of generalization and robustness continues to be a major concern. Several works have shown that using multi-channel PAD systems could alleviate this vulnerability and result in more… ▽ More The vulnerability against presentation attacks is a crucial problem undermining the wide-deployment of face recognition systems. Though presentation attack detection (PAD) systems try to address this problem, the lack of generalization and robustness continues to be a major concern. Several works have shown that using multi-channel PAD systems could alleviate this vulnerability and result in more robust systems. However, there is a wide selection of channels available for a PAD system such as RGB, Near Infrared, Shortwave Infrared, Depth, and Thermal sensors. Having a lot of sensors increases the cost of the system, and therefore an understanding of the performance of different sensors against a wide variety of attacks is necessary while selecting the modalities. In this work, we perform a comprehensive study to understand the effectiveness of various imaging modalities for PAD. The studies are performed on a multi-channel PAD dataset, collected with 14 different sensing modalities considering a wide range of 2D, 3D, and partial attacks. We used the multi-channel convolutional network-based architecture, which uses pixel-wise binary supervision. The model has been evaluated with different combinations of channels, and different image qualities on a variety of challenging known and unknown attack protocols. The results reveal interesting trends and can act as pointers for sensor selection for safety-critical presentation attack detection systems. The source codes and protocols to reproduce the results are made available publicly making it possible to extend this work to other architectures. △ Less

Submitted 21 February, 2022; originally announced February 2022.

Comments: 16 pages, 11 images

arXiv:2112.07509 [pdf, ps, other]

Liquid Democracy with Ranked Delegations

Authors: Markus Brill, Théo Delemazure, Anne-Marie George, Martin Lackner, Ulrike Schmidt-Kraepelin

Abstract: Liquid democracy is a novel paradigm for collective decision-making that gives agents the choice between casting a direct vote or delegating their vote to another agent. We consider a generalization of the standard liquid democracy setting by allowing agents to specify multiple potential delegates, together with a preference ranking among them. This generalization increases the number of possible… ▽ More Liquid democracy is a novel paradigm for collective decision-making that gives agents the choice between casting a direct vote or delegating their vote to another agent. We consider a generalization of the standard liquid democracy setting by allowing agents to specify multiple potential delegates, together with a preference ranking among them. This generalization increases the number of possible delegation paths and enables higher participation rates because fewer votes are lost due to delegation cycles or abstaining agents. In order to implement this generalization of liquid democracy, we need to find a principled way of choosing between multiple delegation paths. In this paper, we provide a thorough axiomatic analysis of the space of delegation rules, i.e., functions assigning a feasible delegation path to each delegating agent. In particular, we prove axiomatic characterizations as well as an impossibility result for delegation rules. We also analyze requirements on delegation rules that have been suggested by practitioners, and introduce novel rules with attractive properties. By performing an extensive experimental analysis on synthetic as well as real-world data, we compare delegation rules with respect to several quantitative criteria relating to the chosen paths and the resulting distribution of voting power. Our experiments reveal that delegation rules can be aligned on a spectrum reflecting an inherent trade-off between competing objectives. △ Less

Submitted 14 December, 2021; originally announced December 2021.

Comments: Accepted at AAAI 2022

arXiv:2112.07170 [pdf, other]

Performance evaluation of the QOS provisioning ability of IEEE 802.11e WLAN standard for multimedia traffic

Authors: Venkata Sitaram. A, Venkatesh. T. G, Arun George, Manivasakan. R, Bhasker Dappuri

Abstract: This paper presents an analytical model for the average frame transmission delay and the jitter for the different Access Categories (ACs) of the IEEE 802.11e Enhanced Distributed Channel Access (EDCA) mechanism. Following are the salient features of our model. As defined by the standard we consider (1) the virtual collisions among different ACs inside each EDCA station in addition to external coll… ▽ More This paper presents an analytical model for the average frame transmission delay and the jitter for the different Access Categories (ACs) of the IEEE 802.11e Enhanced Distributed Channel Access (EDCA) mechanism. Following are the salient features of our model. As defined by the standard we consider (1) the virtual collisions among different ACs inside each EDCA station in addition to external collisions. (2) the effect of priority parameters, such as minimum and maximum values of Contention Window (CW) sizes, Arbitration Inter Frame Space (AIFS). (3) the role of Transmission Opportunity (TXOP) of different ACs. (4) the finite number of retrials a packet experiences before being dropped. Our model and analytical results provide an in-depth understanding of the EDCA mechanism and the effect of Quality of Service (QoS) parameters in the performance of IEEE 802.11e protocol. △ Less

Submitted 14 December, 2021; originally announced December 2021.

arXiv:2112.06772 [pdf, other]

doi 10.1109/ACCESS.2022.3172396

hARMS: A Hardware Acceleration Architecture for Real-Time Event-Based Optical Flow

Authors: Daniel C. Stumpp, Himanshu Akolkar, Alan D. George, Ryad B. Benosman

Abstract: Event-based vision sensors produce asynchronous event streams with high temporal resolution based on changes in the visual scene. The properties of these sensors allow for accurate and fast calculation of optical flow as events are generated. Existing solutions for calculating optical flow from event data either fail to capture the true direction of motion due to the aperture problem, do not use t… ▽ More Event-based vision sensors produce asynchronous event streams with high temporal resolution based on changes in the visual scene. The properties of these sensors allow for accurate and fast calculation of optical flow as events are generated. Existing solutions for calculating optical flow from event data either fail to capture the true direction of motion due to the aperture problem, do not use the high temporal resolution of the sensor, or are too computationally expensive to be run in real time on embedded platforms. In this research, we first present a faster version of our previous algorithm, ARMS (Aperture Robust Multi-Scale flow). The new optimized software version (fARMS) significantly improves throughput on a traditional CPU. Further, we present hARMS, a hardware realization of the fARMS algorithm allowing for real-time computation of true flow on low-power, embedded platforms. The proposed hARMS architecture targets hybrid system-on-chip devices and was designed to maximize configurability and throughput. The hardware architecture and fARMS algorithm were developed with asynchronous neuromorphic processing in mind, abandoning the common use of an event frame and instead operating using only a small history of relevant events, allowing latency to scale independently of the sensor resolution. This change in processing paradigm improved the estimation of flow directions by up to 73% compared to the existing method and yielded a demonstrated hARMS throughput of up to 1.21 Mevent/s on the benchmark configuration selected. This throughput enables real-time performance and makes it the fastest known realization of aperture-robust, event-based optical flow to date. △ Less

Submitted 13 December, 2021; originally announced December 2021.

Comments: 18 pages, 16 figures, 4 tables

arXiv:2111.04698 [pdf, other]

Interactive Inverse Reinforcement Learning for Cooperative Games

Authors: Thomas Kleine Buening, Anne-Marie George, Christos Dimitrakakis

Abstract: We study the problem of designing autonomous agents that can learn to cooperate effectively with a potentially suboptimal partner while having no access to the joint reward function. This problem is modeled as a cooperative episodic two-agent Markov decision process. We assume control over only the first of the two agents in a Stackelberg formulation of the game, where the second agent is acting s… ▽ More We study the problem of designing autonomous agents that can learn to cooperate effectively with a potentially suboptimal partner while having no access to the joint reward function. This problem is modeled as a cooperative episodic two-agent Markov decision process. We assume control over only the first of the two agents in a Stackelberg formulation of the game, where the second agent is acting so as to maximise expected utility given the first agent's policy. How should the first agent act in order to learn the joint reward function as quickly as possible and so that the joint policy is as close to optimal as possible? We analyse how knowledge about the reward function can be gained in this interactive two-agent scenario. We show that when the learning agent's policies have a significant effect on the transition function, the reward function can be learned efficiently. △ Less

Submitted 13 June, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

Comments: ICML 2022

arXiv:2103.00948 [pdf, other]

Cross Modal Focal Loss for RGBD Face Anti-Spoofing

Authors: Anjith George, Sebastien Marcel

Abstract: Automatic methods for detecting presentation attacks are essential to ensure the reliable use of facial recognition technology. Most of the methods available in the literature for presentation attack detection (PAD) fails in generalizing to unseen attacks. In recent years, multi-channel methods have been proposed to improve the robustness of PAD systems. Often, only a limited amount of data is ava… ▽ More Automatic methods for detecting presentation attacks are essential to ensure the reliable use of facial recognition technology. Most of the methods available in the literature for presentation attack detection (PAD) fails in generalizing to unseen attacks. In recent years, multi-channel methods have been proposed to improve the robustness of PAD systems. Often, only a limited amount of data is available for additional channels, which limits the effectiveness of these methods. In this work, we present a new framework for PAD that uses RGB and depth channels together with a novel loss function. The new architecture uses complementary information from the two modalities while reducing the impact of overfitting. Essentially, a cross-modal focal loss function is proposed to modulate the loss contribution of each channel as a function of the confidence of individual channels. Extensive evaluations in two publicly available datasets demonstrate the effectiveness of the proposed approach. △ Less

Submitted 1 March, 2021; originally announced March 2021.

Comments: 10 pages, Accepted for publication in CVPR2021

arXiv:2102.11932 [pdf, other]

On Meritocracy in Optimal Set Selection

Authors: Thomas Kleine Buening, Meirav Segal, Debabrota Basu, Christos Dimitrakakis, Anne-Marie George

Abstract: Typically, merit is defined with respect to some intrinsic measure of worth. We instead consider a setting where an individual's worth is \emph{relative}: when a Decision Maker (DM) selects a set of individuals from a population to maximise expected utility, it is natural to consider the \emph{Expected Marginal Contribution} (EMC) of each person to the utility. We show that this notion satisfies a… ▽ More Typically, merit is defined with respect to some intrinsic measure of worth. We instead consider a setting where an individual's worth is \emph{relative}: when a Decision Maker (DM) selects a set of individuals from a population to maximise expected utility, it is natural to consider the \emph{Expected Marginal Contribution} (EMC) of each person to the utility. We show that this notion satisfies an axiomatic definition of fairness for this setting. We also show that for certain policy structures, this notion of fairness is aligned with maximising expected utility, while for linear utility functions it is identical to the Shapley value. However, for certain natural policies, such as those that select individuals with a specific set of attributes (e.g. high enough test scores for college admissions), there is a trade-off between meritocracy and utility maximisation. We analyse the effect of constraints on the policy on both utility and fairness in extensive experiments based on college admissions and outcomes in Norwegian universities. △ Less

Submitted 9 September, 2022; v1 submitted 23 February, 2021; originally announced February 2021.

Comments: EAAMO 2022

arXiv:2101.06453 [pdf, other]

An MCMC Method to Sample from Lattice Distributions

Authors: Anand Jerry George, Navin Kashyap

Abstract: We introduce a Markov Chain Monte Carlo (MCMC) algorithm to generate samples from probability distributions supported on a $d$-dimensional lattice $Λ= \mathbf{B}\mathbb{Z}^d$, where $\mathbf{B}$ is a full-rank matrix. Specifically, we consider lattice distributions $P_Λ$ in which the probability at a lattice point is proportional to a given probability density function, $f$, evaluated at that poin… ▽ More We introduce a Markov Chain Monte Carlo (MCMC) algorithm to generate samples from probability distributions supported on a $d$-dimensional lattice $Λ= \mathbf{B}\mathbb{Z}^d$, where $\mathbf{B}$ is a full-rank matrix. Specifically, we consider lattice distributions $P_Λ$ in which the probability at a lattice point is proportional to a given probability density function, $f$, evaluated at that point. To generate samples from $P_Λ$, it suffices to draw samples from a pull-back measure $P_{\mathbb{Z}^d}$ defined on the integer lattice. The probability of an integer lattice point under $P_{\mathbb{Z}^d}$ is proportional to the density function $π= |\det(\mathbf{B})|f\circ \mathbf{B}$. The algorithm we present in this paper for sampling from $P_{\mathbb{Z}^d}$ is based on the Metropolis-Hastings framework. In particular, we use $π$ as the proposal distribution and calculate the Metropolis-Hastings acceptance ratio for a well-chosen target distribution. We can use any method, denoted by ALG, that ideally draws samples from the probability density $π$, to generate a proposed state. The target distribution is a piecewise sigmoidal distribution, chosen such that the coordinate-wise rounding of a sample drawn from the target distribution gives a sample from $P_{\mathbb{Z}^d}$. When ALG is ideal, we show that our algorithm is uniformly ergodic if $-\log(π)$ satisfies a gradient Lipschitz condition. △ Less

Submitted 26 January, 2021; v1 submitted 16 January, 2021; originally announced January 2021.

Comments: 11 pages, 7 figures

arXiv:2011.08019 [pdf, other]

On the Effectiveness of Vision Transformers for Zero-shot Face Anti-Spoofing

Authors: Anjith George, Sebastien Marcel

Abstract: The vulnerability of face recognition systems to presentation attacks has limited their application in security-critical scenarios. Automatic methods of detecting such malicious attempts are essential for the safe use of facial recognition technology. Although various methods have been suggested for detecting such attacks, most of them over-fit the training set and fail in generalizing to unseen a… ▽ More The vulnerability of face recognition systems to presentation attacks has limited their application in security-critical scenarios. Automatic methods of detecting such malicious attempts are essential for the safe use of facial recognition technology. Although various methods have been suggested for detecting such attacks, most of them over-fit the training set and fail in generalizing to unseen attacks and environments. In this work, we use transfer learning from the vision transformer model for the zero-shot anti-spoofing task. The effectiveness of the proposed approach is demonstrated through experiments in publicly available datasets. The proposed approach outperforms the state-of-the-art methods in the zero-shot protocols in the HQ-WMCA and SiW-M datasets by a large margin. Besides, the model achieves a significant boost in cross-database performance as well. △ Less

Submitted 2 June, 2021; v1 submitted 16 November, 2020; originally announced November 2020.

Comments: 8 pages, 3 figures, Accepted for Publication in IJCB2021

arXiv:2009.13549 [pdf, other]

Mez: A Messaging System for Latency-Sensitive Multi-Camera Machine Vision at the IoT Edge

Authors: Anjus George, Arun Ravindran, Mattias Mendieta, Hamed Tabkhi

Abstract: Mez is a publish-subscribe messaging system for latency sensitive multi-camera machine vision at the IoT Edge. Unlike existing messaging systems, Mez allows applications to specify latency, and application accuracy bounds. Mez implements a network latency controller that dynamically adjusts the video frame quality to satisfy latency, and application accuracy requirements. Additionally, the design… ▽ More Mez is a publish-subscribe messaging system for latency sensitive multi-camera machine vision at the IoT Edge. Unlike existing messaging systems, Mez allows applications to specify latency, and application accuracy bounds. Mez implements a network latency controller that dynamically adjusts the video frame quality to satisfy latency, and application accuracy requirements. Additionally, the design of Mez utilizes application domain specific features to provide low latency operations. Experimental evaluation on an IoT Edge testbed with a pedestrian detection machine vision application indicates that Mez is able to tolerate latency variations of up to 10x with a worst-case reduction of 4.2\% in the application accuracy F1 score metric. △ Less

Submitted 28 September, 2020; originally announced September 2020.

Comments: Under review ACM Transactions on Internet of Things

arXiv:2009.09703 [pdf, other]

The High-Quality Wide Multi-Channel Attack (HQ-WMCA) database

Authors: Zohreh Mostaani, Anjith George, Guillaume Heusch, David Geissbuhler, Sebastien Marcel

Abstract: The High-Quality Wide Multi-Channel Attack database (HQ-WMCA) database extends the previous Wide Multi-Channel Attack database(WMCA), with more channels including color, depth, thermal, infrared (spectra), and short-wave infrared (spectra), and also a wide variety of attacks. The High-Quality Wide Multi-Channel Attack database (HQ-WMCA) database extends the previous Wide Multi-Channel Attack database(WMCA), with more channels including color, depth, thermal, infrared (spectra), and short-wave infrared (spectra), and also a wide variety of attacks. △ Less

Submitted 21 September, 2020; originally announced September 2020.

arXiv:2007.11469 [pdf, other]

Deep Models and Shortwave Infrared Information to Detect Face Presentation Attacks

Authors: Guillaume Heusch, Anjith George, David Geissbuhler, Zohreh Mostaani, Sebastien Marcel

Abstract: This paper addresses the problem of face presentation attack detection using different image modalities. In particular, the usage of short wave infrared (SWIR) imaging is considered. Face presentation attack detection is performed using recent models based on Convolutional Neural Networks using only carefully selected SWIR image differences as input. Conducted experiments show superior performance… ▽ More This paper addresses the problem of face presentation attack detection using different image modalities. In particular, the usage of short wave infrared (SWIR) imaging is considered. Face presentation attack detection is performed using recent models based on Convolutional Neural Networks using only carefully selected SWIR image differences as input. Conducted experiments show superior performance over similar models acting on either color images or on a combination of different modalities (visible, NIR, thermal and depth), as well as on a SVM-based classifier acting on SWIR image differences. Experiments have been carried on a new public and freely available database, containing a wide variety of attacks. Video sequences have been recorded thanks to several sensors resulting in 14 different streams in the visible, NIR, SWIR and thermal spectra, as well as depth data. The best proposed approach is able to almost perfectly detect all impersonation attacks while ensuring low bonafide classification errors. On the other hand, obtained results show that obfuscation attacks are more difficult to detect. We hope that the proposed database will foster research on this challenging problem. Finally, all the code and instructions to reproduce presented experiments is made available to the research community. △ Less

Submitted 22 July, 2020; originally announced July 2020.

arXiv:2007.11457 [pdf, other]

Learning One Class Representations for Face Presentation Attack Detection using Multi-channel Convolutional Neural Networks

Authors: Anjith George, Sebastien Marcel

Abstract: Face recognition has evolved as a widely used biometric modality. However, its vulnerability against presentation attacks poses a significant security threat. Though presentation attack detection (PAD) methods try to address this issue, they often fail in generalizing to unseen attacks. In this work, we propose a new framework for PAD using a one-class classifier, where the representation used is… ▽ More Face recognition has evolved as a widely used biometric modality. However, its vulnerability against presentation attacks poses a significant security threat. Though presentation attack detection (PAD) methods try to address this issue, they often fail in generalizing to unseen attacks. In this work, we propose a new framework for PAD using a one-class classifier, where the representation used is learned with a Multi-Channel Convolutional Neural Network (MCCNN). A novel loss function is introduced, which forces the network to learn a compact embedding for bonafide class while being far from the representation of attacks. A one-class Gaussian Mixture Model is used on top of these embeddings for the PAD task. The proposed framework introduces a novel approach to learn a robust PAD system from bonafide and available (known) attack classes. This is particularly important as collecting bonafide data and simpler attacks are much easier than collecting a wide variety of expensive attacks. The proposed system is evaluated on the publicly available WMCA multi-channel face PAD database, which contains a wide variety of 2D and 3D attacks. Further, we have performed experiments with MLFP and SiW-M datasets using RGB channels only. Superior performance in unseen attack protocols shows the effectiveness of the proposed approach. Software, data, and protocols to reproduce the results are made available publicly. △ Less

Submitted 22 July, 2020; originally announced July 2020.

Comments: 15 pages

arXiv:2006.16836 [pdf, other]

Can Your Face Detector Do Anti-spoofing? Face Presentation Attack Detection with a Multi-Channel Face Detector

Authors: Anjith George, Sebastien Marcel

Abstract: In a typical face recognition pipeline, the task of the face detector is to localize the face region. However, the face detector localizes regions that look like a face, irrespective of the liveliness of the face, which makes the entire system susceptible to presentation attacks. In this work, we try to reformulate the task of the face detector to detect real faces, thus eliminating the threat of… ▽ More In a typical face recognition pipeline, the task of the face detector is to localize the face region. However, the face detector localizes regions that look like a face, irrespective of the liveliness of the face, which makes the entire system susceptible to presentation attacks. In this work, we try to reformulate the task of the face detector to detect real faces, thus eliminating the threat of presentation attacks. While this task could be challenging with visible spectrum images alone, we leverage the multi-channel information available from off the shelf devices (such as color, depth, and infrared channels) to design a multi-channel face detector. The proposed system can be used as a live-face detector obviating the need for a separate presentation attack detection module, making the system reliable in practice without any additional computational overhead. The main idea is to leverage a single-stage object detection framework, with a joint representation obtained from different channels for the PAD task. We have evaluated our approach in the multi-channel WMCA dataset containing a wide variety of attacks to show the effectiveness of the proposed framework. △ Less

Submitted 29 July, 2020; v1 submitted 30 June, 2020; originally announced June 2020.

Comments: 9 pages

Report number: Idiap-RR-12-2020

arXiv:2006.14749 [pdf, other]

Deepfake Detection using Spatiotemporal Convolutional Networks

Authors: Oscar de Lima, Sean Franklin, Shreshtha Basu, Blake Karwoski, Annet George

Abstract: Better generative models and larger datasets have led to more realistic fake videos that can fool the human eye but produce temporal and spatial artifacts that deep learning approaches can detect. Most current Deepfake detection methods only use individual video frames and therefore fail to learn from temporal information. We created a benchmark of the performance of spatiotemporal convolutional m… ▽ More Better generative models and larger datasets have led to more realistic fake videos that can fool the human eye but produce temporal and spatial artifacts that deep learning approaches can detect. Most current Deepfake detection methods only use individual video frames and therefore fail to learn from temporal information. We created a benchmark of the performance of spatiotemporal convolutional methods using the Celeb-DF dataset. Our methods outperformed state-of-the-art frame-based detection methods. Code for our paper is publicly available at https://github.com/oidelima/Deepfake-Detection. △ Less

Submitted 25 June, 2020; originally announced June 2020.

arXiv:2006.07909 [pdf, ps, other]

Leveraging Multimodal Behavioral Analytics for Automated Job Interview Performance Assessment and Feedback

Authors: Anumeha Agrawal, Rosa Anil George, Selvan Sunitha Ravi, Sowmya Kamath S, Anand Kumar M

Abstract: Behavioral cues play a significant part in human communication and cognitive perception. In most professional domains, employee recruitment policies are framed such that both professional skills and personality traits are adequately assessed. Hiring interviews are structured to evaluate expansively a potential employee's suitability for the position - their professional qualifications, interperson… ▽ More Behavioral cues play a significant part in human communication and cognitive perception. In most professional domains, employee recruitment policies are framed such that both professional skills and personality traits are adequately assessed. Hiring interviews are structured to evaluate expansively a potential employee's suitability for the position - their professional qualifications, interpersonal skills, ability to perform in critical and stressful situations, in the presence of time and resource constraints, etc. Therefore, candidates need to be aware of their positive and negative attributes and be mindful of behavioral cues that might have adverse effects on their success. We propose a multimodal analytical framework that analyzes the candidate in an interview scenario and provides feedback for predefined labels such as engagement, speaking rate, eye contact, etc. We perform a comprehensive analysis that includes the interviewee's facial expressions, speech, and prosodic information, using the video, audio, and text transcripts obtained from the recorded interview. We use these multimodal data sources to construct a composite representation, which is used for training machine learning classifiers to predict the class labels. Such analysis is then used to provide constructive feedback to the interviewee for their behavioral cues and body language. Experimental validation showed that the proposed methodology achieved promising results. △ Less

Submitted 16 June, 2020; v1 submitted 14 June, 2020; originally announced June 2020.

Comments: 9 pages, ACL 2020

arXiv:2006.00058 [pdf, other]

Applying the Decisiveness and Robustness Metrics to Convolutional Neural Networks

Authors: Christopher A. George, Eduardo A. Barrera, Kenric P. Nelson

Abstract: We review three recently-proposed classifier quality metrics and consider their suitability for large-scale classification challenges such as applying convolutional neural networks to the 1000-class ImageNet dataset. These metrics, referred to as the "geometric accuracy," "decisiveness," and "robustness," are based on the generalized mean ($ρ$ equals 0, 1, and -2/3, respectively) of the classifier… ▽ More We review three recently-proposed classifier quality metrics and consider their suitability for large-scale classification challenges such as applying convolutional neural networks to the 1000-class ImageNet dataset. These metrics, referred to as the "geometric accuracy," "decisiveness," and "robustness," are based on the generalized mean ($ρ$ equals 0, 1, and -2/3, respectively) of the classifier's self-reported and measured probabilities of correct classification. We also propose some minor clarifications to standardize the metric definitions. With these updates, we show some examples of calculating the metrics using deep convolutional neural networks (AlexNet and DenseNet) acting on large datasets (the German Traffic Sign Recognition Benchmark and ImageNet). △ Less

Submitted 29 May, 2020; originally announced June 2020.

arXiv:2003.03626 [pdf]

Discrimination Among Multiple Cutaneous and Proprioceptive Hand Percepts Evoked by Nerve Stimulation with Utah Slanted Electrode Arrays in Human Amputees

Authors: David M. Page, Suzanne M. Wendelken, Tyler S. Davis, David T. Kluger, Douglas T. Hutchinson, Jacob A. George, Gregory A. Clark

Abstract: Objective: This paper aims to demonstrate functional discriminability among restored hand sensations with different locations, qualities, and intensities that are evoked by microelectrode stimulation of residual afferent fibers in human amputees. Methods: We implanted a Utah Slanted Electrode Array (USEA) in the median and ulnar residual arm nerves of three transradial amputees and delivered stimu… ▽ More Objective: This paper aims to demonstrate functional discriminability among restored hand sensations with different locations, qualities, and intensities that are evoked by microelectrode stimulation of residual afferent fibers in human amputees. Methods: We implanted a Utah Slanted Electrode Array (USEA) in the median and ulnar residual arm nerves of three transradial amputees and delivered stimulation via different electrodes and at different frequencies to produce various locations, qualities, and intensities of sensation on the missing hand. Blind discrimination trials were performed to determine how well subjects could discriminate among these restored sensations. Results: Subjects discriminated among restored sensory percepts with varying cutaneous and proprioceptive locations, qualities, and intensities in blind trials, including discrimination among up to 10 different location-intensity combinations (15/30 successes, p < 0.0005). Variations in the site of stimulation within the nerve, via electrode selection, enabled discrimination among up to 5 locations and qualities (35/35 successes, p < 0.0001). Variations in the stimulation frequency enabled discrimination among 4 different intensities at the same location (13/20 successes, p < 0.005). One subject discriminated among simultaneous, alternating, and isolated stimulation of two different USEA electrodes, as may be desired during multi-sensor closed-loop prosthesis use (20/25 successes, p < 0.001). Conclusion: USEA stimulation enables encoding of a diversity of functionally discriminable sensations with different locations, qualities, and intensities. Significance: These percepts provide a potentially rich source of sensory feedback that may enhance performance and embodiment during multi-sensor, closed-loop prosthesis use. △ Less

Submitted 7 March, 2020; originally announced March 2020.

Comments: 19 pages

arXiv:2003.00070 [pdf]

Inexpensive surface electromyography sleeve with consistent electrode placement enables dexterous and stable prosthetic control through deep learning

Authors: Jacob A. George, Anna Neibling, Michael D. Paskett, Gregory A. Clark

Abstract: The dexterity of conventional myoelectric prostheses is limited in part by the small datasets used to train the control algorithms. Variations in surface electrode positioning make it difficult to collect consistent data and to estimate motor intent reliably over time. To address these challenges, we developed an inexpensive, easy-to-don sleeve that can record robust and repeatable surface electro… ▽ More The dexterity of conventional myoelectric prostheses is limited in part by the small datasets used to train the control algorithms. Variations in surface electrode positioning make it difficult to collect consistent data and to estimate motor intent reliably over time. To address these challenges, we developed an inexpensive, easy-to-don sleeve that can record robust and repeatable surface electromyography from 32 embedded monopolar electrodes. Embedded grommets are used to consistently align the sleeve with natural skin markings (e.g., moles, freckles, scars). The sleeve can be manufactured in a few hours for less than $60. Data from seven intact participants show the sleeve provides a signal-to-noise ratio of 14, a don-time under 11 seconds, and sub-centimeter precision for electrode placement. Furthermore, in a case study with one intact participant, we use the sleeve to demonstrate that neural networks can provide simultaneous and proportional control of six degrees of freedom, even 263 days after initial algorithm training. We also highlight that consistent recordings, accumulated over time to establish a large dataset, significantly improve dexterity. These results suggest that deep learning with a 74-layer neural network can substantially improve the dexterity and stability of myoelectric prosthetic control, and that deep-learning techniques can be readily instantiated and further validated through inexpensive sleeves/sockets with consistent recording locations. △ Less

Submitted 28 February, 2020; originally announced March 2020.

Comments: MEC2020

Showing 1–50 of 76 results for author: George, A