-
Would Deep Generative Models Amplify Bias in Future Models?
Authors:
Tianwei Chen,
Yusuke Hirota,
Mayu Otani,
Noa Garcia,
Yuta Nakashima
Abstract:
We investigate the impact of deep generative models on potential social biases in upcoming computer vision models. As the internet witnesses an increasing influx of AI-generated images, concerns arise regarding inherent biases that may accompany them, potentially leading to the dissemination of harmful content. This paper explores whether a detrimental feedback loop, resulting in bias amplificatio…
▽ More
We investigate the impact of deep generative models on potential social biases in upcoming computer vision models. As the internet witnesses an increasing influx of AI-generated images, concerns arise regarding inherent biases that may accompany them, potentially leading to the dissemination of harmful content. This paper explores whether a detrimental feedback loop, resulting in bias amplification, would occur if generated images were used as the training data for future models. We conduct simulations by progressively substituting original images in COCO and CC3M datasets with images generated through Stable Diffusion. The modified datasets are used to train OpenCLIP and image captioning models, which we evaluate in terms of quality and bias. Contrary to expectations, our findings indicate that introducing generated images during training does not uniformly amplify bias. Instead, instances of bias mitigation across specific tasks are observed. We further explore the factors that may influence these phenomena, such as artifacts in image generation (e.g., blurry faces) or pre-existing biases in the original datasets.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Can multiple-choice questions really be useful in detecting the abilities of LLMs?
Authors:
Wangyue Li,
Liangzhi Li,
Tong Xiang,
Xiao Liu,
Wei Deng,
Noa Garcia
Abstract:
Multiple-choice questions (MCQs) are widely used in the evaluation of large language models (LLMs) due to their simplicity and efficiency. However, there are concerns about whether MCQs can truly measure LLM's capabilities, particularly in knowledge-intensive scenarios where long-form generation (LFG) answers are required. The misalignment between the task and the evaluation method demands a thoug…
▽ More
Multiple-choice questions (MCQs) are widely used in the evaluation of large language models (LLMs) due to their simplicity and efficiency. However, there are concerns about whether MCQs can truly measure LLM's capabilities, particularly in knowledge-intensive scenarios where long-form generation (LFG) answers are required. The misalignment between the task and the evaluation method demands a thoughtful analysis of MCQ's efficacy, which we undertake in this paper by evaluating nine LLMs on four question-answering (QA) datasets in two languages: Chinese and English. We identify a significant issue: LLMs exhibit an order sensitivity in bilingual MCQs, favoring answers located at specific positions, i.e., the first position. We further quantify the gap between MCQs and long-form generation questions (LFGQs) by comparing their direct outputs, token logits, and embeddings. Our results reveal a relatively low correlation between answers from MCQs and LFGQs for identical questions. Additionally, we propose two methods to quantify the consistency and confidence of LLMs' output, which can be generalized to other QA evaluation benchmarks. Notably, our analysis challenges the idea that the higher the consistency, the greater the accuracy. We also find MCQs to be less reliable than LFGQs in terms of expected calibration error. Finally, the misalignment between MCQs and LFGQs is not only reflected in the evaluation performance but also in the embedding space. Our code and models can be accessed at https://github.com/Meetyou-AI-Lab/Can-MC-Evaluate-LLMs.
△ Less
Submitted 23 May, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
Combined Task and Motion Planning Via Sketch Decompositions (Extended Version with Supplementary Material)
Authors:
Magí Dalmau-Moreno,
Néstor García,
Vicenç Gómez,
Héctor Geffner
Abstract:
The challenge in combined task and motion planning (TAMP) is the effective integration of a search over a combinatorial space, usually carried out by a task planner, and a search over a continuous configuration space, carried out by a motion planner. Using motion planners for testing the feasibility of task plans and filling out the details is not effective because it makes the geometrical constra…
▽ More
The challenge in combined task and motion planning (TAMP) is the effective integration of a search over a combinatorial space, usually carried out by a task planner, and a search over a continuous configuration space, carried out by a motion planner. Using motion planners for testing the feasibility of task plans and filling out the details is not effective because it makes the geometrical constraints play a passive role. This work introduces a new interleaved approach for integrating the two dimensions of TAMP that makes use of sketches, a recent simple but powerful language for expressing the decomposition of problems into subproblems. A sketch has width 1 if it decomposes the problem into subproblems that can be solved greedily in linear time. In the paper, a general sketch is introduced for several classes of TAMP problems which has width 1 under suitable assumptions. While sketch decompositions have been developed for classical planning, they offer two important benefits in the context of TAMP. First, when a task plan is found to be unfeasible due to the geometric constraints, the combinatorial search resumes in a specific sub-problem. Second, the sampling of object configurations is not done once, globally, at the start of the search, but locally, at the start of each subproblem. Optimizations of this basic setting are also considered and experimental results over existing and new pick-and-place benchmarks are reported.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Sim-to-Real gap in RL: Use Case with TIAGo and Isaac Sim/Gym
Authors:
Jaume Albardaner,
Alberto San Miguel,
Néstor García,
Magí Dalmau-Moreno
Abstract:
This paper explores policy-learning approaches in the context of sim-to-real transfer for robotic manipulation using a TIAGo mobile manipulator, focusing on two state-of-art simulators, Isaac Gym and Isaac Sim, both developed by Nvidia. Control architectures are discussed, with a particular emphasis on achieving collision-less movement in both simulation and the real environment. Presented results…
▽ More
This paper explores policy-learning approaches in the context of sim-to-real transfer for robotic manipulation using a TIAGo mobile manipulator, focusing on two state-of-art simulators, Isaac Gym and Isaac Sim, both developed by Nvidia. Control architectures are discussed, with a particular emphasis on achieving collision-less movement in both simulation and the real environment. Presented results demonstrate successful sim-to-real transfer, showcasing similar movements executed by an RL-trained model in both simulated and real setups.
△ Less
Submitted 27 March, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Advancing dermatological diagnosis: Development of a hyperspectral dermatoscope for enhanced skin imaging
Authors:
Martin J. Hetz,
Carina Nogueira Garcia,
Sarah Haggenmüller,
Titus J. Brinker
Abstract:
Clinical dermatology necessitates precision and innovation for efficient diagnosis and treatment of various skin conditions. This paper introduces the development of a cutting-edge hyperspectral dermatoscope (the Hyperscope) tailored for human skin analysis. We detail the requirements to such a device and the design considerations, from optical configurations to sensor selection, necessary to capt…
▽ More
Clinical dermatology necessitates precision and innovation for efficient diagnosis and treatment of various skin conditions. This paper introduces the development of a cutting-edge hyperspectral dermatoscope (the Hyperscope) tailored for human skin analysis. We detail the requirements to such a device and the design considerations, from optical configurations to sensor selection, necessary to capture a wide spectral range with high fidelity. Preliminary results from 15 individuals and 160 recorded skin images demonstrate the potential of the Hyperscope in identifying and characterizing various skin conditions, offering a promising avenue for non-invasive skin evaluation and a platform for future research in dermatology-related hyperspectral imaging.
△ Less
Submitted 25 June, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
Multi-organ Self-supervised Contrastive Learning for Breast Lesion Segmentation
Authors:
Hugo Figueiras,
Helena Aidos,
Nuno Cruz Garcia
Abstract:
Self-supervised learning has proven to be an effective way to learn representations in domains where annotated labels are scarce, such as medical imaging. A widely adopted framework for this purpose is contrastive learning and it has been applied to different scenarios. This paper seeks to advance our understanding of the contrastive learning framework by exploring a novel perspective: employing m…
▽ More
Self-supervised learning has proven to be an effective way to learn representations in domains where annotated labels are scarce, such as medical imaging. A widely adopted framework for this purpose is contrastive learning and it has been applied to different scenarios. This paper seeks to advance our understanding of the contrastive learning framework by exploring a novel perspective: employing multi-organ datasets for pre-training models tailored to specific organ-related target tasks. More specifically, our target task is breast tumour segmentation in ultrasound images. The pre-training datasets include ultrasound images from other organs, such as the lungs and heart, and large datasets of natural images. Our results show that conventional contrastive learning pre-training improves performance compared to supervised baseline approaches. Furthermore, our pre-trained models achieve comparable performance when fine-tuned with only half of the available labelled data. Our findings also show the advantages of pre-training on diverse organ data for improving performance in the downstream task.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Stable Diffusion Exposed: Gender Bias from Prompt to Image
Authors:
Yankun Wu,
Yuta Nakashima,
Noa Garcia
Abstract:
Recent studies have highlighted biases in generative models, shedding light on their predisposition towards gender-based stereotypes and imbalances. This paper contributes to this growing body of research by introducing an evaluation protocol designed to automatically analyze the impact of gender indicators on Stable Diffusion images. Leveraging insights from prior work, we explore how gender indi…
▽ More
Recent studies have highlighted biases in generative models, shedding light on their predisposition towards gender-based stereotypes and imbalances. This paper contributes to this growing body of research by introducing an evaluation protocol designed to automatically analyze the impact of gender indicators on Stable Diffusion images. Leveraging insights from prior work, we explore how gender indicators not only affect gender presentation but also the representation of objects and layouts within the generated images. Our findings include the existence of differences in the depiction of objects, such as instruments tailored for specific genders, and shifts in overall layouts. We also reveal that neutral prompts tend to produce images more aligned with masculine prompts than their feminine counterparts, providing valuable insights into the nuanced gender biases inherent in Stable Diffusion.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Solving the Team Orienteering Problem with Transformers
Authors:
Daniel Fuertes,
Carlos R. del-Blanco,
Fernando Jaureguizar,
Narciso García
Abstract:
Route planning for a fleet of vehicles is an important task in applications such as package delivery, surveillance, or transportation. This problem is usually modeled as a Combinatorial Optimization problem named as Team Orienteering Problem. The most popular Team Orienteering Problem solvers are mainly based on either linear programming, which provides accurate solutions by employing a large comp…
▽ More
Route planning for a fleet of vehicles is an important task in applications such as package delivery, surveillance, or transportation. This problem is usually modeled as a Combinatorial Optimization problem named as Team Orienteering Problem. The most popular Team Orienteering Problem solvers are mainly based on either linear programming, which provides accurate solutions by employing a large computation time that grows with the size of the problem, or heuristic methods, which usually find suboptimal solutions in a shorter amount of time. In this paper, a multi-agent route planning system capable of solving the Team Orienteering Problem in a very fast and accurate manner is presented. The proposed system is based on a centralized Transformer neural network that can learn to encode the scenario (modeled as a graph) and the context of the agents to provide fast and accurate solutions. Several experiments have been performed to demonstrate that the presented system can outperform most of the state-of-the-art works in terms of computation speed. In addition, the code is publicly available at http://gti.ssr.upm.es/data.
△ Less
Submitted 1 December, 2023; v1 submitted 30 November, 2023;
originally announced November 2023.
-
Situating the social issues of image generation models in the model life cycle: a sociotechnical approach
Authors:
Amelia Katirai,
Noa Garcia,
Kazuki Ide,
Yuta Nakashima,
Atsuo Kishimoto
Abstract:
The race to develop image generation models is intensifying, with a rapid increase in the number of text-to-image models available. This is coupled with growing public awareness of these technologies. Though other generative AI models--notably, large language models--have received recent critical attention for the social and other non-technical issues they raise, there has been relatively little c…
▽ More
The race to develop image generation models is intensifying, with a rapid increase in the number of text-to-image models available. This is coupled with growing public awareness of these technologies. Though other generative AI models--notably, large language models--have received recent critical attention for the social and other non-technical issues they raise, there has been relatively little comparable examination of image generation models. This paper reports on a novel, comprehensive categorization of the social issues associated with image generation models. At the intersection of machine learning and the social sciences, we report the results of a survey of the literature, identifying seven issue clusters arising from image generation models: data issues, intellectual property, bias, privacy, and the impacts on the informational, cultural, and natural environments. We situate these social issues in the model life cycle, to aid in considering where potential issues arise, and mitigation may be needed. We then compare these issue clusters with what has been reported for large language models. Ultimately, we argue that the risks posed by image generation models are comparable in severity to the risks posed by large language models, and that the social impact of image generation models must be urgently considered.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
CARE-MI: Chinese Benchmark for Misinformation Evaluation in Maternity and Infant Care
Authors:
Tong Xiang,
Liangzhi Li,
Wangyue Li,
Mingbai Bai,
Lu Wei,
Bowen Wang,
Noa Garcia
Abstract:
The recent advances in natural language processing (NLP), have led to a new trend of applying large language models (LLMs) to real-world scenarios. While the latest LLMs are astonishingly fluent when interacting with humans, they suffer from the misinformation problem by unintentionally generating factually false statements. This can lead to harmful consequences, especially when produced within se…
▽ More
The recent advances in natural language processing (NLP), have led to a new trend of applying large language models (LLMs) to real-world scenarios. While the latest LLMs are astonishingly fluent when interacting with humans, they suffer from the misinformation problem by unintentionally generating factually false statements. This can lead to harmful consequences, especially when produced within sensitive contexts, such as healthcare. Yet few previous works have focused on evaluating misinformation in the long-form (LF) generation of LLMs, especially for knowledge-intensive topics. Moreover, although LLMs have been shown to perform well in different languages, misinformation evaluation has been mostly conducted in English. To this end, we present a benchmark, CARE-MI, for evaluating LLM misinformation in: 1) a sensitive topic, specifically the maternity and infant care domain; and 2) a language other than English, namely Chinese. Most importantly, we provide an innovative paradigm for building LF generation evaluation benchmarks that can be transferred to other knowledge-intensive domains and low-resourced languages. Our proposed benchmark fills the gap between the extensive usage of LLMs and the lack of datasets for assessing the misinformation generated by these models. It contains 1,612 expert-checked questions, accompanied with human-selected references. Using our benchmark, we conduct extensive experiments and found that current Chinese LLMs are far from perfect in the topic of maternity and infant care. In an effort to minimize the reliance on human resources for performance evaluation, we offer off-the-shelf judgment models for automatically assessing the LF output of LLMs given benchmark questions. Moreover, we compare potential solutions for LF generation evaluation and provide insights for building better automated metrics.
△ Less
Submitted 26 October, 2023; v1 submitted 3 July, 2023;
originally announced July 2023.
-
A Cluster-Based Opposition Differential Evolution Algorithm Boosted by a Local Search for ECG Signal Classification
Authors:
Mehran Pourvahab,
Seyed Jalaleddin Mousavirad,
Virginie Felizardo,
Nuno Pombo,
Henriques Zacarias,
Hamzeh Mohammadigheymasi,
Sebastião Pais,
Seyed Nooreddin Jafari,
Nuno M. Garcia
Abstract:
Electrocardiogram (ECG) signals, which capture the heart's electrical activity, are used to diagnose and monitor cardiac problems. The accurate classification of ECG signals, particularly for distinguishing among various types of arrhythmias and myocardial infarctions, is crucial for the early detection and treatment of heart-related diseases. This paper proposes a novel approach based on an impro…
▽ More
Electrocardiogram (ECG) signals, which capture the heart's electrical activity, are used to diagnose and monitor cardiac problems. The accurate classification of ECG signals, particularly for distinguishing among various types of arrhythmias and myocardial infarctions, is crucial for the early detection and treatment of heart-related diseases. This paper proposes a novel approach based on an improved differential evolution (DE) algorithm for ECG signal classification for enhancing the performance. In the initial stages of our approach, the preprocessing step is followed by the extraction of several significant features from the ECG signals. These extracted features are then provided as inputs to an enhanced multi-layer perceptron (MLP). While MLPs are still widely used for ECG signal classification, using gradient-based training methods, the most widely used algorithm for the training process, has significant disadvantages, such as the possibility of being stuck in local optimums. This paper employs an enhanced differential evolution (DE) algorithm for the training process as one of the most effective population-based algorithms. To this end, we improved DE based on a clustering-based strategy, opposition-based learning, and a local search. Clustering-based strategies can act as crossover operators, while the goal of the opposition operator is to improve the exploration of the DE algorithm. The weights and biases found by the improved DE algorithm are then fed into six gradient-based local search algorithms. In other words, the weights found by the DE are employed as an initialization point. Therefore, we introduced six different algorithms for the training process (in terms of different local search algorithms). In an extensive set of experiments, we showed that our proposed training algorithm could provide better results than the conventional training algorithms.
△ Less
Submitted 6 October, 2023; v1 submitted 4 May, 2023;
originally announced May 2023.
-
Not Only Generative Art: Stable Diffusion for Content-Style Disentanglement in Art Analysis
Authors:
Yankun Wu,
Yuta Nakashima,
Noa Garcia
Abstract:
The duality of content and style is inherent to the nature of art. For humans, these two elements are clearly different: content refers to the objects and concepts in the piece of art, and style to the way it is expressed. This duality poses an important challenge for computer vision. The visual appearance of objects and concepts is modulated by the style that may reflect the author's emotions, so…
▽ More
The duality of content and style is inherent to the nature of art. For humans, these two elements are clearly different: content refers to the objects and concepts in the piece of art, and style to the way it is expressed. This duality poses an important challenge for computer vision. The visual appearance of objects and concepts is modulated by the style that may reflect the author's emotions, social trends, artistic movement, etc., and their deep comprehension undoubtfully requires to handle both. A promising step towards a general paradigm for art analysis is to disentangle content and style, whereas relying on human annotations to cull a single aspect of artworks has limitations in learning semantic concepts and the visual appearance of paintings. We thus present GOYA, a method that distills the artistic knowledge captured in a recent generative model to disentangle content and style. Experiments show that synthetically generated images sufficiently serve as a proxy of the real distribution of artworks, allowing GOYA to separately represent the two elements of art while keeping more information than existing methods.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
Model-Agnostic Gender Debiased Image Captioning
Authors:
Yusuke Hirota,
Yuta Nakashima,
Noa Garcia
Abstract:
Image captioning models are known to perpetuate and amplify harmful societal bias in the training set. In this work, we aim to mitigate such gender bias in image captioning models. While prior work has addressed this problem by forcing models to focus on people to reduce gender misclassification, it conversely generates gender-stereotypical words at the expense of predicting the correct gender. Fr…
▽ More
Image captioning models are known to perpetuate and amplify harmful societal bias in the training set. In this work, we aim to mitigate such gender bias in image captioning models. While prior work has addressed this problem by forcing models to focus on people to reduce gender misclassification, it conversely generates gender-stereotypical words at the expense of predicting the correct gender. From this observation, we hypothesize that there are two types of gender bias affecting image captioning models: 1) bias that exploits context to predict gender, and 2) bias in the probability of generating certain (often stereotypical) words because of gender. To mitigate both types of gender biases, we propose a framework, called LIBRA, that learns from synthetically biased samples to decrease both types of biases, correcting gender misclassification and changing gender-stereotypical words to more neutral ones. Code is available at https://github.com/rebnej/LIBRA.
△ Less
Submitted 21 December, 2023; v1 submitted 7 April, 2023;
originally announced April 2023.
-
Uncurated Image-Text Datasets: Shedding Light on Demographic Bias
Authors:
Noa Garcia,
Yusuke Hirota,
Yankun Wu,
Yuta Nakashima
Abstract:
The increasing tendency to collect large and uncurated datasets to train vision-and-language models has raised concerns about fair representations. It is known that even small but manually annotated datasets, such as MSCOCO, are affected by societal bias. This problem, far from being solved, may be getting worse with data crawled from the Internet without much control. In addition, the lack of too…
▽ More
The increasing tendency to collect large and uncurated datasets to train vision-and-language models has raised concerns about fair representations. It is known that even small but manually annotated datasets, such as MSCOCO, are affected by societal bias. This problem, far from being solved, may be getting worse with data crawled from the Internet without much control. In addition, the lack of tools to analyze societal bias in big collections of images makes addressing the problem extremely challenging. Our first contribution is to annotate part of the Google Conceptual Captions dataset, widely used for training vision-and-language models, with four demographic and two contextual attributes. Our second contribution is to conduct a comprehensive analysis of the annotations, focusing on how different demographic groups are represented. Our last contribution lies in evaluating three prevailing vision-and-language tasks: image captioning, text-image CLIP embeddings, and text-to-image generation, showing that societal bias is a persistent problem in all of them.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
Dermatologist-like explainable AI enhances trust and confidence in diagnosing melanoma
Authors:
Tirtha Chanda,
Katja Hauser,
Sarah Hobelsberger,
Tabea-Clara Bucher,
Carina Nogueira Garcia,
Christoph Wies,
Harald Kittler,
Philipp Tschandl,
Cristian Navarrete-Dechent,
Sebastian Podlipnik,
Emmanouil Chousakos,
Iva Crnaric,
Jovana Majstorovic,
Linda Alhajwan,
Tanya Foreman,
Sandra Peternel,
Sergei Sarap,
İrem Özdemir,
Raymond L. Barnhill,
Mar Llamas Velasco,
Gabriela Poch,
Sören Korsing,
Wiebke Sondermann,
Frank Friedrich Gellrich,
Markus V. Heppt
, et al. (10 additional authors not shown)
Abstract:
Although artificial intelligence (AI) systems have been shown to improve the accuracy of initial melanoma diagnosis, the lack of transparency in how these systems identify melanoma poses severe obstacles to user acceptance. Explainable artificial intelligence (XAI) methods can help to increase transparency, but most XAI methods are unable to produce precisely located domain-specific explanations,…
▽ More
Although artificial intelligence (AI) systems have been shown to improve the accuracy of initial melanoma diagnosis, the lack of transparency in how these systems identify melanoma poses severe obstacles to user acceptance. Explainable artificial intelligence (XAI) methods can help to increase transparency, but most XAI methods are unable to produce precisely located domain-specific explanations, making the explanations difficult to interpret. Moreover, the impact of XAI methods on dermatologists has not yet been evaluated. Extending on two existing classifiers, we developed an XAI system that produces text and region based explanations that are easily interpretable by dermatologists alongside its differential diagnoses of melanomas and nevi. To evaluate this system, we conducted a three-part reader study to assess its impact on clinicians' diagnostic accuracy, confidence, and trust in the XAI-support. We showed that our XAI's explanations were highly aligned with clinicians' explanations and that both the clinicians' trust in the support system and their confidence in their diagnoses were significantly increased when using our XAI compared to using a conventional AI system. The clinicians' diagnostic accuracy was numerically, albeit not significantly, increased. This work demonstrates that clinicians are willing to adopt such an XAI system, motivating their future use in the clinic.
△ Less
Submitted 17 March, 2023;
originally announced March 2023.
-
MROS: A framework for robot self-adaptation
Authors:
Gustavo Rezende Silva,
Darko Bozhinoski,
Mario Garzon Oviedo,
Mariano Ramírez Montero,
Nadia Hammoudeh Garcia,
Harshavardhan Deshpande,
Andrzej Wasowski,
Carlos Hernandez Corbato
Abstract:
Self-adaptation can be used in robotics to increase system robustness and reliability. This work describes the Metacontrol method for self-adaptation in robotics. Particularly, it details how the MROS (Metacontrol for ROS Systems) framework implements and packages Metacontrol, and it demonstrate how MROS can be applied in a navigation scenario where a mobile robot navigates in a factory floor. Vid…
▽ More
Self-adaptation can be used in robotics to increase system robustness and reliability. This work describes the Metacontrol method for self-adaptation in robotics. Particularly, it details how the MROS (Metacontrol for ROS Systems) framework implements and packages Metacontrol, and it demonstrate how MROS can be applied in a navigation scenario where a mobile robot navigates in a factory floor. Video: https://www.youtube.com/watch?v=ISe9aMskJuE
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
A Comparative Analysis of Bias Amplification in Graph Neural Network Approaches for Recommender Systems
Authors:
Nikzad Chizari,
Niloufar Shoeibi,
María N. Moreno-García
Abstract:
Recommender Systems (RSs) are used to provide users with personalized item recommendations and help them overcome the problem of information overload. Currently, recommendation methods based on deep learning are gaining ground over traditional methods such as matrix factorization due to their ability to represent the complex relationships between users and items and to incorporate additional infor…
▽ More
Recommender Systems (RSs) are used to provide users with personalized item recommendations and help them overcome the problem of information overload. Currently, recommendation methods based on deep learning are gaining ground over traditional methods such as matrix factorization due to their ability to represent the complex relationships between users and items and to incorporate additional information. The fact that these data have a graph structure and the greater capability of Graph Neural Networks (GNNs) to learn from these structures has led to their successful incorporation into recommender systems. However, the bias amplification issue needs to be investigated while using these algorithms. Bias results in unfair decisions, which can negatively affect the company reputation and financial status due to societal disappointment and environmental harm. In this paper, we aim to comprehensively study this problem through a literature review and an analysis of the behavior against biases of different GNN-based algorithms compared to state-of-the-art methods. We also intend to explore appropriate solutions to tackle this issue with the least possible impact on the model performance.
△ Less
Submitted 18 January, 2023;
originally announced January 2023.
-
Learning More May Not Be Better: Knowledge Transferability in Vision and Language Tasks
Authors:
Tianwei Chen,
Noa Garcia,
Mayu Otani,
Chenhui Chu,
Yuta Nakashima,
Hajime Nagahara
Abstract:
Is more data always better to train vision-and-language models? We study knowledge transferability in multi-modal tasks. The current tendency in machine learning is to assume that by joining multiple datasets from different tasks their overall performance will improve. However, we show that not all the knowledge transfers well or has a positive impact on related tasks, even when they share a commo…
▽ More
Is more data always better to train vision-and-language models? We study knowledge transferability in multi-modal tasks. The current tendency in machine learning is to assume that by joining multiple datasets from different tasks their overall performance will improve. However, we show that not all the knowledge transfers well or has a positive impact on related tasks, even when they share a common goal. We conduct an exhaustive analysis based on hundreds of cross-experiments on 12 vision-and-language tasks categorized in 4 groups. Whereas tasks in the same group are prone to improve each other, results show that this is not always the case. Other factors such as dataset size or pre-training stage have also a great impact on how well the knowledge is transferred.
△ Less
Submitted 23 August, 2022;
originally announced August 2022.
-
RigoBERTa: A State-of-the-Art Language Model For Spanish
Authors:
Alejandro Vaca Serrano,
Guillem Garcia Subies,
Helena Montoro Zamorano,
Nuria Aldama Garcia,
Doaa Samy,
David Betancur Sanchez,
Antonio Moreno Sandoval,
Marta Guerrero Nieto,
Alvaro Barbero Jimenez
Abstract:
This paper presents RigoBERTa, a State-of-the-Art Language Model for Spanish. RigoBERTa is trained over a well-curated corpus formed up from different subcorpora with key features. It follows the DeBERTa architecture, which has several advantages over other architectures of similar size as BERT or RoBERTa. RigoBERTa performance is assessed over 13 NLU tasks in comparison with other available Spani…
▽ More
This paper presents RigoBERTa, a State-of-the-Art Language Model for Spanish. RigoBERTa is trained over a well-curated corpus formed up from different subcorpora with key features. It follows the DeBERTa architecture, which has several advantages over other architectures of similar size as BERT or RoBERTa. RigoBERTa performance is assessed over 13 NLU tasks in comparison with other available Spanish language models, namely, MarIA, BERTIN and BETO. RigoBERTa outperformed the three models in 10 out of the 13 tasks, achieving new "State-of-the-Art" results.
△ Less
Submitted 3 June, 2022; v1 submitted 27 April, 2022;
originally announced May 2022.
-
Gender and Racial Bias in Visual Question Answering Datasets
Authors:
Yusuke Hirota,
Yuta Nakashima,
Noa Garcia
Abstract:
Vision-and-language tasks have increasingly drawn more attention as a means to evaluate human-like reasoning in machine learning models. A popular task in the field is visual question answering (VQA), which aims to answer questions about images. However, VQA models have been shown to exploit language bias by learning the statistical correlations between questions and answers without looking into t…
▽ More
Vision-and-language tasks have increasingly drawn more attention as a means to evaluate human-like reasoning in machine learning models. A popular task in the field is visual question answering (VQA), which aims to answer questions about images. However, VQA models have been shown to exploit language bias by learning the statistical correlations between questions and answers without looking into the image content: e.g., questions about the color of a banana are answered with yellow, even if the banana in the image is green. If societal bias (e.g., sexism, racism, ableism, etc.) is present in the training data, this problem may be causing VQA models to learn harmful stereotypes. For this reason, we investigate gender and racial bias in five VQA datasets. In our analysis, we find that the distribution of answers is highly different between questions about women and men, as well as the existence of detrimental gender-stereotypical samples. Likewise, we identify that specific race-related attributes are underrepresented, whereas potentially discriminatory samples appear in the analyzed datasets. Our findings suggest that there are dangers associated to using VQA datasets without considering and dealing with the potentially harmful stereotypes. We conclude the paper by proposing solutions to alleviate the problem before, during, and after the dataset collection process.
△ Less
Submitted 3 June, 2022; v1 submitted 17 May, 2022;
originally announced May 2022.
-
Emerging Immersive Communication Systems: Overview, Taxonomy, and Good Practises for QoE Assessment
Authors:
Pablo Pérez,
Ester Gonzalez-Sosa,
Jesús Gutiérrez,
Narciso García
Abstract:
Several technological and scientific advances have been achieved recently in the fields of immersive systems, which are offering new possibilities to applications and services in different communication domains, such as entertainment, virtual conferencing, working meetings, social relations, healthcare, and industry. Users of these immersive technologies can explore and experience the stimuli in a…
▽ More
Several technological and scientific advances have been achieved recently in the fields of immersive systems, which are offering new possibilities to applications and services in different communication domains, such as entertainment, virtual conferencing, working meetings, social relations, healthcare, and industry. Users of these immersive technologies can explore and experience the stimuli in a more interactive and personalized way than previous technologies. Thus, considering the new technological challenges related to these systems and the new perceptual dimensions and interaction behaviors involved, a deep understanding of the users' Quality of Experience is required to satisfy their demands and expectations. In this sense, it is essential to foster the research on evaluating the QoE of immersive communication systems, since this will provide useful outcomes to optimize them and to identify the factors that can deteriorate the user experience. With this aim, subjective tests are usually performed following standard methodologies, which are designed for specific technologies and services. Although numerous user studies have been already published, there are no recommendations or standards that define common testing methodologies to be applied to evaluate immersive communication systems, such as those developed for images and video. Therefore, a revision of the QoE evaluation methods designed for previous technologies is required to develop robust and reliable methodologies for immersive communication systems. Thus, the objective of this paper is to provide an overview of existing immersive communication systems and related user studies, which can help on the definition of basic guidelines and testing methodologies to be used when performing user tests of immersive communication systems, such as 360-degree video-based telepresence, avatar-based social VR, cooperative AR, etc.
△ Less
Submitted 1 September, 2022; v1 submitted 12 May, 2022;
originally announced May 2022.
-
Quantifying Societal Bias Amplification in Image Captioning
Authors:
Yusuke Hirota,
Yuta Nakashima,
Noa Garcia
Abstract:
We study societal bias amplification in image captioning. Image captioning models have been shown to perpetuate gender and racial biases, however, metrics to measure, quantify, and evaluate the societal bias in captions are not yet standardized. We provide a comprehensive study on the strengths and limitations of each metric, and propose LIC, a metric to study captioning bias amplification. We arg…
▽ More
We study societal bias amplification in image captioning. Image captioning models have been shown to perpetuate gender and racial biases, however, metrics to measure, quantify, and evaluate the societal bias in captions are not yet standardized. We provide a comprehensive study on the strengths and limitations of each metric, and propose LIC, a metric to study captioning bias amplification. We argue that, for image captioning, it is not enough to focus on the correct prediction of the protected attribute, and the whole context should be taken into account. We conduct extensive evaluation on traditional and state-of-the-art image captioning models, and surprisingly find that, by only focusing on the protected attribute prediction, bias mitigation models are unexpectedly amplifying bias.
△ Less
Submitted 29 March, 2022;
originally announced March 2022.
-
The Met Dataset: Instance-level Recognition for Artworks
Authors:
Nikolaos-Antonios Ypsilantis,
Noa Garcia,
Guangxing Han,
Sarah Ibrahimi,
Nanne Van Noord,
Giorgos Tolias
Abstract:
This work introduces a dataset for large-scale instance-level recognition in the domain of artworks. The proposed benchmark exhibits a number of different challenges such as large inter-class similarity, long tail distribution, and many classes. We rely on the open access collection of The Met museum to form a large training set of about 224k classes, where each class corresponds to a museum exhib…
▽ More
This work introduces a dataset for large-scale instance-level recognition in the domain of artworks. The proposed benchmark exhibits a number of different challenges such as large inter-class similarity, long tail distribution, and many classes. We rely on the open access collection of The Met museum to form a large training set of about 224k classes, where each class corresponds to a museum exhibit with photos taken under studio conditions. Testing is primarily performed on photos taken by museum guests depicting exhibits, which introduces a distribution shift between training and testing. Testing is additionally performed on a set of images not related to Met exhibits making the task resemble an out-of-distribution detection problem. The proposed benchmark follows the paradigm of other recent datasets for instance-level recognition on different domains to encourage research on domain independent approaches. A number of suitable approaches are evaluated to offer a testbed for future comparisons. Self-supervised and supervised contrastive learning are effectively combined to train the backbone which is used for non-parametric classification that is shown as a promising direction. Dataset webpage: http://cmp.felk.cvut.cz/met/
△ Less
Submitted 3 February, 2022;
originally announced February 2022.
-
Selecting the suitable resampling strategy for imbalanced data classification regarding dataset properties
Authors:
Mohamed S. Kraiem,
Fernando Sánchez-Hernández,
María N. Moreno-García
Abstract:
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class. This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples. Thus, the prediction model is unreliable although the ov…
▽ More
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class. This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples. Thus, the prediction model is unreliable although the overall model accuracy can be acceptable. Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class. However, their effectiveness depends on several factors mainly related to data intrinsic characteristics, such as imbalance ratio, dataset size and dimensionality, overlapping between classes or borderline examples. In this work, the impact of these factors is analyzed through a comprehensive comparative study involving 40 datasets from different application areas. The objective is to obtain models for automatic selection of the best resampling strategy for any dataset based on its characteristics. These models allow us to check several factors simultaneously considering a wide range of values since they are induced from very varied datasets that cover a broad spectrum of conditions. This differs from most studies that focus on the individual analysis of the characteristics or cover a small range of values. In addition, the study encompasses both basic and advanced resampling strategies that are evaluated by means of eight different performance metrics, including new measures specifically designed for imbalanced data classification. The general nature of the proposal allows the choice of the most appropriate method regardless of the domain, avoiding the search for special purpose techniques that could be valid for the target data.
△ Less
Submitted 15 December, 2021;
originally announced January 2022.
-
Transferring Domain-Agnostic Knowledge in Video Question Answering
Authors:
Tianran Wu,
Noa Garcia,
Mayu Otani,
Chenhui Chu,
Yuta Nakashima,
Haruo Takemura
Abstract:
Video question answering (VideoQA) is designed to answer a given question based on a relevant video clip. The current available large-scale datasets have made it possible to formulate VideoQA as the joint understanding of visual and language information. However, this training procedure is costly and still less competent with human performance. In this paper, we investigate a transfer learning met…
▽ More
Video question answering (VideoQA) is designed to answer a given question based on a relevant video clip. The current available large-scale datasets have made it possible to formulate VideoQA as the joint understanding of visual and language information. However, this training procedure is costly and still less competent with human performance. In this paper, we investigate a transfer learning method by the introduction of domain-agnostic knowledge and domain-specific knowledge. First, we develop a novel transfer learning framework, which finetunes the pre-trained model by applying domain-agnostic knowledge as the medium. Second, we construct a new VideoQA dataset with 21,412 human-generated question-answer samples for comparable transfer of knowledge. Our experiments show that: (i) domain-agnostic knowledge is transferable and (ii) our proposed transfer learning framework can boost VideoQA performance effectively.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
Detecting Renewal States in Chains of Variable Length via Intrinsic Bayes Factors
Authors:
Victor Freguglia,
Nancy Garcia
Abstract:
Markov chains with variable length are useful parsimonious stochastic models able to generate most stationary sequence of discrete symbols. The idea is to identify the suffixes of the past, called contexts, that are relevant to predict the future symbol. Sometimes a single state is a context, and looking at the past and finding this specific state makes the further past irrelevant. States with suc…
▽ More
Markov chains with variable length are useful parsimonious stochastic models able to generate most stationary sequence of discrete symbols. The idea is to identify the suffixes of the past, called contexts, that are relevant to predict the future symbol. Sometimes a single state is a context, and looking at the past and finding this specific state makes the further past irrelevant. States with such property are called renewal states and they can be used to split the chain into independent and identically distributed blocks. In order to identify renewal states for chains with variable length, we propose the use of Intrinsic Bayes Factor to evaluate the hypothesis that some particular state is a renewal state. In this case, the difficulty lies in integrating the marginal posterior distribution for the random context trees for general prior distribution on the space of context trees, with Dirichlet prior for the transition probabilities, and Monte Carlo methods are applied. To show the strength of our method, we analyzed artificial datasets generated from different binary models models and one example coming from the field of Linguistics.
△ Less
Submitted 6 January, 2022; v1 submitted 14 October, 2021;
originally announced October 2021.
-
Dynamic inference of user context through social tag embedding for music recommendation
Authors:
Diego Sánchez-Moreno,
Álvaro Lozano Murciego,
Vivian F. López Batista,
María Dolores Muñoz Vicente,
María N. Moreno-García
Abstract:
Music listening preferences at a given time depend on a wide range of contextual factors, such as user emotional state, location and activity at listening time, the day of the week, the time of the day, etc. It is therefore of great importance to take them into account when recommending music. However, it is very difficult to develop context-aware recommender systems that consider these factors, b…
▽ More
Music listening preferences at a given time depend on a wide range of contextual factors, such as user emotional state, location and activity at listening time, the day of the week, the time of the day, etc. It is therefore of great importance to take them into account when recommending music. However, it is very difficult to develop context-aware recommender systems that consider these factors, both because of the difficulty of detecting some of them, such as emotional state, and because of the drawbacks derived from the inclusion of many factors, such as sparsity problems in contextual pre-filtering. This work involves the proposal of a method for the detection of the user contextual state when listening to music based on the social tags of music items. The intrinsic characteristics of social tagging that allow for the description of items in multiple dimensions can be exploited to capture many contextual dimensions in the user listening sessions. The embeddings of the tags of the first items played in each session are used to represent the context of that session. Recommendations are then generated based on both user preferences and the similarity of the items computed from tag embeddings. Social tags have been used extensively in many recommender systems, however, to our knowledge, they have been hardly used to dynamically infer contextual states.
△ Less
Submitted 23 September, 2021;
originally announced September 2021.
-
Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation
Authors:
Zechen Bai,
Yuta Nakashima,
Noa Garcia
Abstract:
Have you ever looked at a painting and wondered what is the story behind it? This work presents a framework to bring art closer to people by generating comprehensive descriptions of fine-art paintings. Generating informative descriptions for artworks, however, is extremely challenging, as it requires to 1) describe multiple aspects of the image such as its style, content, or composition, and 2) pr…
▽ More
Have you ever looked at a painting and wondered what is the story behind it? This work presents a framework to bring art closer to people by generating comprehensive descriptions of fine-art paintings. Generating informative descriptions for artworks, however, is extremely challenging, as it requires to 1) describe multiple aspects of the image such as its style, content, or composition, and 2) provide background and contextual knowledge about the artist, their influences, or the historical period. To address these challenges, we introduce a multi-topic and knowledgeable art description framework, which modules the generated sentences according to three artistic topics and, additionally, enhances each description with external knowledge. The framework is validated through an exhaustive analysis, both quantitative and qualitative, as well as a comparative human evaluation, demonstrating outstanding results in terms of both topic diversity and information veracity.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.
-
Soccer line mark segmentation and classification with stochastic watershed transform
Authors:
Daniel Berjón,
Carlos Cuevas,
Narciso García
Abstract:
Augmented reality applications are beginning to change the way sports are broadcast, providing richer experiences and valuable insights to fans. The first step of augmented reality systems is camera calibration, possibly based on detecting the line markings of the playing field. Most existing proposals for line detection rely on edge detection and Hough transform, but radial distortion and extrane…
▽ More
Augmented reality applications are beginning to change the way sports are broadcast, providing richer experiences and valuable insights to fans. The first step of augmented reality systems is camera calibration, possibly based on detecting the line markings of the playing field. Most existing proposals for line detection rely on edge detection and Hough transform, but radial distortion and extraneous edges cause inaccurate or spurious detections of line markings. We propose a novel strategy to automatically and accurately segment and classify line markings. First, line points are segmented thanks to a stochastic watershed transform that is robust to radial distortions, since it makes no assumptions about line straightness, and is unaffected by the presence of players or the ball. The line points are then linked to primitive structures (straight lines and ellipses) thanks to a very efficient procedure that makes no assumptions about the number of primitives that appear in each image. The strategy has been tested on a new and public database composed by 60 annotated images from matches in five stadiums. The results obtained have proven that the proposed strategy is more robust and accurate than existing approaches, achieving successful line mark detection even in challenging conditions.
△ Less
Submitted 3 August, 2022; v1 submitted 13 August, 2021;
originally announced August 2021.
-
A Picture May Be Worth a Hundred Words for Visual Question Answering
Authors:
Yusuke Hirota,
Noa Garcia,
Mayu Otani,
Chenhui Chu,
Yuta Nakashima,
Ittetsu Taniguchi,
Takao Onoye
Abstract:
How far can we go with textual representations for understanding pictures? In image understanding, it is essential to use concise but detailed image representations. Deep visual features extracted by vision models, such as Faster R-CNN, are prevailing used in multiple tasks, and especially in visual question answering (VQA). However, conventional deep visual features may struggle to convey all the…
▽ More
How far can we go with textual representations for understanding pictures? In image understanding, it is essential to use concise but detailed image representations. Deep visual features extracted by vision models, such as Faster R-CNN, are prevailing used in multiple tasks, and especially in visual question answering (VQA). However, conventional deep visual features may struggle to convey all the details in an image as we humans do. Meanwhile, with recent language models' progress, descriptive text may be an alternative to this problem. This paper delves into the effectiveness of textual representations for image understanding in the specific context of VQA. We propose to take description-question pairs as input, instead of deep visual features, and fed them into a language-only Transformer model, simplifying the process and the computational cost. We also experiment with data augmentation techniques to increase the diversity in the training set and avoid learning statistical bias. Extensive evaluations have shown that textual representations require only about a hundred words to compete with deep visual features on both VQA 2.0 and VQA-CP v2.
△ Less
Submitted 25 June, 2021;
originally announced June 2021.
-
GCNBoost: Artwork Classification by Label Propagation through a Knowledge Graph
Authors:
Cheikh Brahim El Vaigh,
Noa Garcia,
Benjamin Renoust,
Chenhui Chu,
Yuta Nakashima,
Hajime Nagahara
Abstract:
The rise of digitization of cultural documents offers large-scale contents, opening the road for development of AI systems in order to preserve, search, and deliver cultural heritage. To organize such cultural content also means to classify them, a task that is very familiar to modern computer science. Contextual information is often the key to structure such real world data, and we propose to use…
▽ More
The rise of digitization of cultural documents offers large-scale contents, opening the road for development of AI systems in order to preserve, search, and deliver cultural heritage. To organize such cultural content also means to classify them, a task that is very familiar to modern computer science. Contextual information is often the key to structure such real world data, and we propose to use it in form of a knowledge graph. Such a knowledge graph, combined with content analysis, enhances the notion of proximity between artworks so it improves the performances in classification tasks. In this paper, we propose a novel use of a knowledge graph, that is constructed on annotated data and pseudo-labeled data. With label propagation, we boost artwork classification by training a model using a graph convolutional network, relying on the relationships between entities of the knowledge graph. Following a transductive learning framework, our experiments show that relying on a knowledge graph modeling the relations between labeled data and unlabeled data allows to achieve state-of-the-art results on multiple classification tasks on a dataset of paintings, and on a dataset of Buddha statues. Additionally, we show state-of-the-art results for the difficult case of dealing with unbalanced data, with the limitation of disregarding classes with extremely low degrees in the knowledge graph.
△ Less
Submitted 25 May, 2021;
originally announced May 2021.
-
Subjective Assessment Experiments That Recruit Few Observers With Repetitions (FOWR)
Authors:
Pablo Perez,
Lucjan Janowski,
Narciso Garcia,
Margaret Pinson
Abstract:
Recent studies have shown that it is possible to characterize subject bias and variance in subjective assessment tests. Apparent differences among subjects can, for the most part, be explained by random factors. Building on that theory, we propose a subjective test design where three to four team members each rate the stimuli multiple times. The results are comparable to a high performing objectiv…
▽ More
Recent studies have shown that it is possible to characterize subject bias and variance in subjective assessment tests. Apparent differences among subjects can, for the most part, be explained by random factors. Building on that theory, we propose a subjective test design where three to four team members each rate the stimuli multiple times. The results are comparable to a high performing objective metric. This provides a quick and simple way to analyze new technologies and perform pre-tests for subjective assessment.
△ Less
Submitted 20 July, 2022; v1 submitted 6 April, 2021;
originally announced April 2021.
-
The Internet Protocol -- Past, some current limitations and a glimpse of a possible future
Authors:
Nuno M. Garcia
Abstract:
The network layer is central to the networking scientific area. It is around the network layer that all the data communications develop, and one of its main tasks is to allow the identification of each single interface/machine between the potentially many interfaces in a network. This seminar addresses some of the issues that are usually presented to young Computer Science Engineering students in…
▽ More
The network layer is central to the networking scientific area. It is around the network layer that all the data communications develop, and one of its main tasks is to allow the identification of each single interface/machine between the potentially many interfaces in a network. This seminar addresses some of the issues that are usually presented to young Computer Science Engineering students in the course of several classes, but also presents some topics that are not address in networking courses. It is mostly focused on using Internet Protocol addresses in Local Area Networks, also considering issues that belong to the Wide Area Networks, such as data aggregation. This document summarizes the content of a seminar, therefore it comprehends both teaching and researching subject. The seminar starts with a history of the evolution of the communication protocols from the early days of networks up until IPv6. It describes a new approach to define the addresses of network interfaces using Variable Length Subnet Masks, as usually this is a not an easy task for Computer Science Engineering undergraduate students. This summary also describes some of the limitations of the data communication in todays' networks, proposing some solutions, where possible, including a novel mean of connectionless data transmission by using IPv6 addresses, by extension of previously published research. The way the seminar is organized provides a history to the past of the Internet Protocol, a view of some of its well-known current limitations, and a glimpse into a possible future regarding an improved connectionless layer 3 data transfer protocol.
△ Less
Submitted 3 April, 2021;
originally announced April 2021.
-
Methodology to Assess Quality, Presence, Empathy, Attitude, and Attention in 360-degree Videos for Immersive Communications
Authors:
Marta Orduna,
Pablo Pérez,
Jesús Gutiérrez,
Narciso García
Abstract:
This paper analyzes the joint assessment of quality, spatial and social presence, empathy, attitude, and attention in three conditions: (A)visualizing and rating the quality of contents in a Head-Mounted Display (HMD), (B)visualizing the contents in an HMD,and (C)visualizing the contents in an HMD where participants can see their hands and take notes. The experiment simulates an immersive communic…
▽ More
This paper analyzes the joint assessment of quality, spatial and social presence, empathy, attitude, and attention in three conditions: (A)visualizing and rating the quality of contents in a Head-Mounted Display (HMD), (B)visualizing the contents in an HMD,and (C)visualizing the contents in an HMD where participants can see their hands and take notes. The experiment simulates an immersive communication where participants attend conversations of different genres and from different acquisition perspectives in the context of international experiences. Video quality is evaluated with Single-Stimulus Discrete Quality Evaluation (SSDQE) methodology. Spatial and social presence are evaluated with questionnaires adapted from the literature. Initial empathy is assessed with Interpersonal Reactivity Index(IRI) and a questionnaire is designed to evaluate attitude. Attention is evaluated with 3 questions that had pass/fail answers. 54 participants were evenly distributed among A, B, and C conditions taking into account their international experience backgrounds, obtaining a diverse sample of participants. The results from the subjective test validate the proposed methodology in VR communications, showing that video quality experiments can be adapted to conditions imposed by experiments focused on the evaluation of socioemotional features in terms of contents of long-duration, actor and observer acquisition perspectives, and genre. In addition, the positive results related to the sense of presence imply that technology can be relevant in the analyzed use case. The acquisition perspective greatly influences social presence and all the contents have a positive impact on all participants on their attitude towards international experiences. The annotated dataset, Student Experiences Around the World dataset (SEAW-dataset), obtained from the experiment is made publicly available.
△ Less
Submitted 9 February, 2022; v1 submitted 3 March, 2021;
originally announced March 2021.
-
Understanding the Role of Scene Graphs in Visual Question Answering
Authors:
Vinay Damodaran,
Sharanya Chakravarthy,
Akshay Kumar,
Anjana Umapathy,
Teruko Mitamura,
Yuta Nakashima,
Noa Garcia,
Chenhui Chu
Abstract:
Visual Question Answering (VQA) is of tremendous interest to the research community with important applications such as aiding visually impaired users and image-based search. In this work, we explore the use of scene graphs for solving the VQA task. We conduct experiments on the GQA dataset which presents a challenging set of questions requiring counting, compositionality and advanced reasoning ca…
▽ More
Visual Question Answering (VQA) is of tremendous interest to the research community with important applications such as aiding visually impaired users and image-based search. In this work, we explore the use of scene graphs for solving the VQA task. We conduct experiments on the GQA dataset which presents a challenging set of questions requiring counting, compositionality and advanced reasoning capability, and provides scene graphs for a large number of images. We adopt image + question architectures for use with scene graphs, evaluate various scene graph generation techniques for unseen images, propose a training curriculum to leverage human-annotated and auto-generated scene graphs, and build late fusion architectures to learn from multiple image representations. We present a multi-faceted study into the use of scene graphs for VQA, making this work the first of its kind.
△ Less
Submitted 16 January, 2021; v1 submitted 14 January, 2021;
originally announced January 2021.
-
MROS: Runtime Adaptation For Robot Control Architectures
Authors:
Darko Bozhinoski,
Carlos Hernandez Corbato,
Mario Garzon Oviedo,
Gijs van der Hoorn,
Nadia Hammoudeh Garcia,
Harshavardhan Deshpande,
Jon Tjerngren,
Andrzej Wasowski
Abstract:
Known attempts to build autonomous robots rely on complex control architectures, often implemented with the Robot Operating System platform (ROS). Runtime adaptation is needed in these systems, to cope with component failures and with contingencies arising from dynamic environments-otherwise, these affect the reliability and quality of the mission execution. Existing proposals on how to build self…
▽ More
Known attempts to build autonomous robots rely on complex control architectures, often implemented with the Robot Operating System platform (ROS). Runtime adaptation is needed in these systems, to cope with component failures and with contingencies arising from dynamic environments-otherwise, these affect the reliability and quality of the mission execution. Existing proposals on how to build self-adaptive systems in robotics usually require a major re-design of the control architecture and rely on complex tools unfamiliar to the robotics community. Moreover, they are hard to reuse across applications.
This paper presents MROS: a model-based framework for run-time adaptation of robot control architectures based on ROS. MROS uses a combination of domain-specific languages to model architectural variants and captures mission quality concerns, and an ontology-based implementation of the MAPE-K and meta-control visions for run-time adaptation. The experiment results obtained applying MROS in two realistic ROS-based robotic demonstrators show the benefits of our approach in terms of the quality of the mission execution, and MROS' extensibility and re-usability across robotic applications.
△ Less
Submitted 23 November, 2021; v1 submitted 18 October, 2020;
originally announced October 2020.
-
Demographic Influences on Contemporary Art with Unsupervised Style Embeddings
Authors:
Nikolai Huckle,
Noa Garcia,
Yuta Nakashima
Abstract:
Computational art analysis has, through its reliance on classification tasks, prioritised historical datasets in which the artworks are already well sorted with the necessary annotations. Art produced today, on the other hand, is numerous and easily accessible, through the internet and social networks that are used by professional and amateur artists alike to display their work. Although this art,…
▽ More
Computational art analysis has, through its reliance on classification tasks, prioritised historical datasets in which the artworks are already well sorted with the necessary annotations. Art produced today, on the other hand, is numerous and easily accessible, through the internet and social networks that are used by professional and amateur artists alike to display their work. Although this art, yet unsorted in terms of style and genre, is less suited for supervised analysis, the data sources come with novel information that may help frame the visual content in equally novel ways. As a first step in this direction, we present contempArt, a multi-modal dataset of exclusively contemporary artworks. contempArt is a collection of paintings and drawings, a detailed graph network based on social connections on Instagram and additional socio-demographic information; all attached to 442 artists at the beginning of their career. We evaluate three methods suited for generating unsupervised style embeddings of images and correlate them with the remaining data. We find no connections between visual style on the one hand and social proximity, gender, and nationality on the other.
△ Less
Submitted 1 December, 2020; v1 submitted 30 September, 2020;
originally announced September 2020.
-
A Dataset and Baselines for Visual Question Answering on Art
Authors:
Noa Garcia,
Chentao Ye,
Zihua Liu,
Qingtao Hu,
Mayu Otani,
Chenhui Chu,
Yuta Nakashima,
Teruko Mitamura
Abstract:
Answering questions related to art pieces (paintings) is a difficult task, as it implies the understanding of not only the visual information that is shown in the picture, but also the contextual knowledge that is acquired through the study of the history of art. In this work, we introduce our first attempt towards building a new dataset, coined AQUA (Art QUestion Answering). The question-answer (…
▽ More
Answering questions related to art pieces (paintings) is a difficult task, as it implies the understanding of not only the visual information that is shown in the picture, but also the contextual knowledge that is acquired through the study of the history of art. In this work, we introduce our first attempt towards building a new dataset, coined AQUA (Art QUestion Answering). The question-answer (QA) pairs are automatically generated using state-of-the-art question generation methods based on paintings and comments provided in an existing art understanding dataset. The QA pairs are cleansed by crowdsourcing workers with respect to their grammatical correctness, answerability, and answers' correctness. Our dataset inherently consists of visual (painting-based) and knowledge (comment-based) questions. We also present a two-branch model as baseline, where the visual and knowledge questions are handled independently. We extensively compare our baseline model against the state-of-the-art models for question answering, and we provide a comprehensive study about the challenges and potential future directions for visual question answering on art.
△ Less
Submitted 28 August, 2020;
originally announced August 2020.
-
Time-Aware Music Recommender Systems: Modeling the Evolution of Implicit User Preferences and User Listening Habits in A Collaborative Filtering Approach
Authors:
Diego Sánchez-Moreno,
Yong Zheng,
María N. Moreno-García
Abstract:
Online streaming services have become the most popular way of listening to music. The majority of these services are endowed with recommendation mechanisms that help users to discover songs and artists that may interest them from the vast amount of music available. However, many are not reliable as they may not take into account contextual aspects or the ever-evolving user behavior. Therefore, it…
▽ More
Online streaming services have become the most popular way of listening to music. The majority of these services are endowed with recommendation mechanisms that help users to discover songs and artists that may interest them from the vast amount of music available. However, many are not reliable as they may not take into account contextual aspects or the ever-evolving user behavior. Therefore, it is necessary to develop systems that consider these aspects. In the field of music, time is one of the most important factors influencing user preferences and managing its effects, and is the motivation behind the work presented in this paper. Here, the temporal information regarding when songs are played is examined. The purpose is to model both the evolution of user preferences in the form of evolving implicit ratings and user listening behavior. In the collaborative filtering method proposed in this work, daily listening habits are captured in order to characterize users and provide them with more reliable recommendations. The results of the validation prove that this approach outperforms other methods in generating both context-aware and context-free recommendations
△ Less
Submitted 26 August, 2020;
originally announced August 2020.
-
Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions
Authors:
Noa Garcia,
Yuta Nakashima
Abstract:
To understand movies, humans constantly reason over the dialogues and actions shown in specific scenes and relate them to the overall storyline already seen. Inspired by this behaviour, we design ROLL, a model for knowledge-based video story question answering that leverages three crucial aspects of movie understanding: dialog comprehension, scene reasoning, and storyline recalling. In ROLL, each…
▽ More
To understand movies, humans constantly reason over the dialogues and actions shown in specific scenes and relate them to the overall storyline already seen. Inspired by this behaviour, we design ROLL, a model for knowledge-based video story question answering that leverages three crucial aspects of movie understanding: dialog comprehension, scene reasoning, and storyline recalling. In ROLL, each of these tasks is in charge of extracting rich and diverse information by 1) processing scene dialogues, 2) generating unsupervised video scene descriptions, and 3) obtaining external knowledge in a weakly supervised fashion. To answer a given question correctly, the information generated by each inspired-cognitive task is encoded via Transformers and fused through a modality weighting mechanism, which balances the information from the different sources. Exhaustive evaluation demonstrates the effectiveness of our approach, which yields a new state-of-the-art on two challenging video question answering datasets: KnowIT VQA and TVQA+.
△ Less
Submitted 17 July, 2020;
originally announced July 2020.
-
An Efficient Data Imputation Technique for Human Activity Recognition
Authors:
Ivan Miguel Pires,
Faisal Hussain,
Nuno M. Garcia,
Eftim Zdravevski
Abstract:
The tremendous applications of human activity recognition are surging its span from health monitoring systems to virtual reality applications. Thus, the automatic recognition of daily life activities has become significant for numerous applications. In recent years, many datasets have been proposed to train the machine learning models for efficient monitoring and recognition of human daily living…
▽ More
The tremendous applications of human activity recognition are surging its span from health monitoring systems to virtual reality applications. Thus, the automatic recognition of daily life activities has become significant for numerous applications. In recent years, many datasets have been proposed to train the machine learning models for efficient monitoring and recognition of human daily living activities. However, the performance of machine learning models in activity recognition is crucially affected when there are incomplete activities in a dataset, i.e., having missing samples in dataset captures. Therefore, in this work, we propose a methodology for extrapolating the missing samples of a dataset to better recognize the human daily living activities. The proposed method efficiently pre-processes the data captures and utilizes the k-Nearest Neighbors (KNN) imputation technique to extrapolate the missing samples in dataset captures. The proposed methodology elegantly extrapolated a similar pattern of activities as they were in the real dataset.
△ Less
Submitted 8 July, 2020;
originally announced July 2020.
-
FVV Live: A real-time free-viewpoint video system with consumer electronics hardware
Authors:
Pablo Carballeira,
Carlos Carmona,
César Díaz,
Daniel Berjón,
Daniel Corregidor,
Julián Cabrera,
Francisco Morán,
Carmen Doblado,
Sergio Arnaldo,
María del Mar Martín,
Narciso García
Abstract:
FVV Live is a novel end-to-end free-viewpoint video system, designed for low cost and real-time operation, based on off-the-shelf components. The system has been designed to yield high-quality free-viewpoint video using consumer-grade cameras and hardware, which enables low deployment costs and easy installation for immersive event-broadcasting or videoconferencing.
The paper describes the archi…
▽ More
FVV Live is a novel end-to-end free-viewpoint video system, designed for low cost and real-time operation, based on off-the-shelf components. The system has been designed to yield high-quality free-viewpoint video using consumer-grade cameras and hardware, which enables low deployment costs and easy installation for immersive event-broadcasting or videoconferencing.
The paper describes the architecture of the system, including acquisition and encoding of multiview plus depth data in several capture servers and virtual view synthesis on an edge server. All the blocks of the system have been designed to overcome the limitations imposed by hardware and network, which impact directly on the accuracy of depth data and thus on the quality of virtual view synthesis. The design of FVV Live allows for an arbitrary number of cameras and capture servers, and the results presented in this paper correspond to an implementation with nine stereo-based depth cameras.
FVV Live presents low motion-to-photon and end-to-end delays, which enables seamless free-viewpoint navigation and bilateral immersive communications. Moreover, the visual quality of FVV Live has been assessed through subjective assessment with satisfactory results, and additional comparative tests show that it is preferred over state-of-the-art DIBR alternatives.
△ Less
Submitted 1 July, 2020;
originally announced July 2020.
-
Identifying Packet Loss and Reordering Packets in Keyed UDP Transmissions
Authors:
Fábio Machado Gil,
Nuno M. Garcia,
Bárbara Matos,
Nuno Pombo,
Rossitza Goleva,
Ciprian Dobre
Abstract:
The User Datagram Protocol (UDP) and other similar protocols send the application data from the source machine to the destination machine inside segments, without foreseeing nor allowing for any type of control on the transmission or success metrics. These protocols are very convenient for e.g. real time data transmission. But when the reliability of the transmitted data is critical, other protoco…
▽ More
The User Datagram Protocol (UDP) and other similar protocols send the application data from the source machine to the destination machine inside segments, without foreseeing nor allowing for any type of control on the transmission or success metrics. These protocols are very convenient for e.g. real time data transmission. But when the reliability of the transmitted data is critical, other protocols termed as connection-oriented, allow for full control of the data transmission process, assuring that the received data is an exact copy of the transmitted data, e.g. the case of the Transmission Control Protocol (TCP). To sustain the increased functionality and features of the connection-oriented protocol, a set of mechanisms is implemented based on some specific fields of the segment header. These mechanisms result in a significant overhead in terms of the increased number of transmitted packets. This may further translate into significant delays, because of the additional number of switching and routing tasks, and eventually, because of more complex communications procedures, such as e.g. transmission window resizing, and of course, acknowledgement and sequence numbers updating. The two extremes of these communication modalities, one that has no control at all, and the other one that allows for full control, have resulted in the creation of an intermediate protocol that allows for a limited degree of knowledge on how successful a transmission was, and even for an eventual reordering of the segments that arrive out of sequence. This paper presents simulation results that confirm the efficiency of the new almost-reliable UDP protocol, termed Keyed User Datagram Protocol (or KUDP) for transmission of data that includes the ability to identify which packets were lost and to reorder packets that were received out-of-sequence, and points future tasks to be pursued in this research.
△ Less
Submitted 30 June, 2020;
originally announced July 2020.
-
FVV Live: Real-Time, Low-Cost, Free Viewpoint Video
Authors:
Daniel Berjón,
Pablo Carballeira,
Julián Cabrera,
Carlos Carmona,
Daniel Corregidor,
César Díaz,
Francisco Morán,
Narciso García
Abstract:
FVV Live is a novel real-time, low-latency, end-to-end free viewpoint system including capture, transmission, synthesis on an edge server and visualization and control on a mobile terminal. The system has been specially designed for low-cost and real-time operation, only using off-the-shelf components.
FVV Live is a novel real-time, low-latency, end-to-end free viewpoint system including capture, transmission, synthesis on an edge server and visualization and control on a mobile terminal. The system has been specially designed for low-cost and real-time operation, only using off-the-shelf components.
△ Less
Submitted 30 June, 2020;
originally announced June 2020.
-
Sentiment Analysis Based on Deep Learning: A Comparative Study
Authors:
Nhan Cach Dang,
María N. Moreno-García,
Fernando De la Prieta
Abstract:
The study of public opinion can provide us with valuable information. The analysis of sentiment on social networks, such as Twitter or Facebook, has become a powerful means of learning about the users' opinions and has a wide range of applications. However, the efficiency and accuracy of sentiment analysis is being hindered by the challenges encountered in natural language processing (NLP). In rec…
▽ More
The study of public opinion can provide us with valuable information. The analysis of sentiment on social networks, such as Twitter or Facebook, has become a powerful means of learning about the users' opinions and has a wide range of applications. However, the efficiency and accuracy of sentiment analysis is being hindered by the challenges encountered in natural language processing (NLP). In recent years, it has been demonstrated that deep learning models are a promising solution to the challenges of NLP. This paper reviews the latest studies that have employed deep learning to solve sentiment analysis problems, such as sentiment polarity. Models using term frequency-inverse document frequency (TF-IDF) and word embedding have been applied to a series of datasets. Finally, a comparative study has been conducted on the experimental results obtained for the different models and input features
△ Less
Submitted 5 June, 2020;
originally announced June 2020.
-
Predictive Modeling of ICU Healthcare-Associated Infections from Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling Approach
Authors:
Fernando Sánchez-Hernández,
Juan Carlos Ballesteros-Herráez,
Mohamed S. Kraiem,
Mercedes Sánchez-Barba,
María N. Moreno-García
Abstract:
Early detection of patients vulnerable to infections acquired in the hospital environment is a challenge in current health systems given the impact that such infections have on patient mortality and healthcare costs. This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units by means of machine-learning methods. Th…
▽ More
Early detection of patients vulnerable to infections acquired in the hospital environment is a challenge in current health systems given the impact that such infections have on patient mortality and healthcare costs. This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units by means of machine-learning methods. The aim is to support decision making addressed at reducing the incidence rate of infections. In this field, it is necessary to deal with the problem of building reliable classifiers from imbalanced datasets. We propose a clustering-based undersampling strategy to be used in combination with ensemble classifiers. A comparative study with data from 4616 patients was conducted in order to validate our proposal. We applied several single and ensemble classifiers both to the original dataset and to data preprocessed by means of different resampling methods. The results were analyzed by means of classic and recent metrics specifically designed for imbalanced data classification. They revealed that the proposal is more efficient in comparison with other approaches.
△ Less
Submitted 7 May, 2020;
originally announced May 2020.
-
A session-based song recommendation approach involving user characterization along the play power-law distribution
Authors:
Diego Sánchez-Moreno,
Vivian F. López Batista,
M. Dolores Muñoz Vicente,
Ana B. Gil González,
María N. Moreno-García
Abstract:
In recent years, streaming music platforms have become very popular mainly due to the huge number of songs these systems make available to users. This enormous availability means that recommendation mechanisms that help users to select the music they like need to be incorporated. However, developing reliable recommender systems in the music field involves dealing with many problems, some of which…
▽ More
In recent years, streaming music platforms have become very popular mainly due to the huge number of songs these systems make available to users. This enormous availability means that recommendation mechanisms that help users to select the music they like need to be incorporated. However, developing reliable recommender systems in the music field involves dealing with many problems, some of which are generic and widely studied in the literature, while others are specific to this application domain and are therefore less well-known. This work is focused on two important issues that have not received much attention: managing gray-sheep users and obtaining implicit ratings. The first one is usually addressed by resorting to content information that is often difficult to obtain. The other drawback is related to the sparsity problem that arises when there are obstacles to gather explicit ratings. In this work, the referred shortcomings are addressed by means of a recommendation approach based on the users' streaming sessions. The method is aimed at managing the well-known power-law probability distribution representing the listening behavior of users. This proposal improves the recommendation reliability of collaborative filtering methods while reducing the complexity of the procedures used so far to deal with the gray-sheep problem.
△ Less
Submitted 25 April, 2020;
originally announced April 2020.
-
Knowledge-Based Visual Question Answering in Videos
Authors:
Noa Garcia,
Mayu Otani,
Chenhui Chu,
Yuta Nakashima
Abstract:
We propose a novel video understanding task by fusing knowledge-based and video question answering. First, we introduce KnowIT VQA, a video dataset with 24,282 human-generated question-answer pairs about a popular sitcom. The dataset combines visual, textual and temporal coherence reasoning together with knowledge-based questions, which need of the experience obtained from the viewing of the serie…
▽ More
We propose a novel video understanding task by fusing knowledge-based and video question answering. First, we introduce KnowIT VQA, a video dataset with 24,282 human-generated question-answer pairs about a popular sitcom. The dataset combines visual, textual and temporal coherence reasoning together with knowledge-based questions, which need of the experience obtained from the viewing of the series to be answered. Second, we propose a video understanding model by combining the visual and textual video content with specific knowledge about the show. Our main findings are: (i) the incorporation of knowledge produces outstanding improvements for VQA in video, and (ii) the performance on KnowIT VQA still lags well behind human accuracy, indicating its usefulness for studying current video modelling limitations.
△ Less
Submitted 16 April, 2020;
originally announced April 2020.
-
Reduction of Surgical Risk Through the Evaluation of Medical Imaging Diagnostics
Authors:
Marco A. V. M. Grinet,
Nuno M. Garcia,
Ana I. R. Gouveia,
Jose A. F. Moutinho,
Abel J. P. Gomes
Abstract:
Computer aided diagnosis (CAD) of Breast Cancer (BRCA) images has been an active area of research in recent years. The main goals of this research is to develop reliable automatic methods for detecting and diagnosing different types of BRCA from diagnostic images. In this paper, we present a review of the state of the art CAD methods applied to magnetic resonance (MRI) and mammography images of BR…
▽ More
Computer aided diagnosis (CAD) of Breast Cancer (BRCA) images has been an active area of research in recent years. The main goals of this research is to develop reliable automatic methods for detecting and diagnosing different types of BRCA from diagnostic images. In this paper, we present a review of the state of the art CAD methods applied to magnetic resonance (MRI) and mammography images of BRCA patients. The review aims to provide an extensive introduction to different features extracted from BRCA images through texture and statistical analysis and to categorize deep learning frameworks and data structures capable of using metadata to aggregate relevant information to assist oncologists and radiologists. We divide the existing literature according to the imaging modality and into radiomics, machine learning, or combination of both. We also emphasize the difference between each modality and methods strengths and weaknesses and analyze their performance in detecting BRCA through a quantitative comparison. We compare the results of various approaches for implementing CAD systems for the detection of BRCA. Each approachs standard workflow components are reviewed and summary tables provided. We present an extensive literature review of radiomics feature extraction techniques and machine learning methods applied in BRCA diagnosis and detection, focusing on data preparation, data structures, pre processing and post processing strategies available in the literature. There is a growing interest on radiomic feature extraction and machine learning methods for BRCA detection through histopathological images, MRI and mammography images. However, there isnt a CAD method able to combine distinct data types to provide the best diagnostic results. Employing data fusion techniques to medical images and patient data could lead to improved detection and classification results.
△ Less
Submitted 8 March, 2020;
originally announced March 2020.
-
DMCL: Distillation Multiple Choice Learning for Multimodal Action Recognition
Authors:
Nuno C. Garcia,
Sarah Adel Bargal,
Vitaly Ablavsky,
Pietro Morerio,
Vittorio Murino,
Stan Sclaroff
Abstract:
In this work, we address the problem of learning an ensemble of specialist networks using multimodal data, while considering the realistic and challenging scenario of possible missing modalities at test time. Our goal is to leverage the complementary information of multiple modalities to the benefit of the ensemble and each individual network. We introduce a novel Distillation Multiple Choice Lear…
▽ More
In this work, we address the problem of learning an ensemble of specialist networks using multimodal data, while considering the realistic and challenging scenario of possible missing modalities at test time. Our goal is to leverage the complementary information of multiple modalities to the benefit of the ensemble and each individual network. We introduce a novel Distillation Multiple Choice Learning framework for multimodal data, where different modality networks learn in a cooperative setting from scratch, strengthening one another. The modality networks learned using our method achieve significantly higher accuracy than if trained separately, due to the guidance of other modalities. We evaluate this approach on three video action recognition benchmark datasets. We obtain state-of-the-art results in comparison to other approaches that work with missing modalities at test time.
△ Less
Submitted 23 December, 2019;
originally announced December 2019.