-
CP HDR: A feature point detection and description library for LDR and HDR images
Authors:
Artur Santos Nascimento,
Valter Guilherme Silva de Souza,
Daniel Oliveira Dantas,
Beatriz Trinchão Andrade
Abstract:
In computer vision, characteristics refer to image regions with unique properties, such as corners, edges, textures, or areas with high contrast. These regions can be represented through feature points (FPs). FP detection and description are fundamental steps to many computer vision tasks. Most FP detection and description methods use low dynamic range (LDR) images, sufficient for most application…
▽ More
In computer vision, characteristics refer to image regions with unique properties, such as corners, edges, textures, or areas with high contrast. These regions can be represented through feature points (FPs). FP detection and description are fundamental steps to many computer vision tasks. Most FP detection and description methods use low dynamic range (LDR) images, sufficient for most applications involving digital images. However, LDR images may have saturated pixels in scenes with extreme light conditions, which degrade FP detection. On the other hand, high dynamic range (HDR) images usually present a greater dynamic range but FP detection algorithms do not take advantage of all the information in such images. In this study, we present a systematic review of image detection and description algorithms that use HDR images as input. We developed a library called CP_HDR that implements the Harris corner detector, SIFT detector and descriptor, and two modifications of those algorithms specialized in HDR images, called SIFT for HDR (SfHDR) and Harris for HDR (HfHDR). Previous studies investigated the use of HDR images in FP detection, but we did not find studies investigating the use of HDR images in FP description. Using uniformity, repeatability rate, mean average precision, and matching rate metrics, we compared the performance of the CP_HDR algorithms using LDR and HDR images. We observed an increase in the uniformity of the distribution of FPs among the high-light, mid-light, and low-light areas of the images. The results show that using HDR images as input to detection algorithms improves performance and that SfHDR and HfHDR enhance FP description.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Evaluating Splitting Approaches in the Context of Student Dropout Prediction
Authors:
Bruno de M. Barros,
Hugo A. D. do Nascimento,
Raphael Guedes,
Sandro E. Monsueto
Abstract:
The prediction of academic dropout, with the aim of preventing it, is one of the current challenges of higher education institutions. Machine learning techniques are a great ally in this task. However, attention is needed in the way that academic data are used by such methods, so that it reflects the reality of the prediction problem under study and allows achieving good results. In this paper, we…
▽ More
The prediction of academic dropout, with the aim of preventing it, is one of the current challenges of higher education institutions. Machine learning techniques are a great ally in this task. However, attention is needed in the way that academic data are used by such methods, so that it reflects the reality of the prediction problem under study and allows achieving good results. In this paper, we study strategies for splitting and using academic data in order to create training and testing sets. Through a conceptual analysis and experiments with data from a public higher education institution, we show that a random proportional data splitting, and even a simple temporal splitting are not suitable for dropout prediction. The study indicates that a temporal splitting combined with a time-based selection of the students' incremental academic histories leads to the best strategy for the problem in question.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
Feature point detection in HDR images based on coefficient of variation
Authors:
Artur Santos Nascimento,
Welerson Augusto Lino de Jesus Melo,
Daniel Oliveira Dantas,
Beatriz Trinchão Andrade
Abstract:
Feature point (FP) detection is a fundamental step of many computer vision tasks. However, FP detectors are usually designed for low dynamic range (LDR) images. In scenes with extreme light conditions, LDR images present saturated pixels, which degrade FP detection. On the other hand, high dynamic range (HDR) images usually present no saturated pixels but FP detection algorithms do not take advant…
▽ More
Feature point (FP) detection is a fundamental step of many computer vision tasks. However, FP detectors are usually designed for low dynamic range (LDR) images. In scenes with extreme light conditions, LDR images present saturated pixels, which degrade FP detection. On the other hand, high dynamic range (HDR) images usually present no saturated pixels but FP detection algorithms do not take advantage of all the information present in such images. FP detection frequently relies on differential methods, which work well in LDR images. However, in HDR images, the differential operation response in bright areas overshadows the response in dark areas. As an alternative to standard FP detection methods, this study proposes an FP detector based on a coefficient of variation (CV) designed for HDR images. The CV operation adapts its response based on the standard deviation of pixels inside a window, working well in both dark and bright areas of HDR images. The proposed and standard detectors are evaluated by measuring their repeatability rate (RR) and uniformity. Our proposed detector shows better performance when compared to other standard state-of-the-art detectors. In uniformity metric, our proposed detector surpasses all the other algorithms. In other hand, when using the repeatability rate metric, the proposed detector is worse than Harris for HDR and SURF detectors.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
Enhancing Peak Network Traffic Prediction via Time-Series Decomposition
Authors:
Tucker Stewart,
Bin Yu,
Anderson Nascimento,
Juhua Hu
Abstract:
For network administration and maintenance, it is critical to anticipate when networks will receive peak volumes of traffic so that adequate resources can be allocated to service requests made to servers. In the event that sufficient resources are not allocated to servers, they can become prone to failure and security breaches. On the contrary, we would waste a lot of resources if we always alloca…
▽ More
For network administration and maintenance, it is critical to anticipate when networks will receive peak volumes of traffic so that adequate resources can be allocated to service requests made to servers. In the event that sufficient resources are not allocated to servers, they can become prone to failure and security breaches. On the contrary, we would waste a lot of resources if we always allocate the maximum amount of resources. Therefore, anticipating peak volumes in network traffic becomes an important problem. However, popular forecasting models such as Autoregressive Integrated Moving Average (ARIMA) forecast time-series data generally, thus lack in predicting peak volumes in these time-series. More than often, a time-series is a combination of different features, which may include but are not limited to 1) Trend, the general movement of the traffic volume, 2) Seasonality, the patterns repeated over some time periods (e.g. daily and monthly), and 3) Noise, the random changes in the data. Considering that the fluctuation of seasonality can be harmful for trend and peak prediction, we propose to extract seasonalities to facilitate the peak volume predictions in the time domain. The experiments on both synthetic and real network traffic data demonstrate the effectiveness of the proposed method.
△ Less
Submitted 9 March, 2023;
originally announced March 2023.
-
Secure Multiparty Computation for Synthetic Data Generation from Distributed Data
Authors:
Mayana Pereira,
Sikha Pentyala,
Anderson Nascimento,
Rafael T. de Sousa Jr.,
Martine De Cock
Abstract:
Legal and ethical restrictions on accessing relevant data inhibit data science research in critical domains such as health, finance, and education. Synthetic data generation algorithms with privacy guarantees are emerging as a paradigm to break this data logjam. Existing approaches, however, assume that the data holders supply their raw data to a trusted curator, who uses it as fuel for synthetic…
▽ More
Legal and ethical restrictions on accessing relevant data inhibit data science research in critical domains such as health, finance, and education. Synthetic data generation algorithms with privacy guarantees are emerging as a paradigm to break this data logjam. Existing approaches, however, assume that the data holders supply their raw data to a trusted curator, who uses it as fuel for synthetic data generation. This severely limits the applicability, as much of the valuable data in the world is locked up in silos, controlled by entities who cannot show their data to each other or a central aggregator without raising privacy concerns.
To overcome this roadblock, we propose the first solution in which data holders only share encrypted data for differentially private synthetic data generation. Data holders send shares to servers who perform Secure Multiparty Computation (MPC) computations while the original data stays encrypted.
We instantiate this idea in an MPC protocol for the Multiplicative Weights with Exponential Mechanism (MWEM) algorithm to generate synthetic data based on real data originating from many data holders without reliance on a single point of failure.
△ Less
Submitted 28 October, 2022; v1 submitted 13 October, 2022;
originally announced October 2022.
-
Data Centred Intelligent Geosciences: Research Agenda and Opportunities, Position Paper
Authors:
Aderson Farias do Nascimento,
Martin A. Musicante,
Umberto Souza da Costa,
Bruno M. Carvalho,
Marcus Alexandre Nunes,
Genoveva Vargas-Solar
Abstract:
This paper describes and discusses our vision to develop and reason about best practices and novel ways of curating data-centric geosciences knowledge (data, experiments, models, methods, conclusions, and interpretations). This knowledge is produced from applying statistical modelling, Machine Learning, and modern data analytics methods on geo-data collections. The problems address open methodolog…
▽ More
This paper describes and discusses our vision to develop and reason about best practices and novel ways of curating data-centric geosciences knowledge (data, experiments, models, methods, conclusions, and interpretations). This knowledge is produced from applying statistical modelling, Machine Learning, and modern data analytics methods on geo-data collections. The problems address open methodological questions in model building, models' assessment, prediction, and forecasting workflows.
△ Less
Submitted 20 August, 2022;
originally announced September 2022.
-
The Gamma Generalized Normal Distribution: A Descriptor of SAR Imagery
Authors:
G. M. Cordeiro,
R. J. Cintra,
L. C. Rêgo,
A. D. C. Nascimento
Abstract:
We propose a new four-parameter distribution for modeling synthetic aperture radar (SAR) imagery named the gamma generalized normal (GGN) by combining the gamma and generalized normal distributions. A mathematical characterization of the new distribution is provided by identifying the limit behavior and by calculating the density and moment expansions. The GGN model performance is evaluated on bot…
▽ More
We propose a new four-parameter distribution for modeling synthetic aperture radar (SAR) imagery named the gamma generalized normal (GGN) by combining the gamma and generalized normal distributions. A mathematical characterization of the new distribution is provided by identifying the limit behavior and by calculating the density and moment expansions. The GGN model performance is evaluated on both synthetic and actual data and, for that, maximum likelihood estimation and random number generation are discussed. The proposed distribution is compared with the beta generalized normal distribution (BGN), which has already shown to appropriately represent SAR imagery. The performance of these two distributions are measured by means of statistics which provide evidence that the GGN can outperform the BGN distribution in some contexts.
△ Less
Submitted 3 June, 2022;
originally announced June 2022.
-
PrivFairFL: Privacy-Preserving Group Fairness in Federated Learning
Authors:
Sikha Pentyala,
Nicola Neophytou,
Anderson Nascimento,
Martine De Cock,
Golnoosh Farnadi
Abstract:
Group fairness ensures that the outcome of machine learning (ML) based decision making systems are not biased towards a certain group of people defined by a sensitive attribute such as gender or ethnicity. Achieving group fairness in Federated Learning (FL) is challenging because mitigating bias inherently requires using the sensitive attribute values of all clients, while FL is aimed precisely at…
▽ More
Group fairness ensures that the outcome of machine learning (ML) based decision making systems are not biased towards a certain group of people defined by a sensitive attribute such as gender or ethnicity. Achieving group fairness in Federated Learning (FL) is challenging because mitigating bias inherently requires using the sensitive attribute values of all clients, while FL is aimed precisely at protecting privacy by not giving access to the clients' data. As we show in this paper, this conflict between fairness and privacy in FL can be resolved by combining FL with Secure Multiparty Computation (MPC) and Differential Privacy (DP). In doing so, we propose a method for training group-fair ML models in cross-device FL under complete and formal privacy guarantees, without requiring the clients to disclose their sensitive attribute values.
△ Less
Submitted 26 August, 2022; v1 submitted 23 May, 2022;
originally announced May 2022.
-
Simulation of machine learning-based 6G systems in virtual worlds
Authors:
Ailton Oliveira,
Felipe Bastos,
Isabela Trindade,
Walter Frazao,
Arthur Nascimento,
Diego Gomes,
Francisco Muller,
Aldebaro Klautau
Abstract:
Digital representations of the real world are being used in many applications, such as augmented reality. 6G systems will not only support use cases that rely on virtual worlds but also benefit from their rich contextual information to improve performance and reduce communication overhead. This paper focuses on the simulation of 6G systems that rely on a 3D representation of the environment, as ca…
▽ More
Digital representations of the real world are being used in many applications, such as augmented reality. 6G systems will not only support use cases that rely on virtual worlds but also benefit from their rich contextual information to improve performance and reduce communication overhead. This paper focuses on the simulation of 6G systems that rely on a 3D representation of the environment, as captured by cameras and other sensors. We present new strategies for obtaining paired MIMO channels and multimodal data. We also discuss trade-offs between speed and accuracy when generating channels via ray tracing. We finally provide beam selection simulation results to assess the proposed methodology.
△ Less
Submitted 15 April, 2022;
originally announced April 2022.
-
Training Differentially Private Models with Secure Multiparty Computation
Authors:
Sikha Pentyala,
Davis Railsback,
Ricardo Maia,
Rafael Dowsley,
David Melanson,
Anderson Nascimento,
Martine De Cock
Abstract:
We address the problem of learning a machine learning model from training data that originates at multiple data owners while providing formal privacy guarantees regarding the protection of each owner's data. Existing solutions based on Differential Privacy (DP) achieve this at the cost of a drop in accuracy. Solutions based on Secure Multiparty Computation (MPC) do not incur such accuracy loss but…
▽ More
We address the problem of learning a machine learning model from training data that originates at multiple data owners while providing formal privacy guarantees regarding the protection of each owner's data. Existing solutions based on Differential Privacy (DP) achieve this at the cost of a drop in accuracy. Solutions based on Secure Multiparty Computation (MPC) do not incur such accuracy loss but leak information when the trained model is made publicly available. We propose an MPC solution for training DP models. Our solution relies on an MPC protocol for model training, and an MPC protocol for perturbing the trained model coefficients with Laplace noise in a privacy-preserving manner. The resulting MPC+DP approach achieves higher accuracy than a pure DP approach while providing the same formal privacy guarantees. Our work obtained first place in the iDASH2021 Track III competition on confidential computing for secure genome analysis.
△ Less
Submitted 1 September, 2022; v1 submitted 5 February, 2022;
originally announced February 2022.
-
Zero-shot hashtag segmentation for multilingual sentiment analysis
Authors:
Ruan Chaves Rodrigues,
Marcelo Akira Inuzuka,
Juliana Resplande Sant'Anna Gomes,
Acquila Santos Rocha,
Iacer Calixto,
Hugo Alexandre Dantas do Nascimento
Abstract:
Hashtag segmentation, also known as hashtag decomposition, is a common step in preprocessing pipelines for social media datasets. It usually precedes tasks such as sentiment analysis and hate speech detection. For sentiment analysis in medium to low-resourced languages, previous research has demonstrated that a multilingual approach that resorts to machine translation can be competitive or superio…
▽ More
Hashtag segmentation, also known as hashtag decomposition, is a common step in preprocessing pipelines for social media datasets. It usually precedes tasks such as sentiment analysis and hate speech detection. For sentiment analysis in medium to low-resourced languages, previous research has demonstrated that a multilingual approach that resorts to machine translation can be competitive or superior to previous approaches to the task. We develop a zero-shot hashtag segmentation framework and demonstrate how it can be used to improve the accuracy of multilingual sentiment analysis pipelines. Our zero-shot framework establishes a new state-of-the-art for hashtag segmentation datasets, surpassing even previous approaches that relied on feature engineering and language models trained on in-domain data.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
Privacy-Preserving Training of Tree Ensembles over Continuous Data
Authors:
Samuel Adams,
Chaitali Choudhary,
Martine De Cock,
Rafael Dowsley,
David Melanson,
Anderson C. A. Nascimento,
Davis Railsback,
Jianwei Shen
Abstract:
Most existing Secure Multi-Party Computation (MPC) protocols for privacy-preserving training of decision trees over distributed data assume that the features are categorical. In real-life applications, features are often numerical. The standard ``in the clear'' algorithm to grow decision trees on data with continuous values requires sorting of training examples for each feature in the quest for an…
▽ More
Most existing Secure Multi-Party Computation (MPC) protocols for privacy-preserving training of decision trees over distributed data assume that the features are categorical. In real-life applications, features are often numerical. The standard ``in the clear'' algorithm to grow decision trees on data with continuous values requires sorting of training examples for each feature in the quest for an optimal cut-point in the range of feature values in each node. Sorting is an expensive operation in MPC, hence finding secure protocols that avoid such an expensive step is a relevant problem in privacy-preserving machine learning. In this paper we propose three more efficient alternatives for secure training of decision tree based models on data with continuous features, namely: (1) secure discretization of the data, followed by secure training of a decision tree over the discretized data; (2) secure discretization of the data, followed by secure training of a random forest over the discretized data; and (3) secure training of extremely randomized trees (``extra-trees'') on the original data. Approaches (2) and (3) both involve randomizing feature choices. In addition, in approach (3) cut-points are chosen randomly as well, thereby alleviating the need to sort or to discretize the data up front. We implemented all proposed solutions in the semi-honest setting with additive secret sharing based MPC. In addition to mathematically proving that all proposed approaches are correct and secure, we experimentally evaluated and compared them in terms of classification accuracy and runtime. We privately train tree ensembles over data sets with 1000s of instances or features in a few minutes, with accuracies that are at par with those obtained in the clear. This makes our solution orders of magnitude more efficient than the existing approaches, which are based on oblivious sorting.
△ Less
Submitted 4 June, 2021;
originally announced June 2021.
-
Round and Communication Balanced Protocols for Oblivious Evaluation of Finite State Machines
Authors:
Rafael Dowsley,
Caleb Horst,
Anderson C. A. Nascimento
Abstract:
We propose protocols for obliviously evaluating finite-state machines, i.e., the evaluation is shared between the provider of the finite-state machine and the provider of the input string in such a manner that neither party learns the other's input, and the states being visited are hidden from both. For alphabet size $|Σ|$, number of states $|Q|$, and input length $n$, previous solutions have eith…
▽ More
We propose protocols for obliviously evaluating finite-state machines, i.e., the evaluation is shared between the provider of the finite-state machine and the provider of the input string in such a manner that neither party learns the other's input, and the states being visited are hidden from both. For alphabet size $|Σ|$, number of states $|Q|$, and input length $n$, previous solutions have either required a number of rounds linear in $n$ or communication $Ω(n|Σ||Q|\log|Q|)$. Our solutions require 2 rounds with communication $O(n(|Σ|+|Q|\log|Q|))$. We present two different solutions to this problem, a two-party one and a setting with an untrusted but non-colluding helper.
△ Less
Submitted 6 January, 2022; v1 submitted 20 March, 2021;
originally announced March 2021.
-
A new interpretable unsupervised anomaly detection method based on residual explanation
Authors:
David F. N. Oliveira,
Lucio F. Vismari,
Alexandre M. Nascimento,
Jorge R. de Almeida Jr,
Paulo S. Cugnasca,
Joao B. Camargo Jr,
Leandro Almeida,
Rafael Gripp,
Marcelo Neves
Abstract:
Despite the superior performance in modeling complex patterns to address challenging problems, the black-box nature of Deep Learning (DL) methods impose limitations to their application in real-world critical domains. The lack of a smooth manner for enabling human reasoning about the black-box decisions hinder any preventive action to unexpected events, in which may lead to catastrophic consequenc…
▽ More
Despite the superior performance in modeling complex patterns to address challenging problems, the black-box nature of Deep Learning (DL) methods impose limitations to their application in real-world critical domains. The lack of a smooth manner for enabling human reasoning about the black-box decisions hinder any preventive action to unexpected events, in which may lead to catastrophic consequences. To tackle the unclearness from black-box models, interpretability became a fundamental requirement in DL-based systems, leveraging trust and knowledge by providing ways to understand the model's behavior. Although a current hot topic, further advances are still needed to overcome the existing limitations of the current interpretability methods in unsupervised DL-based models for Anomaly Detection (AD). Autoencoders (AE) are the core of unsupervised DL-based for AD applications, achieving best-in-class performance. However, due to their hybrid aspect to obtain the results (by requiring additional calculations out of network), only agnostic interpretable methods can be applied to AE-based AD. These agnostic methods are computationally expensive to process a large number of parameters. In this paper we present the RXP (Residual eXPlainer), a new interpretability method to deal with the limitations for AE-based AD in large-scale systems. It stands out for its implementation simplicity, low computational cost and deterministic behavior, in which explanations are obtained through the deviation analysis of reconstructed input features. In an experiment using data from a real heavy-haul railway line, the proposed method achieved superior performance compared to SHAP, demonstrating its potential to support decision making in large scale critical systems.
△ Less
Submitted 14 March, 2021;
originally announced March 2021.
-
Fast Privacy-Preserving Text Classification based on Secure Multiparty Computation
Authors:
Amanda Resende,
Davis Railsback,
Rafael Dowsley,
Anderson C. A. Nascimento,
Diego F. Aranha
Abstract:
We propose a privacy-preserving Naive Bayes classifier and apply it to the problem of private text classification. In this setting, a party (Alice) holds a text message, while another party (Bob) holds a classifier. At the end of the protocol, Alice will only learn the result of the classifier applied to her text input and Bob learns nothing. Our solution is based on Secure Multiparty Computation…
▽ More
We propose a privacy-preserving Naive Bayes classifier and apply it to the problem of private text classification. In this setting, a party (Alice) holds a text message, while another party (Bob) holds a classifier. At the end of the protocol, Alice will only learn the result of the classifier applied to her text input and Bob learns nothing. Our solution is based on Secure Multiparty Computation (SMC). Our Rust implementation provides a fast and secure solution for the classification of unstructured text. Applying our solution to the case of spam detection (the solution is generic, and can be used in any other scenario in which the Naive Bayes classifier can be employed), we can classify an SMS as spam or ham in less than 340ms in the case where the dictionary size of Bob's model includes all words (n = 5200) and Alice's SMS has at most m = 160 unigrams. In the case with n = 369 and m = 8 (the average of a spam SMS in the database), our solution takes only 21ms.
△ Less
Submitted 8 June, 2021; v1 submitted 18 January, 2021;
originally announced January 2021.
-
Towards A Personal Shopper's Dilemma: Time vs Cost
Authors:
Samiul Anwar,
Francesco Lettich,
Mario A. Nascimento
Abstract:
Consider a customer who needs to fulfill a shopping list, and also a personal shopper who is willing to buy and resell to customers the goods in their shopping lists. It is in the personal shopper's best interest to find (shopping) routes that (i) minimize the time serving a customer, in order to be able to serve more customers, and (ii) minimize the price paid for the goods, in order to maximize…
▽ More
Consider a customer who needs to fulfill a shopping list, and also a personal shopper who is willing to buy and resell to customers the goods in their shopping lists. It is in the personal shopper's best interest to find (shopping) routes that (i) minimize the time serving a customer, in order to be able to serve more customers, and (ii) minimize the price paid for the goods, in order to maximize his/her potential profit when reselling them. Those are typically competing criteria leading to what we refer to as the Personal Shopper's Dilemma query, i.e., to determine where to buy each of the required goods while attempting to optimize both criteria at the same time. Given the query's NP-hardness we propose a heuristic approach to determine a subset of the sub-optimal routes under any linear combination of the aforementioned criteria, i.e., the query's approximate linear skyline set. In order to measure the effectiveness of our approach we also introduce two new metrics, optimality and coverage gaps w.r.t. an optimal, but computationally expensive, baseline solution. Our experiments, using realistic city-scale datasets, show that our proposed approach is two orders of magnitude faster than the baseline and yields low values for the optimality and coverage gaps.
△ Less
Submitted 25 September, 2020; v1 submitted 26 August, 2020;
originally announced August 2020.
-
UniformAugment: A Search-free Probabilistic Data Augmentation Approach
Authors:
Tom Ching LingChen,
Ava Khonsari,
Amirreza Lashkari,
Mina Rafi Nazari,
Jaspreet Singh Sambee,
Mario A. Nascimento
Abstract:
Augmenting training datasets has been shown to improve the learning effectiveness for several computer vision tasks. A good augmentation produces an augmented dataset that adds variability while retaining the statistical properties of the original dataset. Some techniques, such as AutoAugment and Fast AutoAugment, have introduced a search phase to find a set of suitable augmentation policies for a…
▽ More
Augmenting training datasets has been shown to improve the learning effectiveness for several computer vision tasks. A good augmentation produces an augmented dataset that adds variability while retaining the statistical properties of the original dataset. Some techniques, such as AutoAugment and Fast AutoAugment, have introduced a search phase to find a set of suitable augmentation policies for a given model and dataset. This comes at the cost of great computational overhead, adding up to several thousand GPU hours. More recently RandAugment was proposed to substantially speedup the search phase by approximating the search space by a couple of hyperparameters, but still incurring non-negligible cost for tuning those. In this paper we show that, under the assumption that the augmentation space is approximately distribution invariant, a uniform sampling over the continuous space of augmentation transformations is sufficient to train highly effective models. Based on that result we propose UniformAugment, an automated data augmentation approach that completely avoids a search phase. In addition to discussing the theoretical underpinning supporting our approach, we also use the standard datasets, as well as established models for image classification, to show that UniformAugment's effectiveness is comparable to the aforementioned methods, while still being highly efficient by virtue of not requiring any search.
△ Less
Submitted 31 March, 2020;
originally announced March 2020.
-
Inline Detection of DGA Domains Using Side Information
Authors:
Raaghavi Sivaguru,
Jonathan Peck,
Femi Olumofin,
Anderson Nascimento,
Martine De Cock
Abstract:
Malware applications typically use a command and control (C&C) server to manage bots to perform malicious activities. Domain Generation Algorithms (DGAs) are popular methods for generating pseudo-random domain names that can be used to establish a communication between an infected bot and the C&C server. In recent years, machine learning based systems have been widely used to detect DGAs. There ar…
▽ More
Malware applications typically use a command and control (C&C) server to manage bots to perform malicious activities. Domain Generation Algorithms (DGAs) are popular methods for generating pseudo-random domain names that can be used to establish a communication between an infected bot and the C&C server. In recent years, machine learning based systems have been widely used to detect DGAs. There are several well known state-of-the-art classifiers in the literature that can detect DGA domain names in real-time applications with high predictive performance. However, these DGA classifiers are highly vulnerable to adversarial attacks in which adversaries purposely craft domain names to evade DGA detection classifiers. In our work, we focus on hardening DGA classifiers against adversarial attacks. To this end, we train and evaluate state-of-the-art deep learning and random forest (RF) classifiers for DGA detection using side information that is harder for adversaries to manipulate than the domain name itself. Additionally, the side information features are selected such that they are easily obtainable in practice to perform inline DGA detection. The performance and robustness of these models is assessed by exposing them to one day of real-traffic data as well as domains generated by adversarial attack algorithms. We found that the DGA classifiers that rely on both the domain name and side information have high performance and are more robust against adversaries.
△ Less
Submitted 12 March, 2020;
originally announced March 2020.
-
High Performance Logistic Regression for Privacy-Preserving Genome Analysis
Authors:
Martine De Cock,
Rafael Dowsley,
Anderson C. A. Nascimento,
Davis Railsback,
Jianwei Shen,
Ariel Todoki
Abstract:
In this paper, we present a secure logistic regression training protocol and its implementation, with a new subprotocol to securely compute the activation function. To the best of our knowledge, we present the fastest existing secure Multi-Party Computation implementation for training logistic regression models on high dimensional genome data distributed across a local area network.
In this paper, we present a secure logistic regression training protocol and its implementation, with a new subprotocol to securely compute the activation function. To the best of our knowledge, we present the fastest existing secure Multi-Party Computation implementation for training logistic regression models on high dimensional genome data distributed across a local area network.
△ Less
Submitted 3 March, 2020; v1 submitted 13 February, 2020;
originally announced February 2020.
-
Protecting Privacy of Users in Brain-Computer Interface Applications
Authors:
Anisha Agarwal,
Rafael Dowsley,
Nicholas D. McKinney,
Dongrui Wu,
Chin-Teng Lin,
Martine De Cock,
Anderson C. A. Nascimento
Abstract:
Machine learning (ML) is revolutionizing research and industry. Many ML applications rely on the use of large amounts of personal data for training and inference. Among the most intimate exploited data sources is electroencephalogram (EEG) data, a kind of data that is so rich with information that application developers can easily gain knowledge beyond the professed scope from unprotected EEG sign…
▽ More
Machine learning (ML) is revolutionizing research and industry. Many ML applications rely on the use of large amounts of personal data for training and inference. Among the most intimate exploited data sources is electroencephalogram (EEG) data, a kind of data that is so rich with information that application developers can easily gain knowledge beyond the professed scope from unprotected EEG signals, including passwords, ATM PINs, and other intimate data. The challenge we address is how to engage in meaningful ML with EEG data while protecting the privacy of users.
Hence, we propose cryptographic protocols based on Secure Multiparty Computation (SMC) to perform linear regression over EEG signals from many users in a fully privacy-preserving (PP) fashion, i.e.~such that each individual's EEG signals are not revealed to anyone else. To illustrate the potential of our secure framework, we show how it allows estimating the drowsiness of drivers from their EEG signals as would be possible in the unencrypted case, and at a very reasonable computational cost. Our solution is the first application of commodity-based SMC to EEG data, as well as the largest documented experiment of secret sharing based SMC in general, namely with 15 players involved in all the computations.
△ Less
Submitted 2 July, 2019;
originally announced July 2019.
-
Privacy-Preserving Classification of Personal Text Messages with Secure Multi-Party Computation: An Application to Hate-Speech Detection
Authors:
Devin Reich,
Ariel Todoki,
Rafael Dowsley,
Martine De Cock,
Anderson C. A. Nascimento
Abstract:
Classification of personal text messages has many useful applications in surveillance, e-commerce, and mental health care, to name a few. Giving applications access to personal texts can easily lead to (un)intentional privacy violations. We propose the first privacy-preserving solution for text classification that is provably secure. Our method, which is based on Secure Multiparty Computation (SMC…
▽ More
Classification of personal text messages has many useful applications in surveillance, e-commerce, and mental health care, to name a few. Giving applications access to personal texts can easily lead to (un)intentional privacy violations. We propose the first privacy-preserving solution for text classification that is provably secure. Our method, which is based on Secure Multiparty Computation (SMC), encompasses both feature extraction from texts, and subsequent classification with logistic regression and tree ensembles. We prove that when using our secure text classification method, the application does not learn anything about the text, and the author of the text does not learn anything about the text classification model used by the application beyond what is given by the classification result itself. We perform end-to-end experiments with an application for detecting hate speech against women and immigrants, demonstrating excellent runtime results without loss of accuracy.
△ Less
Submitted 12 March, 2021; v1 submitted 5 June, 2019;
originally announced June 2019.
-
On the Commitment Capacity of Unfair Noisy Channels
Authors:
Claude Crépeau,
Rafael Dowsley,
Anderson C. A. Nascimento
Abstract:
Noisy channels are a valuable resource from a cryptographic point of view. They can be used for exchanging secret-keys as well as realizing other cryptographic primitives such as commitment and oblivious transfer. To be really useful, noisy channels have to be consider in the scenario where a cheating party has some degree of control over the channel characteristics. Damgård et al. (EUROCRYPT 1999…
▽ More
Noisy channels are a valuable resource from a cryptographic point of view. They can be used for exchanging secret-keys as well as realizing other cryptographic primitives such as commitment and oblivious transfer. To be really useful, noisy channels have to be consider in the scenario where a cheating party has some degree of control over the channel characteristics. Damgård et al. (EUROCRYPT 1999) proposed a more realistic model where such level of control is permitted to an adversary, the so called unfair noisy channels, and proved that they can be used to obtain commitment and oblivious transfer protocols. Given that noisy channels are a precious resource for cryptographic purposes, one important question is determining the optimal rate in which they can be used. The commitment capacity has already been determined for the cases of discrete memoryless channels and Gaussian channels. In this work we address the problem of determining the commitment capacity of unfair noisy channels. We compute a single-letter characterization of the commitment capacity of unfair noisy channels. In the case where an adversary has no control over the channel (the fair case) our capacity reduces to the well-known capacity of a discrete memoryless binary symmetric channel.
△ Less
Submitted 26 May, 2019;
originally announced May 2019.
-
CharBot: A Simple and Effective Method for Evading DGA Classifiers
Authors:
Jonathan Peck,
Claire Nie,
Raaghavi Sivaguru,
Charles Grumer,
Femi Olumofin,
Bin Yu,
Anderson Nascimento,
Martine De Cock
Abstract:
Domain generation algorithms (DGAs) are commonly leveraged by malware to create lists of domain names which can be used for command and control (C&C) purposes. Approaches based on machine learning have recently been developed to automatically detect generated domain names in real-time. In this work, we present a novel DGA called CharBot which is capable of producing large numbers of unregistered d…
▽ More
Domain generation algorithms (DGAs) are commonly leveraged by malware to create lists of domain names which can be used for command and control (C&C) purposes. Approaches based on machine learning have recently been developed to automatically detect generated domain names in real-time. In this work, we present a novel DGA called CharBot which is capable of producing large numbers of unregistered domain names that are not detected by state-of-the-art classifiers for real-time detection of DGAs, including the recently published methods FANCI (a random forest based on human-engineered features) and LSTM.MI (a deep learning approach). CharBot is very simple, effective and requires no knowledge of the targeted DGA classifiers. We show that retraining the classifiers on CharBot samples is not a viable defense strategy. We believe these findings show that DGA classifiers are inherently vulnerable to adversarial attacks if they rely only on the domain name string to make a decision. Designing a robust DGA classifier may, therefore, necessitate the use of additional information besides the domain name alone. To the best of our knowledge, CharBot is the simplest and most efficient black-box adversarial attack against DGA classifiers proposed to date.
△ Less
Submitted 30 May, 2019; v1 submitted 3 May, 2019;
originally announced May 2019.
-
A Systematic Literature Review about the impact of Artificial Intelligence on Autonomous Vehicle Safety
Authors:
A. M. Nascimento,
L. F. Vismari,
C. B. S. T. Molina,
P. S. Cugnasca,
J. B. Camargo Jr.,
J. R. de Almeida Jr.,
R. Inam,
E. Fersman,
M. V. Marquezini,
A. Y. Hata
Abstract:
Autonomous Vehicles (AV) are expected to bring considerable benefits to society, such as traffic optimization and accidents reduction. They rely heavily on advances in many Artificial Intelligence (AI) approaches and techniques. However, while some researchers in this field believe AI is the core element to enhance safety, others believe AI imposes new challenges to assure the safety of these new…
▽ More
Autonomous Vehicles (AV) are expected to bring considerable benefits to society, such as traffic optimization and accidents reduction. They rely heavily on advances in many Artificial Intelligence (AI) approaches and techniques. However, while some researchers in this field believe AI is the core element to enhance safety, others believe AI imposes new challenges to assure the safety of these new AI-based systems and applications. In this non-convergent context, this paper presents a systematic literature review to paint a clear picture of the state of the art of the literature in AI on AV safety. Based on an initial sample of 4870 retrieved papers, 59 studies were selected as the result of the selection criteria detailed in the paper. The shortlisted studies were then mapped into six categories to answer the proposed research questions. An AV system model was proposed and applied to orient the discussions about the SLR findings. As a main result, we have reinforced our preliminary observation about the necessity of considering a serious safety agenda for the future studies on AI-based AV systems.
△ Less
Submitted 4 April, 2019;
originally announced April 2019.
-
In-Route Task Selection in Crowdsourcing
Authors:
Camila F. Costa,
Mario A. Nascimento
Abstract:
One important problem in crowdsourcing is that of assigning tasks to workers. We consider a scenario where a worker is traveling on a preferred/typical path (e.g., from school to home) and there is a set of tasks available to be performed. Furthermore, we assume that: each task yields a positive reward, the worker has the skills necessary to perform all available tasks and he/she is willing to pos…
▽ More
One important problem in crowdsourcing is that of assigning tasks to workers. We consider a scenario where a worker is traveling on a preferred/typical path (e.g., from school to home) and there is a set of tasks available to be performed. Furthermore, we assume that: each task yields a positive reward, the worker has the skills necessary to perform all available tasks and he/she is willing to possibly deviate from his/her preferred path as long as he/she travels at most a total given distance/time. We call this problem the In-Route Task Selection (IRTS) problem and investigate it using the skyline paradigm in order to obtain the exact set of non-dominated solutions, i.e., good and diverse solutions yielding different combinations of smaller or larger rewards while traveling more or less. This is a practically relevant problem as it empowers the worker as he/she can decide, in real time, which tasks suit his/her needs and/or availability better. After showing that the IRTS problem is NP-hard, we propose an exact (but expensive) solution and a few others practical heuristic solutions. While the exact solution is suitable only for reasonably small IRTS instances, the heuristic solutions can produce solutions with good values of precision and recall for problems of realistic sizes within practical, in fact most often sub-second, query processing time.
△ Less
Submitted 13 September, 2018;
originally announced September 2018.
-
VirtualIdentity: Privacy-Preserving User Profiling
Authors:
Sisi Wang,
Wing-Sea Poon,
Golnoosh Farnadi,
Caleb Horst,
Kebra Thompson,
Michael Nickels,
Rafael Dowsley,
Anderson C. A. Nascimento,
Martine De Cock
Abstract:
User profiling from user generated content (UGC) is a common practice that supports the business models of many social media companies. Existing systems require that the UGC is fully exposed to the module that constructs the user profiles. In this paper we show that it is possible to build user profiles without ever accessing the user's original data, and without exposing the trained machine learn…
▽ More
User profiling from user generated content (UGC) is a common practice that supports the business models of many social media companies. Existing systems require that the UGC is fully exposed to the module that constructs the user profiles. In this paper we show that it is possible to build user profiles without ever accessing the user's original data, and without exposing the trained machine learning models for user profiling -- which are the intellectual property of the company -- to the users of the social media site. We present VirtualIdentity, an application that uses secure multi-party cryptographic protocols to detect the age, gender and personality traits of users by classifying their user-generated text and personal pictures with trained support vector machine models in a privacy-preserving manner.
△ Less
Submitted 30 August, 2018;
originally announced August 2018.
-
On the Composability of Statistically Secure Random Oblivious Transfer
Authors:
Rafael Dowsley,
Jörn Müller-Quade,
Anderson C. A. Nascimento
Abstract:
We show that stand-alone statistically secure random oblivious transfer protocols based on two-party stateless primitives are statistically universally composable. I.e. they are simulatable secure with an unlimited adversary, an unlimited simulator and an unlimited environment machine. Our result implies that several previous oblivious transfer protocols in the literature which were proven secure…
▽ More
We show that stand-alone statistically secure random oblivious transfer protocols based on two-party stateless primitives are statistically universally composable. I.e. they are simulatable secure with an unlimited adversary, an unlimited simulator and an unlimited environment machine. Our result implies that several previous oblivious transfer protocols in the literature which were proven secure under weaker, non-composable definitions of security can actually be used in arbitrary statistically secure applications without lowering the security.
△ Less
Submitted 30 August, 2018;
originally announced August 2018.
-
GOTO Rankings Considered Helpful
Authors:
Emery Berger,
Stephen M. Blackburn,
Carla Brodley,
H. V. Jagadish,
Kathryn S. McKinley,
Mario A. Nascimento,
Minjeong Shin,
Lexing Xie
Abstract:
Rankings are a fact of life. Whether or not one likes them, they exist and are influential. Within academia, and in computer science in particular, rankings not only capture our attention but also widely influence people who have a limited understanding of computing science research, including prospective students, university administrators, and policy-makers. In short, rankings matter. This posit…
▽ More
Rankings are a fact of life. Whether or not one likes them, they exist and are influential. Within academia, and in computer science in particular, rankings not only capture our attention but also widely influence people who have a limited understanding of computing science research, including prospective students, university administrators, and policy-makers. In short, rankings matter. This position paper advocates for the adoption of "GOTO rankings": rankings that use Good data, are Open, Transparent, and Objective, and the rejection of rankings that do not meet these criteria.
△ Less
Submitted 24 April, 2019; v1 submitted 29 June, 2018;
originally announced July 2018.
-
Supersingular Isogeny Oblivious Transfer (SIOT)
Authors:
Paulo Barreto,
Anderson Nascimento,
Glaucio Oliveira,
Waldyr Benits
Abstract:
We present an oblivious transfer (OT) protocol that combines the OT scheme of Chou and Orlandi together with thesupersingular isogeny Diffie-Hellman (SIDH) primitive of De Feo, Jao, and Plût. Our construction is a candidate for post-quantum secure OT and demonstrates that SIDH naturally supports OT functionality. We consider the protocol in the simplest configuration of $\binom{2}{1}$-OT and analy…
▽ More
We present an oblivious transfer (OT) protocol that combines the OT scheme of Chou and Orlandi together with thesupersingular isogeny Diffie-Hellman (SIDH) primitive of De Feo, Jao, and Plût. Our construction is a candidate for post-quantum secure OT and demonstrates that SIDH naturally supports OT functionality. We consider the protocol in the simplest configuration of $\binom{2}{1}$-OT and analyze the protocol to verify its security.
△ Less
Submitted 12 January, 2021; v1 submitted 16 May, 2018;
originally announced May 2018.
-
Detecting Changes in Fully Polarimetric SAR Imagery with Statistical Information Theory
Authors:
Abraão D. C. Nascimento,
Alejandro C. Frery,
Renato J. Cintra
Abstract:
Images obtained from coherent illumination processes are contaminated with speckle. A prominent example of such imagery systems is the polarimetric synthetic aperture radar (PolSAR). For such remote sensing tool the speckle interference pattern appears in the form of a positive definite Hermitian matrix, which requires specialized models and makes change detection a hard task. The scaled complex W…
▽ More
Images obtained from coherent illumination processes are contaminated with speckle. A prominent example of such imagery systems is the polarimetric synthetic aperture radar (PolSAR). For such remote sensing tool the speckle interference pattern appears in the form of a positive definite Hermitian matrix, which requires specialized models and makes change detection a hard task. The scaled complex Wishart distribution is a widely used model for PolSAR images. Such distribution is defined by two parameters: the number of looks and the complex covariance matrix. The last parameter contains all the necessary information to characterize the backscattered data and, thus, identifying changes in a sequence of images can be formulated as a problem of verifying whether the complex covariance matrices differ at two or more takes. This paper proposes a comparison between a classical change detection method based on the likelihood ratio and three statistical methods that depend on information-theoretic measures: the Kullback-Leibler distance and two entropies. The performance of these four tests was quantified in terms of their sample test powers and sizes using simulated data. The tests are then applied to actual PolSAR data. The results provide evidence that tests based on entropies may outperform those based on the Kullback-Leibler distance and likelihood ratio statistics.
△ Less
Submitted 16 September, 2018; v1 submitted 26 January, 2018;
originally announced January 2018.
-
A Characterization of Mobility Management in User-centric Networks
Authors:
Andrea Nascimento,
Rute Sofia,
Tiago Condeixa,
Susana Sargento
Abstract:
Mobility management is a key aspect to consider in future Internet architectures, as these architectures include a highly nomadic end-user which often relies on services provided by multi-access networks. In contrast, today's mobility management solutions were designed having in mind simpler scenarios and requirements from the network and where roaming could often be taken care of with previously…
▽ More
Mobility management is a key aspect to consider in future Internet architectures, as these architectures include a highly nomadic end-user which often relies on services provided by multi-access networks. In contrast, today's mobility management solutions were designed having in mind simpler scenarios and requirements from the network and where roaming could often be taken care of with previously established agreements. With a more dynamic behavior in the network, and also with a more prominent role from the end-user, mobility management has to deal with additional requirements derived from new Internet paradigms. To assist in understanding such requirements and also how to deal with them, this paper proposes a starting point to dismantle current mobility management notions. Our contribution is an initial proposal on defining mobility management in concrete functional blocks, their interaction, as well as a potential grouping which later can assist in deriving novel and more flexible mobility management architectures.
△ Less
Submitted 10 November, 2017;
originally announced November 2017.
-
A Framework for Efficient Adaptively Secure Composable Oblivious Transfer in the ROM
Authors:
Paulo S. L. M. Barreto,
Bernardo David,
Rafael Dowsley,
Kirill Morozov,
Anderson C. A. Nascimento
Abstract:
Oblivious Transfer (OT) is a fundamental cryptographic protocol that finds a number of applications, in particular, as an essential building block for two-party and multi-party computation. We construct a round-optimal (2 rounds) universally composable (UC) protocol for oblivious transfer secure against active adaptive adversaries from any OW-CPA secure public-key encryption scheme with certain pr…
▽ More
Oblivious Transfer (OT) is a fundamental cryptographic protocol that finds a number of applications, in particular, as an essential building block for two-party and multi-party computation. We construct a round-optimal (2 rounds) universally composable (UC) protocol for oblivious transfer secure against active adaptive adversaries from any OW-CPA secure public-key encryption scheme with certain properties in the random oracle model (ROM). In terms of computation, our protocol only requires the generation of a public/secret-key pair, two encryption operations and one decryption operation, apart from a few calls to the random oracle. In~terms of communication, our protocol only requires the transfer of one public-key, two ciphertexts, and three binary strings of roughly the same size as the message. Next, we show how to instantiate our construction under the low noise LPN, McEliece, QC-MDPC, LWE, and CDH assumptions. Our instantiations based on the low noise LPN, McEliece, and QC-MDPC assumptions are the first UC-secure OT protocols based on coding assumptions to achieve: 1) adaptive security, 2) optimal round complexity, 3) low communication and computational complexities. Previous results in this setting only achieved static security and used costly cut-and-choose techniques.Our instantiation based on CDH achieves adaptive security at the small cost of communicating only two more group elements as compared to the gap-DH based Simplest OT protocol of Chou and Orlandi (Latincrypt 15), which only achieves static security in the ROM.
△ Less
Submitted 23 October, 2017;
originally announced October 2017.
-
Efficient Computation of Multiple Density-Based Clustering Hierarchies
Authors:
Antonio Cavalcante Araujo Neto,
Joerg Sander,
Ricardo J. G. B. Campello,
Mario A. Nascimento
Abstract:
HDBSCAN*, a state-of-the-art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mpts. While the performance of HDBSCAN* is robust w.r.t. mpts in the sense that a small change in mpts typically leads to only a small or no change in the clustering structure, choosing a "good" mpts value can be challenging: depending on the d…
▽ More
HDBSCAN*, a state-of-the-art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mpts. While the performance of HDBSCAN* is robust w.r.t. mpts in the sense that a small change in mpts typically leads to only a small or no change in the clustering structure, choosing a "good" mpts value can be challenging: depending on the data distribution, a high or low value for mpts may be more appropriate, and certain data clusters may reveal themselves at different values of mpts. To explore results for a range of mpts values, however, one has to run HDBSCAN* for each value in the range independently, which is computationally inefficient. In this paper, we propose an efficient approach to compute all HDBSCAN* hierarchies for a range of mpts values by replacing the graph used by HDBSCAN* with a much smaller graph that is guaranteed to contain the required information. An extensive experimental evaluation shows that with our approach one can obtain over one hundred hierarchies for the computational cost equivalent to running HDBSCAN* about 2 times.
△ Less
Submitted 7 June, 2018; v1 submitted 13 September, 2017;
originally announced September 2017.
-
Commitment and Oblivious Transfer in the Bounded Storage Model with Errors
Authors:
Rafael Dowsley,
Felipe Lacerda,
Anderson C. A. Nascimento
Abstract:
The bounded storage model restricts the memory of an adversary in a cryptographic protocol, rather than restricting its computational power, making information theoretically secure protocols feasible. We present the first protocols for commitment and oblivious transfer in the bounded storage model with errors, i.e., the model where the public random sources available to the two parties are not exa…
▽ More
The bounded storage model restricts the memory of an adversary in a cryptographic protocol, rather than restricting its computational power, making information theoretically secure protocols feasible. We present the first protocols for commitment and oblivious transfer in the bounded storage model with errors, i.e., the model where the public random sources available to the two parties are not exactly the same, but instead are only required to have a small Hamming distance between themselves. Commitment and oblivious transfer protocols were known previously only for the error-free variant of the bounded storage model, which is harder to realize.
△ Less
Submitted 24 October, 2017; v1 submitted 22 October, 2015;
originally announced October 2015.
-
Optimal Time-dependent Sequenced Route Queries in Road Networks
Authors:
Camila F. Costa,
Mario A. Nascimento,
Jose A. F. Macedo,
Yannis Theodoridis,
Nikos Pelekis,
Javam Machado
Abstract:
In this paper we present an algorithm for optimal processing of time-dependent sequenced route queries in road networks, i.e., given a road network where the travel time over an edge is time-dependent and a given ordered list of categories of interest, we find the fastest route between an origin and destination that passes through a sequence of points of interest belonging to each of the specified…
▽ More
In this paper we present an algorithm for optimal processing of time-dependent sequenced route queries in road networks, i.e., given a road network where the travel time over an edge is time-dependent and a given ordered list of categories of interest, we find the fastest route between an origin and destination that passes through a sequence of points of interest belonging to each of the specified categories of interest. For instance, considering a city road network at a given departure time, one can find the fastest route between one's work and his/her home, passing through a bank, a supermarket and a restaurant, in this order. The main contribution of our work is the consideration of the time dependency of the network, a realistic characteristic of urban road networks, which has not been considered previously when addressing the optimal sequenced route query. Our approach uses the A* search paradigm that is equipped with an admissible heuristic function, thus guaranteed to yield the optimal solution, along with a pruning scheme for further reducing the search space. In order to compare our proposal we extended a previously proposed solution aimed at non-time dependent sequenced route queries, enabling it to deal with the time-dependency. Our experiments using real and synthetic data sets have shown our proposed solution to be up to two orders of magnitude faster than the temporally extended previous solution.
△ Less
Submitted 6 September, 2015;
originally announced September 2015.
-
Agile governance in Information and Communication Technologies: shifting paradigms
Authors:
Alexandre J. H. de O. Luna,
Cleyverson P. Costa,
Hermano P. de Moura,
Magdala A. Novaes,
Cesar A. D. C. do Nascimento
Abstract:
This paper presents the basis of the Agile Governance in Information and Communication Technology (ICT), which is based on Agile Software Engineering Methodologies principles and values. Its development was done through a systematic review process, supported by Bibliometrics and Scientometrics methods and techniques, where the Critical Success Factors (CSF) of ICT Governance projects and the princ…
▽ More
This paper presents the basis of the Agile Governance in Information and Communication Technology (ICT), which is based on Agile Software Engineering Methodologies principles and values. Its development was done through a systematic review process, supported by Bibliometrics and Scientometrics methods and techniques, where the Critical Success Factors (CSF) of ICT Governance projects and the principles of the Agile Manifesto were analyzed. Next, through an inductive approach, focused on the convergence between the concepts involved, it was analyzed how agile principles could help to minimize the gap between ICT and business. Evidences of their occurrence were taken through a Conceptual Survey Research. As a result, the foundations and concepts of Agile Governance in ICT were defined and, finally, the development of a reference model was proposed as a future work.
△ Less
Submitted 10 November, 2014;
originally announced November 2014.
-
A GPU-based parallel algorithm for enumerating all chordless cycles in graphs
Authors:
Elisângela Silva Dias,
Diane Castonguay,
Humberto Longo,
Walid Abdala Rfaei Jradi,
Hugo A. D. do Nascimento
Abstract:
In a finite undirected simple graph, a chordless cycle is an induced subgraph which is a cycle. We propose a GPU parallel algorithm for enumerating all chordless cycles of such a graph. The algorithm, implemented in OpenCL, is based on a previous sequential algorithm developed by the current authors for the same problem. It uses a more compact data structure for solution representation which is su…
▽ More
In a finite undirected simple graph, a chordless cycle is an induced subgraph which is a cycle. We propose a GPU parallel algorithm for enumerating all chordless cycles of such a graph. The algorithm, implemented in OpenCL, is based on a previous sequential algorithm developed by the current authors for the same problem. It uses a more compact data structure for solution representation which is suitable for the memory-size limitation of a GPU. Moreover, for graphs with a sufficiently large amount of chordless cycles, the algorithm presents a significant improvement in execution time that outperforms the sequential method.
△ Less
Submitted 25 June, 2015; v1 submitted 17 October, 2014;
originally announced October 2014.
-
On the Oblivious Transfer Capacity of Generalized Erasure Channels against Malicious Adversaries
Authors:
Rafael Dowsley,
Anderson C. A. Nascimento
Abstract:
Noisy channels are a powerful resource for cryptography as they can be used to obtain information-theoretically secure key agreement, commitment and oblivious transfer protocols, among others. Oblivious transfer (OT) is a fundamental primitive since it is complete for secure multi-party computation, and the OT capacity characterizes how efficiently a channel can be used for obtaining string oblivi…
▽ More
Noisy channels are a powerful resource for cryptography as they can be used to obtain information-theoretically secure key agreement, commitment and oblivious transfer protocols, among others. Oblivious transfer (OT) is a fundamental primitive since it is complete for secure multi-party computation, and the OT capacity characterizes how efficiently a channel can be used for obtaining string oblivious transfer. Ahlswede and Csiszár (\emph{ISIT'07}) presented upper and lower bounds on the OT capacity of generalized erasure channels (GEC) against passive adversaries. In the case of GEC with erasure probability at least 1/2, the upper and lower bounds match and therefore the OT capacity was determined. It was later proved by Pinto et al. (\emph{IEEE Trans. Inf. Theory 57(8)}) that in this case there is also a protocol against malicious adversaries achieving the same lower bound, and hence the OT capacity is identical for passive and malicious adversaries. In the case of GEC with erasure probability smaller than 1/2, the known lower bound against passive adversaries that was established by Ahlswede and Csiszár does not match their upper bound and it was unknown whether this OT rate could be achieved against malicious adversaries as well. In this work we show that there is a protocol against malicious adversaries achieving the same OT rate that was obtained against passive adversaries.
In order to obtain our results we introduce a novel use of interactive hashing that is suitable for dealing with the case of low erasure probability ($p^* <1/2$).
△ Less
Submitted 10 October, 2014;
originally announced October 2014.
-
Towards Knowledge-Enriched Path Computation
Authors:
Georgios Skoumas,
Klaus Arthur Schmid,
Gregor Jossé,
Andreas Züfle,
Mario A. Nascimento,
Matthias Renz,
Dieter Pfoser
Abstract:
Directions and paths, as commonly provided by navigation systems, are usually derived considering absolute metrics, e.g., finding the shortest path within an underlying road network. With the aid of crowdsourced geospatial data we aim at obtaining paths that do not only minimize distance but also lead through more popular areas using knowledge generated by users. We extract spatial relations such…
▽ More
Directions and paths, as commonly provided by navigation systems, are usually derived considering absolute metrics, e.g., finding the shortest path within an underlying road network. With the aid of crowdsourced geospatial data we aim at obtaining paths that do not only minimize distance but also lead through more popular areas using knowledge generated by users. We extract spatial relations such as "nearby" or "next to" from travel blogs, that define closeness between pairs of points of interest (PoIs) and quantify each of these relations using a probabilistic model. Subsequently, we create a relationship graph where each node corresponds to a PoI and each edge describes the spatial connection between the respective PoIs. Using Bayesian inference we obtain a probabilistic measure of spatial closeness according to the crowd. Applying this measure to the corresponding road network, we obtain an altered cost function which does not exclusively rely on distance, and enriches an actual road networks taking crowdsourced spatial relations into account. Finally, we propose two routing algorithms on the enriched road networks. To evaluate our approach, we use Flickr photo data as a ground truth for popularity. Our experimental results -- based on real world datasets -- show that the paths computed w.r.t.\ our alternative cost function yield competitive solutions in terms of path length while also providing more "popular" paths, making routing easier and more informative for the user.
△ Less
Submitted 9 September, 2014;
originally announced September 2014.
-
Bias Correction and Modified Profile Likelihood under the Wishart Complex Distribution
Authors:
Abraão D. C. Nascimento,
Alejandro C. Frery,
Renato J. Cintra
Abstract:
This paper proposes improved methods for the maximum likelihood (ML) estimation of the equivalent number of looks $L$. This parameter has a meaningful interpretation in the context of polarimetric synthetic aperture radar (PolSAR) images. Due to the presence of coherent illumination in their processing, PolSAR systems generate images which present a granular noise called speckle. As a potential so…
▽ More
This paper proposes improved methods for the maximum likelihood (ML) estimation of the equivalent number of looks $L$. This parameter has a meaningful interpretation in the context of polarimetric synthetic aperture radar (PolSAR) images. Due to the presence of coherent illumination in their processing, PolSAR systems generate images which present a granular noise called speckle. As a potential solution for reducing such interference, the parameter $L$ controls the signal-noise ratio. Thus, the proposal of efficient estimation methodologies for $L$ has been sought. To that end, we consider firstly that a PolSAR image is well described by the scaled complex Wishart distribution. In recent years, Anfinsen et al. derived and analyzed estimation methods based on the ML and on trace statistical moments for obtaining the parameter $L$ of the unscaled version of such probability law. This paper generalizes that approach. We present the second-order bias expression proposed by Cox and Snell for the ML estimator of this parameter. Moreover, the formula of the profile likelihood modified by Barndorff-Nielsen in terms of $L$ is discussed. Such derivations yield two new ML estimators for the parameter $L$, which are compared to the estimators proposed by Anfinsen et al. The performance of these estimators is assessed by means of Monte Carlo experiments, adopting three statistical measures as comparison criterion: the mean square error, the bias, and the coefficient of variation. Equivalently to the simulation study, an application to actual PolSAR data concludes that the proposed estimators outperform all the others in homogeneous scenarios.
△ Less
Submitted 18 April, 2014;
originally announced April 2014.
-
Improving Memory Hierarchy Utilisation for Stencil Computations on Multicore Machines
Authors:
Alexandre Sena,
Aline Nascimento,
Cristina Boeres,
Vinod E. F. Rebello,
André Bulcão
Abstract:
Although modern supercomputers are composed of multicore machines, one can find scientists that still execute their legacy applications which were developed to monocore cluster where memory hierarchy is dedicated to a sole core. The main objective of this paper is to propose and evaluate an algorithm that identify an efficient blocksize to be applied on MPI stencil computations on multicore machin…
▽ More
Although modern supercomputers are composed of multicore machines, one can find scientists that still execute their legacy applications which were developed to monocore cluster where memory hierarchy is dedicated to a sole core. The main objective of this paper is to propose and evaluate an algorithm that identify an efficient blocksize to be applied on MPI stencil computations on multicore machines. Under the light of an extensive experimental analysis, this work shows the benefits of identifying blocksizes that will dividing data on the various cores and suggest a methodology that explore the memory hierarchy available in modern machines.
△ Less
Submitted 30 October, 2013;
originally announced October 2013.
-
Mining the Temporal Evolution of the Android Bug Reporting Community via Sliding Windows
Authors:
Feng Jiang,
Jiemin Wang,
Abram Hindle,
Mario A. Nascimento
Abstract:
The open source development community consists of both paid and volunteer developers as well as new and experienced users. Previous work has applied social network analysis (SNA) to open source communities and has demonstrated value in expertise discovery and triaging. One problem with applying SNA directly to the data of the entire project lifetime is that the impact of local activities will be d…
▽ More
The open source development community consists of both paid and volunteer developers as well as new and experienced users. Previous work has applied social network analysis (SNA) to open source communities and has demonstrated value in expertise discovery and triaging. One problem with applying SNA directly to the data of the entire project lifetime is that the impact of local activities will be drowned out. In this paper we provide a method for aggregating, analyzing, and visualizing local (small time periods) interactions of bug reporting participants by using the SNA to measure the betweeness centrality of these participants. In particular we mined the Android bug repository by producing social networks from overlapping 30-day windows of bug reports, each sliding over by day. In this paper we define three patterns of participant behaviour based on their local centrality. We propose a method of analyzing the centrality of bug report participants both locally and globally, then we conduct a thorough case study of the bug reporter's activity within the Android bug repository. Furthermore, we validate the conclusions of our method by mining the Android version control system and inspecting the Android release history. We found that windowed SNA analysis elicited local behaviour that were invisible during global analysis.
△ Less
Submitted 28 October, 2013;
originally announced October 2013.
-
Comparing Edge Detection Methods based on Stochastic Entropies and Distances for PolSAR Imagery
Authors:
Abraão D. C. Nascimento,
Michelle M. Horta,
Alejandro C. Frery,
Renato J. Cintra
Abstract:
Polarimetric synthetic aperture radar (PolSAR) has achieved a prominent position as a remote imaging method. However, PolSAR images are contaminated by speckle noise due to the coherent illumination employed during the data acquisition. This noise provides a granular aspect to the image, making its processing and analysis (such as in edge detection) hard tasks. This paper discusses seven methods f…
▽ More
Polarimetric synthetic aperture radar (PolSAR) has achieved a prominent position as a remote imaging method. However, PolSAR images are contaminated by speckle noise due to the coherent illumination employed during the data acquisition. This noise provides a granular aspect to the image, making its processing and analysis (such as in edge detection) hard tasks. This paper discusses seven methods for edge detection in multilook PolSAR images. In all methods, the basic idea consists in detecting transition points in the finest possible strip of data which spans two regions. The edge is contoured using the transitions points and a B-spline curve. Four stochastic distances, two differences of entropies, and the maximum likelihood criterion were used under the scaled complex Wishart distribution; the first six stem from the h-phi class of measures. The performance of the discussed detection methods was quantified and analyzed by the computational time and probability of correct edge detection, with respect to the number of looks, the backscatter matrix as a whole, the SPAN, the covariance an the spatial resolution. The detection procedures were applied to three real PolSAR images. Results provide evidence that the methods based on the Bhattacharyya distance and the difference of Shannon entropies outperform the other techniques.
△ Less
Submitted 9 June, 2013;
originally announced June 2013.
-
Hypothesis Testing in Speckled Data with Stochastic Distances
Authors:
Abraão D. C. Nascimento,
Renato J. Cintra,
Alejandro C. Frery
Abstract:
Images obtained with coherent illumination, as is the case of sonar, ultrasound-B, laser and Synthetic Aperture Radar -- SAR, are affected by speckle noise which reduces the ability to extract information from the data. Specialized techniques are required to deal with such imagery, which has been modeled by the G0 distribution and under which regions with different degrees of roughness and mean br…
▽ More
Images obtained with coherent illumination, as is the case of sonar, ultrasound-B, laser and Synthetic Aperture Radar -- SAR, are affected by speckle noise which reduces the ability to extract information from the data. Specialized techniques are required to deal with such imagery, which has been modeled by the G0 distribution and under which regions with different degrees of roughness and mean brightness can be characterized by two parameters; a third parameter, the number of looks, is related to the overall signal-to-noise ratio. Assessing distances between samples is an important step in image analysis; they provide grounds of the separability and, therefore, of the performance of classification procedures. This work derives and compares eight stochastic distances and assesses the performance of hypothesis tests that employ them and maximum likelihood estimation. We conclude that tests based on the triangular distance have the closest empirical size to the theoretical one, while those based on the arithmetic-geometric distances have the best power. Since the power of tests based on the triangular distance is close to optimum, we conclude that the safest choice is using this distance for hypothesis testing, even when compared with classical distances as Kullback-Leibler and Bhattacharyya.
△ Less
Submitted 12 July, 2012;
originally announced July 2012.
-
Parametric and Nonparametric Tests for Speckled Imagery
Authors:
Renato J. Cintra,
Abraão D. C. Nascimento,
Alejandro C. Frery
Abstract:
Synthetic aperture radar (SAR) has a pivotal role as a remote imaging method. Obtained by means of coherent illumination, SAR images are contaminated with speckle noise. The statistical modeling of such contamination is well described according with the multiplicative model and its implied G0 distribution. The understanding of SAR imagery and scene element identification is an important objective…
▽ More
Synthetic aperture radar (SAR) has a pivotal role as a remote imaging method. Obtained by means of coherent illumination, SAR images are contaminated with speckle noise. The statistical modeling of such contamination is well described according with the multiplicative model and its implied G0 distribution. The understanding of SAR imagery and scene element identification is an important objective in the field. In particular, reliable image contrast tools are sought. Aiming the proposition of new tools for evaluating SAR image contrast, we investigated new methods based on stochastic divergence. We propose several divergence measures specifically tailored for G0 distributed data. We also introduce a nonparametric approach based on the Kolmogorov-Smirnov distance for G0 data. We devised and assessed tests based on such measures, and their performances were quantified according to their test sizes and powers. Using Monte Carlo simulation, we present a robustness analysis of test statistics and of maximum likelihood estimators for several degrees of innovative contamination. It was identified that the proposed tests based on triangular and arithmetic-geometric measures outperformed the Kolmogorov-Smirnov methodology.
△ Less
Submitted 10 July, 2012;
originally announced July 2012.
-
A CCA2 Secure Variant of the McEliece Cryptosystem
Authors:
Nico Döttling,
Rafael Dowsley,
Jörn Müller-Quade,
Anderson C. A. Nascimento
Abstract:
The McEliece public-key encryption scheme has become an interesting alternative to cryptosystems based on number-theoretical problems. Differently from RSA and ElGa- mal, McEliece PKC is not known to be broken by a quantum computer. Moreover, even tough McEliece PKC has a relatively big key size, encryption and decryption operations are rather efficient. In spite of all the recent results in codin…
▽ More
The McEliece public-key encryption scheme has become an interesting alternative to cryptosystems based on number-theoretical problems. Differently from RSA and ElGa- mal, McEliece PKC is not known to be broken by a quantum computer. Moreover, even tough McEliece PKC has a relatively big key size, encryption and decryption operations are rather efficient. In spite of all the recent results in coding theory based cryptosystems, to the date, there are no constructions secure against chosen ciphertext attacks in the standard model - the de facto security notion for public-key cryptosystems. In this work, we show the first construction of a McEliece based public-key cryptosystem secure against chosen ciphertext attacks in the standard model. Our construction is inspired by a recently proposed technique by Rosen and Segev.
△ Less
Submitted 31 May, 2012; v1 submitted 23 May, 2012;
originally announced May 2012.
-
A Review of the Enviro-Net Project
Authors:
Gilberto Z. Pastorello,
G. Arturo Sanchez-Azofeifa,
Mario A. Nascimento
Abstract:
Ecosystems monitoring is essential to properly understand their development and the effects of events, both climatological and anthropological in nature. The amount of data used in these assessments is increasing at very high rates. This is due to increasing availability of sensing systems and the development of new techniques to analyze sensor data. The Enviro-Net Project encompasses several of s…
▽ More
Ecosystems monitoring is essential to properly understand their development and the effects of events, both climatological and anthropological in nature. The amount of data used in these assessments is increasing at very high rates. This is due to increasing availability of sensing systems and the development of new techniques to analyze sensor data. The Enviro-Net Project encompasses several of such sensor system deployments across five countries in the Americas. These deployments use a few different ground-based sensor systems, installed at different heights monitoring the conditions in tropical dry forests over long periods of time. This paper presents our experience in deploying and maintaining these systems, retrieving and pre-processing the data, and describes the Web portal developed to help with data management, visualization and analysis.
△ Less
Submitted 30 June, 2011; v1 submitted 27 June, 2011;
originally announced June 2011.
-
A Fault Analytic Method against HB+
Authors:
Jose Carrijo,
Rafael Tonicelli,
Anderson C. A. Nascimento
Abstract:
The search for lightweight authentication protocols suitable for low-cost RFID tags constitutes an active and challenging research area. In this context, a family of protocols based on the LPN problem has been proposed: the so-called HB-family. Despite the rich literature regarding the cryptanalysis of these protocols, there are no published results about the impact of fault analysis over them. Th…
▽ More
The search for lightweight authentication protocols suitable for low-cost RFID tags constitutes an active and challenging research area. In this context, a family of protocols based on the LPN problem has been proposed: the so-called HB-family. Despite the rich literature regarding the cryptanalysis of these protocols, there are no published results about the impact of fault analysis over them. The purpose of this paper is to fill this gap by presenting a fault analytic method against a prominent member of the HB-family: HB+ protocol. We demonstrate that the fault analysis model can lead to a flexible and effective attack against HB-like protocols, posing a serious threat over them.
△ Less
Submitted 4 October, 2010;
originally announced October 2010.
-
Content-Based Sub-Image Retrieval with Relevance Feedback
Authors:
Jie Luo,
Mario A. Nascimento
Abstract:
The typical content-based image retrieval problem is to find images within a database that are similar to a given query image. This paper presents a solution to a different problem, namely that of content based sub-image retrieval, i.e., finding images from a database that contains another image. Note that this is different from finding a region in a (segmented) image that is similar to another…
▽ More
The typical content-based image retrieval problem is to find images within a database that are similar to a given query image. This paper presents a solution to a different problem, namely that of content based sub-image retrieval, i.e., finding images from a database that contains another image. Note that this is different from finding a region in a (segmented) image that is similar to another image region given as a query. We present a technique for CBsIR that explores relevance feedback, i.e., the user's input on intermediary results, in order to improve retrieval efficiency. Upon modeling images as a set of overlapping and recursive tiles, we use a tile re-weighting scheme that assigns penalties to each tile of the database images and updates the tile penalties for all relevant images retrieved at each iteration using both the relevant and irrelevant images identified by the user. Each tile is modeled by means of its color content using a compact but very efficient method which can, indirectly, capture some notion of texture as well, despite the fact that only color information is maintained. Performance evaluation on a largely heterogeneous dataset of over 10,000 images shows that the system can achieve a stable average recall value of 70% within the top 20 retrieved (and presented) images after only 5 iterations, with each such iteration taking about 2 seconds on an off-the-shelf desktop computer.
△ Less
Submitted 26 April, 2009;
originally announced April 2009.
-
Commitment Capacity of Discrete Memoryless Channels
Authors:
Andreas Winter,
Anderson C. A. Nascimento,
Hideki Imai
Abstract:
In extension of the bit commitment task and following work initiated by Crepeau and Kilian, we introduce and solve the problem of characterising the optimal rate at which a discrete memoryless channel can be used for bit commitment. It turns out that the answer is very intuitive: it is the maximum equivocation of the channel (after removing trivial redundancy), even when unlimited noiseless bidi…
▽ More
In extension of the bit commitment task and following work initiated by Crepeau and Kilian, we introduce and solve the problem of characterising the optimal rate at which a discrete memoryless channel can be used for bit commitment. It turns out that the answer is very intuitive: it is the maximum equivocation of the channel (after removing trivial redundancy), even when unlimited noiseless bidirectional side communication is allowed.
By a well-known reduction, this result provides a lower bound on the channel's capacity for implementing coin tossing, which we conjecture to be an equality.
The method of proving this relates the problem to Wyner's wire--tap channel in an amusing way. We also discuss extensions to quantum channels.
△ Less
Submitted 10 April, 2003;
originally announced April 2003.