subscribe to arXiv mailings

Sharif-MGTD at SemEval-2024 Task 8: A Transformer-Based Approach to Detect Machine Generated Text

Authors: Seyedeh Fatemeh Ebrahimi, Karim Akhavan Azari, Amirmasoud Iravani, Arian Qazvini, Pouya Sadeghi, Zeinab Sadat Taghavi, Hossein Sameti

Abstract: Detecting Machine-Generated Text (MGT) has emerged as a significant area of study within Natural Language Processing. While language models generate text, they often leave discernible traces, which can be scrutinized using either traditional feature-based methods or more advanced neural language models. In this research, we explore the effectiveness of fine-tuning a RoBERTa-base transformer, a pow… ▽ More Detecting Machine-Generated Text (MGT) has emerged as a significant area of study within Natural Language Processing. While language models generate text, they often leave discernible traces, which can be scrutinized using either traditional feature-based methods or more advanced neural language models. In this research, we explore the effectiveness of fine-tuning a RoBERTa-base transformer, a powerful neural architecture, to address MGT detection as a binary classification task. Focusing specifically on Subtask A (Monolingual-English) within the SemEval-2024 competition framework, our proposed system achieves an accuracy of 78.9% on the test dataset, positioning us at 57th among participants. Our study addresses this challenge while considering the limited hardware resources, resulting in a system that excels at identifying human-written texts but encounters challenges in accurately discerning MGTs. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 8 pages, 3 figures, 2 tables. Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

arXiv:2405.18654 [pdf, other]

Mitigating Object Hallucination via Data Augmented Contrastive Tuning

Authors: Pritam Sarkar, Sayna Ebrahimi, Ali Etemad, Ahmad Beirami, Sercan Ö. Arık, Tomas Pfister

Abstract: Despite their remarkable progress, Multimodal Large Language Models (MLLMs) tend to hallucinate factually inaccurate information. In this work, we address object hallucinations in MLLMs, where information is offered about an object that is not present in the model input. We introduce a contrastive tuning method that can be applied to a pretrained off-the-shelf MLLM for mitigating hallucinations wh… ▽ More Despite their remarkable progress, Multimodal Large Language Models (MLLMs) tend to hallucinate factually inaccurate information. In this work, we address object hallucinations in MLLMs, where information is offered about an object that is not present in the model input. We introduce a contrastive tuning method that can be applied to a pretrained off-the-shelf MLLM for mitigating hallucinations while preserving its general vision-language capabilities. For a given factual token, we create a hallucinated token through generative data augmentation by selectively altering the ground-truth information. The proposed contrastive tuning is applied at the token level to improve the relative likelihood of the factual token compared to the hallucinated one. Our thorough evaluation confirms the effectiveness of contrastive tuning in mitigating hallucination. Moreover, the proposed contrastive tuning is simple, fast, and requires minimal training with no additional overhead at inference. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2404.16789 [pdf, other]

Continual Learning of Large Language Models: A Comprehensive Survey

Authors: Haizhou Shi, Zihao Xu, Hengyi Wang, Weiyi Qin, Wenyuan Wang, Yibin Wang, Zifeng Wang, Sayna Ebrahimi, Hao Wang

Abstract: The recent success of large language models (LLMs) trained on static, pre-collected, general datasets has sparked numerous research directions and applications. One such direction addresses the non-trivial challenge of integrating pre-trained LLMs into dynamic data distributions, task structures, and user preferences. Pre-trained LLMs, when tailored for specific needs, often experience significant… ▽ More The recent success of large language models (LLMs) trained on static, pre-collected, general datasets has sparked numerous research directions and applications. One such direction addresses the non-trivial challenge of integrating pre-trained LLMs into dynamic data distributions, task structures, and user preferences. Pre-trained LLMs, when tailored for specific needs, often experience significant performance degradation in previous knowledge domains -- a phenomenon known as "catastrophic forgetting". While extensively studied in the continual learning (CL) community, it presents new manifestations in the realm of LLMs. In this survey, we provide a comprehensive overview of the current research progress on LLMs within the context of CL. This survey is structured into four main sections: we first describe an overview of continually learning LLMs, consisting of two directions of continuity: vertical continuity (or vertical continual learning), i.e., continual adaptation from general to specific capabilities, and horizontal continuity (or horizontal continual learning), i.e., continual adaptation across time and domains (Section 3). We then summarize three stages of learning LLMs in the context of modern CL: Continual Pre-Training (CPT), Domain-Adaptive Pre-training (DAP), and Continual Fine-Tuning (CFT) (Section 4). Then we provide an overview of evaluation protocols for continual learning with LLMs, along with the current available data sources (Section 5). Finally, we discuss intriguing questions pertaining to continual learning for LLMs (Section 6). The full list of papers examined in this survey is available at https://github.com/Wang-ML-Lab/llm-continual-learning-survey. △ Less

Submitted 29 June, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: 47 pages, 2 figures, 4 tables. Work in progress

arXiv:2404.11782 [pdf, other]

REQUAL-LM: Reliability and Equity through Aggregation in Large Language Models

Authors: Sana Ebrahimi, Nima Shahbazi, Abolfazl Asudeh

Abstract: The extensive scope of large language models (LLMs) across various domains underscores the critical importance of responsibility in their application, beyond natural language processing. In particular, the randomized nature of LLMs, coupled with inherent biases and historical stereotypes in data, raises critical concerns regarding reliability and equity. Addressing these challenges are necessary b… ▽ More The extensive scope of large language models (LLMs) across various domains underscores the critical importance of responsibility in their application, beyond natural language processing. In particular, the randomized nature of LLMs, coupled with inherent biases and historical stereotypes in data, raises critical concerns regarding reliability and equity. Addressing these challenges are necessary before using LLMs for applications with societal impact. Towards addressing this gap, we introduce REQUAL-LM, a novel method for finding reliable and equitable LLM outputs through aggregation. Specifically, we develop a Monte Carlo method based on repeated sampling to find a reliable output close to the mean of the underlying distribution of possible outputs. We formally define the terms such as reliability and bias, and design an equity-aware aggregation to minimize harmful bias while finding a highly reliable output. REQUAL-LM does not require specialized hardware, does not impose a significant computing load, and uses LLMs as a blackbox. This design choice enables seamless scalability alongside the rapid advancement of LLM technologies. Our system does not require retraining the LLMs, which makes it deployment ready and easy to adapt. Our comprehensive experiments using various tasks and datasets demonstrate that REQUAL- LM effectively mitigates bias and selects a more equitable response, specifically the outputs that properly represents minority groups. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2403.00198 [pdf, other]

AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language Model Outputs

Authors: Sana Ebrahimi, Kaiwen Chen, Abolfazl Asudeh, Gautam Das, Nick Koudas

Abstract: Pre-trained Large Language Models (LLMs) have significantly advanced natural language processing capabilities but are susceptible to biases present in their training data, leading to unfair outcomes in various applications. While numerous strategies have been proposed to mitigate bias, they often require extensive computational resources and may compromise model performance. In this work, we intro… ▽ More Pre-trained Large Language Models (LLMs) have significantly advanced natural language processing capabilities but are susceptible to biases present in their training data, leading to unfair outcomes in various applications. While numerous strategies have been proposed to mitigate bias, they often require extensive computational resources and may compromise model performance. In this work, we introduce AXOLOTL, a novel post-processing framework, which operates agnostically across tasks and models, leveraging public APIs to interact with LLMs without direct access to internal parameters. Through a three-step process resembling zero-shot learning, AXOLOTL identifies biases, proposes resolutions, and guides the model to self-debias its outputs. This approach minimizes computational costs and preserves model performance, making AXOLOTL a promising tool for debiasing LLM outputs with broad applicability and ease of use. △ Less

Submitted 29 February, 2024; originally announced March 2024.

arXiv:2402.11363 [pdf, other]

Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry

Authors: Shiva Ebrahimi, Xuan Guo

Abstract: Tandem mass spectrometry (MS/MS) stands as the predominant high-throughput technique for comprehensively analyzing protein content within biological samples. This methodology is a cornerstone driving the advancement of proteomics. In recent years, substantial strides have been made in Data-Independent Acquisition (DIA) strategies, facilitating impartial and non-targeted fragmentation of precursor… ▽ More Tandem mass spectrometry (MS/MS) stands as the predominant high-throughput technique for comprehensively analyzing protein content within biological samples. This methodology is a cornerstone driving the advancement of proteomics. In recent years, substantial strides have been made in Data-Independent Acquisition (DIA) strategies, facilitating impartial and non-targeted fragmentation of precursor ions. The DIA-generated MS/MS spectra present a formidable obstacle due to their inherent high multiplexing nature. Each spectrum encapsulates fragmented product ions originating from multiple precursor peptides. This intricacy poses a particularly acute challenge in de novo peptide/protein sequencing, where current methods are ill-equipped to address the multiplexing conundrum. In this paper, we introduce DiaTrans, a deep-learning model based on transformer architecture. It deciphers peptide sequences from DIA mass spectrometry data. Our results show significant improvements over existing STOA methods, including DeepNovo-DIA and PepNet. Casanovo-DIA enhances precision by 15.14% to 34.8%, recall by 11.62% to 31.94% at the amino acid level, and boosts precision by 59% to 81.36% at the peptide level. Integrating DIA data and our DiaTrans model holds considerable promise to uncover novel peptides and more comprehensive profiling of biological samples. Casanovo-DIA is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/DiaTrans. △ Less

Submitted 26 June, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

Comments: Ebrahimi S., Guo X. Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry. In 2023 IEEE 23rd International Conference on Bioinformatics and Bioengineering (BIBE) 2022 Dec 6 (pp. 17-22). IEEE

arXiv:2402.04879 [pdf, other]

Comparing Methods for Creating a National Random Sample of Twitter Users

Authors: Meysam Alizadeh, Darya Zare, Zeynab Samei, Mohammadamin Alizadeh, Mael Kubli, Mohammadhadi Aliahmadi, Sarvenaz Ebrahimi, Fabrizio Gilardi

Abstract: Twitter data has been widely used by researchers across various social and computer science disciplines. A common aim when working with Twitter data is the construction of a random sample of users from a given country. However, while several methods have been proposed in the literature, their comparative performance is mostly unexplored. In this paper, we implement four common methods to collect a… ▽ More Twitter data has been widely used by researchers across various social and computer science disciplines. A common aim when working with Twitter data is the construction of a random sample of users from a given country. However, while several methods have been proposed in the literature, their comparative performance is mostly unexplored. In this paper, we implement four common methods to collect a random sample of Twitter users in the US: 1% Stream, Bounding Box, Location Query, and Language Query. Then, we compare the methods according to their tweet- and user-level metrics as well as their accuracy in estimating US population with and without using inclusion probabilities of various demographics. Our results show that the 1% Stream method performs differently than others in tweet- and user-level metrics, and best for the construction of a population representative sample. We discuss the conditions under which the 1% Stream method may not be suitable and suggest the Bounding Box method as the second-best method to use. △ Less

Submitted 11 March, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

arXiv:2312.01279 [pdf, other]

TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents

Authors: James Enouen, Hootan Nakhost, Sayna Ebrahimi, Sercan O Arik, Yan Liu, Tomas Pfister

Abstract: Large language models (LLMs) have attracted huge interest in practical applications given their increasingly accurate responses and coherent reasoning abilities. Given their nature as black-boxes using complex reasoning processes on their inputs, it is inevitable that the demand for scalable and faithful explanations for LLMs' generated content will continue to grow. There have been major developm… ▽ More Large language models (LLMs) have attracted huge interest in practical applications given their increasingly accurate responses and coherent reasoning abilities. Given their nature as black-boxes using complex reasoning processes on their inputs, it is inevitable that the demand for scalable and faithful explanations for LLMs' generated content will continue to grow. There have been major developments in the explainability of neural network models over the past decade. Among them, post-hoc explainability methods, especially Shapley values, have proven effective for interpreting deep learning models. However, there are major challenges in scaling up Shapley values for LLMs, particularly when dealing with long input contexts containing thousands of tokens and autoregressively generated output sequences. Furthermore, it is often unclear how to effectively utilize generated explanations to improve the performance of LLMs. In this paper, we introduce TextGenSHAP, an efficient post-hoc explanation method incorporating LM-specific techniques. We demonstrate that this leads to significant increases in speed compared to conventional Shapley value computations, reducing processing times from hours to minutes for token-level explanations, and to just seconds for document-level explanations. In addition, we demonstrate how real-time Shapley values can be utilized in two important scenarios, providing better understanding of long-document question answering by localizing important words and sentences; and improving existing document retrieval systems through enhancing the accuracy of selected passages and ultimately the final responses. △ Less

Submitted 2 December, 2023; originally announced December 2023.

arXiv:2310.11689 [pdf, other]

Adaptation with Self-Evaluation to Improve Selective Prediction in LLMs

Authors: Jiefeng Chen, Jinsung Yoon, Sayna Ebrahimi, Sercan O Arik, Tomas Pfister, Somesh Jha

Abstract: Large language models (LLMs) have recently shown great advances in a variety of tasks, including natural language understanding and generation. However, their use in high-stakes decision-making scenarios is still limited due to the potential for errors. Selective prediction is a technique that can be used to improve the reliability of the LLMs by allowing them to abstain from making predictions wh… ▽ More Large language models (LLMs) have recently shown great advances in a variety of tasks, including natural language understanding and generation. However, their use in high-stakes decision-making scenarios is still limited due to the potential for errors. Selective prediction is a technique that can be used to improve the reliability of the LLMs by allowing them to abstain from making predictions when they are unsure of the answer. In this work, we propose a novel framework for adaptation with self-evaluation to improve the selective prediction performance of LLMs. Our framework is based on the idea of using parameter-efficient tuning to adapt the LLM to the specific task at hand while improving its ability to perform self-evaluation. We evaluate our method on a variety of question-answering (QA) datasets and show that it outperforms state-of-the-art selective prediction methods. For example, on the CoQA benchmark, our method improves the AUACC from 91.23% to 92.63% and improves the AUROC from 74.61% to 80.25%. △ Less

Submitted 11 November, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: Paper published at Findings of the Association for Computational Linguistics: EMNLP, 2023

arXiv:2310.05269 [pdf, other]

Federated Learning: A Cutting-Edge Survey of the Latest Advancements and Applications

Authors: Azim Akhtarshenas, Mohammad Ali Vahedifar, Navid Ayoobi, Behrouz Maham, Tohid Alizadeh, Sina Ebrahimi, David López-Pérez

Abstract: Robust machine learning (ML) models can be developed by leveraging large volumes of data and distributing the computational tasks across numerous devices or servers. Federated learning (FL) is a technique in the realm of ML that facilitates this goal by utilizing cloud infrastructure to enable collaborative model training among a network of decentralized devices. Beyond distributing the computatio… ▽ More Robust machine learning (ML) models can be developed by leveraging large volumes of data and distributing the computational tasks across numerous devices or servers. Federated learning (FL) is a technique in the realm of ML that facilitates this goal by utilizing cloud infrastructure to enable collaborative model training among a network of decentralized devices. Beyond distributing the computational load, FL targets the resolution of privacy issues and the reduction of communication costs simultaneously. To protect user privacy, FL requires users to send model updates rather than transmitting large quantities of raw and potentially confidential data. Specifically, individuals train ML models locally using their own data and then upload the results in the form of weights and gradients to the cloud for aggregation into the global model. This strategy is also advantageous in environments with limited bandwidth or high communication costs, as it prevents the transmission of large data volumes. With the increasing volume of data and rising privacy concerns, alongside the emergence of large-scale ML models like Large Language Models (LLMs), FL presents itself as a timely and relevant solution. It is therefore essential to review current FL algorithms to guide future research that meets the rapidly evolving ML demands. This survey provides a comprehensive analysis and comparison of the most recent FL algorithms, evaluating them on various fronts including mathematical frameworks, privacy protection, resource allocation, and applications. Beyond summarizing existing FL methods, this survey identifies potential gaps, open areas, and future challenges based on the performance reports and algorithms used in recent studies. This survey enables researchers to readily identify existing limitations in the FL field for further exploration. △ Less

Submitted 25 May, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

arXiv:2308.13703 [pdf, other]

PAITS: Pretraining and Augmentation for Irregularly-Sampled Time Series

Authors: Nicasia Beebe-Wang, Sayna Ebrahimi, Jinsung Yoon, Sercan O. Arik, Tomas Pfister

Abstract: Real-world time series data that commonly reflect sequential human behavior are often uniquely irregularly sampled and sparse, with highly nonuniform sampling over time and entities. Yet, commonly-used pretraining and augmentation methods for time series are not specifically designed for such scenarios. In this paper, we present PAITS (Pretraining and Augmentation for Irregularly-sampled Time Seri… ▽ More Real-world time series data that commonly reflect sequential human behavior are often uniquely irregularly sampled and sparse, with highly nonuniform sampling over time and entities. Yet, commonly-used pretraining and augmentation methods for time series are not specifically designed for such scenarios. In this paper, we present PAITS (Pretraining and Augmentation for Irregularly-sampled Time Series), a framework for identifying suitable pretraining strategies for sparse and irregularly sampled time series datasets. PAITS leverages a novel combination of NLP-inspired pretraining tasks and augmentations, and a random search to identify an effective strategy for a given dataset. We demonstrate that different datasets benefit from different pretraining choices. Compared with prior methods, our approach is better able to consistently improve pretraining across multiple datasets and domains. Our code is available at \url{https://github.com/google-research/google-research/tree/master/irregular_timeseries_pretraining}. △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: Code: \url{https://github.com/google-research/google-research/tree/master/irregular_timeseries_pretraining}

arXiv:2306.09293 [pdf, other]

[Experiments & Analysis] Evaluating the Feasibility of Sampling-Based Techniques for Training Multilayer Perceptrons

Authors: Sana Ebrahimi, Rishi Advani, Abolfazl Asudeh

Abstract: The training process of neural networks is known to be time-consuming, and having a deep architecture only aggravates the issue. This process consists mostly of matrix operations, among which matrix multiplication is the bottleneck. Several sampling-based techniques have been proposed for speeding up the training time of deep neural networks by approximating the matrix products. These techniques f… ▽ More The training process of neural networks is known to be time-consuming, and having a deep architecture only aggravates the issue. This process consists mostly of matrix operations, among which matrix multiplication is the bottleneck. Several sampling-based techniques have been proposed for speeding up the training time of deep neural networks by approximating the matrix products. These techniques fall under two categories: (i) sampling a subset of nodes in every hidden layer as active at every iteration and (ii) sampling a subset of nodes from the previous layer to approximate the current layer's activations using the edges from the sampled nodes. In both cases, the matrix products are computed using only the selected samples. In this paper, we evaluate the feasibility of these approaches on CPU machines with limited computational resources. Making a connection between the two research directions as special cases of approximating matrix multiplications in the context of neural networks, we provide a negative theoretical analysis that shows feedforward approximation is an obstacle against scalability. We conduct comprehensive experimental evaluations that demonstrate the most pressing challenges and limitations associated with the studied approaches. We observe that the hashing-based node selection method is not scalable to a large number of layers, confirming our theoretical analysis. Finally, we identify directions for future research. △ Less

Submitted 20 June, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

arXiv:2305.19157 [pdf, other]

Sensor Fault Detection and Compensation with Performance Prescription for Robotic Manipulators

Authors: S. Mohammadreza Ebrahimi, Farid Norouzi, Hossein Dastres, Reza Faieghi, Mehdi Naderi, Milad Malekzadeh

Abstract: This paper focuses on sensor fault detection and compensation for robotic manipulators. The proposed method features a new adaptive observer and a new terminal sliding mode control law established on a second-order integral sliding surface. The method enables sensor fault detection without the need to know the bounds on fault value and/or its derivative. It also enables fast and fixed-time fault-t… ▽ More This paper focuses on sensor fault detection and compensation for robotic manipulators. The proposed method features a new adaptive observer and a new terminal sliding mode control law established on a second-order integral sliding surface. The method enables sensor fault detection without the need to know the bounds on fault value and/or its derivative. It also enables fast and fixed-time fault-tolerant control whose performance can be prescribed beforehand by defining funnel bounds on the tracking error. The ultimate boundedness of the estimation errors for the proposed observer and the fixed-time stability of the control system are shown using Lyapunov stability analysis. The effectiveness of the proposed method is verified using numerical simulations on two different robotic manipulators, and the results are compared with existing methods. Our results demonstrate performance gains obtained by the proposed method compared to the existing results. △ Less

Submitted 18 March, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

arXiv:2305.16556 [pdf, other]

LANISTR: Multimodal Learning from Structured and Unstructured Data

Authors: Sayna Ebrahimi, Sercan O. Arik, Yihe Dong, Tomas Pfister

Abstract: Multimodal large-scale pretraining has shown impressive performance for unstructured data such as language and image. However, a prevalent real-world scenario involves structured data types, tabular and time-series, along with unstructured data. Such scenarios have been understudied. To bridge this gap, we propose LANISTR, an attention-based framework to learn from LANguage, Image, and STRuctured… ▽ More Multimodal large-scale pretraining has shown impressive performance for unstructured data such as language and image. However, a prevalent real-world scenario involves structured data types, tabular and time-series, along with unstructured data. Such scenarios have been understudied. To bridge this gap, we propose LANISTR, an attention-based framework to learn from LANguage, Image, and STRuctured data. The core of LANISTR's methodology is rooted in \textit{masking-based} training applied across both unimodal and multimodal levels. In particular, we introduce a new similarity-based multimodal masking loss that enables it to learn cross-modal relations from large-scale multimodal data with missing modalities. On two real-world datasets, MIMIC-IV (from healthcare) and Amazon Product Review (from retail), LANISTR demonstrates remarkable improvements, 6.6\% (in AUROC) and 14\% (in accuracy) when fine-tuned with 0.1\% and 0.01\% of labeled data, respectively, compared to the state-of-the-art alternatives. Notably, these improvements are observed even with very high ratio of samples (35.7\% and 99.8\% respectively) not containing all modalities, underlining the robustness of LANISTR to practical missing modality challenge. Our code and models will be available at https://github.com/google-research/lanistr △ Less

Submitted 24 April, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

arXiv:2304.03870 [pdf, other]

ASPEST: Bridging the Gap Between Active Learning and Selective Prediction

Authors: Jiefeng Chen, Jinsung Yoon, Sayna Ebrahimi, Sercan Arik, Somesh Jha, Tomas Pfister

Abstract: Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain. These predictions can then be deferred to humans for further evaluation. As an everlasting challenge for machine learning, in many real-world scenarios, the distribution of test data is different from the training data. This results in more inaccurate predictions, and often increased dependenc… ▽ More Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain. These predictions can then be deferred to humans for further evaluation. As an everlasting challenge for machine learning, in many real-world scenarios, the distribution of test data is different from the training data. This results in more inaccurate predictions, and often increased dependence on humans, which can be difficult and expensive. Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples. Selective prediction and active learning have been approached from different angles, with the connection between them missing. In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain while increasing accuracy and coverage. For this new paradigm, we propose a simple yet effective approach, ASPEST, that utilizes ensembles of model snapshots with self-training with their aggregated outputs as pseudo labels. Extensive experiments on numerous image, text and structured datasets, which suffer from domain shifts, demonstrate that ASPEST can significantly outperform prior work on selective prediction and active learning (e.g. on the MNIST$\to$SVHN benchmark with the labeling budget of 100, ASPEST improves the AUACC metric from 79.36% to 88.84%) and achieves more optimal utilization of humans in the loop. △ Less

Submitted 29 February, 2024; v1 submitted 7 April, 2023; originally announced April 2023.

arXiv:2211.15646 [pdf, other]

Beyond Invariance: Test-Time Label-Shift Adaptation for Distributions with "Spurious" Correlations

Authors: Qingyao Sun, Kevin Murphy, Sayna Ebrahimi, Alexander D'Amour

Abstract: Changes in the data distribution at test time can have deleterious effects on the performance of predictive models $p(y|x)$. We consider situations where there are additional meta-data labels (such as group labels), denoted by $z$, that can account for such changes in the distribution. In particular, we assume that the prior distribution $p(y, z)$, which models the dependence between the class lab… ▽ More Changes in the data distribution at test time can have deleterious effects on the performance of predictive models $p(y|x)$. We consider situations where there are additional meta-data labels (such as group labels), denoted by $z$, that can account for such changes in the distribution. In particular, we assume that the prior distribution $p(y, z)$, which models the dependence between the class label $y$ and the "nuisance" factors $z$, may change across domains, either due to a change in the correlation between these terms, or a change in one of their marginals. However, we assume that the generative model for features $p(x|y,z)$ is invariant across domains. We note that this corresponds to an expanded version of the widely used "label shift" assumption, where the labels now also include the nuisance factors $z$. Based on this observation, we propose a test-time label shift correction that adapts to changes in the joint distribution $p(y, z)$ using EM applied to unlabeled samples from the target domain distribution, $p_t(x)$. Importantly, we are able to avoid fitting a generative model $p(x|y, z)$, and merely need to reweight the outputs of a discriminative model $p_s(y, z|x)$ trained on the source distribution. We evaluate our method, which we call "Test-Time Label-Shift Adaptation" (TTLSA), on several standard image and text datasets, as well as the CheXpert chest X-ray dataset, and show that it improves performance over methods that target invariance to changes in the distribution, as well as baseline empirical risk minimization methods. Code for reproducing experiments is available at https://github.com/nalzok/test-time-label-shift . △ Less

Submitted 28 November, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

Comments: 24 pages, 7 figures

arXiv:2207.07704 [pdf, other]

Maximizing Fair Content Spread via Edge Suggestion in Social Networks

Authors: Ian P. Swift, Sana Ebrahimi, Azade Nova, Abolfazl Asudeh

Abstract: Content spread inequity is a potential unfairness issue in online social networks, disparately impacting minority groups. In this paper, we view friendship suggestion, a common feature in social network platforms, as an opportunity to achieve an equitable spread of content. In particular, we propose to suggest a subset of potential edges (currently not existing in the network but likely to be acce… ▽ More Content spread inequity is a potential unfairness issue in online social networks, disparately impacting minority groups. In this paper, we view friendship suggestion, a common feature in social network platforms, as an opportunity to achieve an equitable spread of content. In particular, we propose to suggest a subset of potential edges (currently not existing in the network but likely to be accepted) that maximizes content spread while achieving fairness. Instead of re-engineering the existing systems, our proposal builds a fairness wrapper on top of the existing friendship suggestion components. We prove the problem is NP-hard and inapproximable in polynomial time unless P = NP. Therefore, allowing relaxation of the fairness constraint, we propose an algorithm based on LP-relaxation and randomized rounding with fixed approximation ratios on fairness and content spread. We provide multiple optimizations, further improving the performance of our algorithm in practice. Besides, we propose a scalable algorithm that dynamically adds subsets of nodes, chosen via iterative sampling, and solves smaller problems corresponding to these nodes. Besides theoretical analysis, we conduct comprehensive experiments on real and synthetic data sets. Across different settings, our algorithms found solutions with nearzero unfairness while significantly increasing the content spread. Our scalable algorithm could process a graph with half a million nodes on a single machine, reducing the unfairness to around 0.0004 while lifting content spread by 43%. △ Less

Submitted 20 December, 2022; v1 submitted 15 July, 2022; originally announced July 2022.

Comments: 16 pages, 17 figures, 8 tables. VLDB '22. Technical Report

arXiv:2206.07240 [pdf, other]

Test-Time Adaptation for Visual Document Understanding

Authors: Sayna Ebrahimi, Sercan O. Arik, Tomas Pfister

Abstract: For visual document understanding (VDU), self-supervised pretraining has been shown to successfully generate transferable representations, yet, effective adaptation of such representations to distribution shifts at test-time remains to be an unexplored area. We propose DocTTA, a novel test-time adaptation method for documents, that does source-free domain adaptation using unlabeled target document… ▽ More For visual document understanding (VDU), self-supervised pretraining has been shown to successfully generate transferable representations, yet, effective adaptation of such representations to distribution shifts at test-time remains to be an unexplored area. We propose DocTTA, a novel test-time adaptation method for documents, that does source-free domain adaptation using unlabeled target document data. DocTTA leverages cross-modality self-supervised learning via masked visual language modeling, as well as pseudo labeling to adapt models learned on a \textit{source} domain to an unlabeled \textit{target} domain at test time. We introduce new benchmarks using existing public datasets for various VDU tasks, including entity recognition, key-value extraction, and document visual question answering. DocTTA shows significant improvements on these compared to the source model performance, up to 1.89\% in (F1 score), 3.43\% (F1 score), and 17.68\% (ANLS score), respectively. Our benchmark datasets are available at \url{https://saynaebrahimi.github.io/DocTTA.html}. △ Less

Submitted 23 August, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: Accepted at TMLR 2023

arXiv:2204.10377 [pdf, other]

Contrastive Test-Time Adaptation

Authors: Dian Chen, Dequan Wang, Trevor Darrell, Sayna Ebrahimi

Abstract: Test-time adaptation is a special setting of unsupervised domain adaptation where a trained model on the source domain has to adapt to the target domain without accessing source data. We propose a novel way to leverage self-supervised contrastive learning to facilitate target feature learning, along with an online pseudo labeling scheme with refinement that significantly denoises pseudo labels. Th… ▽ More Test-time adaptation is a special setting of unsupervised domain adaptation where a trained model on the source domain has to adapt to the target domain without accessing source data. We propose a novel way to leverage self-supervised contrastive learning to facilitate target feature learning, along with an online pseudo labeling scheme with refinement that significantly denoises pseudo labels. The contrastive learning task is applied jointly with pseudo labeling, contrasting positive and negative pairs constructed similarly as MoCo but with source-initialized encoder, and excluding same-class negative pairs indicated by pseudo labels. Meanwhile, we produce pseudo labels online and refine them via soft voting among their nearest neighbors in the target feature space, enabled by maintaining a memory queue. Our method, AdaContrast, achieves state-of-the-art performance on major benchmarks while having several desirable properties compared to existing works, including memory efficiency, insensitivity to hyper-parameters, and better model calibration. Project page: sites.google.com/view/adacontrast. △ Less

Submitted 21 April, 2022; originally announced April 2022.

Comments: CVPR 2022 camera-ready version

arXiv:2204.04799 [pdf, other]

DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning

Authors: Zifeng Wang, Zizhao Zhang, Sayna Ebrahimi, Ruoxi Sun, Han Zhang, Chen-Yu Lee, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, Tomas Pfister

Abstract: Continual learning aims to enable a single model to learn a sequence of tasks without catastrophic forgetting. Top-performing methods usually require a rehearsal buffer to store past pristine examples for experience replay, which, however, limits their practical value due to privacy and memory constraints. In this work, we present a simple yet effective framework, DualPrompt, which learns a tiny s… ▽ More Continual learning aims to enable a single model to learn a sequence of tasks without catastrophic forgetting. Top-performing methods usually require a rehearsal buffer to store past pristine examples for experience replay, which, however, limits their practical value due to privacy and memory constraints. In this work, we present a simple yet effective framework, DualPrompt, which learns a tiny set of parameters, called prompts, to properly instruct a pre-trained model to learn tasks arriving sequentially without buffering past examples. DualPrompt presents a novel approach to attach complementary prompts to the pre-trained backbone, and then formulates the objective as learning task-invariant and task-specific "instructions". With extensive experimental validation, DualPrompt consistently sets state-of-the-art performance under the challenging class-incremental setting. In particular, DualPrompt outperforms recent advanced continual learning methods with relatively large buffer sizes. We also introduce a more challenging benchmark, Split ImageNet-R, to help generalize rehearsal-free continual learning research. Source code is available at https://github.com/google-research/l2p. △ Less

Submitted 5 August, 2022; v1 submitted 10 April, 2022; originally announced April 2022.

Comments: Published at ECCV 2022 as a conference paper

arXiv:2111.03297 [pdf, other]

doi 10.1109/TETC.2021.3102041

RC-RNN: Reconfigurable Cache Architecture for Storage Systems Using Recurrent Neural Networks

Authors: Shahriar Ebrahimi, Reza Salkhordeh, Seyed Ali Osia, Ali Taheri, Hamid Reza Rabiee, Hossein Asadi

Abstract: Solid-State Drives (SSDs) have significant performance advantages over traditional Hard Disk Drives (HDDs) such as lower latency and higher throughput. Significantly higher price per capacity and limited lifetime, however, prevents designers to completely substitute HDDs by SSDs in enterprise storage systems. SSD-based caching has recently been suggested for storage systems to benefit from higher… ▽ More Solid-State Drives (SSDs) have significant performance advantages over traditional Hard Disk Drives (HDDs) such as lower latency and higher throughput. Significantly higher price per capacity and limited lifetime, however, prevents designers to completely substitute HDDs by SSDs in enterprise storage systems. SSD-based caching has recently been suggested for storage systems to benefit from higher performance of SSDs while minimizing the overall cost. While conventional caching algorithms such as Least Recently Used (LRU) provide high hit ratio in processors, due to the highly random behavior of Input/Output (I/O) workloads, they hardly provide the required performance level for storage systems. In addition to poor performance, inefficient algorithms also shorten SSD lifetime with unnecessary cache replacements. Such shortcomings motivate us to benefit from more complex non-linear algorithms to achieve higher cache performance and extend SSD lifetime. In this paper, we propose RC-RNN, the first reconfigurable SSD-based cache architecture for storage systems that utilizes machine learning to identify performance-critical data pages for I/O caching. The proposed architecture uses Recurrent Neural Networks (RNN) to characterize ongoing workloads and optimize itself towards higher cache performance while improving SSD lifetime. RC-RNN attempts to learn characteristics of the running workload to predict its behavior and then uses the collected information to identify performance-critical data pages to fetch into the cache. Experimental results show that RC-RNN characterizes workloads with an accuracy up to 94.6% for SNIA I/O workloads. RC-RNN can perform similarly to the optimal cache algorithm by an accuracy of 95% on average, and outperforms previous SSD caching architectures by providing up to 7x higher hit ratio and decreasing cache replacements by up to 2x. △ Less

Submitted 5 November, 2021; originally announced November 2021.

Comments: Date of Publication: 09 August 2021

Journal ref: IEEE Transactions on Emerging Topics in Computing (2021)

arXiv:2110.00274 [pdf, other]

Enhancing Cold Wallet Security with Native Multi-Signature schemes in Centralized Exchanges

Authors: Shahriar Ebrahimi, Parisa Hasanizadeh, Seyed Mohammad Aghamirmohammadali, Amirali Akbari

Abstract: Currently, one of the most widely used protocols to secure cryptocurrency assets in centralized exchanges is categorizing wallets into cold and hot. While cold wallets hold user deposits, hot} wallets are responsible for addressing withdrawal requests. However, this method has some shortcomings such as: 1) availability of private keys in at least one cold device, and~2) exposure of all private key… ▽ More Currently, one of the most widely used protocols to secure cryptocurrency assets in centralized exchanges is categorizing wallets into cold and hot. While cold wallets hold user deposits, hot} wallets are responsible for addressing withdrawal requests. However, this method has some shortcomings such as: 1) availability of private keys in at least one cold device, and~2) exposure of all private keys to one trusted cold wallet admin. To overcome such issues, we design a new protocol for managing cold wallet assets by employing native multi-signature schemes. The proposed cold wallet system, involves at least two distinct devices and their corresponding admins for both wallet creation and signature generation. The method ensures that no final private key is stored on any device. To this end, no individual authority can spend from exchange assets. Moreover, we provide details regarding practical implementation of the proposed method and compare it against state-of-the-art. Furthermore, we extend the application of the proposed method to an scalable scenario where users are directly involved in wallet generation and signing process of cold wallets in an MPC manner. △ Less

Submitted 1 October, 2021; originally announced October 2021.

Comments: Nobitex Crypto-Exchange: https://www.nobitex.net, Available Online at: https://cdn.nobitex.net/security/nobitex-security-whitepaper.pdf

arXiv:2109.01087 [pdf, other]

On-target Adaptation

Authors: Dequan Wang, Shaoteng Liu, Sayna Ebrahimi, Evan Shelhamer, Trevor Darrell

Abstract: Domain adaptation seeks to mitigate the shift between training on the \emph{source} domain and testing on the \emph{target} domain. Most adaptation methods rely on the source data by joint optimization over source data and target data. Source-free methods replace the source data with a source model by fine-tuning it on target. Either way, the majority of the parameter updates for the model represe… ▽ More Domain adaptation seeks to mitigate the shift between training on the \emph{source} domain and testing on the \emph{target} domain. Most adaptation methods rely on the source data by joint optimization over source data and target data. Source-free methods replace the source data with a source model by fine-tuning it on target. Either way, the majority of the parameter updates for the model representation and the classifier are derived from the source, and not the target. However, target accuracy is the goal, and so we argue for optimizing as much as possible on the target data. We show significant improvement by on-target adaptation, which learns the representation purely from target data while taking only the source predictions for supervision. In the long-tailed classification setting, we show further improvement by on-target class distribution learning, which learns the (im)balance of classes from target data. △ Less

Submitted 2 September, 2021; originally announced September 2021.

arXiv:2108.09186 [pdf, other]

Region-level Active Detector Learning

Authors: Michael Laielli, Giscard Biamby, Dian Chen, Ritwik Gupta, Adam Loeffler, Phat Dat Nguyen, Ross Luo, Trevor Darrell, Sayna Ebrahimi

Abstract: Active learning for object detection is conventionally achieved by applying techniques developed for classification in a way that aggregates individual detections into image-level selection criteria. This is typically coupled with the costly assumption that every image selected for labelling must be exhaustively annotated. This yields incremental improvements on well-curated vision datasets and st… ▽ More Active learning for object detection is conventionally achieved by applying techniques developed for classification in a way that aggregates individual detections into image-level selection criteria. This is typically coupled with the costly assumption that every image selected for labelling must be exhaustively annotated. This yields incremental improvements on well-curated vision datasets and struggles in the presence of data imbalance and visual clutter that occurs in real-world imagery. Alternatives to the image-level approach are surprisingly under-explored in the literature. In this work, we introduce a new strategy that subsumes previous Image-level and Object-level approaches into a generalized, Region-level approach that promotes spatial-diversity by avoiding nearby redundant queries from the same image and minimizes context-switching for the labeler. We show that this approach significantly decreases labeling effort and improves rare object search on realistic data with inherent class-imbalance and cluttered scenes. △ Less

Submitted 17 January, 2022; v1 submitted 20 August, 2021; originally announced August 2021.

arXiv:2107.03315 [pdf, other]

Predicting with Confidence on Unseen Distributions

Authors: Devin Guillory, Vaishaal Shankar, Sayna Ebrahimi, Trevor Darrell, Ludwig Schmidt

Abstract: Recent work has shown that the performance of machine learning models can vary substantially when models are evaluated on data drawn from a distribution that is close to but different from the training distribution. As a result, predicting model performance on unseen distributions is an important challenge. Our work connects techniques from domain adaptation and predictive uncertainty literature,… ▽ More Recent work has shown that the performance of machine learning models can vary substantially when models are evaluated on data drawn from a distribution that is close to but different from the training distribution. As a result, predicting model performance on unseen distributions is an important challenge. Our work connects techniques from domain adaptation and predictive uncertainty literature, and allows us to predict model accuracy on challenging unseen distributions without access to labeled data. In the context of distribution shift, distributional distances are often used to adapt models and improve their performance on new domains, however accuracy estimation, or other forms of predictive uncertainty, are often neglected in these investigations. Through investigating a wide range of established distributional distances, such as Frechet distance or Maximum Mean Discrepancy, we determine that they fail to induce reliable estimates of performance under distribution shift. On the other hand, we find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts. We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference. $DoC$ reduces predictive error by almost half ($46\%$) on several realistic and challenging distribution shifts, e.g., on the ImageNet-Vid-Robust and ImageNet-Rendition datasets. △ Less

Submitted 19 August, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

Comments: ICCV Camera ready; new scatter plots in supplementary material

ACM Class: I.2.10

arXiv:2103.12718 [pdf, other]

Self-Supervised Pretraining Improves Self-Supervised Pretraining

Authors: Colorado J. Reed, Xiangyu Yue, Ani Nrusimha, Sayna Ebrahimi, Vivek Vijaykumar, Richard Mao, Bo Li, Shanghang Zhang, Devin Guillory, Sean Metzger, Kurt Keutzer, Trevor Darrell

Abstract: While self-supervised pretraining has proven beneficial for many computer vision tasks, it requires expensive and lengthy computation, large amounts of data, and is sensitive to data augmentation. Prior work demonstrates that models pretrained on datasets dissimilar to their target data, such as chest X-ray models trained on ImageNet, underperform models trained from scratch. Users that lack the r… ▽ More While self-supervised pretraining has proven beneficial for many computer vision tasks, it requires expensive and lengthy computation, large amounts of data, and is sensitive to data augmentation. Prior work demonstrates that models pretrained on datasets dissimilar to their target data, such as chest X-ray models trained on ImageNet, underperform models trained from scratch. Users that lack the resources to pretrain must use existing models with lower performance. This paper explores Hierarchical PreTraining (HPT), which decreases convergence time and improves accuracy by initializing the pretraining process with an existing pretrained model. Through experimentation on 16 diverse vision datasets, we show HPT converges up to 80x faster, improves accuracy across tasks, and improves the robustness of the self-supervised pretraining process to changes in the image augmentation policy or amount of pretraining data. Taken together, HPT provides a simple framework for obtaining better pretrained representations with less computational resources. △ Less

Submitted 24 March, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

arXiv:2012.10467 [pdf, other]

Minimax Active Learning

Authors: Sayna Ebrahimi, William Gan, Dian Chen, Giscard Biamby, Kamyar Salahi, Michael Laielli, Shizhan Zhu, Trevor Darrell

Abstract: Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator. Current active learning techniques either rely on model uncertainty to select the most uncertain samples or use clustering or reconstruction to choose the most diverse set of unlabeled examples. While uncertainty-based strategies are susceptible to outliers, so… ▽ More Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator. Current active learning techniques either rely on model uncertainty to select the most uncertain samples or use clustering or reconstruction to choose the most diverse set of unlabeled examples. While uncertainty-based strategies are susceptible to outliers, solely relying on sample diversity does not capture the information available on the main task. In this work, we develop a semi-supervised minimax entropy-based active learning algorithm that leverages both uncertainty and diversity in an adversarial manner. Our model consists of an entropy minimizing feature encoding network followed by an entropy maximizing classification layer. This minimax formulation reduces the distribution gap between the labeled/unlabeled data, while a discriminator is simultaneously trained to distinguish the labeled/unlabeled data. The highest entropy samples from the classifier that the discriminator predicts as unlabeled are selected for labeling. We evaluate our method on various image classification and semantic segmentation benchmark datasets and show superior performance over the state-of-the-art methods. △ Less

Submitted 30 March, 2021; v1 submitted 18 December, 2020; originally announced December 2020.

Comments: Project page is available at https://people.eecs.berkeley.edu/~sayna/mal.html

arXiv:2010.01528 [pdf, other]

Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting

Authors: Sayna Ebrahimi, Suzanne Petryk, Akash Gokul, William Gan, Joseph E. Gonzalez, Marcus Rohrbach, Trevor Darrell

Abstract: The goal of continual learning (CL) is to learn a sequence of tasks without suffering from the phenomenon of catastrophic forgetting. Previous work has shown that leveraging memory in the form of a replay buffer can reduce performance degradation on prior tasks. We hypothesize that forgetting can be further reduced when the model is encouraged to remember the \textit{evidence} for previously made… ▽ More The goal of continual learning (CL) is to learn a sequence of tasks without suffering from the phenomenon of catastrophic forgetting. Previous work has shown that leveraging memory in the form of a replay buffer can reduce performance degradation on prior tasks. We hypothesize that forgetting can be further reduced when the model is encouraged to remember the \textit{evidence} for previously made decisions. As a first step towards exploring this hypothesis, we propose a simple novel training paradigm, called Remembering for the Right Reasons (RRR), that additionally stores visual model explanations for each example in the buffer and ensures the model has "the right reasons" for its predictions by encouraging its explanations to remain consistent with those used to make decisions at training time. Without this constraint, there is a drift in explanations and increase in forgetting as conventional continual learning algorithms learn new tasks. We demonstrate how RRR can be easily added to any memory or regularization-based approach and results in reduced forgetting, and more importantly, improved model explanations. We have evaluated our approach in the standard and few-shot settings and observed a consistent improvement across various CL approaches using different architectures and techniques to generate model explanations and demonstrated our approach showing a promising connection between explainability and continual learning. Our code is available at \url{https://github.com/SaynaEbrahimi/Remembering-for-the-Right-Reasons}. △ Less

Submitted 2 May, 2021; v1 submitted 4 October, 2020; originally announced October 2020.

Comments: Accepted at ICLR 2021

arXiv:2003.09553 [pdf, other]

Adversarial Continual Learning

Authors: Sayna Ebrahimi, Franziska Meier, Roberto Calandra, Trevor Darrell, Marcus Rohrbach

Abstract: Continual learning aims to learn new tasks without forgetting previously learned ones. We hypothesize that representations learned to solve each task in a sequence have a shared structure while containing some task-specific properties. We show that shared features are significantly less prone to forgetting and propose a novel hybrid continual learning framework that learns a disjoint representatio… ▽ More Continual learning aims to learn new tasks without forgetting previously learned ones. We hypothesize that representations learned to solve each task in a sequence have a shared structure while containing some task-specific properties. We show that shared features are significantly less prone to forgetting and propose a novel hybrid continual learning framework that learns a disjoint representation for task-invariant and task-specific features required to solve a sequence of tasks. Our model combines architecture growth to prevent forgetting of task-specific skills and an experience replay approach to preserve shared skills. We demonstrate our hybrid approach is effective in avoiding forgetting and show it is superior to both architecture-based and memory-based approaches on class incrementally learning of a single dataset as well as a sequence of multiple datasets in image classification. Our code is available at \url{https://github.com/facebookresearch/Adversarial-Continual-Learning}. △ Less

Submitted 21 July, 2020; v1 submitted 20 March, 2020; originally announced March 2020.

Comments: Accepted at ECCV 2020

arXiv:2002.08984 [pdf, other]

E2E Migration Strategies Towards 5G: Long-term Migration Plan and Evolution Roadmap

Authors: Abulfazl Zakeri, Narges Gholipoor, Mohsen Tajallifar, Sina Ebrahimi, Mohammad Reza Javan, Nader Mokari, Ahmad Reza Sharafat

Abstract: After freezing the first phase of the fifth generation of wireless networks (5G) standardization, it finally goes live now and the rollout of the commercial launch (most in fixed 5G broadband services) and migration has been started. However, some challenges are arising in the deployment, integration of each technology, and the interoperability in the network of the communication service providers… ▽ More After freezing the first phase of the fifth generation of wireless networks (5G) standardization, it finally goes live now and the rollout of the commercial launch (most in fixed 5G broadband services) and migration has been started. However, some challenges are arising in the deployment, integration of each technology, and the interoperability in the network of the communication service providers (CSPs). At the same time, the evolution of 5G is not clear and many questions arise such as whether 5G has long-term evolution or when 5G will change to a next-generation one. This paper provides long-term migration options and paths towards 5G considering many key factors such as the cost, local/national data traffic, marketing, and the standardization trends in the radio access network (RAN), the transport network (TN), the core network (CN), and E2E network. Moreover, we outline some 5G evolution road maps emphasizing on the technologies, standards, and service time lines. The proposed migration paths can be the answer to some CSPs concerns about how to do long-term migration to 5G and beyond. △ Less

Submitted 20 February, 2020; originally announced February 2020.

Comments: Migration, 5G, evolution, roadmap, option, path, 3 Figure, 4Table

arXiv:1912.00192 [pdf, other]

Joint Resource and Admission Management for Slice-enabled Networks

Authors: Sina Ebrahimi, Abulfazl Zakeri, Behzad Akbari, Nader Mokari

Abstract: Network slicing is a crucial part of the 5G networks that communication service providers (CSPs) seek to deploy. By exploiting three main enabling technologies, namely, software-defined networking (SDN), network function virtualization (NFV), and network slicing, communication services can be served to the end-users in an efficient, scalable, and flexible manner. To adopt these technologies, what… ▽ More Network slicing is a crucial part of the 5G networks that communication service providers (CSPs) seek to deploy. By exploiting three main enabling technologies, namely, software-defined networking (SDN), network function virtualization (NFV), and network slicing, communication services can be served to the end-users in an efficient, scalable, and flexible manner. To adopt these technologies, what is highly important is how to allocate the resources and admit the customers of the CSPs based on the predefined criteria and available resources. In this regard, we propose a novel joint resource and admission management algorithm for slice-enabled networks. In the proposed algorithm, our target is to minimize the network cost of the CSP subject to the slice requests received from the tenants corresponding to the virtual machines and virtual links constraints. Our performance evaluation of the proposed method shows its efficiency in managing CSP's resources. △ Less

Submitted 7 December, 2019; v1 submitted 30 November, 2019; originally announced December 2019.

Comments: 8 pages, double column, 8 figures, accepted to be presented in IEEE/IFIP Network Operations and Management Symposium (NOMS) 2020

arXiv:1912.00187 [pdf, other]

Energy-Efficient Task Offloading Under E2E Latency Constraints

Authors: Mohsen Tajallifar, Sina Ebrahimi, Mohammad Reza Javan, Nader Mokari, Luca Chiaraviglio

Abstract: In this paper, we propose a novel resource management scheme that jointly allocates the transmit power and computational resources in a centralized radio access network architecture. The network comprises a set of computing nodes to which the requested tasks of different users are offloaded. The optimization problem minimizes the energy consumption of task offloading while takes the end-to-end lat… ▽ More In this paper, we propose a novel resource management scheme that jointly allocates the transmit power and computational resources in a centralized radio access network architecture. The network comprises a set of computing nodes to which the requested tasks of different users are offloaded. The optimization problem minimizes the energy consumption of task offloading while takes the end-to-end latency, i.e., the transmission, execution, and propagation latencies of each task, into account. We aim to allocate the transmit power and computational resources such that the maximum acceptable latency of each task is satisfied. Since the optimization problem is non-convex, we divide it into two sub-problems, one for transmit power allocation and another for task placement and computational resource allocation. Transmit power is allocated via the convex-concave procedure. In addition, a heuristic algorithm is proposed to jointly manage computational resources and task placement. We also propose a feasibility analysis that finds a feasible subset of tasks. Furthermore, a disjoint method that separately allocates the transmit power and the computational resources is proposed as the baseline of comparison. A lower bound on the optimal solution of the optimization problem is also derived based on exhaustive search over task placement decisions and utilizing Karush-Kuhn-Tucker conditions. Simulation results show that the joint method outperforms the disjoint method in terms of acceptance ratio. Simulations also show that the optimality gap of the joint method is less than 5%. △ Less

Submitted 23 June, 2021; v1 submitted 30 November, 2019; originally announced December 2019.

Comments: 32 pages, 10 figures

arXiv:1909.10225 [pdf, other]

WiCV 2019: The Sixth Women In Computer Vision Workshop

Authors: Irene Amerini, Elena Balashova, Sayna Ebrahimi, Kathryn Leonard, Arsha Nagrani, Amaia Salvador

Abstract: In this paper we present the Women in Computer Vision Workshop - WiCV 2019, organized in conjunction with CVPR 2019. This event is meant for increasing the visibility and inclusion of women researchers in the computer vision field. Computer vision and machine learning have made incredible progress over the past years, but the number of female researchers is still low both in academia and in indust… ▽ More In this paper we present the Women in Computer Vision Workshop - WiCV 2019, organized in conjunction with CVPR 2019. This event is meant for increasing the visibility and inclusion of women researchers in the computer vision field. Computer vision and machine learning have made incredible progress over the past years, but the number of female researchers is still low both in academia and in industry. WiCV is organized especially for the following reason: to raise visibility of female researchers, to increase collaborations between them, and to provide mentorship to female junior researchers in the field. In this paper, we present a report of trends over the past years, along with a summary of statistics regarding presenters, attendees, and sponsorship for the current workshop. △ Less

Submitted 23 September, 2019; originally announced September 2019.

Comments: Report of the Sixth Women In Computer Vision Workshop

Journal ref: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2019, pp. 0-0

arXiv:1906.02425 [pdf, other]

Uncertainty-guided Continual Learning with Bayesian Neural Networks

Authors: Sayna Ebrahimi, Mohamed Elhoseiny, Trevor Darrell, Marcus Rohrbach

Abstract: Continual learning aims to learn new tasks without forgetting previously learned ones. This is especially challenging when one cannot access data from previous tasks and when the model has a fixed capacity. Current regularization-based continual learning algorithms need an external representation and extra computation to measure the parameters' \textit{importance}. In contrast, we propose Uncertai… ▽ More Continual learning aims to learn new tasks without forgetting previously learned ones. This is especially challenging when one cannot access data from previous tasks and when the model has a fixed capacity. Current regularization-based continual learning algorithms need an external representation and extra computation to measure the parameters' \textit{importance}. In contrast, we propose Uncertainty-guided Continual Bayesian Neural Networks (UCB), where the learning rate adapts according to the uncertainty defined in the probability distribution of the weights in networks. Uncertainty is a natural way to identify \textit{what to remember} and \textit{what to change} as we continually learn, and thus mitigate catastrophic forgetting. We also show a variant of our model, which uses uncertainty for weight pruning and retains task performance after pruning by saving binary masks per tasks. We evaluate our UCB approach extensively on diverse object classification datasets with short and long sequences of tasks and report superior or on-par performance compared to existing approaches. Additionally, we show that our model does not necessarily need task information at test time, i.e. it does not presume knowledge of which task a sample belongs to. △ Less

Submitted 19 February, 2020; v1 submitted 6 June, 2019; originally announced June 2019.

Comments: Accepted at ICLR 2020

arXiv:1904.00370 [pdf, other]

Variational Adversarial Active Learning

Authors: Samarth Sinha, Sayna Ebrahimi, Trevor Darrell

Abstract: Active learning aims to develop label-efficient algorithms by sampling the most representative queries to be labeled by an oracle. We describe a pool-based semi-supervised active learning algorithm that implicitly learns this sampling mechanism in an adversarial manner. Unlike conventional active learning algorithms, our approach is task agnostic, i.e., it does not depend on the performance of the… ▽ More Active learning aims to develop label-efficient algorithms by sampling the most representative queries to be labeled by an oracle. We describe a pool-based semi-supervised active learning algorithm that implicitly learns this sampling mechanism in an adversarial manner. Unlike conventional active learning algorithms, our approach is task agnostic, i.e., it does not depend on the performance of the task for which we are trying to acquire labeled data. Our method learns a latent space using a variational autoencoder (VAE) and an adversarial network trained to discriminate between unlabeled and labeled data. The mini-max game between the VAE and the adversarial network is played such that while the VAE tries to trick the adversarial network into predicting that all data points are from the labeled pool, the adversarial network learns how to discriminate between dissimilarities in the latent space. We extensively evaluate our method on various image classification and semantic segmentation benchmark datasets and establish a new state of the art on $\text{CIFAR10/100}$, $\text{Caltech-256}$, $\text{ImageNet}$, $\text{Cityscapes}$, and $\text{BDD100K}$. Our results demonstrate that our adversarial approach learns an effective low dimensional latent space in large-scale settings and provides for a computationally efficient sampling method. Our code is available at https://github.com/sinhasam/vaal. △ Less

Submitted 28 October, 2019; v1 submitted 31 March, 2019; originally announced April 2019.

Comments: First two authors contributed equally, listed alphabetically. Accepted as Oral at ICCV 2019

arXiv:1812.10430 [pdf, other]

Large Multistream Data Analytics for Monitoring and Diagnostics in Manufacturing Systems

Authors: Samaneh Ebrahimi, Chitta Ranjan, Kamran Paynabar

Abstract: The high-dimensionality and volume of large scale multistream data has inhibited significant research progress in developing an integrated monitoring and diagnostics (M&D) approach. This data, also categorized as big data, is becoming common in manufacturing plants. In this paper, we propose an integrated M\&D approach for large scale streaming data. We developed a novel monitoring method named Ad… ▽ More The high-dimensionality and volume of large scale multistream data has inhibited significant research progress in developing an integrated monitoring and diagnostics (M&D) approach. This data, also categorized as big data, is becoming common in manufacturing plants. In this paper, we propose an integrated M\&D approach for large scale streaming data. We developed a novel monitoring method named Adaptive Principal Component monitoring (APC) which adaptively chooses PCs that are most likely to vary due to the change for early detection. Importantly, we integrate a novel diagnostic approach, Principal Component Signal Recovery (PCSR), to enable a streamlined SPC. This diagnostics approach draws inspiration from Compressed Sensing and uses Adaptive Lasso for identifying the sparse change in the process. We theoretically motivate our approaches and do a performance evaluation of our integrated M&D method through simulations and case studies. △ Less

Submitted 26 December, 2018; originally announced December 2018.

arXiv:1812.01784 [pdf, other]

Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders

Authors: Edgar Schönfeld, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, Zeynep Akata

Abstract: Many approaches in generalized zero-shot learning rely on cross-modal mapping between the image feature space and the class embedding space. As labeled images are expensive, one direction is to augment the dataset by generating either images or image features. However, the former misses fine-grained details and the latter requires learning a mapping associated with class embeddings. In this work,… ▽ More Many approaches in generalized zero-shot learning rely on cross-modal mapping between the image feature space and the class embedding space. As labeled images are expensive, one direction is to augment the dataset by generating either images or image features. However, the former misses fine-grained details and the latter requires learning a mapping associated with class embeddings. In this work, we take feature generation one step further and propose a model where a shared latent space of image features and class embeddings is learned by modality-specific aligned variational autoencoders. This leaves us with the required discriminative information about the image and classes in the latent features, on which we train a softmax classifier. The key to our approach is that we align the distributions learned from images and from side-information to construct latent features that contain the essential multi-modal information associated with unseen classes. We evaluate our learned latent features on several benchmark datasets, i.e. CUB, SUN, AWA1 and AWA2, and establish a new state of the art on generalized zero-shot as well as on few-shot learning. Moreover, our results on ImageNet with various zero-shot splits show that our latent features generalize well in large-scale settings. △ Less

Submitted 5 April, 2019; v1 submitted 4 December, 2018; originally announced December 2018.

Comments: Accepted at CVPR 2019

arXiv:1807.07560 [pdf, other]

Compositional GAN: Learning Image-Conditional Binary Composition

Authors: Samaneh Azadi, Deepak Pathak, Sayna Ebrahimi, Trevor Darrell

Abstract: Generative Adversarial Networks (GANs) can produce images of remarkable complexity and realism but are generally structured to sample from a single latent source ignoring the explicit spatial interaction between multiple entities that could be present in a scene. Capturing such complex interactions between different objects in the world, including their relative scaling, spatial layout, occlusion,… ▽ More Generative Adversarial Networks (GANs) can produce images of remarkable complexity and realism but are generally structured to sample from a single latent source ignoring the explicit spatial interaction between multiple entities that could be present in a scene. Capturing such complex interactions between different objects in the world, including their relative scaling, spatial layout, occlusion, or viewpoint transformation is a challenging problem. In this work, we propose a novel self-consistent Composition-by-Decomposition (CoDe) network to compose a pair of objects. Given object images from two distinct distributions, our model can generate a realistic composite image from their joint distribution following the texture and shape of the input objects. We evaluate our approach through qualitative experiments and user evaluations. Our results indicate that the learned model captures potential interactions between the two object domains, and generates realistic composed scenes at test time. △ Less

Submitted 28 March, 2019; v1 submitted 19 July, 2018; originally announced July 2018.

arXiv:1806.07912 [pdf, other]

Resource-Efficient Neural Architect

Authors: Yanqi Zhou, Siavash Ebrahimi, Sercan Ö. Arık, Haonan Yu, Hairong Liu, Greg Diamos

Abstract: Neural Architecture Search (NAS) is a laborious process. Prior work on automated NAS targets mainly on improving accuracy, but lacks consideration of computational resource use. We propose the Resource-Efficient Neural Architect (RENA), an efficient resource-constrained NAS using reinforcement learning with network embedding. RENA uses a policy network to process the network embeddings to generate… ▽ More Neural Architecture Search (NAS) is a laborious process. Prior work on automated NAS targets mainly on improving accuracy, but lacks consideration of computational resource use. We propose the Resource-Efficient Neural Architect (RENA), an efficient resource-constrained NAS using reinforcement learning with network embedding. RENA uses a policy network to process the network embeddings to generate new configurations. We demonstrate RENA on image recognition and keyword spotting (KWS) problems. RENA can find novel architectures that achieve high performance even with tight resource constraints. For CIFAR10, it achieves 2.95% test error when compute intensity is greater than 100 FLOPs/byte, and 3.87% test error when model size is less than 3M parameters. For Google Speech Commands Dataset, RENA achieves the state-of-the-art accuracy without resource constraints, and it outperforms the optimized architectures with tight resource constraints. △ Less

Submitted 12 June, 2018; originally announced June 2018.

arXiv:1805.06747 [pdf, other]

doi 10.1109/TPDS.2018.2796100

ReCA: an Efficient Reconfigurable Cache Architecture for Storage Systems with Online Workload Characterization

Authors: Reza Salkhordeh, Shahriar Ebrahimi, Hossein Asadi

Abstract: In recent years, SSDs have gained tremendous attention in computing and storage systems due to significant performance improvement over HDDs. The cost per capacity of SSDs, however, prevents them from entirely replacing HDDs in such systems. One approach to effectively take advantage of SSDs is to use them as a caching layer to store performance critical data blocks to reduce the number of accesse… ▽ More In recent years, SSDs have gained tremendous attention in computing and storage systems due to significant performance improvement over HDDs. The cost per capacity of SSDs, however, prevents them from entirely replacing HDDs in such systems. One approach to effectively take advantage of SSDs is to use them as a caching layer to store performance critical data blocks to reduce the number of accesses to disk subsystem. Due to characteristics of Flash-based SSDs such as limited write endurance and long latency on write operations, employing caching algorithms at the Operating System (OS) level necessitates to take such characteristics into consideration. Previous caching techniques are optimized towards only one type of application, which affects both generality and applicability. In addition, they are not adaptive when the workload pattern changes over time. This paper presents an efficient Reconfigurable Cache Architecture (ReCA) for storage systems using a comprehensive workload characterization to find an optimal cache configuration for I/O intensive applications. For this purpose, we first investigate various types of I/O workloads and classify them into five major classes. Based on this characterization, an optimal cache configuration is presented for each class of workloads. Then, using the main features of each class, we continuously monitor the characteristics of an application during system runtime and the cache organization is reconfigured if the application changes from one class to another class of workloads. The cache reconfiguration is done online and workload classes can be extended to emerging I/O workloads in order to maintain its efficiency with the characteristics of I/O requests. Experimental results obtained by implementing ReCA in a server running Linux show that the proposed architecture improves performance and lifetime up to 24\% and 33\%, respectively. △ Less

Submitted 3 May, 2018; originally announced May 2018.

Journal ref: IEEE TPDS 2018

arXiv:1805.00325 [pdf, other]

Study of Residual Networks for Image Recognition

Authors: Mohammad Sadegh Ebrahimi, Hossein Karkeh Abadi

Abstract: Deep neural networks demonstrate to have a high performance on image classification tasks while being more difficult to train. Due to the complexity and vanishing gradient problem, it normally takes a lot of time and more computational power to train deeper neural networks. Deep residual networks (ResNets) can make the training process faster and attain more accuracy compared to their equivalent n… ▽ More Deep neural networks demonstrate to have a high performance on image classification tasks while being more difficult to train. Due to the complexity and vanishing gradient problem, it normally takes a lot of time and more computational power to train deeper neural networks. Deep residual networks (ResNets) can make the training process faster and attain more accuracy compared to their equivalent neural networks. ResNets achieve this improvement by adding a simple skip connection parallel to the layers of convolutional neural networks. In this project we first design a ResNet model that can perform the image classification task on the Tiny ImageNet dataset with a high accuracy, then we compare the performance of this ResNet model with its equivalent Convolutional Network (ConvNet). Our findings illustrate that ResNets are more prone to overfitting despite their higher accuracy. Several methods to prevent overfitting such as adding dropout layers and stochastic augmentation of the training dataset has been studied in this work. △ Less

Submitted 21 April, 2018; originally announced May 2018.

Comments: 6 pages, 9 figures

arXiv:1804.08044 [pdf]

Predicting User Performance and Bitcoin Price Using Block Chain Transaction Network

Authors: Mohammad Sadegh Ebrahimi, Afshin Babveyh

Abstract: This work is organized as follows. In the first section we review the prior work and we have obtained our data. Next, we will look at address reuse in the Bitcoin network. We show that a great portion of users reuse their addresses which could enable us to cluster the addresses and attribute them to single users. Next, we will categorize the nodes based on their role in the network as a customer o… ▽ More This work is organized as follows. In the first section we review the prior work and we have obtained our data. Next, we will look at address reuse in the Bitcoin network. We show that a great portion of users reuse their addresses which could enable us to cluster the addresses and attribute them to single users. Next, we will categorize the nodes based on their role in the network as a customer or seller. Finally, we do a study of nodes and network performance. △ Less

Submitted 21 April, 2018; originally announced April 2018.

Comments: 8 pages, 7 figures

arXiv:1802.03319 [pdf, other]

doi 10.1145/3159652.3159701

Predicting Audio Advertisement Quality

Authors: Samaneh Ebrahimi, Hossein Vahabi, Matthew Prockup, Oriol Nieto

Abstract: Online audio advertising is a particular form of advertising used abundantly in online music streaming services. In these platforms, which tend to host tens of thousands of unique audio advertisements (ads), providing high quality ads ensures a better user experience and results in longer user engagement. Therefore, the automatic assessment of these ads is an important step toward audio ads rankin… ▽ More Online audio advertising is a particular form of advertising used abundantly in online music streaming services. In these platforms, which tend to host tens of thousands of unique audio advertisements (ads), providing high quality ads ensures a better user experience and results in longer user engagement. Therefore, the automatic assessment of these ads is an important step toward audio ads ranking and better audio ads creation. In this paper we propose one way to measure the quality of the audio ads using a proxy metric called Long Click Rate (LCR), which is defined by the amount of time a user engages with the follow-up display ad (that is shown while the audio ad is playing) divided by the impressions. We later focus on predicting the audio ad quality using only acoustic features such as harmony, rhythm, and timbre of the audio, extracted from the raw waveform. We discuss how the characteristics of the sound can be connected to concepts such as the clarity of the audio ad message, its trustworthiness, etc. Finally, we propose a new deep learning model for audio ad quality prediction, which outperforms the other discussed models trained on hand-crafted features. To the best of our knowledge, this is the first large-scale audio ad quality prediction study. △ Less

Submitted 9 February, 2018; originally announced February 2018.

Comments: WSDM '18 Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, 9 pages

Journal ref: 2018. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (WSDM '18)

arXiv:1710.05958 [pdf, other]

Gradient-free Policy Architecture Search and Adaptation

Authors: Sayna Ebrahimi, Anna Rohrbach, Trevor Darrell

Abstract: We develop a method for policy architecture search and adaptation via gradient-free optimization which can learn to perform autonomous driving tasks. By learning from both demonstration and environmental reward we develop a model that can learn with relatively few early catastrophic failures. We first learn an architecture of appropriate complexity to perceive aspects of world state relevant to th… ▽ More We develop a method for policy architecture search and adaptation via gradient-free optimization which can learn to perform autonomous driving tasks. By learning from both demonstration and environmental reward we develop a model that can learn with relatively few early catastrophic failures. We first learn an architecture of appropriate complexity to perceive aspects of world state relevant to the expert demonstration, and then mitigate the effect of domain-shift during deployment by adapting a policy demonstrated in a source domain to rewards obtained in a target environment. We show that our approach allows safer learning than baseline methods, offering a reduced cumulative crash metric over the agent's lifetime as it learns to drive in a realistic simulated environment. △ Less

Submitted 16 October, 2017; originally announced October 2017.

Comments: Accepted in Conference on Robot Learning, 2017

arXiv:1704.03396 [pdf, ps, other]

Source-Sensitive Belief Change

Authors: Shahab Ebrahimi

Abstract: The AGM model is the most remarkable framework for modeling belief revision. However, it is not perfect in all aspects. Paraconsistent belief revision, multi-agent belief revision and non-prioritized belief revision are three different extensions to AGM to address three important criticisms applied to it. In this article, we propose a framework based on AGM that takes a position in each of these c… ▽ More The AGM model is the most remarkable framework for modeling belief revision. However, it is not perfect in all aspects. Paraconsistent belief revision, multi-agent belief revision and non-prioritized belief revision are three different extensions to AGM to address three important criticisms applied to it. In this article, we propose a framework based on AGM that takes a position in each of these categories. Also, we discuss some features of our framework and study the satisfiability of AGM postulates in this new context. △ Less

Submitted 5 May, 2017; v1 submitted 11 April, 2017; originally announced April 2017.

Comments: 13 pages

Journal ref: International Journal of Artificial Intelligence and Applications (IJAIA), Vol.8, No.2, March 2017

arXiv:1608.03533 [pdf, other]

Sequence Graph Transform (SGT): A Feature Embedding Function for Sequence Data Mining

Authors: Chitta Ranjan, Samaneh Ebrahimi, Kamran Paynabar

Abstract: Sequence feature embedding is a challenging task due to the unstructuredness of sequence, i.e., arbitrary strings of arbitrary length. Existing methods are efficient in extracting short-term dependencies but typically suffer from computation issues for the long-term. Sequence Graph Transform (SGT), a feature embedding function, that can extract a varying amount of short- to long-term dependencies… ▽ More Sequence feature embedding is a challenging task due to the unstructuredness of sequence, i.e., arbitrary strings of arbitrary length. Existing methods are efficient in extracting short-term dependencies but typically suffer from computation issues for the long-term. Sequence Graph Transform (SGT), a feature embedding function, that can extract a varying amount of short- to long-term dependencies without increasing the computation is proposed. SGT's properties are analytically proved for interpretation under normal and uniform distribution assumptions. SGT features yield significantly superior results in sequence clustering and classification with higher accuracy and lower computation as compared to the existing methods, including the state-of-the-art sequence/string Kernels and LSTM. △ Less

Submitted 4 October, 2021; v1 submitted 11 August, 2016; originally announced August 2016.

Showing 1–46 of 46 results for author: Ebrahimi, S