subscribe to arXiv mailings

Toward Regulatory Compliance: A few-shot Learning Approach to Extract Processing Activities

Authors: Pragyan K C, Rambod Ghandiparsi, Rocky Slavin, Sepideh Ghanavati, Travis Breaux, Mitra Bokaei Hosseini

Abstract: The widespread use of mobile applications has driven the growth of the industry, with companies relying heavily on user data for services like targeted advertising and personalized offerings. In this context, privacy regulations such as the General Data Protection Regulation (GDPR) play a crucial role. One of the GDPR requirements is the maintenance of a Record of Processing Activities (RoPA) by c… ▽ More The widespread use of mobile applications has driven the growth of the industry, with companies relying heavily on user data for services like targeted advertising and personalized offerings. In this context, privacy regulations such as the General Data Protection Regulation (GDPR) play a crucial role. One of the GDPR requirements is the maintenance of a Record of Processing Activities (RoPA) by companies. RoPA encompasses various details, including the description of data processing activities, their purposes, types of data involved, and other relevant external entities. Small app-developing companies face challenges in meeting such compliance requirements due to resource limitations and tight timelines. To aid these developers and prevent fines, we propose a method to generate segments of RoPA from user-authored usage scenarios using large language models (LLMs). Our method employs few-shot learning with GPT-3.5 Turbo to summarize usage scenarios and generate RoPA segments. We evaluate different factors that can affect few-shot learning performance consistency for our summarization task, including the number of examples in few-shot learning prompts, repetition, and order permutation of examples in the prompts. Our findings highlight the significant influence of the number of examples in prompts on summarization F1 scores, while demonstrating negligible variability in F1 scores across multiple prompt repetitions. Our prompts achieve successful summarization of processing activities with an average 70% ROUGE-L F1 score. Finally, we discuss avenues for improving results through manual evaluation of the generated summaries. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: Accepted in the the 11th International Workshop on Evolving Security & Privacy Requirements Engineering (ESPRE)

arXiv:2407.06464 [pdf, other]

SideSeeing: A multimodal dataset and collection of tools for sidewalk assessment

Authors: R. J. P. Damaceno, L. Ferreira, F. Miranda, M. Hosseini, R. M. Cesar Jr

Abstract: This paper introduces SideSeeing, a novel initiative that provides tools and datasets for assessing the built environment. We present a framework for street-level data acquisition, loading, and analysis. Using the framework, we collected a novel dataset that integrates synchronized video footaged captured from chest-mounted mobile devices with sensor data (accelerometer, gyroscope, magnetometer, a… ▽ More This paper introduces SideSeeing, a novel initiative that provides tools and datasets for assessing the built environment. We present a framework for street-level data acquisition, loading, and analysis. Using the framework, we collected a novel dataset that integrates synchronized video footaged captured from chest-mounted mobile devices with sensor data (accelerometer, gyroscope, magnetometer, and GPS). Each data sample represents a path traversed by a user filming sidewalks near hospitals in Brazil and the USA. The dataset encompasses three hours of content covering 12 kilometers around nine hospitals, and includes 325,000 video frames with corresponding sensor data. Additionally, we present a novel 68-element taxonomy specifically created for sidewalk scene identification. SideSeeing is a step towards a suite of tools that urban experts can use to perform in-depth sidewalk accessibility evaluations. SideSeeing data and tools are publicly available at https://sites.usp.br/sideseeing/. △ Less

Submitted 12 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

Comments: 11 pages, 7 figures

arXiv:2407.03552 [pdf, other]

Vision Mamba for Classification of Breast Ultrasound Images

Authors: Ali Nasiri-Sarvi, Mahdi S. Hosseini, Hassan Rivaz

Abstract: Mamba-based models, VMamba and Vim, are a recent family of vision encoders that offer promising performance improvements in many computer vision tasks. This paper compares Mamba-based models with traditional Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) using the breast ultrasound BUSI and B datasets. Our evaluation, which includes multiple runs of experiments and statistical… ▽ More Mamba-based models, VMamba and Vim, are a recent family of vision encoders that offer promising performance improvements in many computer vision tasks. This paper compares Mamba-based models with traditional Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) using the breast ultrasound BUSI and B datasets. Our evaluation, which includes multiple runs of experiments and statistical significance analysis, demonstrates that Mamba-based architectures frequently outperform CNN and ViT models with statistically significant results. These Mamba-based models effectively capture long-range dependencies while maintaining inductive biases, making them suitable for applications with limited data. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2406.19803 [pdf, other]

Scalable and Domain-General Abstractive Proposition Segmentation

Authors: Mohammad Javad Hosseini, Yang Gao, Tim Baumgärtner, Alex Fabrikant, Reinald Kim Amplayo

Abstract: Segmenting text into fine-grained units of meaning is important to a wide range of NLP applications. The default approach of segmenting text into sentences is often insufficient, especially since sentences are usually complex enough to include multiple units of meaning that merit separate treatment in the downstream task. We focus on the task of abstractive proposition segmentation: transforming t… ▽ More Segmenting text into fine-grained units of meaning is important to a wide range of NLP applications. The default approach of segmenting text into sentences is often insufficient, especially since sentences are usually complex enough to include multiple units of meaning that merit separate treatment in the downstream task. We focus on the task of abstractive proposition segmentation: transforming text into simple, self-contained, well-formed sentences. Several recent works have demonstrated the utility of proposition segmentation with few-shot prompted LLMs for downstream tasks such as retrieval-augmented grounding and fact verification. However, this approach does not scale to large amounts of text and may not always extract all the facts from the input text. In this paper, we first introduce evaluation metrics for the task to measure several dimensions of quality. We then propose a scalable, yet accurate, proposition segmentation model. We model proposition segmentation as a supervised task by training LLMs on existing annotated datasets and show that training yields significantly improved results. We further show that by using the fine-tuned LLMs as teachers for annotating large amounts of multi-domain synthetic distillation data, we can train smaller student models with results similar to the teacher LLMs. We then demonstrate that our technique leads to effective domain generalization, by annotating data in two domains outside the original training data and evaluating on them. Finally, as a key contribution of the paper, we share an easy-to-use API for NLP practitioners to use. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.01551 [pdf, other]

ELSA: Evaluating Localization of Social Activities in Urban Streets

Authors: Maryam Hosseini, Marco Cipriano, Sedigheh Eslami, Daniel Hodczak, Liu Liu, Andres Sevtsuk, Gerard de Melo

Abstract: Why do some streets attract more social activities than others? Is it due to street design, or do land use patterns in neighborhoods create opportunities for businesses where people gather? These questions have intrigued urban sociologists, designers, and planners for decades. Yet, most research in this area has remained limited in scale, lacking a comprehensive perspective on the various factors… ▽ More Why do some streets attract more social activities than others? Is it due to street design, or do land use patterns in neighborhoods create opportunities for businesses where people gather? These questions have intrigued urban sociologists, designers, and planners for decades. Yet, most research in this area has remained limited in scale, lacking a comprehensive perspective on the various factors influencing social interactions in urban settings. Exploring these issues requires fine-level data on the frequency and variety of social interactions on urban street. Recent advances in computer vision and the emergence of the open-vocabulary detection models offer a unique opportunity to address this long-standing issue on a scale that was previously impossible using traditional observational methods. In this paper, we propose a new benchmark dataset for Evaluating Localization of Social Activities (ELSA) in urban street images. ELSA draws on theoretical frameworks in urban sociology and design. While majority of action recognition datasets are collected in controlled settings, we use in-the-wild street-level imagery, where the size of social groups and the types of activities can vary significantly. ELSA includes 937 manually annotated images with more than 4,300 multi-labeled bounding boxes for individual and group activities, categorized into three primary groups: Condition, State, and Action. Each category contains various sub-categories, e.g., alone or group under Condition category, standing or walking, which fall under the State category, and talking or dining with regards to the Action category. ELSA is publicly available for the research community. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.18942 [pdf, other]

Verifiably Robust Conformal Prediction

Authors: Linus Jeary, Tom Kuipers, Mehran Hosseini, Nicola Paoletti

Abstract: Conformal Prediction (CP) is a popular uncertainty quantification method that provides distribution-free, statistically valid prediction sets, assuming that training and test data are exchangeable. In such a case, CP's prediction sets are guaranteed to cover the (unknown) true test output with a user-specified probability. Nevertheless, this guarantee is violated when the data is subjected to adve… ▽ More Conformal Prediction (CP) is a popular uncertainty quantification method that provides distribution-free, statistically valid prediction sets, assuming that training and test data are exchangeable. In such a case, CP's prediction sets are guaranteed to cover the (unknown) true test output with a user-specified probability. Nevertheless, this guarantee is violated when the data is subjected to adversarial attacks, which often result in a significant loss of coverage. Recently, several approaches have been put forward to recover CP guarantees in this setting. These approaches leverage variations of randomised smoothing to produce conservative sets which account for the effect of the adversarial perturbations. They are, however, limited in that they only support $\ell^2$-bounded perturbations and classification tasks. This paper introduces VRCP (Verifiably Robust Conformal Prediction), a new framework that leverages recent neural network verification methods to recover coverage guarantees under adversarial attacks. Our VRCP method is the first to support perturbations bounded by arbitrary norms including $\ell^1$, $\ell^2$, and $\ell^\infty$, as well as regression tasks. We evaluate and compare our approach on image classification tasks (CIFAR10, CIFAR100, and TinyImageNet) and regression tasks for deep reinforcement learning environments. In every case, VRCP achieves above nominal coverage and yields significantly more efficient and informative prediction regions than the SotA. △ Less

Submitted 6 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

MSC Class: 68T37 (Primary) 68T27 (Secondary) ACM Class: G.3; I.2.4; F.4.1

arXiv:2405.16397 [pdf, other]

AdaFisher: Adaptive Second Order Optimization via Fisher Information

Authors: Damien Martins Gomes, Yanlei Zhang, Eugene Belilovsky, Guy Wolf, Mahdi S. Hosseini

Abstract: First-order optimization methods are currently the mainstream in training deep neural networks (DNNs). Optimizers like Adam incorporate limited curvature information by employing the diagonal matrix preconditioning of the stochastic gradient during the training. Despite their widespread, second-order optimization algorithms exhibit superior convergence properties compared to their first-order coun… ▽ More First-order optimization methods are currently the mainstream in training deep neural networks (DNNs). Optimizers like Adam incorporate limited curvature information by employing the diagonal matrix preconditioning of the stochastic gradient during the training. Despite their widespread, second-order optimization algorithms exhibit superior convergence properties compared to their first-order counterparts e.g. Adam and SGD. However, their practicality in training DNNs are still limited due to increased per-iteration computations and suboptimal accuracy compared to the first order methods. We present AdaFisher--an adaptive second-order optimizer that leverages a block-diagonal approximation to the Fisher information matrix for adaptive gradient preconditioning. AdaFisher aims to bridge the gap between enhanced convergence capabilities and computational efficiency in second-order optimization framework for training DNNs. Despite the slow pace of second-order optimizers, we showcase that AdaFisher can be reliably adopted for image classification, language modelling and stand out for its stability and robustness in hyperparameter tuning. We demonstrate that AdaFisher outperforms the SOTA optimizers in terms of both accuracy and convergence speed. Code available from \href{https://github.com/AtlasAnalyticsLab/AdaFisher}{https://github.com/AtlasAnalyticsLab/AdaFisher} △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.10648 [pdf, other]

Optimal Service Placement, Request Routing and CPU Sizing in Cooperative Mobile Edge Computing Networks for Delay-Sensitive Applications

Authors: Naeimeh Omidvar, Mahdieh Ahmadi, Seyed Mohammad Hosseini

Abstract: We study joint optimization of service placement, request routing, and CPU sizing in a cooperative MEC system. The problem is considered from the perspective of the service provider (SP), which delivers heterogeneous MEC-enabled delay-sensitive services, and needs to pay for the used resources to the mobile network operators and the cloud provider, while earning revenue from the served requests. W… ▽ More We study joint optimization of service placement, request routing, and CPU sizing in a cooperative MEC system. The problem is considered from the perspective of the service provider (SP), which delivers heterogeneous MEC-enabled delay-sensitive services, and needs to pay for the used resources to the mobile network operators and the cloud provider, while earning revenue from the served requests. We formulate the problem of maximizing the SP's total profit subject to the computation, storage, and communication constraints of each edge node and end-to-end delay requirements of the services as a mixed-integer non-convex optimization problem, and prove it to be NP-hard. To tackle the challenges in solving the problem, we first introduce a design trade-off parameter for different delay requirements of each service, which maintains flexibility in prioritizing them, and transform the original optimization problem by the new delay constraints. Then, by exploiting a hidden convexity, we reformulate the delay constraints into an equivalent form. Next, to handle the challenge of the complicating (integer) variables, using primal decomposition, we decompose the problem into an equivalent form of master and inner sub-problems over the mixed and real variables, respectively. We then employ a cutting-plane approach for building up adequate representations of the extremal value of the inner problem as a function of the complicating variables and the set of values of the complicating variables for which the inner problem is feasible. Finally, we propose a solution strategy based on generalized Benders decomposition and prove its convergence to the optimal solution within a limited number of iterations. Extensive simulation results demonstrate that the proposed scheme significantly outperforms the existing mechanisms in terms of the SP's profit, cache hit ratio, running time, and end-to-end delay. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.10622 [pdf, ps, other]

Differentially Private Machine Learning-powered Combinatorial Auction Design

Authors: Arash Jamshidi, Seyed Mohammad Hosseini, Seyed Mahdi Noormousavi, Mahdi Jafari Siavoshani

Abstract: We present a new approach to machine learning-powered combinatorial auctions, which is based on the principles of Differential Privacy. Our methodology guarantees that the auction mechanism is truthful, meaning that rational bidders have the incentive to reveal their true valuation functions. We achieve this by inducing truthfulness in the auction dynamics, ensuring that bidders consistently provi… ▽ More We present a new approach to machine learning-powered combinatorial auctions, which is based on the principles of Differential Privacy. Our methodology guarantees that the auction mechanism is truthful, meaning that rational bidders have the incentive to reveal their true valuation functions. We achieve this by inducing truthfulness in the auction dynamics, ensuring that bidders consistently provide accurate information about their valuation functions. Our method not only ensures truthfulness but also preserves the efficiency of the original auction. This means that if the initial auction outputs an allocation with high social welfare, our modified truthful version of the auction will also achieve high social welfare. We use techniques from Differential Privacy, such as the Exponential Mechanism, to achieve these results. Additionally, we examine the application of differential privacy in auctions across both asymptotic and non-asymptotic regimes. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2404.15976 [pdf, other]

doi 10.1111/cgf.15112

The State of the Art in Visual Analytics for 3D Urban Data

Authors: Fabio Miranda, Thomas Ortner, Gustavo Moreira, Maryam Hosseini, Milena Vuckovic, Filip Biljecki, Claudio Silva, Marcos Lage, Nivan Ferreira

Abstract: Urbanization has amplified the importance of three-dimensional structures in urban environments for a wide range of phenomena that are of significant interest to diverse stakeholders. With the growing availability of 3D urban data, numerous studies have focused on developing visual analysis techniques tailored to the unique characteristics of urban environments. However, incorporating the third di… ▽ More Urbanization has amplified the importance of three-dimensional structures in urban environments for a wide range of phenomena that are of significant interest to diverse stakeholders. With the growing availability of 3D urban data, numerous studies have focused on developing visual analysis techniques tailored to the unique characteristics of urban environments. However, incorporating the third dimension into visual analytics introduces additional challenges in designing effective visual tools to tackle urban data's diverse complexities. In this paper, we present a survey on visual analytics of 3D urban data. Our work characterizes published works along three main dimensions (why, what, and how), considering use cases, analysis tasks, data, visualizations, and interactions. We provide a fine-grained categorization of published works from visualization journals and conferences, as well as from a myriad of urban domains, including urban planning, architecture, and engineering. By incorporating perspectives from both urban and visualization experts, we identify literature gaps, motivate visualization researchers to understand challenges and opportunities, and indicate future research directions. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: Accepted at EuroVis 2024 (STAR track). Surveyed works available at https://urbantk.org/survey-3d

arXiv:2404.13222 [pdf, other]

Vim4Path: Self-Supervised Vision Mamba for Histopathology Images

Authors: Ali Nasiri-Sarvi, Vincent Quoc-Huy Trinh, Hassan Rivaz, Mahdi S. Hosseini

Abstract: Representation learning from Gigapixel Whole Slide Images (WSI) poses a significant challenge in computational pathology due to the complicated nature of tissue structures and the scarcity of labeled data. Multi-instance learning methods have addressed this challenge, leveraging image patches to classify slides utilizing pretrained models using Self-Supervised Learning (SSL) approaches. The perfor… ▽ More Representation learning from Gigapixel Whole Slide Images (WSI) poses a significant challenge in computational pathology due to the complicated nature of tissue structures and the scarcity of labeled data. Multi-instance learning methods have addressed this challenge, leveraging image patches to classify slides utilizing pretrained models using Self-Supervised Learning (SSL) approaches. The performance of both SSL and MIL methods relies on the architecture of the feature encoder. This paper proposes leveraging the Vision Mamba (Vim) architecture, inspired by state space models, within the DINO framework for representation learning in computational pathology. We evaluate the performance of Vim against Vision Transformers (ViT) on the Camelyon16 dataset for both patch-level and slide-level classification. Our findings highlight Vim's enhanced performance compared to ViT, particularly at smaller scales, where Vim achieves an 8.21 increase in ROC AUC for models of similar size. An explainability analysis further highlights Vim's capabilities, which reveals that Vim uniquely emulates the pathologist workflow-unlike ViT. This alignment with human expert analysis highlights Vim's potential in practical diagnostic settings and contributes significantly to developing effective representation-learning algorithms in computational pathology. We release the codes and pretrained weights at \url{https://github.com/AtlasAnalyticsLab/Vim4Path}. △ Less

Submitted 25 May, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

Comments: Accepted in CVPR2024 (9th Workshop on Computer Vision for Microscopy Image Analysis)

arXiv:2404.00178 [pdf, other]

Beyond Suspension: A Two-phase Methodology for Concluding Sports Leagues

Authors: Ali Hassanzadeh, Mojtaba Hosseini, John G. Turner

Abstract: Problem definition: Professional sports leagues may be suspended due to various reasons such as the recent COVID-19 pandemic. A critical question the league must address when re-opening is how to appropriately select a subset of the remaining games to conclude the season in a shortened time frame. Academic/practical relevance: Despite the rich literature on scheduling an entire season starting fro… ▽ More Problem definition: Professional sports leagues may be suspended due to various reasons such as the recent COVID-19 pandemic. A critical question the league must address when re-opening is how to appropriately select a subset of the remaining games to conclude the season in a shortened time frame. Academic/practical relevance: Despite the rich literature on scheduling an entire season starting from a blank slate, concluding an existing season is quite different. Our approach attempts to achieve team rankings similar to that which would have resulted had the season been played out in full. Methodology: We propose a data-driven model which exploits predictive and prescriptive analytics to produce a schedule for the remainder of the season comprised of a subset of originally-scheduled games. Our model introduces novel rankings-based objectives within a stochastic optimization model, whose parameters are first estimated using a predictive model. We introduce a deterministic equivalent reformulation along with a tailored Frank-Wolfe algorithm to efficiently solve our problem, as well as a robust counterpart based on min-max regret. Results: We present simulation-based numerical experiments from previous National Basketball Association (NBA) seasons 2004--2019, and show that our models are computationally efficient, outperform a greedy benchmark that approximates a non-rankings-based scheduling policy, and produce interpretable results. Managerial implications: Our data-driven decision-making framework may be used to produce a shortened season with 25-50\% fewer games while still producing an end-of-season ranking similar to that of the full season, had it been played. △ Less

Submitted 29 March, 2024; originally announced April 2024.

Comments: 32 pages, 9 figures

MSC Class: 90B50 (Primary) 90C06; 90C11; 90C90 (Secondary)

arXiv:2403.08077 [pdf, other]

A Multimodal Intermediate Fusion Network with Manifold Learning for Stress Detection

Authors: Morteza Bodaghi, Majid Hosseini, Raju Gottumukkala

Abstract: Multimodal deep learning methods capture synergistic features from multiple modalities and have the potential to improve accuracy for stress detection compared to unimodal methods. However, this accuracy gain typically comes from high computational cost due to the high-dimensional feature spaces, especially for intermediate fusion. Dimensionality reduction is one way to optimize multimodal learnin… ▽ More Multimodal deep learning methods capture synergistic features from multiple modalities and have the potential to improve accuracy for stress detection compared to unimodal methods. However, this accuracy gain typically comes from high computational cost due to the high-dimensional feature spaces, especially for intermediate fusion. Dimensionality reduction is one way to optimize multimodal learning by simplifying data and making the features more amenable to processing and analysis, thereby reducing computational complexity. This paper introduces an intermediate multimodal fusion network with manifold learning-based dimensionality reduction. The multimodal network generates independent representations from biometric signals and facial landmarks through 1D-CNN and 2D-CNN. Finally, these features are fused and fed to another 1D-CNN layer, followed by a fully connected dense layer. We compared various dimensionality reduction techniques for different variations of unimodal and multimodal networks. We observe that the intermediate-level fusion with the Multi-Dimensional Scaling (MDS) manifold method showed promising results with an accuracy of 96.00\% in a Leave-One-Subject-Out Cross-Validation (LOSO-CV) paradigm over other dimensional reduction methods. MDS had the highest computational cost among manifold learning methods. However, while outperforming other networks, it managed to reduce the computational cost of the proposed networks by 25\% when compared to six well-known conventional feature selection methods used in the preprocessing step. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: This work was accepted to The 3rd International Conference on Computing and Machine Intelligence (ICMI 2024)

arXiv:2403.01643 [pdf, other]

You Need to Pay Better Attention: Rethinking the Mathematics of Attention Mechanism

Authors: Mehran Hosseini, Peyman Hosseini

Abstract: Scaled Dot Product Attention (SDPA) is the backbone of many modern deep-learning models. It is so versatile that it has been used in natural language, vision, and multi-modal domains with very little change compared to its original formulation. This paper discusses why the current formulation is inefficient by delving into the mathematical details of the attention mechanism. We propose three impro… ▽ More Scaled Dot Product Attention (SDPA) is the backbone of many modern deep-learning models. It is so versatile that it has been used in natural language, vision, and multi-modal domains with very little change compared to its original formulation. This paper discusses why the current formulation is inefficient by delving into the mathematical details of the attention mechanism. We propose three improvements to mitigate these inefficiencies, thereby, introducing three enhanced attention mechanisms: Optimised, Efficient, and Super Attention. Optimised and Efficient Attention have one and two matrix multiplications fewer per head, respectively, and 25% and 50% fewer parameters, respectively, than standard SDPA, but perform similarly to standard SDPA in both vision and natural language tasks. They can be used in all applications where SDPA is used while offering smaller model sizes and faster training and inference without noticeable loss in performance. Super Attention introduces a new linear transformation on the values, transforming them from the left. It outperforms standard SPDA on vision and natural language tasks by up to 17% while having one fewer matrix multiplication per head and 25% fewer parameters than standard SDPA. Consequently, it is also faster than standard SDPA. Super Attention is ideal in applications where the attention layer's context length is fixed, such as Vision Transformers. In addition to providing mathematical reasoning, we evaluate the presented attention mechanisms on several datasets including MNIST, CIFAR100, ImageNet, IMDB Movie Reviews, and Amazon Reviews datasets, as well as combined Europarl and Anki English-Spanish datasets for neural machine translation. △ Less

Submitted 30 May, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

MSC Class: 68T07 (Primary) 68T45; 68T50; 68T10; 15A03; 15A04 (Secondary) ACM Class: I.2.6; I.2.7; I.2.10; I.4.0; I.5.0; I.7.0

arXiv:2402.17169 [pdf, other]

doi 10.1109/TBDATA.2024.3382964

Deep Umbra: A Generative Approach for Sunlight Access Computation in Urban Spaces

Authors: Kazi Shahrukh Omar, Gustavo Moreira, Daniel Hodczak, Maryam Hosseini, Nicola Colaninno, Marcos Lage, Fabio Miranda

Abstract: Sunlight and shadow play critical roles in how urban spaces are utilized, thrive, and grow. While access to sunlight is essential to the success of urban environments, shadows can provide shaded places to stay during the hot seasons, mitigate heat island effect, and increase pedestrian comfort levels. Properly quantifying sunlight access and shadows in large urban environments is key in tackling s… ▽ More Sunlight and shadow play critical roles in how urban spaces are utilized, thrive, and grow. While access to sunlight is essential to the success of urban environments, shadows can provide shaded places to stay during the hot seasons, mitigate heat island effect, and increase pedestrian comfort levels. Properly quantifying sunlight access and shadows in large urban environments is key in tackling some of the important challenges facing cities today. In this paper, we propose Deep Umbra, a novel computational framework that enables the quantification of sunlight access and shadows at a global scale. Our framework is based on a conditional generative adversarial network that considers the physical form of cities to compute high-resolution spatial information of accumulated sunlight access for the different seasons of the year. We use data from seven different cities to train our model, and show, through an extensive set of experiments, its low overall RMSE (below 0.1) as well as its extensibility to cities that were not part of the training set. Additionally, we contribute a set of case studies and a comprehensive dataset with sunlight access information for more than 100 cities across six continents of the world. Deep Umbra is available at https://urbantk.org/shadows. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: Accepted at IEEE Transactions on Big Data. Deep Umbra is available at https://urbantk.org/shadows

arXiv:2402.12368 [pdf, other]

A synthetic data approach for domain generalization of NLI models

Authors: Mohammad Javad Hosseini, Andrey Petrov, Alex Fabrikant, Annie Louis

Abstract: Natural Language Inference (NLI) remains an important benchmark task for LLMs. NLI datasets are a springboard for transfer learning to other semantic tasks, and NLI models are standard tools for identifying the faithfulness of model-generated text. There are several large scale NLI datasets today, and models have improved greatly by hill-climbing on these collections. Yet their realistic performan… ▽ More Natural Language Inference (NLI) remains an important benchmark task for LLMs. NLI datasets are a springboard for transfer learning to other semantic tasks, and NLI models are standard tools for identifying the faithfulness of model-generated text. There are several large scale NLI datasets today, and models have improved greatly by hill-climbing on these collections. Yet their realistic performance on out-of-distribution/domain data is less well-understood. We explore the opportunity for synthetic high-quality datasets to adapt NLI models for zero-shot use in downstream applications across new and unseen text domains. We demonstrate a new approach for generating NLI data in diverse domains and lengths, so far not covered by existing training sets. The resulting examples have meaningful premises, the hypotheses are formed in creative ways rather than simple edits to a few premise tokens, and the labels have high accuracy. We show that models trained on this data ($685$K synthetic examples) have the best generalization to completely new downstream test settings. On the TRUE benchmark, a T5-small model trained with our data improves around $7\%$ on average compared to training on the best alternative dataset. The improvements are more pronounced for smaller models, while still meaningful on a T5 XXL model. We also demonstrate gains on test sets when in-domain training data is augmented with our domain-general synthetic data. △ Less

Submitted 28 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2401.11627 [pdf, other]

Tight Verification of Probabilistic Robustness in Bayesian Neural Networks

Authors: Ben Batten, Mehran Hosseini, Alessio Lomuscio

Abstract: We introduce two algorithms for computing tight guarantees on the probabilistic robustness of Bayesian Neural Networks (BNNs). Computing robustness guarantees for BNNs is a significantly more challenging task than verifying the robustness of standard Neural Networks (NNs) because it requires searching the parameters' space for safe weights. Moreover, tight and complete approaches for the verificat… ▽ More We introduce two algorithms for computing tight guarantees on the probabilistic robustness of Bayesian Neural Networks (BNNs). Computing robustness guarantees for BNNs is a significantly more challenging task than verifying the robustness of standard Neural Networks (NNs) because it requires searching the parameters' space for safe weights. Moreover, tight and complete approaches for the verification of standard NNs, such as those based on Mixed-Integer Linear Programming (MILP), cannot be directly used for the verification of BNNs because of the polynomial terms resulting from the consecutive multiplication of variables encoding the weights. Our algorithms efficiently and effectively search the parameters' space for safe weights by using iterative expansion and the network's gradient and can be used with any verification algorithm of choice for BNNs. In addition to proving that our algorithms compute tighter bounds than the SoA, we also evaluate our algorithms against the SoA on standard benchmarks, such as MNIST and CIFAR10, showing that our algorithms compute bounds up to 40% tighter than the SoA. △ Less

Submitted 28 February, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

Comments: Accepted at AISTATS 2024

MSC Class: 68T27 (Primary) 68T45; 68T07; 68T01 (Secondary) ACM Class: I.2.0; I.2.4; F.3.1; D.2.4

arXiv:2401.01951 [pdf, other]

Can We Generate Realistic Hands Only Using Convolution?

Authors: Mehran Hosseini, Peyman Hosseini

Abstract: The enduring inability of image generative models to recreate intricate geometric features, such as those present in human hands and fingers has been an ongoing problem in image generation for nearly a decade. While strides have been made by increasing model sizes and diversifying training datasets, this issue remains prevalent across all models, from denoising diffusion models to Generative Adver… ▽ More The enduring inability of image generative models to recreate intricate geometric features, such as those present in human hands and fingers has been an ongoing problem in image generation for nearly a decade. While strides have been made by increasing model sizes and diversifying training datasets, this issue remains prevalent across all models, from denoising diffusion models to Generative Adversarial Networks (GAN), pointing to a fundamental shortcoming in the underlying architectures. In this paper, we demonstrate how this problem can be mitigated by augmenting convolution layers geometric capabilities through providing them with a single input channel incorporating the relative $n$-dimensional Cartesian coordinate system. We show that this drastically improves quality of hand and face images generated by GANs and Variational AutoEncoders (VAE). △ Less

Submitted 3 January, 2024; originally announced January 2024.

Comments: Contains 17 pages, 14 figures, and 6 tables

MSC Class: 51 ACM Class: I.2.10; I.4.0; I.4.10

arXiv:2311.03606 [pdf, other]

Multimodal Stress Detection Using Facial Landmarks and Biometric Signals

Authors: Majid Hosseini, Morteza Bodaghi, Ravi Teja Bhupatiraju, Anthony Maida, Raju Gottumukkala

Abstract: The development of various sensing technologies is improving measurements of stress and the well-being of individuals. Although progress has been made with single signal modalities like wearables and facial emotion recognition, integrating multiple modalities provides a more comprehensive understanding of stress, given that stress manifests differently across different people. Multi-modal learning… ▽ More The development of various sensing technologies is improving measurements of stress and the well-being of individuals. Although progress has been made with single signal modalities like wearables and facial emotion recognition, integrating multiple modalities provides a more comprehensive understanding of stress, given that stress manifests differently across different people. Multi-modal learning aims to capitalize on the strength of each modality rather than relying on a single signal. Given the complexity of processing and integrating high-dimensional data from limited subjects, more research is needed. Numerous research efforts have been focused on fusing stress and emotion signals at an early stage, e.g., feature-level fusion using basic machine learning methods and 1D-CNN Methods. This paper proposes a multi-modal learning approach for stress detection that integrates facial landmarks and biometric signals. We test this multi-modal integration with various early-fusion and late-fusion techniques to integrate the 1D-CNN model from biometric signals and 2-D CNN using facial landmarks. We evaluate these architectures using a rigorous test of models' generalizability using the leave-one-subject-out mechanism, i.e., all samples related to a single subject are left out to train the model. Our findings show that late-fusion achieved 94.39\% accuracy, and early-fusion surpassed it with a 98.38\% accuracy rate. This research contributes valuable insights into enhancing stress detection through a multi-modal approach. The proposed research offers important knowledge in improving stress detection using a multi-modal approach. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: 16 pages, 8 figures

arXiv:2309.05150 [pdf, other]

Faster, Lighter, More Accurate: A Deep Learning Ensemble for Content Moderation

Authors: Mohammad Hosseini, Mahmudul Hasan

Abstract: To address the increasing need for efficient and accurate content moderation, we propose an efficient and lightweight deep classification ensemble structure. Our approach is based on a combination of simple visual features, designed for high-accuracy classification of violent content with low false positives. Our ensemble architecture utilizes a set of lightweight models with narrowed-down color f… ▽ More To address the increasing need for efficient and accurate content moderation, we propose an efficient and lightweight deep classification ensemble structure. Our approach is based on a combination of simple visual features, designed for high-accuracy classification of violent content with low false positives. Our ensemble architecture utilizes a set of lightweight models with narrowed-down color features, and we apply it to both images and videos. We evaluated our approach using a large dataset of explosion and blast contents and compared its performance to popular deep learning models such as ResNet-50. Our evaluation results demonstrate significant improvements in prediction accuracy, while benefiting from 7.64x faster inference and lower computation cost. While our approach is tailored to explosion detection, it can be applied to other similar content moderation and violence detection use cases as well. Based on our experiments, we propose a "think small, think many" philosophy in classification scenarios. We argue that transforming a single, large, monolithic deep model into a verification-based step model ensemble of multiple small, simple, and lightweight models with narrowed-down visual features can possibly lead to predictions with higher accuracy. △ Less

Submitted 10 September, 2023; originally announced September 2023.

Comments: 6 pages, 22nd IEEE International Conference on Machine Learning and Applications (IEEE ICMLA'23), December 15-17, 2023, Jacksonville Riverfront, Florida, USA. arXiv admin note: substantial text overlap with arXiv:2103.10350

arXiv:2308.07769 [pdf, other]

doi 10.1109/TVCG.2023.3326598

The Urban Toolkit: A Grammar-based Framework for Urban Visual Analytics

Authors: Gustavo Moreira, Maryam Hosseini, Md Nafiul Alam Nipu, Marcos Lage, Nivan Ferreira, Fabio Miranda

Abstract: While cities around the world are looking for smart ways to use new advances in data collection, management, and analysis to address their problems, the complex nature of urban issues and the overwhelming amount of available data have posed significant challenges in translating these efforts into actionable insights. In the past few years, urban visual analytics tools have significantly helped tac… ▽ More While cities around the world are looking for smart ways to use new advances in data collection, management, and analysis to address their problems, the complex nature of urban issues and the overwhelming amount of available data have posed significant challenges in translating these efforts into actionable insights. In the past few years, urban visual analytics tools have significantly helped tackle these challenges. When analyzing a feature of interest, an urban expert must transform, integrate, and visualize different thematic (e.g., sunlight access, demographic) and physical (e.g., buildings, street networks) data layers, oftentimes across multiple spatial and temporal scales. However, integrating and analyzing these layers require expertise in different fields, increasing development time and effort. This makes the entire visual data exploration and system implementation difficult for programmers and also sets a high entry barrier for urban experts outside of computer science. With this in mind, in this paper, we present the Urban Toolkit (UTK), a flexible and extensible visualization framework that enables the easy authoring of web-based visualizations through a new high-level grammar specifically built with common urban use cases in mind. In order to facilitate the integration and visualization of different urban data, we also propose the concept of knots to merge thematic and physical urban layers. We evaluate our approach through use cases and a series of interviews with experts and practitioners from different domains, including urban accessibility, urban planning, architecture, and climate science. UTK is available at urbantk.org. △ Less

Submitted 15 August, 2023; originally announced August 2023.

Comments: Accepted at IEEE VIS 2023. UTK is available at http://urbantk.org

Journal ref: Published in: IEEE Transactions on Visualization and Computer Graphics ( Volume: 30, Issue: 1, January 2024)

arXiv:2308.03936 [pdf, other]

ALFA -- Leveraging All Levels of Feature Abstraction for Enhancing the Generalization of Histopathology Image Classification Across Unseen Hospitals

Authors: Milad Sikaroudi, Maryam Hosseini, Shahryar Rahnamayan, H. R. Tizhoosh

Abstract: We propose an exhaustive methodology that leverages all levels of feature abstraction, targeting an enhancement in the generalizability of image classification to unobserved hospitals. Our approach incorporates augmentation-based self-supervision with common distribution shifts in histopathology scenarios serving as the pretext task. This enables us to derive invariant features from training image… ▽ More We propose an exhaustive methodology that leverages all levels of feature abstraction, targeting an enhancement in the generalizability of image classification to unobserved hospitals. Our approach incorporates augmentation-based self-supervision with common distribution shifts in histopathology scenarios serving as the pretext task. This enables us to derive invariant features from training images without relying on training labels, thereby covering different abstraction levels. Moving onto the subsequent abstraction level, we employ a domain alignment module to facilitate further extraction of invariant features across varying training hospitals. To represent the highly specific features of participating hospitals, an encoder is trained to classify hospital labels, independent of their diagnostic labels. The features from each of these encoders are subsequently disentangled to minimize redundancy and segregate the features. This representation, which spans a broad spectrum of semantic information, enables the development of a model demonstrating increased robustness to unseen images from disparate distributions. Experimental results from the PACS dataset (a domain generalization benchmark), a synthetic dataset created by applying histopathology-specific jitters to the MHIST dataset (defining different domains with varied distribution shifts), and a Renal Cell Carcinoma dataset derived from four image repositories from TCGA, collectively indicate that our proposed model is adept at managing varying levels of image granularity. Thus, it shows improved generalizability when faced with new, out-of-distribution hospital images. △ Less

Submitted 9 August, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

Comments: Accepted for publication at ICCV 2023, Computer Vision for Automated Medical Diagnosis Workshop

arXiv:2307.03967 [pdf, other]

End-to-End Supervised Multilabel Contrastive Learning

Authors: Ahmad Sajedi, Samir Khaki, Konstantinos N. Plataniotis, Mahdi S. Hosseini

Abstract: Multilabel representation learning is recognized as a challenging problem that can be associated with either label dependencies between object categories or data-related issues such as the inherent imbalance of positive/negative samples. Recent advances address these challenges from model- and data-centric viewpoints. In model-centric, the label correlation is obtained by an external model designs… ▽ More Multilabel representation learning is recognized as a challenging problem that can be associated with either label dependencies between object categories or data-related issues such as the inherent imbalance of positive/negative samples. Recent advances address these challenges from model- and data-centric viewpoints. In model-centric, the label correlation is obtained by an external model designs (e.g., graph CNN) to incorporate an inductive bias for training. However, they fail to design an end-to-end training framework, leading to high computational complexity. On the contrary, in data-centric, the realistic nature of the dataset is considered for improving the classification while ignoring the label dependencies. In this paper, we propose a new end-to-end training framework -- dubbed KMCL (Kernel-based Mutlilabel Contrastive Learning) -- to address the shortcomings of both model- and data-centric designs. The KMCL first transforms the embedded features into a mixture of exponential kernels in Gaussian RKHS. It is then followed by encoding an objective loss that is comprised of (a) reconstruction loss to reconstruct kernel representation, (b) asymmetric classification loss to address the inherent imbalance problem, and (c) contrastive loss to capture label correlation. The KMCL models the uncertainty of the feature encoder while maintaining a low computational footprint. Extensive experiments are conducted on image classification tasks to showcase the consistent improvements of KMCL over the SOTA methods. PyTorch implementation is provided in \url{https://github.com/mahdihosseini/KMCL}. △ Less

Submitted 8 July, 2023; originally announced July 2023.

arXiv:2305.19585 [pdf, other]

LAIT: Efficient Multi-Segment Encoding in Transformers with Layer-Adjustable Interaction

Authors: Jeremiah Milbauer, Annie Louis, Mohammad Javad Hosseini, Alex Fabrikant, Donald Metzler, Tal Schuster

Abstract: Transformer encoders contextualize token representations by attending to all other tokens at each layer, leading to quadratic increase in compute effort with the input length. In practice, however, the input text of many NLP tasks can be seen as a sequence of related segments (e.g., the sequence of sentences within a passage, or the hypothesis and premise in NLI). While attending across these segm… ▽ More Transformer encoders contextualize token representations by attending to all other tokens at each layer, leading to quadratic increase in compute effort with the input length. In practice, however, the input text of many NLP tasks can be seen as a sequence of related segments (e.g., the sequence of sentences within a passage, or the hypothesis and premise in NLI). While attending across these segments is highly beneficial for many tasks, we hypothesize that this interaction can be delayed until later encoding stages. To this end, we introduce Layer-Adjustable Interactions in Transformers (LAIT). Within LAIT, segmented inputs are first encoded independently, and then jointly. This partial two-tower architecture bridges the gap between a Dual Encoder's ability to pre-compute representations for segments and a fully self-attentive Transformer's capacity to model cross-segment attention. The LAIT framework effectively leverages existing pretrained Transformers and converts them into the hybrid of the two aforementioned architectures, allowing for easy and intuitive control over the performance-efficiency tradeoff. Experimenting on a wide range of NLP tasks, we find LAIT able to reduce 30-50% of the attention FLOPs on many tasks, while preserving high accuracy; in some practical settings, LAIT could reduce actual latency by orders of magnitude. △ Less

Submitted 31 May, 2023; originally announced May 2023.

Comments: ACL 2023

arXiv:2305.14552 [pdf, other]

Sources of Hallucination by Large Language Models on Inference Tasks

Authors: Nick McKenna, Tianyi Li, Liang Cheng, Mohammad Javad Hosseini, Mark Johnson, Mark Steedman

Abstract: Large Language Models (LLMs) are claimed to be capable of Natural Language Inference (NLI), necessary for applied tasks like question answering and summarization. We present a series of behavioral studies on several LLM families (LLaMA, GPT-3.5, and PaLM) which probe their behavior using controlled experiments. We establish two biases originating from pretraining which predict much of their behavi… ▽ More Large Language Models (LLMs) are claimed to be capable of Natural Language Inference (NLI), necessary for applied tasks like question answering and summarization. We present a series of behavioral studies on several LLM families (LLaMA, GPT-3.5, and PaLM) which probe their behavior using controlled experiments. We establish two biases originating from pretraining which predict much of their behavior, and show that these are major sources of hallucination in generative LLMs. First, memorization at the level of sentences: we show that, regardless of the premise, models falsely label NLI test samples as entailing when the hypothesis is attested in training data, and that entities are used as ``indices'' to access the memorized data. Second, statistical patterns of usage learned at the level of corpora: we further show a similar effect when the premise predicate is less frequent than that of the hypothesis in the training data, a bias following from previous studies. We demonstrate that LLMs perform significantly worse on NLI test samples which do not conform to these biases than those which do, and we offer these as valuable controls for future LLM evaluation. △ Less

Submitted 22 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: Findings of EMNLP 2023

arXiv:2304.06907 [pdf, other]

doi 10.1007/s11554-022-01210-6

Toward Real-Time Image Annotation Using Marginalized Coupled Dictionary Learning

Authors: Seyed Mahdi Roostaiyan, Mohammad Mehdi Hosseini, Mahya Mohammadi Kashani, S. Hamid Amiri

Abstract: In most image retrieval systems, images include various high-level semantics, called tags or annotations. Virtually all the state-of-the-art image annotation methods that handle imbalanced labeling are search-based techniques which are time-consuming. In this paper, a novel coupled dictionary learning approach is proposed to learn a limited number of visual prototypes and their corresponding seman… ▽ More In most image retrieval systems, images include various high-level semantics, called tags or annotations. Virtually all the state-of-the-art image annotation methods that handle imbalanced labeling are search-based techniques which are time-consuming. In this paper, a novel coupled dictionary learning approach is proposed to learn a limited number of visual prototypes and their corresponding semantics simultaneously. This approach leads to a real-time image annotation procedure. Another contribution of this paper is that utilizes a marginalized loss function instead of the squared loss function that is inappropriate for image annotation with imbalanced labels. We have employed a marginalized loss function in our method to leverage a simple and effective method of prototype updating. Meanwhile, we have introduced ${\ell}_1$ regularization on semantic prototypes to preserve the sparse and imbalanced nature of labels in learned semantic prototypes. Finally, comprehensive experimental results on various datasets demonstrate the efficiency of the proposed method for image annotation tasks in terms of accuracy and time. The reference implementation is publicly available on https://github.com/hamid-amiri/MCDL-Image-Annotation. △ Less

Submitted 17 April, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

Comments: @article{roostaiyan2022toward, title={Toward real-time image annotation using marginalized coupled dictionary learning}, author={Roostaiyan, Seyed Mahdi and Hosseini, Mohammad Mehdi and Kashani, Mahya Mohammadi and Amiri, S Hamid}, journal={Journal of Real-Time Image Processing}, volume={19}, number={3}, pages={623--638}, year={2022}, publisher={Springer} }

Journal ref: Journal of Real-Time Image Processing. 2022 Jun;19(3):623-38

arXiv:2304.05482 [pdf, other]

Computational Pathology: A Survey Review and The Way Forward

Authors: Mahdi S. Hosseini, Babak Ehteshami Bejnordi, Vincent Quoc-Huy Trinh, Danial Hasan, Xingwen Li, Taehyo Kim, Haochen Zhang, Theodore Wu, Kajanan Chinniah, Sina Maghsoudlou, Ryan Zhang, Stephen Yang, Jiadai Zhu, Lyndon Chan, Samir Khaki, Andrei Buin, Fatemeh Chaji, Ala Salehi, Bich Ngoc Nguyen, Dimitris Samaras, Konstantinos N. Plataniotis

Abstract: Computational Pathology CPath is an interdisciplinary science that augments developments of computational approaches to analyze and model medical histopathology images. The main objective for CPath is to develop infrastructure and workflows of digital diagnostics as an assistive CAD system for clinical pathology, facilitating transformational changes in the diagnosis and treatment of cancer that a… ▽ More Computational Pathology CPath is an interdisciplinary science that augments developments of computational approaches to analyze and model medical histopathology images. The main objective for CPath is to develop infrastructure and workflows of digital diagnostics as an assistive CAD system for clinical pathology, facilitating transformational changes in the diagnosis and treatment of cancer that are mainly address by CPath tools. With evergrowing developments in deep learning and computer vision algorithms, and the ease of the data flow from digital pathology, currently CPath is witnessing a paradigm shift. Despite the sheer volume of engineering and scientific works being introduced for cancer image analysis, there is still a considerable gap of adopting and integrating these algorithms in clinical practice. This raises a significant question regarding the direction and trends that are undertaken in CPath. In this article we provide a comprehensive review of more than 800 papers to address the challenges faced in problem design all-the-way to the application and implementation viewpoints. We have catalogued each paper into a model-card by examining the key works and challenges faced to layout the current landscape in CPath. We hope this helps the community to locate relevant works and facilitate understanding of the field's future directions. In a nutshell, we oversee the CPath developments in cycle of stages which are required to be cohesively linked together to address the challenges associated with such multidisciplinary science. We overview this cycle from different perspectives of data-centric, model-centric, and application-centric problems. We finally sketch remaining challenges and provide directions for future technical developments and clinical integration of CPath (https://github.com/AtlasAnalyticsLab/CPath_Survey). △ Less

Submitted 27 January, 2024; v1 submitted 11 April, 2023; originally announced April 2023.

Comments: Accepted in Elsevier Journal of Pathology Informatics (JPI) 2024

arXiv:2303.02468 [pdf, other]

doi 10.18653/v1/2023.semeval-1.185

Lon-ea at SemEval-2023 Task 11: A Comparison of Activation Functions for Soft and Hard Label Prediction

Authors: Peyman Hosseini, Mehran Hosseini, Sana Sabah Al-Azzawi, Marcus Liwicki, Ignacio Castro, Matthew Purver

Abstract: We study the influence of different activation functions in the output layer of deep neural network models for soft and hard label prediction in the learning with disagreement task. In this task, the goal is to quantify the amount of disagreement via predicting soft labels. To predict the soft labels, we use BERT-based preprocessors and encoders and vary the activation function used in the output… ▽ More We study the influence of different activation functions in the output layer of deep neural network models for soft and hard label prediction in the learning with disagreement task. In this task, the goal is to quantify the amount of disagreement via predicting soft labels. To predict the soft labels, we use BERT-based preprocessors and encoders and vary the activation function used in the output layer, while keeping other parameters constant. The soft labels are then used for the hard label prediction. The activation functions considered are sigmoid as well as a step-function that is added to the model post-training and a sinusoidal activation function, which is introduced for the first time in this paper. △ Less

Submitted 3 January, 2024; v1 submitted 4 March, 2023; originally announced March 2023.

Comments: Accepted in ACL 2023 SemEval Workshop as selected task paper

ACM Class: I.2.7

arXiv:2302.02956 [pdf, other]

RoboCup 2022 AdultSize Winner NimbRo: Upgraded Perception, Capture Steps Gait and Phase-based In-walk Kicks

Authors: Dmytro Pavlichenko, Grzegorz Ficht, Arash Amini, Mojtaba Hosseini, Raphael Memmesheimer, Angel Villar-Corrales, Stefan M. Schulz, Marcell Missura, Maren Bennewitz, Sven Behnke

Abstract: Beating the human world champions by 2050 is an ambitious goal of the Humanoid League that provides a strong incentive for RoboCup teams to further improve and develop their systems. In this paper, we present upgrades of our system which enabled our team NimbRo to win the Soccer Tournament, the Drop-in Games, and the Technical Challenges in the Humanoid AdultSize League of RoboCup 2022. Strong per… ▽ More Beating the human world champions by 2050 is an ambitious goal of the Humanoid League that provides a strong incentive for RoboCup teams to further improve and develop their systems. In this paper, we present upgrades of our system which enabled our team NimbRo to win the Soccer Tournament, the Drop-in Games, and the Technical Challenges in the Humanoid AdultSize League of RoboCup 2022. Strong performance in these competitions resulted in the Best Humanoid award in the Humanoid League. The mentioned upgrades include: hardware upgrade of the vision module, balanced walking with Capture Steps, and the introduction of phase-based in-walk kicks. △ Less

Submitted 7 February, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

Journal ref: In: RoboCup 2022: Robot World Cup XXV. LNCS 13561, Springer, May 2023

arXiv:2301.01286 [pdf, other]

Pseudo-Inverted Bottleneck Convolution for DARTS Search Space

Authors: Arash Ahmadian, Louis S. P. Liu, Yue Fei, Konstantinos N. Plataniotis, Mahdi S. Hosseini

Abstract: Differentiable Architecture Search (DARTS) has attracted considerable attention as a gradient-based neural architecture search method. Since the introduction of DARTS, there has been little work done on adapting the action space based on state-of-art architecture design principles for CNNs. In this work, we aim to address this gap by incrementally augmenting the DARTS search space with micro-desig… ▽ More Differentiable Architecture Search (DARTS) has attracted considerable attention as a gradient-based neural architecture search method. Since the introduction of DARTS, there has been little work done on adapting the action space based on state-of-art architecture design principles for CNNs. In this work, we aim to address this gap by incrementally augmenting the DARTS search space with micro-design changes inspired by ConvNeXt and studying the trade-off between accuracy, evaluation layer count, and computational cost. We introduce the Pseudo-Inverted Bottleneck Conv (PIBConv) block intending to reduce the computational footprint of the inverted bottleneck block proposed in ConvNeXt. Our proposed architecture is much less sensitive to evaluation layer count and outperforms a DARTS network with similar size significantly, at layer counts as small as 2. Furthermore, with less layers, not only does it achieve higher accuracy with lower computational footprint (measured in GMACs) and parameter count, GradCAM comparisons show that our network can better detect distinctive features of target objects compared to DARTS. Code is available from https://github.com/mahdihosseini/PIBConv. △ Less

Submitted 18 March, 2023; v1 submitted 31 December, 2022; originally announced January 2023.

Comments: 5 pages

arXiv:2301.01240 [pdf, other]

Modeling Effective Lifespan of Payment Channels

Authors: Soheil Zibakhsh Shabgahi, Seyed Mahdi Hosseini, Seyed Pooya Shariatpanahi, Behnam Bahrak

Abstract: While being decentralized, secure, and reliable, Bitcoin and many other blockchain-based cryptocurrencies suffer from scalability issues. One of the promising proposals to address this problem is off-chain payment channels. Since, not all nodes are connected directly to each other, they can use a payment network to route their payments. Each node allocates a balance that is frozen during the chann… ▽ More While being decentralized, secure, and reliable, Bitcoin and many other blockchain-based cryptocurrencies suffer from scalability issues. One of the promising proposals to address this problem is off-chain payment channels. Since, not all nodes are connected directly to each other, they can use a payment network to route their payments. Each node allocates a balance that is frozen during the channel's lifespan. Spending and receiving transactions will shift the balance to one side of the channel. A channel becomes unbalanced when there is not sufficient balance in one direction. In this case, we say the effective lifespan of the channel has ended. In this paper, we develop a mathematical model to predict the expected effective lifespan of a channel based on the network's topology. We investigate the impact of channel unbalancing on the payment network and individual channels. We also discuss the effect of certain characteristics of payment channels on their lifespan. Our case study on a snapshot of the Lightning Network shows how the effective lifespan is distributed, and how it is correlated with other network characteristics. Our results show that central unbalanced channels have a drastic effect on the network performance. △ Less

Submitted 11 September, 2022; originally announced January 2023.

arXiv:2212.10933 [pdf, other]

Resolving Indirect Referring Expressions for Entity Selection

Authors: Mohammad Javad Hosseini, Filip Radlinski, Silvia Pareti, Annie Louis

Abstract: Recent advances in language modeling have enabled new conversational systems. In particular, it is often desirable for people to make choices among specified options when using such systems. We address this problem of reference resolution, when people use natural expressions to choose between the entities. For example, given the choice `Should we make a Simnel cake or a Pandan cake?' a natural res… ▽ More Recent advances in language modeling have enabled new conversational systems. In particular, it is often desirable for people to make choices among specified options when using such systems. We address this problem of reference resolution, when people use natural expressions to choose between the entities. For example, given the choice `Should we make a Simnel cake or a Pandan cake?' a natural response from a dialog participant may be indirect: `let's make the green one'. Such natural expressions have been little studied for reference resolution. We argue that robustly understanding such language has large potential for improving naturalness in dialog, recommendation, and search systems. We create AltEntities (Alternative Entities), a new public dataset of 42K entity pairs and expressions (referring to one entity in the pair), and develop models for the disambiguation problem. Consisting of indirect referring expressions across three domains, our corpus enables for the first time the study of how language models can be adapted to this task. We find they achieve 82%-87% accuracy in realistic settings, which while reasonable also invites further advances. △ Less

Submitted 26 May, 2023; v1 submitted 21 December, 2022; originally announced December 2022.

arXiv:2211.11390 [pdf, other]

State Estimation for Hybrid Locomotion of Driving-Stepping Quadrupeds

Authors: Mojtaba Hosseini, Diego Rodriguez, Sven Behnke

Abstract: Fast and versatile locomotion can be achieved with wheeled quadruped robots that drive quickly on flat terrain, but are also able to overcome challenging terrain by adapting their body pose and by making steps. In this paper, we present a state estimation approach for four-legged robots with non-steerable wheels that enables hybrid driving-stepping locomotion capabilities. We formulate a Kalman Fi… ▽ More Fast and versatile locomotion can be achieved with wheeled quadruped robots that drive quickly on flat terrain, but are also able to overcome challenging terrain by adapting their body pose and by making steps. In this paper, we present a state estimation approach for four-legged robots with non-steerable wheels that enables hybrid driving-stepping locomotion capabilities. We formulate a Kalman Filter (KF) for state estimation that integrates driven wheels into the filter equations and estimates the robot state (position and velocity) as well as the contribution of driving with wheels to the above state. Our estimation approach allows us to use the control framework of the Mini Cheetah quadruped robot with minor modifications. We tested our approach on this robot that we augmented with actively driven wheels in simulation and in the real world. The experimental results are available at https://www.ais.uni-bonn.de/%7Ehosseini/se-dsq . △ Less

Submitted 21 November, 2022; originally announced November 2022.

Comments: Accepted final version. IEEE International Robotic Computing (IRC), Naples, Italy, December 2022

arXiv:2211.07105 [pdf, other]

Bayesian Reconstruction and Differential Testing of Excised mRNA

Authors: Marjan Hosseini, Devin McConnell, Derek Aguiar

Abstract: Characterizing the differential excision of mRNA is critical for understanding the functional complexity of a cell or tissue, from normal developmental processes to disease pathogenesis. Most transcript reconstruction methods infer full-length transcripts from high-throughput sequencing data. However, this is a challenging task due to incomplete annotations and the differential expression of trans… ▽ More Characterizing the differential excision of mRNA is critical for understanding the functional complexity of a cell or tissue, from normal developmental processes to disease pathogenesis. Most transcript reconstruction methods infer full-length transcripts from high-throughput sequencing data. However, this is a challenging task due to incomplete annotations and the differential expression of transcripts across cell-types, tissues, and experimental conditions. Several recent methods circumvent these difficulties by considering local splicing events, but these methods lose transcript-level splicing information and may conflate transcripts. We develop the first probabilistic model that reconciles the transcript and local splicing perspectives. First, we formalize the sequence of mRNA excisions (SME) reconstruction problem, which aims to assemble variable-length sequences of mRNA excisions from RNA-sequencing data. We then present a novel hierarchical Bayesian admixture model for the Reconstruction of Excised mRNA (BREM). BREM interpolates between local splicing events and full-length transcripts and thus focuses only on SMEs that have high posterior probability. We develop posterior inference algorithms based on Gibbs sampling and local search of independent sets and characterize differential SME usage using generalized linear models based on converged BREM model parameters. We show that BREM achieves higher F1 score for reconstruction tasks and improved accuracy and sensitivity in differential splicing when compared with four state-of-the-art transcript and local splicing methods on simulated data. Lastly, we evaluate BREM on both bulk and scRNA sequencing data based on transcript reconstruction, novelty of transcripts produced, model sensitivity to hyperparameters, and a functional analysis of differentially expressed SMEs, demonstrating that BREM captures relevant biological signal. △ Less

Submitted 13 November, 2022; originally announced November 2022.

arXiv:2210.04695 [pdf, other]

Language Models Are Poor Learners of Directional Inference

Authors: Tianyi Li, Mohammad Javad Hosseini, Sabine Weber, Mark Steedman

Abstract: We examine LMs' competence of directional predicate entailments by supervised fine-tuning with prompts. Our analysis shows that contrary to their apparent success on standard NLI, LMs show limited ability to learn such directional inference; moreover, existing datasets fail to test directionality, and/or are infested by artefacts that can be learnt as proxy for entailments, yielding over-optimisti… ▽ More We examine LMs' competence of directional predicate entailments by supervised fine-tuning with prompts. Our analysis shows that contrary to their apparent success on standard NLI, LMs show limited ability to learn such directional inference; moreover, existing datasets fail to test directionality, and/or are infested by artefacts that can be learnt as proxy for entailments, yielding over-optimistic results. In response, we present BoOQA (Boolean Open QA), a robust multi-lingual evaluation benchmark for directional predicate entailments, extrinsic to existing training sets. On BoOQA, we establish baselines and show evidence of existing LM-prompting models being incompetent directional entailment learners, in contrast to entailment graphs, however limited by sparsity. △ Less

Submitted 14 October, 2022; v1 submitted 10 October, 2022; originally announced October 2022.

Comments: Findings of EMNLP 2022

arXiv:2210.02350 [pdf, other]

Crowdsourcing and Sidewalk Data: A Preliminary Study on the Trustworthiness of OpenStreetMap Data in the US

Authors: Kazi Shahrukh Omar, Gustavo Moreira, Daniel Hodczak, Maryam Hosseini, Fabio Miranda

Abstract: Sidewalks play a pivotal role in urban mobility of everyday life. Ideally, sidewalks provide a safe walkway for pedestrians, link public transportation facilities, and equip people with routing and navigation services. However, there is a scarcity of open sidewalk data, which not only impacts the accessibility and walkability of cities but also limits policymakers in generating insightful measures… ▽ More Sidewalks play a pivotal role in urban mobility of everyday life. Ideally, sidewalks provide a safe walkway for pedestrians, link public transportation facilities, and equip people with routing and navigation services. However, there is a scarcity of open sidewalk data, which not only impacts the accessibility and walkability of cities but also limits policymakers in generating insightful measures to improve the current state of pedestrian facilities. As one of the most famous crowdsourced data repositories, OpenStreetMap (OSM) could aid the lack of open sidewalk data to a large extent. However, completeness and quality of OSM data have long been a major issue. In this paper, we offer a preliminary study on the availability and trustworthiness of OSM sidewalk data. First, we compare OSM sidewalk data coverage in over 50 major cities in the United States. Then, we select three major cities (Seattle, Chicago, and New York City) to further analyze the completeness of sidewalk data and its features, and to compute a trustworthiness index leveraging historical OSM sidewalk data. △ Less

Submitted 5 October, 2022; originally announced October 2022.

Comments: ASSETS 2022 UrbanAccess Workshop

arXiv:2209.13542 [pdf, other]

EmpathicSchool: A multimodal dataset for real-time facial expressions and physiological data analysis under different stress conditions

Authors: Majid Hosseini, Fahad Sohrab, Raju Gottumukkala, Ravi Teja Bhupatiraju, Satya Katragadda, Jenni Raitoharju, Alexandros Iosifidis, Moncef Gabbouj

Abstract: Affective computing has garnered researchers' attention and interest in recent years as there is a need for AI systems to better understand and react to human emotions. However, analyzing human emotions, such as mood or stress, is quite complex. While various stress studies use facial expressions and wearables, most existing datasets rely on processing data from a single modality. This paper prese… ▽ More Affective computing has garnered researchers' attention and interest in recent years as there is a need for AI systems to better understand and react to human emotions. However, analyzing human emotions, such as mood or stress, is quite complex. While various stress studies use facial expressions and wearables, most existing datasets rely on processing data from a single modality. This paper presents EmpathicSchool, a novel dataset that captures facial expressions and the associated physiological signals, such as heart rate, electrodermal activity, and skin temperature, under different stress levels. The data was collected from 20 participants at different sessions for 26 hours. The data includes nine different signal types, including both computer vision and physiological features that can be used to detect stress. In addition, various experiments were conducted to validate the signal quality. △ Less

Submitted 29 August, 2022; originally announced September 2022.

arXiv:2209.10914 [pdf, other]

Morpheus: Extending the Last Level Cache Capacity in GPU Systems Using Idle GPU Core Resources

Authors: Sina Darabi, Mohammad Sadrosadati, Joël Lindegger, Negar Akbarzadeh, Mohammad Hosseini, Jisung Park, Juan Gómez-Luna, Hamid Sarbazi-Azad, Onur Mutlu

Abstract: Graphics Processing Units (GPUs) are widely-used accelerators for data-parallel applications. In many GPU applications, GPU memory bandwidth bottlenecks performance, causing underutilization of GPU cores. Hence, disabling many cores does not affect the performance of memory-bound workloads. While simply power-gating unused GPU cores would save energy, prior works attempt to better utilize GPU core… ▽ More Graphics Processing Units (GPUs) are widely-used accelerators for data-parallel applications. In many GPU applications, GPU memory bandwidth bottlenecks performance, causing underutilization of GPU cores. Hence, disabling many cores does not affect the performance of memory-bound workloads. While simply power-gating unused GPU cores would save energy, prior works attempt to better utilize GPU cores for other applications (ideally compute-bound), which increases the GPU's total throughput. In this paper, we introduce Morpheus, a new hardware/software co-designed technique to boost the performance of memory-bound applications. The key idea of Morpheus is to exploit unused core resources to extend the GPU last level cache (LLC) capacity. In Morpheus, each GPU core has two execution modes: compute mode and cache mode. Cores in compute mode operate conventionally and run application threads. However, for the cores in cache mode, Morpheus invokes a software helper kernel that uses the cores' on-chip memories (i.e., register file, shared memory, and L1) in a way that extends the LLC capacity for a running memory-bound workload. Morpheus adds a controller to the GPU hardware to forward LLC requests to either the conventional LLC (managed by hardware) or the extended LLC (managed by the helper kernel). Our experimental results show that Morpheus improves the performance and energy efficiency of a baseline GPU architecture by an average of 39% and 58%, respectively, across several memory-bound workloads. Morpheus' performance is within 3% of a GPU design that has a quadruple-sized conventional LLC. Morpheus can thus contribute to reducing the hardware dedicated to a conventional LLC by exploiting idle cores' on-chip memory resources as additional cache capacity. △ Less

Submitted 6 April, 2023; v1 submitted 22 September, 2022; originally announced September 2022.

Comments: To appear in 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022

arXiv:2209.01536 [pdf, other]

Interpretable Fake News Detection with Topic and Deep Variational Models

Authors: Marjan Hosseini, Alireza Javadian Sabet, Suining He, Derek Aguiar

Abstract: The growing societal dependence on social media and user generated content for news and information has increased the influence of unreliable sources and fake content, which muddles public discourse and lessens trust in the media. Validating the credibility of such information is a difficult task that is susceptible to confirmation bias, leading to the development of algorithmic techniques to dist… ▽ More The growing societal dependence on social media and user generated content for news and information has increased the influence of unreliable sources and fake content, which muddles public discourse and lessens trust in the media. Validating the credibility of such information is a difficult task that is susceptible to confirmation bias, leading to the development of algorithmic techniques to distinguish between fake and real news. However, most existing methods are challenging to interpret, making it difficult to establish trust in predictions, and make assumptions that are unrealistic in many real-world scenarios, e.g., the availability of audiovisual features or provenance. In this work, we focus on fake news detection of textual content using interpretable features and methods. In particular, we have developed a deep probabilistic model that integrates a dense representation of textual news using a variational autoencoder and bi-directional Long Short-Term Memory (LSTM) networks with semantic topic-related features inferred from a Bayesian admixture model. Extensive experimental studies with 3 real-world datasets demonstrate that our model achieves comparable performance to state-of-the-art competing models while facilitating model interpretability from the learned topics. Finally, we have conducted model ablation studies to justify the effectiveness and accuracy of integrating neural embeddings and topic features both quantitatively by evaluating performance and qualitatively through separability in lower dimensional embeddings. △ Less

Submitted 4 September, 2022; originally announced September 2022.

arXiv:2209.01421 [pdf, other]

Deep Live Video Ad Placement on the 5G Edge

Authors: Mohammad Hosseini

Abstract: The video broadcasting industry has been growing significantly in the recent years, specially on delivering personalized contents to the end users. While video broadcasting has continued to grow beyond TV, video adverting has become a key marketing tool to deliver targeted messages directly to the audience. However, unfortunately for broadband TV, a key problem is that the TV commercials target th… ▽ More The video broadcasting industry has been growing significantly in the recent years, specially on delivering personalized contents to the end users. While video broadcasting has continued to grow beyond TV, video adverting has become a key marketing tool to deliver targeted messages directly to the audience. However, unfortunately for broadband TV, a key problem is that the TV commercials target the broad audience, therefore lacking user-specific and personalized ad contents. In this paper, we propose a deep edge-cloud ad-placement system, and briefly describe our methodologies and the architecture of our designed ad placement system for delivering both the Video on Demand (VoD) and live broadcast TV contents over MMT streaming protocol. The aim of our paper is to showcase how to enable targeted, personalized, and user-specific advertising services deployed on the future 5G MEC platforms, which in turn can have high potentials to increase ad revenues for the mobile operator industry. △ Less

Submitted 3 September, 2022; originally announced September 2022.

Comments: ACM Multimedia Systems 2018, Demo track, June 2018, Amsterdam, Netherlands, 5 pages

arXiv:2208.10919 [pdf, other]

Cluster Based Secure Multi-Party Computation in Federated Learning for Histopathology Images

Authors: S. Maryam Hosseini, Milad Sikaroudi, Morteza Babaei, H. R. Tizhoosh

Abstract: Federated learning (FL) is a decentralized method enabling hospitals to collaboratively learn a model without sharing private patient data for training. In FL, participant hospitals periodically exchange training results rather than training samples with a central server. However, having access to model parameters or gradients can expose private training data samples. To address this challenge, we… ▽ More Federated learning (FL) is a decentralized method enabling hospitals to collaboratively learn a model without sharing private patient data for training. In FL, participant hospitals periodically exchange training results rather than training samples with a central server. However, having access to model parameters or gradients can expose private training data samples. To address this challenge, we adopt secure multiparty computation (SMC) to establish a privacy-preserving federated learning framework. In our proposed method, the hospitals are divided into clusters. After local training, each hospital splits its model weights among other hospitals in the same cluster such that no single hospital can retrieve other hospitals' weights on its own. Then, all hospitals sum up the received weights, sending the results to the central server. Finally, the central server aggregates the results, retrieving the average of models' weights and updating the model without having access to individual hospitals' weights. We conduct experiments on a publicly available repository, The Cancer Genome Atlas (TCGA). We compare the performance of the proposed framework with differential privacy and federated averaging as the baseline. The results reveal that compared to differential privacy, our framework can achieve higher accuracy with no privacy leakage risk at a cost of higher communication overhead. △ Less

Submitted 21 August, 2022; originally announced August 2022.

Comments: Accepted at MICCAI 2022 Workshop on Distributed, Collaborative and Federated Learning

arXiv:2206.13677 [pdf, other]

Towards Global-Scale Crowd+AI Techniques to Map and Assess Sidewalks for People with Disabilities

Authors: Maryam Hosseini, Mikey Saugstad, Fabio Miranda, Andres Sevtsuk, Claudio T. Silva, Jon E. Froehlich

Abstract: There is a lack of data on the location, condition, and accessibility of sidewalks across the world, which not only impacts where and how people travel but also fundamentally limits interactive mapping tools and urban analytics. In this paper, we describe initial work in semi-automatically building a sidewalk network topology from satellite imagery using hierarchical multi-scale attention models,… ▽ More There is a lack of data on the location, condition, and accessibility of sidewalks across the world, which not only impacts where and how people travel but also fundamentally limits interactive mapping tools and urban analytics. In this paper, we describe initial work in semi-automatically building a sidewalk network topology from satellite imagery using hierarchical multi-scale attention models, inferring surface materials from street-level images using active learning-based semantic segmentation, and assessing sidewalk condition and accessibility features using Crowd+AI. We close with a call to create a database of labeled satellite and streetscape scenes for sidewalks and sidewalk accessibility issues along with standardized benchmarks. △ Less

Submitted 18 August, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

Comments: CVPR 2022 AVA (Accessibility, Vision, and Autonomy Meet) Workshop

arXiv:2205.13064 [pdf, other]

doi 10.1111/cgf.14534

Urban Rhapsody: Large-scale exploration of urban soundscapes

Authors: Joao Rulff, Fabio Miranda, Maryam Hosseini, Marcos Lage, Mark Cartwright, Graham Dove, Juan Bello, Claudio T. Silva

Abstract: Noise is one of the primary quality-of-life issues in urban environments. In addition to annoyance, noise negatively impacts public health and educational performance. While low-cost sensors can be deployed to monitor ambient noise levels at high temporal resolutions, the amount of data they produce and the complexity of these data pose significant analytical challenges. One way to address these c… ▽ More Noise is one of the primary quality-of-life issues in urban environments. In addition to annoyance, noise negatively impacts public health and educational performance. While low-cost sensors can be deployed to monitor ambient noise levels at high temporal resolutions, the amount of data they produce and the complexity of these data pose significant analytical challenges. One way to address these challenges is through machine listening techniques, which are used to extract features in attempts to classify the source of noise and understand temporal patterns of a city's noise situation. However, the overwhelming number of noise sources in the urban environment and the scarcity of labeled data makes it nearly impossible to create classification models with large enough vocabularies that capture the true dynamism of urban soundscapes In this paper, we first identify a set of requirements in the yet unexplored domain of urban soundscape exploration. To satisfy the requirements and tackle the identified challenges, we propose Urban Rhapsody, a framework that combines state-of-the-art audio representation, machine learning, and visual analytics to allow users to interactively create classification models, understand noise patterns of a city, and quickly retrieve and label audio excerpts in order to create a large high-precision annotated database of urban sound recordings. We demonstrate the tool's utility through case studies performed by domain experts using data generated over the five-year deployment of a one-of-a-kind sensor network in New York City. △ Less

Submitted 25 May, 2022; originally announced May 2022.

Comments: Accepted at EuroVis 2022. Source code available at: https://github.com/VIDA-NYU/Urban-Rhapsody

arXiv:2203.16723 [pdf, other]

Exploiting Explainable Metrics for Augmented SGD

Authors: Mahdi S. Hosseini, Mathieu Tuli, Konstantinos N. Plataniotis

Abstract: Explaining the generalization characteristics of deep learning is an emerging topic in advanced machine learning. There are several unanswered questions about how learning under stochastic optimization really works and why certain strategies are better than others. In this paper, we address the following question: \textit{can we probe intermediate layers of a deep neural network to identify and qu… ▽ More Explaining the generalization characteristics of deep learning is an emerging topic in advanced machine learning. There are several unanswered questions about how learning under stochastic optimization really works and why certain strategies are better than others. In this paper, we address the following question: \textit{can we probe intermediate layers of a deep neural network to identify and quantify the learning quality of each layer?} With this question in mind, we propose new explainability metrics that measure the redundant information in a network's layers using a low-rank factorization framework and quantify a complexity measure that is highly correlated with the generalization performance of a given optimizer, network, and dataset. We subsequently exploit these metrics to augment the Stochastic Gradient Descent (SGD) optimizer by adaptively adjusting the learning rate in each layer to improve in generalization performance. Our augmented SGD -- dubbed RMSGD -- introduces minimal computational overhead compared to SOTA methods and outperforms them by exhibiting strong generalization characteristics across application, architecture, and dataset. △ Less

Submitted 30 March, 2022; originally announced March 2022.

Comments: Accepted in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR2022)

arXiv:2203.12552 [pdf, other]

doi 10.1002/aelm.202100724

Organic log-domain integrator synapse

Authors: Mohammad Javad Mirshojaeian Hosseini, Elisa Donati, Giacomo Indiveri, Robert A. Nawrocki

Abstract: Synapses play a critical role in memory, learning, and cognition. Their main functions include converting pre-synaptic voltage spikes to post-synaptic currents, as well as scaling the input signal. Several brain-inspired architectures have been proposed to emulate the behavior of biological synapses. While these are useful to explore the properties of nervous systems, the challenge of making bioco… ▽ More Synapses play a critical role in memory, learning, and cognition. Their main functions include converting pre-synaptic voltage spikes to post-synaptic currents, as well as scaling the input signal. Several brain-inspired architectures have been proposed to emulate the behavior of biological synapses. While these are useful to explore the properties of nervous systems, the challenge of making biocompatible and flexible circuits with biologically plausible time constants and tunable gain remains. Here, a physically flexible organic log-domain integrator synaptic circuit is shown to address this challenge. In particular, the circuit is fabricated using organic-based materials that are electrically active, offer flexibility and biocompatibility, as well as time constants (critical in learning neural codes and encoding spatiotemporal patterns) that are biologically plausible. Using a 10 nF synaptic capacitor, the time constant reached 126 ms and 221 ms before and during bending, respectively. The flexible synaptic circuit is characterized before and during bending, followed by studies on the effects of weighting voltage, synaptic capacitance, and disparity in pre-synaptic signals on the time constant. △ Less

Submitted 23 March, 2022; originally announced March 2022.

Comments: Accepted by Advanced Electronic Materials (18 pages, 17 figures)

arXiv:2203.06264 [pdf, other]

Cross-lingual Inference with A Chinese Entailment Graph

Authors: Tianyi Li, Sabine Weber, Mohammad Javad Hosseini, Liane Guillou, Mark Steedman

Abstract: Predicate entailment detection is a crucial task for question-answering from text, where previous work has explored unsupervised learning of entailment graphs from typed open relation triples. In this paper, we present the first pipeline for building Chinese entailment graphs, which involves a novel high-recall open relation extraction (ORE) method and the first Chinese fine-grained entity typing… ▽ More Predicate entailment detection is a crucial task for question-answering from text, where previous work has explored unsupervised learning of entailment graphs from typed open relation triples. In this paper, we present the first pipeline for building Chinese entailment graphs, which involves a novel high-recall open relation extraction (ORE) method and the first Chinese fine-grained entity typing dataset under the FIGER type ontology. Through experiments on the Levy-Holt dataset, we verify the strength of our Chinese entailment graph, and reveal the cross-lingual complementarity: on the parallel Levy-Holt dataset, an ensemble of Chinese and English entailment graphs outperforms both monolingual graphs, and raises unsupervised SOTA by 4.7 AUC points. △ Less

Submitted 11 March, 2022; originally announced March 2022.

Comments: Accepted to Findings of ACL 2022

arXiv:2201.11246 [pdf, other]

HistoKT: Cross Knowledge Transfer in Computational Pathology

Authors: Ryan Zhang, Jiadai Zhu, Stephen Yang, Mahdi S. Hosseini, Angelo Genovese, Lina Chen, Corwyn Rowsell, Savvas Damaskinos, Sonal Varma, Konstantinos N. Plataniotis

Abstract: The lack of well-annotated datasets in computational pathology (CPath) obstructs the application of deep learning techniques for classifying medical images. %Since pathologist time is expensive, dataset curation is intrinsically difficult. Many CPath workflows involve transferring learned knowledge between various image domains through transfer learning. Currently, most transfer learning research… ▽ More The lack of well-annotated datasets in computational pathology (CPath) obstructs the application of deep learning techniques for classifying medical images. %Since pathologist time is expensive, dataset curation is intrinsically difficult. Many CPath workflows involve transferring learned knowledge between various image domains through transfer learning. Currently, most transfer learning research follows a model-centric approach, tuning network parameters to improve transfer results over few datasets. In this paper, we take a data-centric approach to the transfer learning problem and examine the existence of generalizable knowledge between histopathological datasets. First, we create a standardization workflow for aggregating existing histopathological data. We then measure inter-domain knowledge by training ResNet18 models across multiple histopathological datasets, and cross-transferring between them to determine the quantity and quality of innate shared knowledge. Additionally, we use weight distillation to share knowledge between models without additional training. We find that hard to learn, multi-class datasets benefit most from pretraining, and a two stage learning framework incorporating a large source domain such as ImageNet allows for better utilization of smaller datasets. Furthermore, we find that weight distillation enables models trained on purely histopathological features to outperform models using external natural image data. △ Less

Submitted 26 January, 2022; originally announced January 2022.

Comments: Accepted in ICASSP2022

arXiv:2201.02260 [pdf, other]

doi 10.1016/j.scs.2021.103630

CitySurfaces: City-Scale Semantic Segmentation of Sidewalk Materials

Authors: Maryam Hosseini, Fabio Miranda, Jianzhe Lin, Claudio Silva

Abstract: While designing sustainable and resilient urban built environment is increasingly promoted around the world, significant data gaps have made research on pressing sustainability issues challenging to carry out. Pavements are known to have strong economic and environmental impacts; however, most cities lack a spatial catalog of their surfaces due to the cost-prohibitive and time-consuming nature of… ▽ More While designing sustainable and resilient urban built environment is increasingly promoted around the world, significant data gaps have made research on pressing sustainability issues challenging to carry out. Pavements are known to have strong economic and environmental impacts; however, most cities lack a spatial catalog of their surfaces due to the cost-prohibitive and time-consuming nature of data collection. Recent advancements in computer vision, together with the availability of street-level images, provide new opportunities for cities to extract large-scale built environment data with lower implementation costs and higher accuracy. In this paper, we propose CitySurfaces, an active learning-based framework that leverages computer vision techniques for classifying sidewalk materials using widely available street-level images. We trained the framework on images from New York City and Boston and the evaluation results show a 90.5% mIoU score. Furthermore, we evaluated the framework using images from six different cities, demonstrating that it can be applied to regions with distinct urban fabrics, even outside the domain of the training data. CitySurfaces can provide researchers and city agencies with a low-cost, accurate, and extensible method to collect sidewalk material data which plays a critical role in addressing major sustainability issues, including climate change and surface water management. △ Less

Submitted 6 January, 2022; originally announced January 2022.

Comments: Sustainable Cities and Society journal (accepted); Model: https://github.com/VIDA-NYU/city-surfaces

arXiv:2112.06122 [pdf, other]

A Visual Analytics System for Profiling Urban Land Use Evolution

Authors: Claudio Santos, Maryam Hosseini, João Rulff, Nivan Ferreira, Luc Wilson, Fabio Miranda, Claudio Silva, Marcos Lage

Abstract: The growth of cities calls for regulations on how urban space is used and zoning resolutions define how and for what purpose each piece of land is going to be used. Tracking land use and zoning evolution can reveal a wealth of information about urban development. For that matter, cities have been releasing data sets describing the historical evolution of both the shape and the attributes of land u… ▽ More The growth of cities calls for regulations on how urban space is used and zoning resolutions define how and for what purpose each piece of land is going to be used. Tracking land use and zoning evolution can reveal a wealth of information about urban development. For that matter, cities have been releasing data sets describing the historical evolution of both the shape and the attributes of land units. The complex nature of zoning code and land-use data, however, makes the analysis of such data quite challenging and often time-consuming. We address these challenges by introducing Urban Chronicles, an open-source web-based visual analytics system that enables interactive exploration of changes in land use patterns. Using New York City's Primary Land Use Tax Lot Output (PLUTO) as an example, we show the capabilities of the system by exploring the data over several years at different scales. Urban Chronicles supports on-the-fly aggregation and filtering operations by using a tree-based data structure that leverages the hierarchical nature of the data set to index the shape and attributes of geographical regions that change over time. We demonstrate the utility of our system through a set of case studies that analyze the impact of Hurricane Sandy on land use attributes, as well as the effects of proposed rezoning plans in Downtown Brooklyn. △ Less

Submitted 11 December, 2021; originally announced December 2021.

Comments: The open-source system can be found at https://github.com/Prograf-UFF/urban-chronicles

arXiv:2112.06120 [pdf, other]

Sidewalk Measurements from Satellite Images: Preliminary Findings

Authors: Maryam Hosseini, Iago B. Araujo, Hamed Yazdanpanah, Eric K. Tokuda, Fabio Miranda, Claudio T. Silva, Roberto M. Cesar Jr

Abstract: Large-scale analysis of pedestrian infrastructures, particularly sidewalks, is critical to human-centric urban planning and design. Benefiting from the rich data set of planimetric features and high-resolution orthoimages provided through the New York City Open Data portal, we train a computer vision model to detect sidewalks, roads, and buildings from remote-sensing imagery and achieve 83% mIoU o… ▽ More Large-scale analysis of pedestrian infrastructures, particularly sidewalks, is critical to human-centric urban planning and design. Benefiting from the rich data set of planimetric features and high-resolution orthoimages provided through the New York City Open Data portal, we train a computer vision model to detect sidewalks, roads, and buildings from remote-sensing imagery and achieve 83% mIoU over held-out test set. We apply shape analysis techniques to study different attributes of the extracted sidewalks. More specifically, we do a tile-wise analysis of the width, angle, and curvature of sidewalks, which aside from their general impacts on walkability and accessibility of urban areas, are known to have significant roles in the mobility of wheelchair users. The preliminary results are promising, glimpsing the potential of the proposed approach to be adopted in different cities, enabling researchers and practitioners to have a more vivid picture of the pedestrian realm. △ Less

Submitted 11 December, 2021; originally announced December 2021.

Journal ref: Spatial Data Science Symposium 2021

Showing 1–50 of 126 results for author: Hosseini, M