-
Characterizing Encrypted Application Traffic through Cellular Radio Interface Protocol
Authors:
Md Ruman Islam,
Raja Hasnain Anwar,
Spyridon Mastorakis,
Muhammad Taqi Raza
Abstract:
Modern applications are end-to-end encrypted to prevent data from being read or secretly modified. 5G tech nology provides ubiquitous access to these applications without compromising the application-specific performance and latency goals. In this paper, we empirically demonstrate that 5G radio communication becomes the side channel to precisely infer the user's applications in real-time. The key…
▽ More
Modern applications are end-to-end encrypted to prevent data from being read or secretly modified. 5G tech nology provides ubiquitous access to these applications without compromising the application-specific performance and latency goals. In this paper, we empirically demonstrate that 5G radio communication becomes the side channel to precisely infer the user's applications in real-time. The key idea lies in observing the 5G physical and MAC layer interactions over time that reveal the application's behavior. The MAC layer receives the data from the application and requests the network to assign the radio resource blocks. The network assigns the radio resources as per application requirements, such as priority, Quality of Service (QoS) needs, amount of data to be transmitted, and buffer size. The adversary can passively observe the radio resources to fingerprint the applications. We empirically demonstrate this attack by considering four different categories of applications: online shopping, voice/video conferencing, video streaming, and Over-The-Top (OTT) media platforms. Finally, we have also demonstrated that an attacker can differentiate various types of applications in real-time within each category.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Deep UAV Path Planning with Assured Connectivity in Dense Urban Setting
Authors:
Jiyong Oh,
Syed M. Raza,
Lusungu J. Mwasinga,
Moonseong Kim,
Hyunseung Choo
Abstract:
Unmanned Ariel Vehicle (UAV) services with 5G connectivity is an emerging field with numerous applications. Operator-controlled UAV flights and manual static flight configurations are major limitations for the wide adoption of scalability of UAV services. Several services depend on excellent UAV connectivity with a cellular network and maintaining it is challenging in predetermined flight paths. T…
▽ More
Unmanned Ariel Vehicle (UAV) services with 5G connectivity is an emerging field with numerous applications. Operator-controlled UAV flights and manual static flight configurations are major limitations for the wide adoption of scalability of UAV services. Several services depend on excellent UAV connectivity with a cellular network and maintaining it is challenging in predetermined flight paths. This paper addresses these limitations by proposing a Deep Reinforcement Learning (DRL) framework for UAV path planning with assured connectivity (DUPAC). During UAV flight, DUPAC determines the best route from a defined source to the destination in terms of distance and signal quality. The viability and performance of DUPAC are evaluated under simulated real-world urban scenarios using the Unity framework. The results confirm that DUPAC achieves an autonomous UAV flight path similar to base method with only 2% increment while maintaining an average 9% better connection quality throughout the flight.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Unobtrusive Monitoring of Physical Weakness: A Simulated Approach
Authors:
Chen Long-fei,
Muhammad Ahmed Raza,
Craig Innes,
Subramanian Ramamoorthy,
Robert B. Fisher
Abstract:
Aging and chronic conditions affect older adults' daily lives, making early detection of developing health issues crucial. Weakness, common in many conditions, alters physical movements and daily activities subtly. However, detecting such changes can be challenging due to their subtle and gradual nature. To address this, we employ a non-intrusive camera sensor to monitor individuals' daily sitting…
▽ More
Aging and chronic conditions affect older adults' daily lives, making early detection of developing health issues crucial. Weakness, common in many conditions, alters physical movements and daily activities subtly. However, detecting such changes can be challenging due to their subtle and gradual nature. To address this, we employ a non-intrusive camera sensor to monitor individuals' daily sitting and relaxing activities for signs of weakness. We simulate weakness in healthy subjects by having them perform physical exercise and observing the behavioral changes in their daily activities before and after workouts. The proposed system captures fine-grained features related to body motion, inactivity, and environmental context in real-time while prioritizing privacy. A Bayesian Network is used to model the relationships between features, activities, and health conditions. We aim to identify specific features and activities that indicate such changes and determine the most suitable time scale for observing the change. Results show 0.97 accuracy in distinguishing simulated weakness at the daily level. Fine-grained behavioral features, including non-dominant upper body motion speed and scale, and inactivity distribution, along with a 300-second window, are found most effective. However, individual-specific models are recommended as no universal set of optimal features and activities was identified across all participants.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Regional Correlation Aided Mobile Traffic Prediction with Spatiotemporal Deep Learning
Authors:
JeongJun Park,
Lusungu J. Mwasinga,
Huigyu Yang,
Syed M. Raza,
Duc-Tai Le,
Moonseong Kim,
Min Young Chung,
Hyunseung Choo
Abstract:
Mobile traffic data in urban regions shows differentiated patterns during different hours of the day. The exploitation of these patterns enables highly accurate mobile traffic prediction for proactive network management. However, recent Deep Learning (DL) driven studies have only exploited spatiotemporal features and have ignored the geographical correlations, causing high complexity and erroneous…
▽ More
Mobile traffic data in urban regions shows differentiated patterns during different hours of the day. The exploitation of these patterns enables highly accurate mobile traffic prediction for proactive network management. However, recent Deep Learning (DL) driven studies have only exploited spatiotemporal features and have ignored the geographical correlations, causing high complexity and erroneous mobile traffic predictions. This paper addresses these limitations by proposing an enhanced mobile traffic prediction scheme that combines the clustering strategy of daily mobile traffic peak time and novel multi Temporal Convolutional Network with a Long Short Term Memory (multi TCN-LSTM) model. The mobile network cells that exhibit peak traffic during the same hour of the day are clustered together. Our experiments on large-scale real-world mobile traffic data show up to 28% performance improvement compared to state-of-the-art studies, which confirms the efficacy and viability of the proposed approach.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Emotion-Oriented Behavior Model Using Deep Learning
Authors:
Muhammad Arslan Raza,
Muhammad Shoaib Farooq,
Adel Khelifi,
Atif Alvi
Abstract:
Emotions, as a fundamental ingredient of any social interaction, lead to behaviors that represent the effectiveness of the interaction through facial expressions and gestures in humans. Hence an agent must possess the social and cognitive abilities to understand human social parameters and behave accordingly. However, no such emotion-oriented behavior model is presented yet in the existing researc…
▽ More
Emotions, as a fundamental ingredient of any social interaction, lead to behaviors that represent the effectiveness of the interaction through facial expressions and gestures in humans. Hence an agent must possess the social and cognitive abilities to understand human social parameters and behave accordingly. However, no such emotion-oriented behavior model is presented yet in the existing research. The emotion prediction may generate appropriate agents' behaviors for effective interaction using conversation modality. Considering the importance of emotions, and behaviors, for an agent's social interaction, an Emotion-based Behavior model is presented in this paper for Socio-cognitive artificial agents. The proposed model is implemented using tweets data trained on multiple models like Long Short-Term Memory (LSTM), Convolution Neural Network (CNN) and Bidirectional Encoder Representations from Transformers (BERT) for emotion prediction with an average accuracy of 92%, and 55% respectively. Further, using emotion predictions from CNN-LSTM, the behavior module responds using facial expressions and gestures using Behavioral Markup Language (BML). The accuracy of emotion-based behavior predictions is statistically validated using the 2-tailed Pearson correlation on the data collected from human users through questionnaires. Analysis shows that all emotion-based behaviors accurately depict human-like gestures and facial expressions based on the significant correlation at the 0.01 and 0.05 levels. This study is a steppingstone to a multi-faceted artificial agent interaction based on emotion-oriented behaviors. Cognition has significance regarding social interaction among humans.
△ Less
Submitted 28 October, 2023;
originally announced November 2023.
-
Domain Generalization in Computational Pathology: Survey and Guidelines
Authors:
Mostafa Jahanifar,
Manahil Raza,
Kesi Xu,
Trinh Vuong,
Rob Jewsbury,
Adam Shephard,
Neda Zamanitajeddin,
Jin Tae Kwak,
Shan E Ahmed Raza,
Fayyaz Minhas,
Nasir Rajpoot
Abstract:
Deep learning models have exhibited exceptional effectiveness in Computational Pathology (CPath) by tackling intricate tasks across an array of histology image analysis applications. Nevertheless, the presence of out-of-distribution data (stemming from a multitude of sources such as disparate imaging devices and diverse tissue preparation methods) can cause \emph{domain shift} (DS). DS decreases t…
▽ More
Deep learning models have exhibited exceptional effectiveness in Computational Pathology (CPath) by tackling intricate tasks across an array of histology image analysis applications. Nevertheless, the presence of out-of-distribution data (stemming from a multitude of sources such as disparate imaging devices and diverse tissue preparation methods) can cause \emph{domain shift} (DS). DS decreases the generalization of trained models to unseen datasets with slightly different data distributions, prompting the need for innovative \emph{domain generalization} (DG) solutions. Recognizing the potential of DG methods to significantly influence diagnostic and prognostic models in cancer studies and clinical practice, we present this survey along with guidelines on achieving DG in CPath. We rigorously define various DS types, systematically review and categorize existing DG approaches and resources in CPath, and provide insights into their advantages, limitations, and applicability. We also conduct thorough benchmarking experiments with 28 cutting-edge DG algorithms to address a complex DG problem. Our findings suggest that careful experiment design and CPath-specific Stain Augmentation technique can be very effective. However, there is no one-size-fits-all solution for DG in CPath. Therefore, we establish clear guidelines for detecting and managing DS depending on different scenarios. While most of the concepts, guidelines, and recommendations are given for applications in CPath, we believe that they are applicable to most medical image analysis tasks as well.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
FormaT5: Abstention and Examples for Conditional Table Formatting with Natural Language
Authors:
Mukul Singh,
José Cambronero,
Sumit Gulwani,
Vu Le,
Carina Negreanu,
Elnaz Nouri,
Mohammad Raza,
Gust Verbruggen
Abstract:
Formatting is an important property in tables for visualization, presentation, and analysis. Spreadsheet software allows users to automatically format their tables by writing data-dependent conditional formatting (CF) rules. Writing such rules is often challenging for users as it requires them to understand and implement the underlying logic. We present FormaT5, a transformer-based model that can…
▽ More
Formatting is an important property in tables for visualization, presentation, and analysis. Spreadsheet software allows users to automatically format their tables by writing data-dependent conditional formatting (CF) rules. Writing such rules is often challenging for users as it requires them to understand and implement the underlying logic. We present FormaT5, a transformer-based model that can generate a CF rule given the target table and a natural language description of the desired formatting logic. We find that user descriptions for these tasks are often under-specified or ambiguous, making it harder for code generation systems to accurately learn the desired rule in a single step. To tackle this problem of under-specification and minimise argument errors, FormaT5 learns to predict placeholders though an abstention objective. These placeholders can then be filled by a second model or, when examples of rows that should be formatted are available, by a programming-by-example system. To evaluate FormaT5 on diverse and real scenarios, we create an extensive benchmark of 1053 CF tasks, containing real-world descriptions collected from four different sources. We release our benchmarks to encourage research in this area. Abstention and filling allow FormaT5 to outperform 8 different neural approaches on our benchmarks, both with and without examples. Our results illustrate the value of building domain-specific learning systems.
△ Less
Submitted 1 November, 2023; v1 submitted 26 October, 2023;
originally announced October 2023.
-
An Empirical Study on Bugs Inside PyTorch: A Replication Study
Authors:
Sharon Chee Yin Ho,
Vahid Majdinasab,
Mohayeminul Islam,
Diego Elias Costa,
Emad Shihab,
Foutse Khomh,
Sarah Nadi,
Muhammad Raza
Abstract:
Software systems are increasingly relying on deep learning components, due to their remarkable capability of identifying complex data patterns and powering intelligent behaviour. A core enabler of this change in software development is the availability of easy-to-use deep learning libraries. Libraries like PyTorch and TensorFlow empower a large variety of intelligent systems, offering a multitude…
▽ More
Software systems are increasingly relying on deep learning components, due to their remarkable capability of identifying complex data patterns and powering intelligent behaviour. A core enabler of this change in software development is the availability of easy-to-use deep learning libraries. Libraries like PyTorch and TensorFlow empower a large variety of intelligent systems, offering a multitude of algorithms and configuration options, applicable to numerous domains of systems. However, bugs in those popular deep learning libraries also may have dire consequences for the quality of systems they enable; thus, it is important to understand how bugs are identified and fixed in those libraries.
Inspired by a study of Jia et al., which investigates the bug identification and fixing process at TensorFlow, we characterize bugs in the PyTorch library, a very popular deep learning framework. We investigate the causes and symptoms of bugs identified during PyTorch's development, and assess their locality within the project, and extract patterns of bug fixes. Our results highlight that PyTorch bugs are more like traditional software projects bugs, than related to deep learning characteristics. Finally, we also compare our results with the study on TensorFlow, highlighting similarities and differences across the bug identification and fixing process.
△ Less
Submitted 1 August, 2023; v1 submitted 25 July, 2023;
originally announced July 2023.
-
Cooperation Is All You Need
Authors:
Ahsan Adeel,
Junaid Muzaffar,
Khubaib Ahmed,
Mohsin Raza
Abstract:
Going beyond 'dendritic democracy', we introduce a 'democracy of local processors', termed Cooperator. Here we compare their capabilities when used in permutation-invariant neural networks for reinforcement learning (RL), with machine learning algorithms based on Transformers, such as ChatGPT. Transformers are based on the long-standing conception of integrate-and-fire 'point' neurons, whereas Coo…
▽ More
Going beyond 'dendritic democracy', we introduce a 'democracy of local processors', termed Cooperator. Here we compare their capabilities when used in permutation-invariant neural networks for reinforcement learning (RL), with machine learning algorithms based on Transformers, such as ChatGPT. Transformers are based on the long-standing conception of integrate-and-fire 'point' neurons, whereas Cooperator is inspired by recent neurobiological breakthroughs suggesting that the cellular foundations of mental life depend on context-sensitive pyramidal neurons in the neocortex which have two functionally distinct points. We show that when used for RL, an algorithm based on Cooperator learns far quicker than that based on Transformer, even while having the same number of parameters.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.
-
From Words to Code: Harnessing Data for Program Synthesis from Natural Language
Authors:
Anirudh Khatry,
Joyce Cahoon,
Jordan Henkel,
Shaleen Deep,
Venkatesh Emani,
Avrilia Floratou,
Sumit Gulwani,
Vu Le,
Mohammad Raza,
Sherry Shi,
Mukul Singh,
Ashish Tiwari
Abstract:
Creating programs to correctly manipulate data is a difficult task, as the underlying programming languages and APIs can be challenging to learn for many users who are not skilled programmers. Large language models (LLMs) demonstrate remarkable potential for generating code from natural language, but in the data manipulation domain, apart from the natural language (NL) description of the intended…
▽ More
Creating programs to correctly manipulate data is a difficult task, as the underlying programming languages and APIs can be challenging to learn for many users who are not skilled programmers. Large language models (LLMs) demonstrate remarkable potential for generating code from natural language, but in the data manipulation domain, apart from the natural language (NL) description of the intended task, we also have the dataset on which the task is to be performed, or the "data context". Existing approaches have utilized data context in a limited way by simply adding relevant information from the input data into the prompts sent to the LLM.
In this work, we utilize the available input data to execute the candidate programs generated by the LLMs and gather their outputs. We introduce semantic reranking, a technique to rerank the programs generated by LLMs based on three signals coming the program outputs: (a) semantic filtering and well-formedness based score tuning: do programs even generate well-formed outputs, (b) semantic interleaving: how do the outputs from different candidates compare to each other, and (c) output-based score tuning: how do the outputs compare to outputs predicted for the same task. We provide theoretical justification for semantic interleaving. We also introduce temperature mixing, where we combine samples generated by LLMs using both high and low temperatures. We extensively evaluate our approach in three domains, namely databases (SQL), data science (Pandas) and business intelligence (Excel's Power Query M) on a variety of new and existing benchmarks. We observe substantial gains across domains, with improvements of up to 45% in top-1 accuracy and 34% in top-3 accuracy.
△ Less
Submitted 3 May, 2023; v1 submitted 2 May, 2023;
originally announced May 2023.
-
SACPlanner: Real-World Collision Avoidance with a Soft Actor Critic Local Planner and Polar State Representations
Authors:
Khaled Nakhleh,
Minahil Raza,
Mack Tang,
Matthew Andrews,
Rinu Boney,
Ilija Hadzic,
Jeongran Lee,
Atefeh Mohajeri,
Karina Palyutina
Abstract:
We study the training performance of ROS local planners based on Reinforcement Learning (RL), and the trajectories they produce on real-world robots. We show that recent enhancements to the Soft Actor Critic (SAC) algorithm such as RAD and DrQ achieve almost perfect training after only 10000 episodes. We also observe that on real-world robots the resulting SACPlanner is more reactive to obstacles…
▽ More
We study the training performance of ROS local planners based on Reinforcement Learning (RL), and the trajectories they produce on real-world robots. We show that recent enhancements to the Soft Actor Critic (SAC) algorithm such as RAD and DrQ achieve almost perfect training after only 10000 episodes. We also observe that on real-world robots the resulting SACPlanner is more reactive to obstacles than traditional ROS local planners such as DWA.
△ Less
Submitted 21 March, 2023;
originally announced March 2023.
-
Mimicking a Pathologist: Dual Attention Model for Scoring of Gigapixel Histology Images
Authors:
Manahil Raza,
Ruqayya Awan,
Raja Muhammad Saad Bashir,
Talha Qaiser,
Nasir M. Rajpoot
Abstract:
Some major challenges associated with the automated processing of whole slide images (WSIs) includes their sheer size, different magnification levels and high resolution. Utilizing these images directly in AI frameworks is computationally expensive due to memory constraints, while downsampling WSIs incurs information loss and splitting WSIs into tiles and patches results in loss of important conte…
▽ More
Some major challenges associated with the automated processing of whole slide images (WSIs) includes their sheer size, different magnification levels and high resolution. Utilizing these images directly in AI frameworks is computationally expensive due to memory constraints, while downsampling WSIs incurs information loss and splitting WSIs into tiles and patches results in loss of important contextual information. We propose a novel dual attention approach, consisting of two main components, to mimic visual examination by a pathologist. The first component is a soft attention model which takes as input a high-level view of the WSI to determine various regions of interest. We employ a custom sampling method to extract diverse and spatially distinct image tiles from selected high attention areas. The second component is a hard attention classification model, which further extracts a sequence of multi-resolution glimpses from each tile for classification. Since hard attention is non-differentiable, we train this component using reinforcement learning and predict the location of glimpses without processing all patches of a given tile, thereby aligning with pathologist's way of diagnosis. We train our components both separately and in an end-to-end fashion using a joint loss function to demonstrate the efficacy of our proposed model. We employ our proposed model on two different IHC use cases: HER2 prediction on breast cancer and prediction of Intact/Loss status of two MMR biomarkers, for colorectal cancer. We show that the proposed model achieves accuracy comparable to state-of-the-art methods while only processing a small fraction of the WSI at highest magnification.
△ Less
Submitted 19 February, 2023;
originally announced February 2023.
-
SimuShips -- A High Resolution Simulation Dataset for Ship Detection with Precise Annotations
Authors:
Minahil Raza,
Hanna Prokopova,
Samir Huseynzade,
Sepinoud Azimi,
Sebastien Lafond
Abstract:
Obstacle detection is a fundamental capability of an autonomous maritime surface vessel (AMSV). State-of-the-art obstacle detection algorithms are based on convolutional neural networks (CNNs). While CNNs provide higher detection accuracy and fast detection speed, they require enormous amounts of data for their training. In particular, the availability of domain-specific datasets is a challenge fo…
▽ More
Obstacle detection is a fundamental capability of an autonomous maritime surface vessel (AMSV). State-of-the-art obstacle detection algorithms are based on convolutional neural networks (CNNs). While CNNs provide higher detection accuracy and fast detection speed, they require enormous amounts of data for their training. In particular, the availability of domain-specific datasets is a challenge for obstacle detection. The difficulty in conducting onsite experiments limits the collection of maritime datasets. Owing to the logistic cost of conducting on-site operations, simulation tools provide a safe and cost-efficient alternative for data collection. In this work, we introduce SimuShips, a publicly available simulation-based dataset for maritime environments. Our dataset consists of 9471 high-resolution (1920x1080) images which include a wide range of obstacle types, atmospheric and illumination conditions along with occlusion, scale and visible proportion variations. We provide annotations in the form of bounding boxes. In addition, we conduct experiments with YOLOv5 to test the viability of simulation data. Our experiments indicate that the combination of real and simulated images improves the recall for all classes by 2.9%.
△ Less
Submitted 22 September, 2022;
originally announced November 2022.
-
Multimodal Speech Enhancement Using Burst Propagation
Authors:
Mohsin Raza,
Leandro A. Passos,
Ahmed Khubaib,
Ahsan Adeel
Abstract:
This paper proposes the MBURST, a novel multimodal solution for audio-visual speech enhancements that consider the most recent neurological discoveries regarding pyramidal cells of the prefrontal cortex and other brain regions. The so-called burst propagation implements several criteria to address the credit assignment problem in a more biologically plausible manner: steering the sign and magnitud…
▽ More
This paper proposes the MBURST, a novel multimodal solution for audio-visual speech enhancements that consider the most recent neurological discoveries regarding pyramidal cells of the prefrontal cortex and other brain regions. The so-called burst propagation implements several criteria to address the credit assignment problem in a more biologically plausible manner: steering the sign and magnitude of plasticity through feedback, multiplexing the feedback and feedforward information across layers through different weight connections, approximating feedback and feedforward connections, and linearizing the feedback signals. MBURST benefits from such capabilities to learn correlations between the noisy signal and the visual stimuli, thus attributing meaning to the speech by amplifying relevant information and suppressing noise. Experiments conducted over a Grid Corpus and CHiME3-based dataset show that MBURST can reproduce similar mask reconstructions to the multimodal backpropagation-based baseline while demonstrating outstanding energy efficiency management, reducing the neuron firing rates to values up to \textbf{$70\%$} lower. Such a feature implies more sustainable implementations, suitable and desirable for hearing aids or any other similar embedded systems.
△ Less
Submitted 5 February, 2024; v1 submitted 7 September, 2022;
originally announced September 2022.
-
CORNET: Learning Table Formatting Rules By Example
Authors:
Mukul Singh,
José Cambronero,
Sumit Gulwani,
Vu Le,
Carina Negreanu,
Mohammad Raza,
Gust Verbruggen
Abstract:
Spreadsheets are widely used for table manipulation and presentation. Stylistic formatting of these tables is an important property for both presentation and analysis. As a result, popular spreadsheet software, such as Excel, supports automatically formatting tables based on rules. Unfortunately, writing such formatting rules can be challenging for users as it requires knowledge of the underlying…
▽ More
Spreadsheets are widely used for table manipulation and presentation. Stylistic formatting of these tables is an important property for both presentation and analysis. As a result, popular spreadsheet software, such as Excel, supports automatically formatting tables based on rules. Unfortunately, writing such formatting rules can be challenging for users as it requires knowledge of the underlying rule language and data logic. We present CORNET, a system that tackles the novel problem of automatically learning such formatting rules from user examples in the form of formatted cells. CORNET takes inspiration from advances in inductive programming and combines symbolic rule enumeration with a neural ranker to learn conditional formatting rules. To motivate and evaluate our approach, we extracted tables with over 450K unique formatting rules from a corpus of over 1.8M real worksheets. Since we are the first to introduce conditional formatting, we compare CORNET to a wide range of symbolic and neural baselines adapted from related domains. Our results show that CORNET accurately learns rules across varying evaluation setups. Additionally, we show that CORNET finds shorter rules than those that a user has written and discovers rules in spreadsheets that users have manually formatted.
△ Less
Submitted 5 December, 2022; v1 submitted 11 August, 2022;
originally announced August 2022.
-
Overwatch: Learning Patterns in Code Edit Sequences
Authors:
Yuhao Zhang,
Yasharth Bajpai,
Priyanshu Gupta,
Ameya Ketkar,
Miltiadis Allamanis,
Titus Barik,
Sumit Gulwani,
Arjun Radhakrishna,
Mohammad Raza,
Gustavo Soares,
Ashish Tiwari
Abstract:
Integrated Development Environments (IDEs) provide tool support to automate many source code editing tasks. Traditionally, IDEs use only the spatial context, i.e., the location where the developer is editing, to generate candidate edit recommendations. However, spatial context alone is often not sufficient to confidently predict the developer's next edit, and thus IDEs generate many suggestions at…
▽ More
Integrated Development Environments (IDEs) provide tool support to automate many source code editing tasks. Traditionally, IDEs use only the spatial context, i.e., the location where the developer is editing, to generate candidate edit recommendations. However, spatial context alone is often not sufficient to confidently predict the developer's next edit, and thus IDEs generate many suggestions at a location. Therefore, IDEs generally do not actively offer suggestions and instead, the developer is usually required to click on a specific icon or menu and then select from a large list of potential suggestions. As a consequence, developers often miss the opportunity to use the tool support because they are not aware it exists or forget to use it.
To better understand common patterns in developer behavior and produce better edit recommendations, we can additionally use the temporal context, i.e., the edits that a developer was recently performing. To enable edit recommendations based on temporal context, we present Overwatch, a novel technique for learning edit sequence patterns from traces of developers' edits performed in an IDE. Our experiments show that Overwatch has 78% precision and that Overwatch not only completed edits when developers missed the opportunity to use the IDE tool support but also predicted new edits that have no tool support in the IDE.
△ Less
Submitted 25 July, 2022;
originally announced July 2022.
-
Context-sensitive neocortical neurons transform the effectiveness and efficiency of neural information processing
Authors:
Ahsan Adeel,
Mario Franco,
Mohsin Raza,
Khubaib Ahmed
Abstract:
Deep learning (DL) has big-data processing capabilities that are as good, or even better, than those of humans in many real-world domains, but at the cost of high energy requirements that may be unsustainable in some applications and of errors, that, though infrequent, can be large. We hypothesise that a fundamental weakness of DL lies in its intrinsic dependence on integrate-and-fire point neuron…
▽ More
Deep learning (DL) has big-data processing capabilities that are as good, or even better, than those of humans in many real-world domains, but at the cost of high energy requirements that may be unsustainable in some applications and of errors, that, though infrequent, can be large. We hypothesise that a fundamental weakness of DL lies in its intrinsic dependence on integrate-and-fire point neurons that maximise information transmission irrespective of whether it is relevant in the current context or not. This leads to unnecessary neural firing and to the feedforward transmission of conflicting messages, which makes learning difficult and processing energy inefficient. Here we show how to circumvent these limitations by mimicking the capabilities of context-sensitive neocortical neurons that receive input from diverse sources as a context to amplify and attenuate the transmission of relevant and irrelevant information, respectively. We demonstrate that a deep network composed of such local processors seeks to maximise agreement between the active neurons, thus restricting the transmission of conflicting information to higher levels and reducing the neural activity required to process large amounts of heterogeneous real-world data. As shown to be far more effective and efficient than current forms of DL, this two-point neuron study offers a possible step-change in transforming the cellular foundations of deep network architectures.
△ Less
Submitted 4 April, 2023; v1 submitted 15 July, 2022;
originally announced July 2022.
-
Landmarks and Regions: A Robust Approach to Data Extraction
Authors:
Suresh Parthasarathy,
Lincy Pattanaik,
Anirudh Khatry,
Arun Iyer,
Arjun Radhakrishna,
Sriram Rajamani,
Mohammad Raza
Abstract:
We propose a new approach to extracting data items or field values from semi-structured documents. Examples of such problems include extracting passenger name, departure time and departure airport from a travel itinerary, or extracting price of an item from a purchase receipt. Traditional approaches to data extraction use machine learning or program synthesis to process the whole document to extra…
▽ More
We propose a new approach to extracting data items or field values from semi-structured documents. Examples of such problems include extracting passenger name, departure time and departure airport from a travel itinerary, or extracting price of an item from a purchase receipt. Traditional approaches to data extraction use machine learning or program synthesis to process the whole document to extract the desired fields. Such approaches are not robust to format changes in the document, and the extraction process typically fails even if changes are made to parts of the document that are unrelated to the desired fields of interest. We propose a new approach to data extraction based on the concepts of landmarks and regions. Humans routinely use landmarks in manual processing of documents to zoom in and focus their attention on small regions of interest in the document. Inspired by this human intuition, we use the notion of landmarks in program synthesis to automatically synthesize extraction programs that first extract a small region of interest, and then automatically extract the desired value from the region in a subsequent step. We have implemented our landmark-based extraction approach in a tool LRSyn, and show extensive evaluation on documents in HTML as well as scanned images of invoices and receipts. Our results show that our approach is robust to various types of format changes that routinely happen in real-world settings.
△ Less
Submitted 11 April, 2022;
originally announced April 2022.
-
Cellular Segmentation and Composition in Routine Histology Images using Deep Learning
Authors:
Muhammad Dawood,
Raja Muhammad Saad Bashir,
Srijay Deshpande,
Manahil Raza,
Adam Shephard
Abstract:
Identification and quantification of nuclei in colorectal cancer haematoxylin \& eosin (H\&E) stained histology images is crucial to prognosis and patient management. In computational pathology these tasks are referred to as nuclear segmentation, classification and composition and are used to extract meaningful interpretable cytological and architectural features for downstream analysis. The CoNIC…
▽ More
Identification and quantification of nuclei in colorectal cancer haematoxylin \& eosin (H\&E) stained histology images is crucial to prognosis and patient management. In computational pathology these tasks are referred to as nuclear segmentation, classification and composition and are used to extract meaningful interpretable cytological and architectural features for downstream analysis. The CoNIC challenge poses the task of automated nuclei segmentation, classification and composition into six different types of nuclei from the largest publicly known nuclei dataset - Lizard. In this regard, we have developed pipelines for the prediction of nuclei segmentation using HoVer-Net and ALBRT for cellular composition. On testing on the preliminary test set, HoVer-Net achieved a PQ of 0.58, a PQ+ of 0.58 and finally a mPQ+ of 0.35. For the prediction of cellular composition with ALBRT on the preliminary test set, we achieved an overall $R^2$ score of 0.53, consisting of 0.84 for lymphocytes, 0.70 for epithelial cells, 0.70 for plasma and .060 for eosinophils.
△ Less
Submitted 4 March, 2022;
originally announced March 2022.
-
Object-Centric Representation Learning with Generative Spatial-Temporal Factorization
Authors:
Li Nanbo,
Muhammad Ahmed Raza,
Hu Wenbin,
Zhaole Sun,
Robert B. Fisher
Abstract:
Learning object-centric scene representations is essential for attaining structural understanding and abstraction of complex scenes. Yet, as current approaches for unsupervised object-centric representation learning are built upon either a stationary observer assumption or a static scene assumption, they often: i) suffer single-view spatial ambiguities, or ii) infer incorrectly or inaccurately obj…
▽ More
Learning object-centric scene representations is essential for attaining structural understanding and abstraction of complex scenes. Yet, as current approaches for unsupervised object-centric representation learning are built upon either a stationary observer assumption or a static scene assumption, they often: i) suffer single-view spatial ambiguities, or ii) infer incorrectly or inaccurately object representations from dynamic scenes. To address this, we propose Dynamics-aware Multi-Object Network (DyMON), a method that broadens the scope of multi-view object-centric representation learning to dynamic scenes. We train DyMON on multi-view-dynamic-scene data and show that DyMON learns -- without supervision -- to factorize the entangled effects of observer motions and scene object dynamics from a sequence of observations, and constructs scene object spatial representations suitable for rendering at arbitrary times (querying across time) and from arbitrary viewpoints (querying across space). We also show that the factorized scene representations (w.r.t. objects) support querying about a single object by space and time independently.
△ Less
Submitted 9 November, 2021;
originally announced November 2021.
-
Real-Time Trash Detection for Modern Societies using CCTV to Identifying Trash by utilizing Deep Convolutional Neural Network
Authors:
Syed Muhammad Raza,
Syed Muhammad Ghazi Hassan,
Syed Ali Hassan,
Soo Young Shin
Abstract:
To protect the environment from trash pollution, especially in societies, and to take strict action against the red-handed people who throws the trash. As modern societies are developing and these societies need a modern solution to make the environment clean. Artificial intelligence (AI) evolution, especially in Deep Learning, gives an excellent opportunity to develop real-time trash detection us…
▽ More
To protect the environment from trash pollution, especially in societies, and to take strict action against the red-handed people who throws the trash. As modern societies are developing and these societies need a modern solution to make the environment clean. Artificial intelligence (AI) evolution, especially in Deep Learning, gives an excellent opportunity to develop real-time trash detection using CCTV cameras. The inclusion of this project is real-time trash detection using a deep model of Convolutional Neural Network (CNN). It is used to obtain eight classes mask, tissue papers, shoppers, boxes, automobile parts, pampers, bottles, and juices boxes. After detecting the trash, the camera records the video of that person for ten seconds who throw trash in society. The challenging part of this paper is preparing a complex custom dataset that took too much time. The dataset consists of more than 2100 images. The CNN model was created, labeled, and trained. The detection time accuracy and average mean precision (mAP) benchmark both models' performance. In experimental phase the mAP performance and accuracy of the improved CNN model was superior in all aspects. The model is used on a CCTV camera to detect trash in real-time.
△ Less
Submitted 21 September, 2021; v1 submitted 20 September, 2021;
originally announced September 2021.
-
Multi-modal Program Inference: a Marriage of Pre-trainedLanguage Models and Component-based Synthesis
Authors:
Kia Rahmani,
Mohammad Raza,
Sumit Gulwani,
Vu Le,
Daniel Morris,
Arjun Radhakrishna,
Gustavo Soares,
Ashish Tiwari
Abstract:
Multi-modal program synthesis refers to the task of synthesizing programs (code) from their specification given in different forms, such as a combination of natural language and examples. Examples provide a precise but incomplete specification, and natural language provides an ambiguous but more "complete" task description. Machine-learned pre-trained models (PTMs) are adept at handling ambiguous…
▽ More
Multi-modal program synthesis refers to the task of synthesizing programs (code) from their specification given in different forms, such as a combination of natural language and examples. Examples provide a precise but incomplete specification, and natural language provides an ambiguous but more "complete" task description. Machine-learned pre-trained models (PTMs) are adept at handling ambiguous natural language, but struggle with generating syntactically and semantically precise code. Program synthesis techniques can generate correct code, often even from incomplete but precise specifications, such as examples, but they are unable to work with the ambiguity of natural languages. We present an approach that combines PTMs with component-based synthesis (CBS): PTMs are used to generate candidates programs from the natural language description of the task, which are then used to guide the CBS procedure to find the program that matches the precise examples-based specification. We use our combination approach to instantiate multi-modal synthesis systems for two programming domains: the domain of regular expressions and the domain of CSS selectors. Our evaluation demonstrates the effectiveness of our domain-agnostic approach in comparison to a state-of-the-art specialized system, and the generality of our approach in providing multi-modal program synthesis from natural language and examples in different programming domains.
△ Less
Submitted 3 September, 2021;
originally announced September 2021.
-
Reliability Aware Multiple Path Installation in Software Defined Networking
Authors:
Syed Mohsan Raza
Abstract:
Being a state-of-the-art network, Software Defined Networking (SDN) decouples control and management planes from data plane of the forwarding devices by implementing both the control and management planes at logically centralized entity, called controller. This helps to make simple and easy both the network control and management. Failure of links occurs frequently in a computer network. To deal w…
▽ More
Being a state-of-the-art network, Software Defined Networking (SDN) decouples control and management planes from data plane of the forwarding devices by implementing both the control and management planes at logically centralized entity, called controller. This helps to make simple and easy both the network control and management. Failure of links occurs frequently in a computer network. To deal with the link failures, the existing approaches computes and installs multiple paths for a flow at the switches in SDN without considering the reliability value of the primary path. This incurs extra computation to compute multiple paths, and both increased computation time and traffic to install extra flow rules in the network. In this research work we propose a new approach that calculates the link reliability and then installs the number of multiple paths based on the reliability value of the primary path. More specifically, if a primary path has higher reliability then a smaller number of alternative paths should be installed. This shall decrease the path computational time and flow rule installation load at controller. Resultantly there shall be less flow rule entries in switch flow table which in turn will avoid the overflow of the flow table. Through simulation results, our proposed approach performs better as compared to the existing approach in term of computational overhead at controller, end-to-end delay for packet deliver and the traffic overhead for flow rule installation.
△ Less
Submitted 17 December, 2020;
originally announced January 2021.
-
Binary Black-box Evasion Attacks Against Deep Learning-based Static Malware Detectors with Adversarial Byte-Level Language Model
Authors:
Mohammadreza Ebrahimi,
Ning Zhang,
James Hu,
Muhammad Taqi Raza,
Hsinchun Chen
Abstract:
Anti-malware engines are the first line of defense against malicious software. While widely used, feature engineering-based anti-malware engines are vulnerable to unseen (zero-day) attacks. Recently, deep learning-based static anti-malware detectors have achieved success in identifying unseen attacks without requiring feature engineering and dynamic analysis. However, these detectors are susceptib…
▽ More
Anti-malware engines are the first line of defense against malicious software. While widely used, feature engineering-based anti-malware engines are vulnerable to unseen (zero-day) attacks. Recently, deep learning-based static anti-malware detectors have achieved success in identifying unseen attacks without requiring feature engineering and dynamic analysis. However, these detectors are susceptible to malware variants with slight perturbations, known as adversarial examples. Generating effective adversarial examples is useful to reveal the vulnerabilities of such systems. Current methods for launching such attacks require accessing either the specifications of the targeted anti-malware model, the confidence score of the anti-malware response, or dynamic malware analysis, which are either unrealistic or expensive. We propose MalRNN, a novel deep learning-based approach to automatically generate evasive malware variants without any of these restrictions. Our approach features an adversarial example generation process, which learns a language model via a generative sequence-to-sequence recurrent neural network to augment malware binaries. MalRNN effectively evades three recent deep learning-based malware detectors and outperforms current benchmark methods. Findings from applying our MalRNN on a real dataset with eight malware categories are discussed.
△ Less
Submitted 14 December, 2020;
originally announced December 2020.
-
Malware Traffic Classification: Evaluation of Algorithms and an Automated Ground-truth Generation Pipeline
Authors:
Syed Muhammad Kumail Raza,
Juan Caballero
Abstract:
Identifying threats in a network traffic flow which is encrypted is uniquely challenging. On one hand it is extremely difficult to simply decrypt the traffic due to modern encryption algorithms. On the other hand, passing such an encrypted stream through pattern matching algorithms is useless because encryption ensures there aren't any. Moreover, evaluating such models is also difficult due to lac…
▽ More
Identifying threats in a network traffic flow which is encrypted is uniquely challenging. On one hand it is extremely difficult to simply decrypt the traffic due to modern encryption algorithms. On the other hand, passing such an encrypted stream through pattern matching algorithms is useless because encryption ensures there aren't any. Moreover, evaluating such models is also difficult due to lack of labeled benign and malware datasets. Other approaches have tried to tackle this problem by employing observable meta-data gathered from the flow. We try to augment this approach by extending it to a semi-supervised malware classification pipeline using these observable meta-data. To this end, we explore and test different kind of clustering approaches which make use of unique and diverse set of features extracted from this observable meta-data. We also, propose an automated packet data-labeling pipeline to generate ground-truth data which can serve as a base-line to evaluate the classifiers mentioned above in particular, or any other detection model in general.
△ Less
Submitted 7 November, 2020; v1 submitted 22 October, 2020;
originally announced October 2020.
-
Signature-based Non-orthogonal Multiple Access (S-NOMA) for Massive Machine-Type Communications in 5G
Authors:
Mostafa Mohammadkarimi,
Muhammad Ahmad Raza,
Octavia A. Dobre
Abstract:
The problem of providing massive connectivity in Internet-of-Things (IoT) with a limited number of available resources motivates the non-orthogonal multiple access (NOMA) solutions. In this article, we provide a comprehensive review of the signature-based NOMA (S-NOMA) schemes as potential candidates for IoT. The signature in S-NOMA represents the way the data stream of an active device is spread…
▽ More
The problem of providing massive connectivity in Internet-of-Things (IoT) with a limited number of available resources motivates the non-orthogonal multiple access (NOMA) solutions. In this article, we provide a comprehensive review of the signature-based NOMA (S-NOMA) schemes as potential candidates for IoT. The signature in S-NOMA represents the way the data stream of an active device is spread over available resources in a non-orthogonal manner. It can be designed based on device-specific codebook structures, delay patterns, spreading sequences, interleaving patterns, and scrambling sequences. Additionally, we present the detection algorithms employed to decode each device's data from non-orthogonally superimposed signals at the receiver. The bit error rate of different S-NOMA schemes is simulated in impulsive noise environments, which can be important in machine-type communications. Simulation results show that the performance of the S-NOMA schemes degrades under such conditions. Finally, research challenges in S-NOMA oriented IoT are presented.
△ Less
Submitted 21 August, 2018;
originally announced August 2018.
-
Interactive Program Synthesis
Authors:
Vu Le,
Daniel Perelman,
Oleksandr Polozov,
Mohammad Raza,
Abhishek Udupa,
Sumit Gulwani
Abstract:
Program synthesis from incomplete specifications (e.g. input-output examples) has gained popularity and found real-world applications, primarily due to its ease-of-use. Since this technology is often used in an interactive setting, efficiency and correctness are often the key user expectations from a system based on such technologies. Ensuring efficiency is challenging since the highly combinatori…
▽ More
Program synthesis from incomplete specifications (e.g. input-output examples) has gained popularity and found real-world applications, primarily due to its ease-of-use. Since this technology is often used in an interactive setting, efficiency and correctness are often the key user expectations from a system based on such technologies. Ensuring efficiency is challenging since the highly combinatorial nature of program synthesis algorithms does not fit in a 1-2 second response expectation of a user-facing system. Meeting correctness expectations is also difficult, given that the specifications provided are incomplete, and that the users of such systems are typically non-programmers.
In this paper, we describe how interactivity can be leveraged to develop efficient synthesis algorithms, as well as to decrease the cognitive burden that a user endures trying to ensure that the system produces the desired program. We build a formal model of user interaction along three dimensions: incremental algorithm, step-based problem formulation, and feedback-based intent refinement. We then illustrate the effectiveness of each of these forms of interactivity with respect to synthesis performance and correctness on a set of real-world case studies.
△ Less
Submitted 9 March, 2017;
originally announced March 2017.
-
New Threats to SMS-Assisted Mobile Internet Services from 4G LTE: Lessons Learnt from Distributed Mobile-Initiated Attacks towards Facebook and Other Services
Authors:
Guan-Hua Tu,
Yuanjie Li,
Chunyi Peng,
Chi-Yu Li,
Muhammad Taqi Raza,
Hsiao-Yun Tseng,
Songwu Lu
Abstract:
Mobile Internet is becoming the norm. With more personalized mobile devices in hand, many services choose to offer alternative, usually more convenient, approaches to authenticating and delivering the content between mobile users and service providers. One main option is to use SMS (i.e., short messaging service). Such carrier-grade text service has been widely used to assist versatile mobile serv…
▽ More
Mobile Internet is becoming the norm. With more personalized mobile devices in hand, many services choose to offer alternative, usually more convenient, approaches to authenticating and delivering the content between mobile users and service providers. One main option is to use SMS (i.e., short messaging service). Such carrier-grade text service has been widely used to assist versatile mobile services, including social networking, banking, to name a few. Though the text service can be spoofed via certain Internet text service providers which cooperated with carriers, such attacks haven well studied and defended by industry due to the efforts of research community. However, as cellular network technology advances to the latest IP-based 4G LTE, we find that these mobile services are somehow exposed to new threats raised by this change, particularly on 4G LTE Text service (via brand-new distributed Mobile-Initiated Spoofed SMS attack which is not available in legacy 2G/3G systems). The reason is that messaging service over LTE shifts from the circuit-switched (CS) design to the packet-switched (PS) paradigm as 4G LTE supports PS only. Due to this change, 4G LTE Text Service becomes open to access. However, its shields to messaging integrity and user authentication are not in place. As a consequence, such weaknesses can be exploited to launch attacks (e.g., hijack Facebook accounts) against a targeted individual, a large scale of mobile users and even service providers, from mobile devices. Current defenses for Internet-Initiated Spoofed SMS attacks cannot defend the unprecedented attack. Our study shows that 53 of 64 mobile services over 27 industries are vulnerable to at least one threat. We validate these proof-of-concept attacks in one major US carrier which supports more than 100 million users. We finally propose quick fixes and discuss security insights and lessons we have learnt.
△ Less
Submitted 31 October, 2015; v1 submitted 28 October, 2015;
originally announced October 2015.
-
On the Deployment of Cognitive Relay as Underlay Systems
Authors:
Ankit Kaushik,
M. Rehan Raza,
Friedrich K. Jondral
Abstract:
The objective of this paper is to extend the idea of Cognitive Relay (CR). CR, as a secondary user, follows an underlay paradigm to endorse secondary usage of the spectrum to the indoor devices. To seek a spatial opportunity, i.e., deciding its transmission over the primary user channels, CR models its deployment scenario and the movements of the primary receivers and indoor devices. Modeling is b…
▽ More
The objective of this paper is to extend the idea of Cognitive Relay (CR). CR, as a secondary user, follows an underlay paradigm to endorse secondary usage of the spectrum to the indoor devices. To seek a spatial opportunity, i.e., deciding its transmission over the primary user channels, CR models its deployment scenario and the movements of the primary receivers and indoor devices. Modeling is beneficial for theoretical analysis, however it is also important to ensure the performance of CR in a real scenario. We consider briefly, the challenges involved while deploying a hardware prototype of such a system.
△ Less
Submitted 18 April, 2014;
originally announced April 2014.
-
Minimizing Electricity Theft using Smart Meters in AMI
Authors:
M. Anas,
N. Javaid,
A. Mahmood,
S. M. Raza,
U. Qasim,
Z. A. Khan
Abstract:
Global energy crises are increasing every moment. Every one has the attention towards more and more energy production and also trying to save it. Electricity can be produced through many ways which is then synchronized on a main grid for usage. The main issue for which we have written this survey paper is losses in electrical system. Weather these losses are technical or non-technical. Technical l…
▽ More
Global energy crises are increasing every moment. Every one has the attention towards more and more energy production and also trying to save it. Electricity can be produced through many ways which is then synchronized on a main grid for usage. The main issue for which we have written this survey paper is losses in electrical system. Weather these losses are technical or non-technical. Technical losses can be calculated easily, as we discussed in section of mathematical modeling that how to calculate technical losses. Where as nontechnical losses can be evaluated if technical losses are known. Theft in electricity produce non-technical losses. To reduce or control theft one can save his economic resources. Smart meter can be the best option to minimize electricity theft, because of its high security, best efficiency, and excellent resistance towards many of theft ideas in electromechanical meters. So in this paper we have mostly concentrated on theft issues.
△ Less
Submitted 11 August, 2012;
originally announced August 2012.
-
Footprints in Local Reasoning
Authors:
Mohammad Raza,
Philippa Gardner
Abstract:
Local reasoning about programs exploits the natural local behaviour common in programs by focussing on the footprint - that part of the resource accessed by the program. We address the problem of formally characterising and analysing the footprint notion for abstract local functions introduced by Calcagno, O Hearn and Yang. With our definition, we prove that the footprints are the only essential…
▽ More
Local reasoning about programs exploits the natural local behaviour common in programs by focussing on the footprint - that part of the resource accessed by the program. We address the problem of formally characterising and analysing the footprint notion for abstract local functions introduced by Calcagno, O Hearn and Yang. With our definition, we prove that the footprints are the only essential elements required for a complete specification of a local function. We formalise the notion of small specifications in local reasoning and show that for well-founded resource models, a smallest specification always exists that only includes the footprints, and also present results for the non-well-founded case. Finally, we use this theory of footprints to investigate the conditions under which the footprints correspond to the smallest safe states. We present a new model of RAM in which, unlike the standard model, the footprints of every program correspond to the smallest safe states, and we also identify a general condition on the primitive commands of a programming language which guarantees this property for arbitrary models.
△ Less
Submitted 24 April, 2009; v1 submitted 5 March, 2009;
originally announced March 2009.