subscribe to arXiv mailings

ALPINE: An adaptive language-agnostic pruning method for language models for code

Authors: Mootez Saad, José Antonio Hernández López, Boqi Chen, Dániel Varró, Tushar Sharma

Abstract: Language models of code have demonstrated state-of-the-art performance across various software engineering and source code analysis tasks. However, their demanding computational resource requirements and consequential environmental footprint remain as significant challenges. This work introduces ALPINE, an adaptive programming language-agnostic pruning technique designed to substantially reduce th… ▽ More Language models of code have demonstrated state-of-the-art performance across various software engineering and source code analysis tasks. However, their demanding computational resource requirements and consequential environmental footprint remain as significant challenges. This work introduces ALPINE, an adaptive programming language-agnostic pruning technique designed to substantially reduce these models' computational overhead. The proposed method offers a pluggable layer that can be integrated with all Transformer-based models. With ALPINE, input sequences undergo adaptive compression throughout the pipeline, reaching a size up to $\times 3$ less their initial size, resulting in significantly reduced computational load. Our experiments on two software engineering tasks, defect prediction and code clone detection across three language models CodeBERT, GraphCodeBERT and UniXCoder show that ALPINE achieves up to a 50% reduction in FLOPs, a 58.1% decrease in memory footprint, and a 28.1% improvement in throughput on average. This led to a reduction in CO2 by up to $44.85$%. Importantly, it achieves the reduction in computation resources while maintaining up to 98.1% of the original predictive performance. These findings highlight the potential of ALPINE in making language models of code more resource-efficient and accessible while preserving their performance, contributing to the overall sustainability of adopting language models in software development. Also, it sheds light on redundant and noisy information in source code analysis corpora, as shown by the substantial sequence compression achieved by ALPINE. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2406.10461 [pdf, ps, other]

Exploring Parent-Child Perceptions on Safety in Generative AI: Concerns, Mitigation Strategies, and Design Implications

Authors: Yaman Yu, Tanusree Sharma, Melinda Hu, Justin Wang, Yang Wang

Abstract: The widespread use of Generative Artificial Intelligence (GAI) among teenagers has led to significant misuse and safety concerns. To identify risks and understand parental controls challenges, we conducted a content analysis on Reddit and interviewed 20 participants (seven teenagers and 13 parents). Our study reveals a significant gap in parental awareness of the extensive ways children use GAI, s… ▽ More The widespread use of Generative Artificial Intelligence (GAI) among teenagers has led to significant misuse and safety concerns. To identify risks and understand parental controls challenges, we conducted a content analysis on Reddit and interviewed 20 participants (seven teenagers and 13 parents). Our study reveals a significant gap in parental awareness of the extensive ways children use GAI, such as interacting with character-based chatbots for emotional support or engaging in virtual relationships. Parents and children report differing perceptions of risks associated with GAI. Parents primarily express concerns about data collection, misinformation, and exposure to inappropriate content. In contrast, teenagers are more concerned about becoming addicted to virtual relationships with GAI, the potential misuse of GAI to spread harmful content in social groups, and the invasion of privacy due to unauthorized use of their personal data in GAI applications. The absence of parental control features on GAI platforms forces parents to rely on system-built controls, manually check histories, share accounts, and engage in active mediation. Despite these efforts, parents struggle to grasp the full spectrum of GAI-related risks and to perform effective real-time monitoring, mediation, and education. We provide design recommendations to improve parent-child communication and enhance the safety of GAI use. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 13 pages

arXiv:2405.11138 [pdf, other]

Spatial Models for Crowdsourced Internet Access Network Performance Measurements

Authors: Taveesh Sharma, Paul Schmitt, Francesco Bronzino, Nick Feamster, Nicole Marwell

Abstract: Despite significant investments in access network infrastructure, universal access to high-quality Internet connectivity remains a challenge. Policymakers often rely on large-scale, crowdsourced measurement datasets to assess the distribution of access network performance across geographic areas. These decisions typically rest on the assumption that Internet performance is uniformly distributed wi… ▽ More Despite significant investments in access network infrastructure, universal access to high-quality Internet connectivity remains a challenge. Policymakers often rely on large-scale, crowdsourced measurement datasets to assess the distribution of access network performance across geographic areas. These decisions typically rest on the assumption that Internet performance is uniformly distributed within predefined social boundaries, such as zip codes, census tracts, or community areas. However, this assumption may not be valid for two reasons: (1) crowdsourced measurements often exhibit non-uniform sampling densities within geographic areas; and (2) predefined social boundaries may not align with the actual boundaries of Internet infrastructure. In this paper, we model Internet performance as a spatial process. We apply and evaluate a series of statistical techniques to: (1) aggregate Internet performance over a geographic region; (2) overlay interpolated maps with various sampling boundary choices; and (3) spatially cluster boundary units to identify areas with similar performance characteristics. We evaluated the effectiveness of these using a 17-month-long crowdsourced dataset from Ookla Speedtest. We evaluate several leading interpolation methods at varying spatial scales. Further, we examine the similarity between the resulting boundaries for smaller realizations of the dataset. Our findings suggest that our combination of techniques achieves a 56% gain in similarity score over traditional methods that rely on aggregates over raw measurement values for performance summarization. Our work highlights an urgent need for more sophisticated strategies in understanding and addressing Internet access disparities. △ Less

Submitted 21 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

Comments: 13 pages

arXiv:2405.02790 [pdf, other]

Confidential and Protected Disease Classifier using Fully Homomorphic Encryption

Authors: Aditya Malik, Nalini Ratha, Bharat Yalavarthi, Tilak Sharma, Arjun Kaushik, Charanjit Jutla

Abstract: With the rapid surge in the prevalence of Large Language Models (LLMs), individuals are increasingly turning to conversational AI for initial insights across various domains, including health-related inquiries such as disease diagnosis. Many users seek potential causes on platforms like ChatGPT or Bard before consulting a medical professional for their ailment. These platforms offer valuable benef… ▽ More With the rapid surge in the prevalence of Large Language Models (LLMs), individuals are increasingly turning to conversational AI for initial insights across various domains, including health-related inquiries such as disease diagnosis. Many users seek potential causes on platforms like ChatGPT or Bard before consulting a medical professional for their ailment. These platforms offer valuable benefits by streamlining the diagnosis process, alleviating the significant workload of healthcare practitioners, and saving users both time and money by avoiding unnecessary doctor visits. However, Despite the convenience of such platforms, sharing personal medical data online poses risks, including the presence of malicious platforms or potential eavesdropping by attackers. To address privacy concerns, we propose a novel framework combining FHE and Deep Learning for a secure and private diagnosis system. Operating on a question-and-answer-based model akin to an interaction with a medical practitioner, this end-to-end secure system employs Fully Homomorphic Encryption (FHE) to handle encrypted input data. Given FHE's computational constraints, we adapt deep neural networks and activation functions to the encryted domain. Further, we also propose a faster algorithm to compute summation of ciphertext elements. Through rigorous experiments, we demonstrate the efficacy of our approach. The proposed framework achieves strict security and privacy with minimal loss in performance. △ Less

Submitted 4 May, 2024; originally announced May 2024.

arXiv:2402.01841 [pdf, other]

COMET: Generating Commit Messages using Delta Graph Context Representation

Authors: Abhinav Reddy Mandli, Saurabhsingh Rajput, Tushar Sharma

Abstract: Commit messages explain code changes in a commit and facilitate collaboration among developers. Several commit message generation approaches have been proposed; however, they exhibit limited success in capturing the context of code changes. We propose Comet (Context-Aware Commit Message Generation), a novel approach that captures context of code changes using a graph-based representation and lever… ▽ More Commit messages explain code changes in a commit and facilitate collaboration among developers. Several commit message generation approaches have been proposed; however, they exhibit limited success in capturing the context of code changes. We propose Comet (Context-Aware Commit Message Generation), a novel approach that captures context of code changes using a graph-based representation and leverages a transformer-based model to generate high-quality commit messages. Our proposed method utilizes delta graph that we developed to effectively represent code differences. We also introduce a customizable quality assurance module to identify optimal messages, mitigating subjectivity in commit messages. Experiments show that Comet outperforms state-of-the-art techniques in terms of bleu-norm and meteor metrics while being comparable in terms of rogue-l. Additionally, we compare the proposed approach with the popular gpt-3.5-turbo model, along with gpt-4-turbo; the most capable GPT model, over zero-shot, one-shot, and multi-shot settings. We found Comet outperforming the GPT models, on five and four metrics respectively and provide competitive results with the two other metrics. The study has implications for researchers, tool developers, and software developers. Software developers may utilize Comet to generate context-aware commit messages. Researchers and tool developers can apply the proposed delta graph technique in similar contexts, like code review summarization. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: 22 Pages, 7 Figures

arXiv:2401.17967 [pdf, other]

CONCORD: Towards a DSL for Configurable Graph Code Representation

Authors: Mootez Saad, Tushar Sharma

Abstract: Deep learning is widely used to uncover hidden patterns in large code corpora. To achieve this, constructing a format that captures the relevant characteristics and features of source code is essential. Graph-based representations have gained attention for their ability to model structural and semantic information. However, existing tools lack flexibility in constructing graphs across different pr… ▽ More Deep learning is widely used to uncover hidden patterns in large code corpora. To achieve this, constructing a format that captures the relevant characteristics and features of source code is essential. Graph-based representations have gained attention for their ability to model structural and semantic information. However, existing tools lack flexibility in constructing graphs across different programming languages, limiting their use. Additionally, the output of these tools often lacks interoperability and results in excessively large graphs, making graph-based neural networks training slower and less scalable. We introduce CONCORD, a domain-specific language to build customizable graph representations. It implements reduction heuristics to reduce graphs' size complexity. We demonstrate its effectiveness in code smell detection as an illustrative use case and show that: first, CONCORD can produce code representations automatically per the specified configuration, and second, our heuristics can achieve comparable performance with significantly reduced size. CONCORD will help researchers a) create and experiment with customizable graph-based code representations for different software engineering tasks involving DL, b) reduce the engineering work to generate graph representations, c) address the issue of scalability in GNN models, and d) enhance the reproducibility of experiments in research through a standardized approach to code representation and analysis. △ Less

Submitted 31 January, 2024; originally announced January 2024.

arXiv:2401.07930 [pdf, other]

On Inter-dataset Code Duplication and Data Leakage in Large Language Models

Authors: José Antonio Hernández López, Boqi Chen, Tushar Sharma, Dániel Varró

Abstract: Motivation. Large language models (LLMs) have exhibited remarkable proficiency in diverse software engineering (SE) tasks. Handling such tasks typically involves acquiring foundational coding knowledge on large, general-purpose datasets during a pre-training phase, and subsequently refining on smaller, task-specific datasets as part of a fine-tuning phase. Problem statement. Data leakage is a we… ▽ More Motivation. Large language models (LLMs) have exhibited remarkable proficiency in diverse software engineering (SE) tasks. Handling such tasks typically involves acquiring foundational coding knowledge on large, general-purpose datasets during a pre-training phase, and subsequently refining on smaller, task-specific datasets as part of a fine-tuning phase. Problem statement. Data leakage is a well-known issue in training of machine learning models. A manifestation of this issue is the intersection of the training and testing splits. While intra-dataset code duplication examines this intersection within a given dataset and has been addressed in prior research, inter-dataset code duplication, which gauges the overlap between different datasets, remains largely unexplored. If this phenomenon exists, it could compromise the integrity of LLM evaluations because of the inclusion of fine-tuning test samples that were already encountered during pre-training, resulting in inflated performance metrics. Contribution. This paper explores the phenomenon of inter-dataset code duplication and its impact on evaluating LLMs across diverse SE tasks. Study design. We conduct an empirical study using the CSN dataset, a widely adopted pre-training dataset, and five fine-tuning datasets used for various SE tasks. We first identify the intersection between the pre-training and fine-tuning datasets using a deduplication process. Then, we fine-tune four models pre-trained on CSN to evaluate their performance on samples encountered during pre-training and those unseen during that phase. Results. Our findings reveal a potential threat to the evaluation of various LLMs across multiple SE tasks, stemming from the inter-dataset code duplication phenomenon. Moreover, we demonstrate that this threat is accentuated by factors like the LLM's size and the chosen fine-tuning technique. △ Less

Submitted 15 January, 2024; originally announced January 2024.

arXiv:2312.15896 [pdf, other]

WWW: What, When, Where to Compute-in-Memory

Authors: Tanvi Sharma, Mustafa Ali, Indranil Chakraborty, Kaushik Roy

Abstract: Compute-in-memory (CiM) has emerged as a highly energy efficient solution for performing matrix multiplication during Machine Learning (ML) inference. However, integrating compute in memory poses key questions, such as 1) What type of CiM to use: Given a multitude of CiM design characteristics, determining their suitability from architecture perspective is needed. 2) When to use CiM: ML inference… ▽ More Compute-in-memory (CiM) has emerged as a highly energy efficient solution for performing matrix multiplication during Machine Learning (ML) inference. However, integrating compute in memory poses key questions, such as 1) What type of CiM to use: Given a multitude of CiM design characteristics, determining their suitability from architecture perspective is needed. 2) When to use CiM: ML inference includes workloads with a variety of memory and compute requirements, making it difficult to identify when CiM is more beneficial. 3) Where to integrate CiM: Each memory level has different bandwidth and capacity, creating different data reuse opportunities for CiM integration. To answer such questions regarding on-chip CiM integration for accelerating ML workloads, we use an analytical architecture evaluation methodology where we tailor the dataflow mapping. The mapping algorithm aims to achieve highest weight reuse and reduced data movements for a given CiM prototype and workload. Our experiments show that CiM integrated memory improves energy efficiency by up to 3.4x and throughput by up to 15.6x compared to tensor-core-like baseline architecture, with INT-8 precision under iso-area constraints. We believe the proposed work provides insights into what type of CiM to use, and when and where to optimally integrate it in the cache hierarchy for efficient matrix multiplication. △ Less

Submitted 20 June, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

Comments: updated methodology

arXiv:2311.13508 [pdf, other]

Naturalness of Attention: Revisiting Attention in Code Language Models

Authors: Mootez Saad, Tushar Sharma

Abstract: Language models for code such as CodeBERT offer the capability to learn advanced source code representation, but their opacity poses barriers to understanding of captured properties. Recent attention analysis studies provide initial interpretability insights by focusing solely on attention weights rather than considering the wider context modeling of Transformers. This study aims to shed some ligh… ▽ More Language models for code such as CodeBERT offer the capability to learn advanced source code representation, but their opacity poses barriers to understanding of captured properties. Recent attention analysis studies provide initial interpretability insights by focusing solely on attention weights rather than considering the wider context modeling of Transformers. This study aims to shed some light on the previously ignored factors of the attention mechanism beyond the attention weights. We conduct an initial empirical study analyzing both attention distributions and transformed representations in CodeBERT. Across two programming languages, Java and Python, we find that the scaled transformation norms of the input better capture syntactic structure compared to attention weights alone. Our analysis reveals characterization of how CodeBERT embeds syntactic code properties. The findings demonstrate the importance of incorporating factors beyond just attention weights for rigorously understanding neural code models. This lays the groundwork for developing more interpretable models and effective uses of attention mechanisms in program analysis. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: Accepted at ICSE-NIER (2024) track

arXiv:2310.02859 [pdf, other]

Tight Sampling in Unbounded Networks

Authors: Kshitijaa Jaglan, Meher Chaitanya, Triansh Sharma, Abhijeeth Singam, Nidhi Goyal, Ponnurangam Kumaraguru, Ulrik Brandes

Abstract: The default approach to deal with the enormous size and limited accessibility of many Web and social media networks is to sample one or more subnetworks from a conceptually unbounded unknown network. Clearly, the extracted subnetworks will crucially depend on the sampling scheme. Motivated by studies of homophily and opinion formation, we propose a variant of snowball sampling designed to prioriti… ▽ More The default approach to deal with the enormous size and limited accessibility of many Web and social media networks is to sample one or more subnetworks from a conceptually unbounded unknown network. Clearly, the extracted subnetworks will crucially depend on the sampling scheme. Motivated by studies of homophily and opinion formation, we propose a variant of snowball sampling designed to prioritize inclusion of entire cohesive communities rather than any kind of representativeness, breadth, or depth of coverage. The method is illustrated on a concrete example, and experiments on synthetic networks suggest that it behaves as desired. △ Less

Submitted 5 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: The first two authors contributed equally

arXiv:2309.12583 [pdf]

doi 10.1145/3571884.3603755

Using ChatGPT in HCI Research -- A Trioethnography

Authors: Smit Desai, Tanusree Sharma, Pratyasha Saha

Abstract: This paper explores the lived experience of using ChatGPT in HCI research through a month-long trioethnography. Our approach combines the expertise of three HCI researchers with diverse research interests to reflect on our daily experience of living and working with ChatGPT. Our findings are presented as three provocations grounded in our collective experiences and HCI theories. Specifically, we e… ▽ More This paper explores the lived experience of using ChatGPT in HCI research through a month-long trioethnography. Our approach combines the expertise of three HCI researchers with diverse research interests to reflect on our daily experience of living and working with ChatGPT. Our findings are presented as three provocations grounded in our collective experiences and HCI theories. Specifically, we examine (1) the emotional impact of using ChatGPT, with a focus on frustration and embarrassment, (2) the absence of accountability and consideration of future implications in design, and raise (3) questions around bias from a Global South perspective. Our work aims to inspire critical discussions about utilizing ChatGPT in HCI research and advance equitable and inclusive technological development. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2308.12264 [pdf, other]

Enhancing Energy-Awareness in Deep Learning through Fine-Grained Energy Measurement

Authors: Saurabhsingh Rajput, Tim Widmayer, Ziyuan Shang, Maria Kechagia, Federica Sarro, Tushar Sharma

Abstract: With the increasing usage, scale, and complexity of Deep Learning (DL) models, their rapidly growing energy consumption has become a critical concern. Promoting green development and energy awareness at different granularities is the need of the hour to limit carbon emissions of DL systems. However, the lack of standard and repeatable tools to accurately measure and optimize energy consumption at… ▽ More With the increasing usage, scale, and complexity of Deep Learning (DL) models, their rapidly growing energy consumption has become a critical concern. Promoting green development and energy awareness at different granularities is the need of the hour to limit carbon emissions of DL systems. However, the lack of standard and repeatable tools to accurately measure and optimize energy consumption at a fine granularity (e.g., at method level) hinders progress in this area. This paper introduces FECoM (Fine-grained Energy Consumption Meter), a framework for fine-grained DL energy consumption measurement. FECoM enables researchers and developers to profile DL APIs from energy perspective. FECoM addresses the challenges of measuring energy consumption at fine-grained level by using static instrumentation and considering various factors, including computational load and temperature stability. We assess FECoM's capability to measure fine-grained energy consumption for one of the most popular open-source DL frameworks, namely TensorFlow. Using FECoM, we also investigate the impact of parameter size and execution time on energy consumption, enriching our understanding of TensorFlow APIs' energy profiles. Furthermore, we elaborate on the considerations, issues, and challenges that one needs to consider while designing and implementing a fine-grained energy consumption measurement tool. This work will facilitate further advances in DL energy measurement and the development of energy-aware practices for DL systems. △ Less

Submitted 1 February, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

arXiv:2308.12199 [pdf, other]

Towards Real-Time Analysis of Broadcast Badminton Videos

Authors: Nitin Nilesh, Tushar Sharma, Anurag Ghosh, C. V. Jawahar

Abstract: Analysis of player movements is a crucial subset of sports analysis. Existing player movement analysis methods use recorded videos after the match is over. In this work, we propose an end-to-end framework for player movement analysis for badminton matches on live broadcast match videos. We only use the visual inputs from the match and, unlike other approaches which use multi-modal sensor data, our… ▽ More Analysis of player movements is a crucial subset of sports analysis. Existing player movement analysis methods use recorded videos after the match is over. In this work, we propose an end-to-end framework for player movement analysis for badminton matches on live broadcast match videos. We only use the visual inputs from the match and, unlike other approaches which use multi-modal sensor data, our approach uses only visual cues. We propose a method to calculate the on-court distance covered by both the players from the video feed of a live broadcast badminton match. To perform this analysis, we focus on the gameplay by removing replays and other redundant parts of the broadcast match. We then perform player tracking to identify and track the movements of both players in each frame. Finally, we calculate the distance covered by each player and the average speed with which they move on the court. We further show a heatmap of the areas covered by the player on the court which is useful for analyzing the gameplay of the player. Our proposed framework was successfully used to analyze live broadcast matches in real-time during the Premier Badminton League 2019 (PBL 2019), with commentators and broadcasters appreciating the utility. △ Less

Submitted 23 August, 2023; originally announced August 2023.

arXiv:2308.06882 [pdf, other]

Quantifying Outlierness of Funds from their Categories using Supervised Similarity

Authors: Dhruv Desai, Ashmita Dhiman, Tushar Sharma, Deepika Sharma, Dhagash Mehta, Stefano Pasquali

Abstract: Mutual fund categorization has become a standard tool for the investment management industry and is extensively used by allocators for portfolio construction and manager selection, as well as by fund managers for peer analysis and competitive positioning. As a result, a (unintended) miscategorization or lack of precision can significantly impact allocation decisions and investment fund managers. H… ▽ More Mutual fund categorization has become a standard tool for the investment management industry and is extensively used by allocators for portfolio construction and manager selection, as well as by fund managers for peer analysis and competitive positioning. As a result, a (unintended) miscategorization or lack of precision can significantly impact allocation decisions and investment fund managers. Here, we aim to quantify the effect of miscategorization of funds utilizing a machine learning based approach. We formulate the problem of miscategorization of funds as a distance-based outlier detection problem, where the outliers are the data-points that are far from the rest of the data-points in the given feature space. We implement and employ a Random Forest (RF) based method of distance metric learning, and compute the so-called class-wise outlier measures for each data-point to identify outliers in the data. We test our implementation on various publicly available data sets, and then apply it to mutual fund data. We show that there is a strong relationship between the outlier measures of the funds and their future returns and discuss the implications of our findings. △ Less

Submitted 13 August, 2023; originally announced August 2023.

Comments: 8 pages, 5 tables, 8 figures

arXiv:2307.08652 [pdf, other]

Search Me Knot, Render Me Knot: Embedding Search and Differentiable Rendering of Knots in 3D

Authors: Aalok Gangopadhyay, Paras Gupta, Tarun Sharma, Prajwal Singh, Shanmuganathan Raman

Abstract: We introduce the problem of knot-based inverse perceptual art. Given multiple target images and their corresponding viewing configurations, the objective is to find a 3D knot-based tubular structure whose appearance resembles the target images when viewed from the specified viewing configurations. To solve this problem, we first design a differentiable rendering algorithm for rendering tubular kno… ▽ More We introduce the problem of knot-based inverse perceptual art. Given multiple target images and their corresponding viewing configurations, the objective is to find a 3D knot-based tubular structure whose appearance resembles the target images when viewed from the specified viewing configurations. To solve this problem, we first design a differentiable rendering algorithm for rendering tubular knots embedded in 3D for arbitrary perspective camera configurations. Utilizing this differentiable rendering algorithm, we search over the space of knot configurations to find the ideal knot embedding. We represent the knot embeddings via homeomorphisms of the desired template knot, where the homeomorphisms are parametrized by the weights of an invertible neural network. Our approach is fully differentiable, making it possible to find the ideal 3D tubular structure for the desired perceptual art using gradient-based optimization. We propose several loss functions that impose additional physical constraints, enforcing that the tube is free of self-intersection, lies within a predefined region in space, satisfies the physical bending limits of the tube material and the material cost is within a specified budget. We demonstrate through results that our knot representation is highly expressive and gives impressive results even for challenging target images in both single view as well as multiple view constraints. Through extensive ablation study we show that each of the proposed loss function is effective in ensuring physical realizability. We construct a real world 3D-printed object to demonstrate the practical utility of our approach. To the best of our knowledge, we are the first to propose a fully differentiable optimization framework for knot-based inverse perceptual art. △ Less

Submitted 19 August, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

arXiv:2306.06261 [pdf, other]

Iterative Design of An Accessible Crypto Wallet for Blind Users

Authors: Zhixuan Zhou, Tanusree Sharma, Luke Emano, Sauvik Das, Yang Wang

Abstract: Crypto wallets are a key touch-point for cryptocurrency use. People use crypto wallets to make transactions, manage crypto assets, and interact with decentralized apps (dApps). However, as is often the case with emergent technologies, little attention has been paid to understanding and improving accessibility barriers in crypto wallet software. We present a series of user studies that explored how… ▽ More Crypto wallets are a key touch-point for cryptocurrency use. People use crypto wallets to make transactions, manage crypto assets, and interact with decentralized apps (dApps). However, as is often the case with emergent technologies, little attention has been paid to understanding and improving accessibility barriers in crypto wallet software. We present a series of user studies that explored how both blind and sighted individuals use MetaMask, one of the most popular non-custodial crypto wallets. We uncovered inter-related accessibility, learnability, and security issues with MetaMask. We also report on an iterative redesign of MetaMask to make it more accessible for blind users. This process involved multiple evaluations with 44 novice crypto wallet users, including 20 sighted users, 23 blind users, and one user with low vision. Our study results show notable improvements for accessibility after two rounds of design iterations. Based on the results, we discuss design implications for creating more accessible and secure crypto wallets for blind users. △ Less

Submitted 9 June, 2023; originally announced June 2023.

Comments: 19th Symposium on Usable Privacy and Security

arXiv:2306.01194 [pdf, other]

doi 10.1145/3618257.3624828

Estimating WebRTC Video QoE Metrics Without Using Application Headers

Authors: Taveesh Sharma, Tarun Mangla, Arpit Gupta, Junchen Jiang, Nick Feamster

Abstract: The increased use of video conferencing applications (VCAs) has made it critical to understand and support end-user quality of experience (QoE) by all stakeholders in the VCA ecosystem, especially network operators, who typically do not have direct access to client software. Existing VCA QoE estimation methods use passive measurements of application-level Real-time Transport Protocol (RTP) headers… ▽ More The increased use of video conferencing applications (VCAs) has made it critical to understand and support end-user quality of experience (QoE) by all stakeholders in the VCA ecosystem, especially network operators, who typically do not have direct access to client software. Existing VCA QoE estimation methods use passive measurements of application-level Real-time Transport Protocol (RTP) headers. However, a network operator does not always have access to RTP headers in all cases, particularly when VCAs use custom RTP protocols (e.g., Zoom) or due to system constraints (e.g., legacy measurement systems). Given this challenge, this paper considers the use of more standard features in the network traffic, namely, IP and UDP headers, to provide per-second estimates of key VCA QoE metrics such as frames rate and video resolution. We develop a method that uses machine learning with a combination of flow statistics (e.g., throughput) and features derived based on the mechanisms used by the VCAs to fragment video frames into packets. We evaluate our method for three prevalent VCAs running over WebRTC: Google Meet, Microsoft Teams, and Cisco Webex. Our evaluation consists of 54,696 seconds of VCA data collected from both (1), controlled in-lab network conditions, and (2) real-world networks from 15 households. We show that the ML-based approach yields similar accuracy compared to the RTP-based methods, despite using only IP/UDP data. For instance, we can estimate FPS within 2 FPS for up to 83.05% of one-second intervals in the real-world data, which is only 1.76% lower than using the application-level RTP headers. △ Less

Submitted 9 November, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: 16 pages

arXiv:2304.09822 [pdf, other]

Unpacking How Decentralized Autonomous Organizations (DAOs) Work in Practice

Authors: Tanusree Sharma, Yujin Kwon, Kornrapat Pongmala, Henry Wang, Andrew Miller, Dawn Song, Yang Wang

Abstract: Decentralized Autonomous Organizations (DAOs) have emerged as a novel way to coordinate a group of (pseudonymous) entities towards a shared vision (e.g., promoting sustainability), utilizing self-executing smart contracts on blockchains to support decentralized governance and decision-making. In just a few years, over 4,000 DAOs have been launched in various domains, such as investment, education,… ▽ More Decentralized Autonomous Organizations (DAOs) have emerged as a novel way to coordinate a group of (pseudonymous) entities towards a shared vision (e.g., promoting sustainability), utilizing self-executing smart contracts on blockchains to support decentralized governance and decision-making. In just a few years, over 4,000 DAOs have been launched in various domains, such as investment, education, health, and research. Despite such rapid growth and diversity, it is unclear how these DAOs actually work in practice and to what extent they are effective in achieving their goals. Given this, we aim to unpack how (well) DAOs work in practice. We conducted an in-depth analysis of a diverse set of 10 DAOs of various categories and smart contracts, leveraging on-chain (e.g., voting results) and off-chain data (e.g., community discussions) as well as our interviews with DAO organizers/members. Specifically, we defined metrics to characterize key aspects of DAOs, such as the degrees of decentralization and autonomy. We observed CompoundDAO, AssangeDAO, Bankless, and Krausehouse having poor decentralization in voting, while decentralization has improved over time for one-person-one-vote DAOs (e.g., Proof of Humanity). Moreover, the degree of autonomy varies among DAOs, with some (e.g., Compound and Krausehouse) relying more on third parties than others. Lastly, we offer a set of design implications for future DAO systems based on our findings. △ Less

Submitted 16 April, 2023; originally announced April 2023.

arXiv:2304.07598 [pdf, other]

Understanding Rug Pulls: An In-Depth Behavioral Analysis of Fraudulent NFT Creators

Authors: Trishie Sharma, Rachit Agarwal, Sandeep Kumar Shukla

Abstract: The explosive growth of non-fungible tokens (NFTs) on Web3 has created a new frontier for digital art and collectibles, but also an emerging space for fraudulent activities. This study provides an in-depth analysis of NFT rug pulls, which are fraudulent schemes aimed at stealing investors' funds. Using data from 758 rug pulls across 10 NFT marketplaces, we examine the structural and behavioral pro… ▽ More The explosive growth of non-fungible tokens (NFTs) on Web3 has created a new frontier for digital art and collectibles, but also an emerging space for fraudulent activities. This study provides an in-depth analysis of NFT rug pulls, which are fraudulent schemes aimed at stealing investors' funds. Using data from 758 rug pulls across 10 NFT marketplaces, we examine the structural and behavioral properties of these schemes, identify the characteristics and motivations of rug-pullers, and classify NFT projects into groups based on creators' association with their accounts. Our findings reveal that repeated rug pulls account for a significant proportion of the rise in NFT-related cryptocurrency crimes, with one NFT collection attempting 37 rug pulls within three months. Additionally, we identify the largest group of creators influencing the majority of rug pulls, and demonstrate the connection between rug-pullers of different NFT projects through the use of the same wallets to store and move money. Our study contributes to the understanding of NFT market risks and provides insights for designing preventative strategies to mitigate future losses. △ Less

Submitted 15 April, 2023; originally announced April 2023.

arXiv:2303.08729 [pdf, other]

DACOS-A Manually Annotated Dataset of Code Smells

Authors: Himesh Nandani, Mootez Saad, Tushar Sharma

Abstract: Researchers apply machine-learning techniques for code smell detection to counter the subjectivity of many code smells. Such approaches need a large, manually annotated dataset for training and benchmarking. Existing literature offers a few datasets; however, they are small in size and, more importantly, do not focus on the subjective code snippets. In this paper, we present DACOS, a manually anno… ▽ More Researchers apply machine-learning techniques for code smell detection to counter the subjectivity of many code smells. Such approaches need a large, manually annotated dataset for training and benchmarking. Existing literature offers a few datasets; however, they are small in size and, more importantly, do not focus on the subjective code snippets. In this paper, we present DACOS, a manually annotated dataset containing 10,267 annotations for 5,192 code snippets. The dataset targets three kinds of code smells at different granularity: multifaceted abstraction, complex method, and long parameter list. The dataset is created in two phases. The first phase helps us identify the code snippets that are potentially subjective by determining the thresholds of metrics used to detect a smell. The second phase collects annotations for potentially subjective snippets. We also offer an extended dataset DACOSX that includes definitely benign and definitely smelly snippets by using the thresholds identified in the first phase. We have developed TagMan, a web application to help annotators view and mark the snippets one-by-one and record the provided annotations. We make the datasets and the web application accessible publicly. This dataset will help researchers working on smell detection techniques to build relevant and context-aware machine-learning models. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: 4 pages

arXiv:2301.02211 [pdf, other]

Teaching Computer Vision for Ecology

Authors: Elijah Cole, Suzanne Stathatos, Björn Lütjens, Tarun Sharma, Justin Kay, Jason Parham, Benjamin Kellenberger, Sara Beery

Abstract: Computer vision can accelerate ecology research by automating the analysis of raw imagery from sensors like camera traps, drones, and satellites. However, computer vision is an emerging discipline that is rarely taught to ecologists. This work discusses our experience teaching a diverse group of ecologists to prototype and evaluate computer vision systems in the context of an intensive hands-on su… ▽ More Computer vision can accelerate ecology research by automating the analysis of raw imagery from sensors like camera traps, drones, and satellites. However, computer vision is an emerging discipline that is rarely taught to ecologists. This work discusses our experience teaching a diverse group of ecologists to prototype and evaluate computer vision systems in the context of an intensive hands-on summer workshop. We explain the workshop structure, discuss common challenges, and propose best practices. This document is intended for computer scientists who teach computer vision across disciplines, but it may also be useful to ecologists or other domain experts who are learning to use computer vision themselves. △ Less

Submitted 5 January, 2023; originally announced January 2023.

arXiv:2211.06254 [pdf, other]

Re-visiting Reservoir Computing architectures optimized by Evolutionary Algorithms

Authors: Sebastián Basterrech, Tarun Kumar Sharma

Abstract: For many years, Evolutionary Algorithms (EAs) have been applied to improve Neural Networks (NNs) architectures. They have been used for solving different problems, such as training the networks (adjusting the weights), designing network topology, optimizing global parameters, and selecting features. Here, we provide a systematic brief survey about applications of the EAs on the specific domain of… ▽ More For many years, Evolutionary Algorithms (EAs) have been applied to improve Neural Networks (NNs) architectures. They have been used for solving different problems, such as training the networks (adjusting the weights), designing network topology, optimizing global parameters, and selecting features. Here, we provide a systematic brief survey about applications of the EAs on the specific domain of the recurrent NNs named Reservoir Computing (RC). At the beginning of the 2000s, the RC paradigm appeared as a good option for employing recurrent NNs without dealing with the inconveniences of the training algorithms. RC models use a nonlinear dynamic system, with fixed recurrent neural network named the \textit{reservoir}, and learning process is restricted to adjusting a linear parametric function. %so the performance of learning is fast and precise. However, an RC model has several hyper-parameters, therefore EAs are helpful tools to figure out optimal RC architectures. We provide an overview of the results on the area, discuss novel advances, and we present our vision regarding the new trends and still open questions. △ Less

Submitted 11 November, 2022; originally announced November 2022.

Comments: Accepted manuscript to the 14th World Congress on Nature and Biologically Inspired Computing (NaBIC), Seattle, WA, United States, December 14-16, 2022. A revised manuscript will be published in the conference proceedings by Springer in the Lecture Notes in Networks and Systems

arXiv:2209.02438 [pdf]

Threat Detection In Self-Driving Vehicles Using Computer Vision

Authors: Umang Goenka, Aaryan Jagetia, Param Patil, Akshay Singh, Taresh Sharma, Poonam Saini

Abstract: On-road obstacle detection is an important field of research that falls in the scope of intelligent transportation infrastructure systems. The use of vision-based approaches results in an accurate and cost-effective solution to such systems. In this research paper, we propose a threat detection mechanism for autonomous self-driving cars using dashcam videos to ensure the presence of any unwanted o… ▽ More On-road obstacle detection is an important field of research that falls in the scope of intelligent transportation infrastructure systems. The use of vision-based approaches results in an accurate and cost-effective solution to such systems. In this research paper, we propose a threat detection mechanism for autonomous self-driving cars using dashcam videos to ensure the presence of any unwanted obstacle on the road that falls within its visual range. This information can assist the vehicle's program to en route safely. There are four major components, namely, YOLO to identify the objects, advanced lane detection algorithm, multi regression model to measure the distance of the object from the camera, the two-second rule for measuring the safety, and limiting speed. In addition, we have used the Car Crash Dataset(CCD) for calculating the accuracy of the model. The YOLO algorithm gives an accuracy of around 93%. The final accuracy of our proposed Threat Detection Model (TDM) is 82.65%. △ Less

Submitted 6 September, 2022; originally announced September 2022.

Comments: Presented in 3rd International Conference on Machine Learning, Image Processing, Network Security and Data Sciences MIND-2021

arXiv:2206.13910 [pdf, other]

Epidemic Control Modeling using Parsimonious Models and Markov Decision Processes

Authors: Edilson F. Arruda, Tarun Sharma, Rodrigo e A. Alexandre, Sinnu Susan Thomas

Abstract: Many countries have experienced at least two waves of the COVID-19 pandemic. The second wave is far more dangerous as distinct strains appear more harmful to human health, but it stems from the complacency about the first wave. This paper introduces a parsimonious yet representative stochastic epidemic model that simulates the uncertain spread of the disease regardless of the latency and recovery… ▽ More Many countries have experienced at least two waves of the COVID-19 pandemic. The second wave is far more dangerous as distinct strains appear more harmful to human health, but it stems from the complacency about the first wave. This paper introduces a parsimonious yet representative stochastic epidemic model that simulates the uncertain spread of the disease regardless of the latency and recovery time distributions. We also propose a Markov decision process to seek an optimal trade-off between the usage of the healthcare system and the economic costs of an epidemic. We apply the model to COVID-19 data from New Delhi, India and simulate the epidemic spread with different policy review times. The results show that the optimal policy acts swiftly to curb the epidemic in the first wave, thus avoiding the collapse of the healthcare system and the future costs of posterior outbreaks. An analysis of the recent collapse of the healthcare system of India during the second COVID-19 wave suggests that many lives could have been preserved if swift mitigation was promoted after the first wave. △ Less

Submitted 23 June, 2022; originally announced June 2022.

arXiv:2204.11193 [pdf, other]

Exploring Security Practices of Smart Contract Developers

Authors: Tanusree Sharma, Zhixuan Zhou, Andrew Miller, Yang Wang

Abstract: Smart contracts are self-executing programs that run on blockchains (e.g., Ethereum). 680 million US dollars worth of digital assets controlled by smart contracts have been hacked or stolen due to various security vulnerabilities in 2021. Although security is a fundamental concern for smart contracts, it is unclear how smart contract developers approach security. To help fill this research gap, we… ▽ More Smart contracts are self-executing programs that run on blockchains (e.g., Ethereum). 680 million US dollars worth of digital assets controlled by smart contracts have been hacked or stolen due to various security vulnerabilities in 2021. Although security is a fundamental concern for smart contracts, it is unclear how smart contract developers approach security. To help fill this research gap, we conducted an exploratory qualitative study consisting of a semi-structured interview and a code review task with 29 smart contract developers with diverse backgrounds, including 10 early stage (less than one year of experience) and 19 experienced (2-5 years of experience) smart contract developers. Our findings show a wide range of smart contract security perceptions and practices including various tools and resources they used. Our early-stage developer participants had a much lower success rate (15%) of identifying security vulnerabilities in the code review task than their experienced counterparts (55%). Our hierarchical task analysis of their code reviews implies that just by accessing standard documentation, reference implementations and security tools is not sufficient. Many developers checked those materials or used a security tool but still failed to identify the security issues. In addition, several participants pointed out shortcomings of current smart contract security tooling such as its usability. We discuss how future education and tools could better support developers in ensuring smart contract security. △ Less

Submitted 24 April, 2022; originally announced April 2022.

arXiv:2203.15950 [pdf, other]

doi 10.1145/3524842.3528032

Empirical Standards for Repository Mining

Authors: Preetha Chatterjee, Tushar Sharma, Paul Ralph

Abstract: The purpose of scholarly peer review is to evaluate the quality of scientific manuscripts. However, study after study demonstrates that peer review neither effectively nor reliably assesses research quality. Empirical standards attempt to address this problem by modelling a scientific community's expectations for each kind of empirical study conducted in that community. This should enhance not onl… ▽ More The purpose of scholarly peer review is to evaluate the quality of scientific manuscripts. However, study after study demonstrates that peer review neither effectively nor reliably assesses research quality. Empirical standards attempt to address this problem by modelling a scientific community's expectations for each kind of empirical study conducted in that community. This should enhance not only the quality of research but also the reliability and predictability of peer review, as scientists adopt the standards in both their researcher and reviewer roles. However, these improvements depend on the quality and adoption of the standards. This tutorial will therefore present the empirical standard for mining software repositories, both to communicate its contents and to get feedback from the attendees. The tutorial will be organized into three parts: (1) brief overview of the empirical standards project; (2) detailed presentation of the repository mining standard; (3) discussion and suggestions for improvement. △ Less

Submitted 29 March, 2022; originally announced March 2022.

arXiv:2201.13233 [pdf, other]

"It's A Blessing and A Curse": Unpacking Creators' Practices with Non-Fungible Tokens (NFTs) and Their Communities

Authors: Tanusree Sharma, Zhixuan Zhou, Yun Huang, Yang Wang

Abstract: NFTs (Non-Fungible Tokens) are blockchain-based cryptographic tokens to represent ownership of unique content such as images, videos, or 3D objects. Despite NFTs' increasing popularity and skyrocketing trading prices, little is known about people's perceptions of and experiences with NFTs. In this work, we focus on NFT creators and present results of an exploratory qualitative study in which we in… ▽ More NFTs (Non-Fungible Tokens) are blockchain-based cryptographic tokens to represent ownership of unique content such as images, videos, or 3D objects. Despite NFTs' increasing popularity and skyrocketing trading prices, little is known about people's perceptions of and experiences with NFTs. In this work, we focus on NFT creators and present results of an exploratory qualitative study in which we interviewed 15 NFT creators from nine different countries. Our participants had nuanced feelings about NFTs and their communities. We found that most of our participants were enthusiastic about the underlying technologies and how they empower individuals to express their creativity and pursue new business models of content creation. Our participants also gave kudos to the NFT communities that have supported them to learn, collaborate, and grow in their NFT endeavors. However, these positivities were juxtaposed by their accounts of the many challenges that they encountered and thorny issues that the NFT ecosystem is grappling with around ownership of digital content, low-quality NFTs, scams, possible money laundering, and regulations. We discuss how the built-in properties (e.g., decentralization) of blockchains and NFTs might have contributed to some of these issues. We present design implications on how to improve the NFT ecosystem (e.g., making NFTs even more accessible to newcomers and the broader population). △ Less

Submitted 15 January, 2022; originally announced January 2022.

arXiv:2112.11961 [pdf, other]

doi 10.1088/2040-8986/ac6f0b

BBM92 quantum key distribution over a free space dusty channel of 200 meters

Authors: Sarika Mishra, Ayan Biswas, Satyajeet Patil, Pooja Chandravanshi, Vardaan Mongia, Tanya Sharma, Anju Rani, Shashi Prabhakar, S. Ramachandran, Ravindra P. Singh

Abstract: Free space quantum communication assumes importance as it is a precursor for satellite-based quantum communication needed for secure key distribution over longer distances. Prepare and measure protocols like BB84 consider the satellite as a trusted device, which is fraught with security threat looking at the current trend for satellite-based optical communication. Therefore, entanglement-based pro… ▽ More Free space quantum communication assumes importance as it is a precursor for satellite-based quantum communication needed for secure key distribution over longer distances. Prepare and measure protocols like BB84 consider the satellite as a trusted device, which is fraught with security threat looking at the current trend for satellite-based optical communication. Therefore, entanglement-based protocols must be preferred, so that one can consider the satellite as an untrusted device too. The current work reports the implementation of BBM92 protocol, an entanglement-based QKD protocol over 200 m distance using an indigenous facility developed at Physical Research Laboratory (PRL), Ahmedabad, India. Our results show the effect of atmospheric aerosols on sift key rate, and eventually, secure key rate. Such experiments are important to validate the models to account for the atmospheric effects on the key rates achieved through satellite-based QKD. △ Less

Submitted 9 January, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

Comments: 7 pages, 6 figures, 2 tables

Journal ref: Journal of Optics 24, 074002 (2022)

arXiv:2110.09610 [pdf, other]

A Survey on Machine Learning Techniques for Source Code Analysis

Authors: Tushar Sharma, Maria Kechagia, Stefanos Georgiou, Rohit Tiwari, Indira Vats, Hadi Moazen, Federica Sarro

Abstract: The advancements in machine learning techniques have encouraged researchers to apply these techniques to a myriad of software engineering tasks that use source code analysis, such as testing and vulnerability detection. Such a large number of studies hinders the community from understanding the current research landscape. This paper aims to summarize the current knowledge in applied machine learni… ▽ More The advancements in machine learning techniques have encouraged researchers to apply these techniques to a myriad of software engineering tasks that use source code analysis, such as testing and vulnerability detection. Such a large number of studies hinders the community from understanding the current research landscape. This paper aims to summarize the current knowledge in applied machine learning for source code analysis. We review studies belonging to twelve categories of software engineering tasks and corresponding machine learning techniques, tools, and datasets that have been applied to solve them. To do so, we conducted an extensive literature search and identified 479 primary studies published between 2011 and 2021. We summarize our observations and findings with the help of the identified studies. Our findings suggest that the use of machine learning techniques for source code analysis tasks is consistently increasing. We synthesize commonly used steps and the overall workflow for each task and summarize machine learning techniques employed. We identify a comprehensive list of available datasets and tools useable in this context. Finally, the paper discusses perceived challenges in this area, including the availability of standard datasets, reproducibility and replicability, and hardware resources. △ Less

Submitted 13 September, 2022; v1 submitted 18 October, 2021; originally announced October 2021.

arXiv:2104.10466 [pdf, other]

HDR-Fuzz: Detecting Buffer Overruns using AddressSanitizer Instrumentation and Fuzzing

Authors: Raveendra Kumar Medicherla, Malathy Nagalakshmi, Tanya Sharma, Raghavan Komondoor

Abstract: Buffer-overruns are a prevalent vulnerability in software libraries and applications. Fuzz testing is one of the effective techniques to detect vulnerabilities in general. Greybox fuzzers such as AFL automatically generate a sequence of test inputs for a given program using a fitness-guided search process. A recently proposed approach in the literature introduced a buffer-overrun specific fitness… ▽ More Buffer-overruns are a prevalent vulnerability in software libraries and applications. Fuzz testing is one of the effective techniques to detect vulnerabilities in general. Greybox fuzzers such as AFL automatically generate a sequence of test inputs for a given program using a fitness-guided search process. A recently proposed approach in the literature introduced a buffer-overrun specific fitness metric called "headroom", which tracks how close each generated test input comes to exposing the vulnerabilities. That approach showed good initial promise, but is somewhat imprecise and expensive due to its reliance on conservative points-to analysis. Inspired by the approach above, in this paper we propose a new ground-up approach for detecting buffer-overrun vulnerabilities. This approach uses an extended version of ASAN (Address Sanitizer) that runs in parallel with the fuzzer, and reports back to the fuzzer test inputs that happen to come closer to exposing buffer-overrun vulnerabilities. The ASAN-style instrumentation is precise as it has no dependence on points-to analysis. We describe in this paper our approach, as well as an implementation and evaluation of the approach. △ Less

Submitted 21 April, 2021; originally announced April 2021.

ACM Class: D.2.5; K.6.5

arXiv:2012.12324 [pdf, other]

Do We Need Improved Code Quality Metrics?

Authors: Tushar Sharma, Diomidis Spinellis

Abstract: The software development community has been using code quality metrics for the last five decades. Despite their wide adoption, code quality metrics have attracted a fair share of criticism. In this paper, first, we carry out a qualitative exploration by surveying software developers to gauge their opinions about current practices and potential gaps with the present set of metrics. We identify defi… ▽ More The software development community has been using code quality metrics for the last five decades. Despite their wide adoption, code quality metrics have attracted a fair share of criticism. In this paper, first, we carry out a qualitative exploration by surveying software developers to gauge their opinions about current practices and potential gaps with the present set of metrics. We identify deficiencies including lack of soundness, i.e., the ability of a metric to capture a notion accurately as promised by the metric, lack of support for assessing software architecture quality, and insufficient support for assessing software testing and infrastructure. In the second part of the paper, we focus on one specific code quality metric-LCOM as a case study to explore opportunities towards improved metrics. We evaluate existing LCOM algorithms qualitatively and quantitatively to observe how closely they represent the concept of cohesion. In this pursuit, we first create eight diverse cases that any LCOM algorithm must cover and obtain their cohesion levels by a set of experienced developers and consider them as a ground truth. We show that the present set of LCOM algorithms do poorly w.r.t. these cases. To bridge the identified gap, we propose a new approach to compute LCOM and evaluate the new approach with the ground truth. We also show, using a quantitative analysis using more than 90 thousand types belonging to 261 high-quality Java repositories, the present set of methods paint a very inaccurate and misleading picture of class cohesion. We conclude that the current code quality metrics in use suffer from various deficiencies, presenting ample opportunities for the research community to address the gaps. △ Less

Submitted 22 December, 2020; originally announced December 2020.

arXiv:2008.04114 [pdf, other]

Improved Adaptive Type-2 Fuzzy Filter with Exclusively Two Fuzzy Membership Function for Filtering Salt and Pepper Noise

Authors: Vikas Singh, Pooja Agrawal, Teena Sharma, Nishchal K. Verma

Abstract: Image denoising is one of the preliminary steps in image processing methods in which the presence of noise can deteriorate the image quality. To overcome this limitation, in this paper a improved two-stage fuzzy filter is proposed for filtering salt and pepper noise from the images. In the first-stage, the pixels in the image are categorized as good or noisy based on adaptive thresholding using ty… ▽ More Image denoising is one of the preliminary steps in image processing methods in which the presence of noise can deteriorate the image quality. To overcome this limitation, in this paper a improved two-stage fuzzy filter is proposed for filtering salt and pepper noise from the images. In the first-stage, the pixels in the image are categorized as good or noisy based on adaptive thresholding using type-2 fuzzy logic with exclusively two different membership functions in the filter window. In the second-stage, the noisy pixels are denoised using modified ordinary fuzzy logic in the respective filter window. The proposed filter is validated on standard images with various noise levels. The proposed filter removes the noise and preserves useful image characteristics, i.e., edges and corners at higher noise level. The performance of the proposed filter is compared with the various state-of-the-art methods in terms of peak signal-to-noise ratio and computation time. To show the effectiveness of filter statistical tests, i.e., Friedman test and Bonferroni-Dunn (BD) test are also carried out which clearly ascertain that the proposed filter outperforms in comparison of various filtering approaches. △ Less

Submitted 10 August, 2020; originally announced August 2020.

arXiv:2007.16147 [pdf]

doi 10.5772/intechopen.78497

Computer and Network Security

Authors: Jaydip Sen, Sidra Mehtab, Michael Ekonde Sone, Veeramreddy Jyothsna, Koneti Munivara Prasad, Rajeev Singh, Teek Parval Sharma, Anton Noskov, Ignacio Velasquez, Angelica Caro, Alfonco Rodriguez, Tamer S. A. Fatayer, Altaf O. Mulani, Pradeep B. Mane, Roshan Chitrakar, Roshan Bhusal, Prajwol Maharjan

Abstract: In the era of Internet of Things and with the explosive worldwide growth of electronic data volume, and associated need of processing, analysis and storage of such humongous volume of data, several new challenges are faced in protecting privacy of sensitive data and securing systems by designing novel schemes for secure authentication, integrity protection, encryption and non-repudiation. Lightwei… ▽ More In the era of Internet of Things and with the explosive worldwide growth of electronic data volume, and associated need of processing, analysis and storage of such humongous volume of data, several new challenges are faced in protecting privacy of sensitive data and securing systems by designing novel schemes for secure authentication, integrity protection, encryption and non-repudiation. Lightweight symmetric key cryptography and adaptive network security algorithms are in demand for mitigating these challenges. This book presents some of the state-of-the-art research work in the field of cryptography and security in computing and communications. It is a valuable source of knowledge for researchers, engineers, practitioners, graduate and doctoral students who are working in the field of cryptography, network security and security and privacy issues in the Internet of Things (IoT), and machine learning application in security. It will also be useful for faculty members of graduate schools and universities. △ Less

Submitted 31 July, 2020; originally announced July 2020.

Comments: 175 pages, 87 figures and 44 Tables

arXiv:2007.13737 [pdf, other]

BIDEAL: A Toolbox for Bicluster Analysis -- Generation, Visualization and Validation

Authors: Nishchal K. Verma, T. Sharma, S. Dixit, P. Agrawal, S. Sengupta, V. Singh

Abstract: This paper introduces a novel toolbox named BIDEAL for the generation of biclusters, their analysis, visualization, and validation. The objective is to facilitate researchers to use forefront biclustering algorithms embedded on a single platform. A single toolbox comprising various biclustering algorithms play a vital role to extract meaningful patterns from the data for detecting diseases, biomar… ▽ More This paper introduces a novel toolbox named BIDEAL for the generation of biclusters, their analysis, visualization, and validation. The objective is to facilitate researchers to use forefront biclustering algorithms embedded on a single platform. A single toolbox comprising various biclustering algorithms play a vital role to extract meaningful patterns from the data for detecting diseases, biomarkers, gene-drug association, etc. BIDEAL consists of seventeen biclustering algorithms, three biclusters visualization techniques, and six validation indices. The toolbox can analyze several types of data, including biological data through a graphical user interface. It also facilitates data preprocessing techniques i.e., binarization, discretization, normalization, elimination of null and missing values. The effectiveness of the developed toolbox has been presented through testing and validations on Saccharomyces cerevisiae cell cycle, Leukemia cancer, Mammary tissue profile, and Ligand screen in B-cells datasets. The biclusters of these datasets have been generated using BIDEAL and evaluated in terms of coherency, differential co-expression ranking, and similarity measure. The visualization of generated biclusters has also been provided through a heat map and gene plot. △ Less

Submitted 26 July, 2020; originally announced July 2020.

arXiv:2007.04444 [pdf, other]

Are PETs (Privacy Enhancing Technologies) Giving Protection for Smartphones? -- A Case Study

Authors: Tanusree Sharma, Masooda Bashir

Abstract: With smartphone technologies enhanced way of interacting with the world around us, it has also been paving the way for easier access to our private and personal information. This has been amplified by the existence of numerous embedded sensors utilized by millions of apps to users. While mobile apps have positively transformed many aspects of our lives with new functionalities, many of these appli… ▽ More With smartphone technologies enhanced way of interacting with the world around us, it has also been paving the way for easier access to our private and personal information. This has been amplified by the existence of numerous embedded sensors utilized by millions of apps to users. While mobile apps have positively transformed many aspects of our lives with new functionalities, many of these applications are taking advantage of vast amounts of data, privacy apps, a form of Privacy Enhancing Technology can be an effective privacy management tool for smartphones. To protect against vulnerabilities related to the collection, storage, and sharing of sensitive data, developers are building numerous privacy apps. However, there has been a lack of discretion in this particular area which calls for a proper assessment to understand the far-reaching utilization of these apps among users. During this process we have conducted an evaluation of the most popular privacy apps from our total collection of five hundred and twelve to demonstrate their functionality specific data protections they are claiming to offer, both technologically and conventionally, measuring up to standards. Taking their offered security functionalities as a scale, we conducted forensic experiments to indicate where they are failing to be consistent in maintaining protection. For legitimate validation of security gaps in assessed privacy apps, we have also utilized NIST and OWASP guidelines. We believe this study will be efficacious for continuous improvement and can be considered as a foundation towards a common standard for privacy and security measures for an app's development stage. △ Less

Submitted 8 July, 2020; originally announced July 2020.

arXiv:2004.14165 [pdf, ps, other]

Classification of Cuisines from Sequentially Structured Recipes

Authors: Tript Sharma, Utkarsh Upadhyay, Ganesh Bagler

Abstract: Cultures across the world are distinguished by the idiosyncratic patterns in their cuisines. These cuisines are characterized in terms of their substructures such as ingredients, cooking processes and utensils. A complex fusion of these substructures intrinsic to a region defines the identity of a cuisine. Accurate classification of cuisines based on their culinary features is an outstanding probl… ▽ More Cultures across the world are distinguished by the idiosyncratic patterns in their cuisines. These cuisines are characterized in terms of their substructures such as ingredients, cooking processes and utensils. A complex fusion of these substructures intrinsic to a region defines the identity of a cuisine. Accurate classification of cuisines based on their culinary features is an outstanding problem and has hitherto been attempted to solve by accounting for ingredients of a recipe as features. Previous studies have attempted cuisine classification by using unstructured recipes without accounting for details of cooking techniques. In reality, the cooking processes/techniques and their order are highly significant for the recipe's structure and hence for its classification. In this article, we have implemented a range of classification techniques by accounting for this information on the RecipeDB dataset containing sequential data on recipes. The state-of-the-art RoBERTa model presented the highest accuracy of 73.30% among a range of classification models from Logistic Regression and Naive Bayes to LSTMs and Transformers. △ Less

Submitted 26 April, 2020; originally announced April 2020.

Comments: 36th IEEE International Conference on Data Engineering (ICDE 2020), DECOR Workshop; 4 pages, 4 tables

arXiv:2004.12283 [pdf, other]

Hierarchical Clustering of World Cuisines

Authors: Tript Sharma, Utkarsh Upadhyay, Jushaan Kalra, Sakshi Arora, Saad Ahmad, Bhavay Aggarwal, Ganesh Bagler

Abstract: Cultures across the world have evolved to have unique patterns despite shared ingredients and cooking techniques. Using data obtained from RecipeDB, an online resource for recipes, we extract patterns in 26 world cuisines and further probe for their inter-relatedness. By application of frequent itemset mining and ingredient authenticity we characterize the quintessential patterns in the cuisines a… ▽ More Cultures across the world have evolved to have unique patterns despite shared ingredients and cooking techniques. Using data obtained from RecipeDB, an online resource for recipes, we extract patterns in 26 world cuisines and further probe for their inter-relatedness. By application of frequent itemset mining and ingredient authenticity we characterize the quintessential patterns in the cuisines and build a hierarchical tree of the world cuisines. This tree provides interesting insights into the evolution of cuisines and their geographical as well as historical relatedness. △ Less

Submitted 25 April, 2020; originally announced April 2020.

Comments: 36th IEEE International Conference on Data Engineering (ICDE 2020), DECOR Workshop; 6 pages, 6 figures, 1 table

arXiv:2003.07666 [pdf, other]

Inverse Design of Potential Singlet Fission Molecules using a Transfer Learning Based Approach

Authors: Akshay Subramanian, Utkarsh Saha, Tejasvini Sharma, Naveen K. Tailor, Soumitra Satapathi

Abstract: Singlet fission has emerged as one of the most exciting phenomena known to improve the efficiencies of different types of solar cells and has found uses in diverse optoelectronic applications. The range of available singlet fission molecules is, however, limited as to undergo singlet fission, molecules have to satisfy certain energy conditions. Recent advances in material search using inverse desi… ▽ More Singlet fission has emerged as one of the most exciting phenomena known to improve the efficiencies of different types of solar cells and has found uses in diverse optoelectronic applications. The range of available singlet fission molecules is, however, limited as to undergo singlet fission, molecules have to satisfy certain energy conditions. Recent advances in material search using inverse design has enabled the prediction of materials for a wide range of applications and has emerged as one of the most efficient methods in the discovery of suitable materials. It is particularly helpful in manipulating large datasets, uncovering hidden information from the molecular dataset and generating new structures. However, we seldom encounter large datasets in structure prediction problems in material science. In our work, we put forward inverse design of possible singlet fission molecules using a transfer learning based approach where we make use of a much larger ChEMBL dataset of structurally similar molecules to transfer the learned characteristics to the singlet fission dataset. △ Less

Submitted 17 March, 2020; originally announced March 2020.

Comments: 15 pages, 4 figures. The first two authors contributed equally

arXiv:2001.10832 [pdf, other]

Audio-Visual Decision Fusion for WFST-based and seq2seq Models

Authors: Rohith Aralikatti, Sharad Roy, Abhinav Thanda, Dilip Kumar Margam, Pujitha Appan Kandala, Tanay Sharma, Shankar M Venkatesan

Abstract: Under noisy conditions, speech recognition systems suffer from high Word Error Rates (WER). In such cases, information from the visual modality comprising the speaker lip movements can help improve the performance. In this work, we propose novel methods to fuse information from audio and visual modalities at inference time. This enables us to train the acoustic and visual models independently. Fir… ▽ More Under noisy conditions, speech recognition systems suffer from high Word Error Rates (WER). In such cases, information from the visual modality comprising the speaker lip movements can help improve the performance. In this work, we propose novel methods to fuse information from audio and visual modalities at inference time. This enables us to train the acoustic and visual models independently. First, we train separate RNN-HMM based acoustic and visual models. A common WFST generated by taking a special union of the HMM components is used for decoding using a modified Viterbi algorithm. Second, we train separate seq2seq acoustic and visual models. The decoding step is performed simultaneously for both modalities using shallow fusion while maintaining a common hypothesis beam. We also present results for a novel seq2seq fusion without the weighing parameter. We present results at varying SNR and show that our methods give significant improvements over acoustic-only WER. △ Less

Submitted 29 January, 2020; originally announced January 2020.

Comments: Submitted for review to ICASSP 2020 on October 21st, 2019

arXiv:1906.12170 [pdf, other]

LipReading with 3D-2D-CNN BLSTM-HMM and word-CTC models

Authors: Dilip Kumar Margam, Rohith Aralikatti, Tanay Sharma, Abhinav Thanda, Pujitha A K, Sharad Roy, Shankar M Venkatesan

Abstract: In recent years, deep learning based machine lipreading has gained prominence. To this end, several architectures such as LipNet, LCANet and others have been proposed which perform extremely well compared to traditional lipreading DNN-HMM hybrid systems trained on DCT features. In this work, we propose a simpler architecture of 3D-2D-CNN-BLSTM network with a bottleneck layer. We also present analy… ▽ More In recent years, deep learning based machine lipreading has gained prominence. To this end, several architectures such as LipNet, LCANet and others have been proposed which perform extremely well compared to traditional lipreading DNN-HMM hybrid systems trained on DCT features. In this work, we propose a simpler architecture of 3D-2D-CNN-BLSTM network with a bottleneck layer. We also present analysis of two different approaches for lipreading on this architecture. In the first approach, 3D-2D-CNN-BLSTM network is trained with CTC loss on characters (ch-CTC). Then BLSTM-HMM model is trained on bottleneck lip features (extracted from 3D-2D-CNN-BLSTM ch-CTC network) in a traditional ASR training pipeline. In the second approach, same 3D-2D-CNN-BLSTM network is trained with CTC loss on word labels (w-CTC). The first approach shows that bottleneck features perform better compared to DCT features. Using the second approach on Grid corpus' seen speaker test set, we report $1.3\%$ WER - a $55\%$ improvement relative to LCANet. On unseen speaker test set we report $8.6\%$ WER which is $24.5\%$ improvement relative to LipNet. We also verify the method on a second dataset of $81$ speakers which we collected. Finally, we also discuss the effect of feature duplication on BLSTM-HMM model performance. △ Less

Submitted 25 June, 2019; originally announced June 2019.

Comments: Submitted to Interspeech 2019

arXiv:1906.11803 [pdf, ps, other]

Data Consortia

Authors: Eric Bax, John Donald, Melissa Gerber, Lisa Giaffo, Tanisha Sharma, Nikki Thompson, Kimberly Williams

Abstract: Today, web-based companies use user data to provide and enhance services to users, both individually and collectively. Some also analyze user data for other purposes, for example to select advertisements or price offers for users. Some even use or allow the data to be used to evaluate investments in financial markets. Users' concerns about how their data is or may be used has prompted legislative… ▽ More Today, web-based companies use user data to provide and enhance services to users, both individually and collectively. Some also analyze user data for other purposes, for example to select advertisements or price offers for users. Some even use or allow the data to be used to evaluate investments in financial markets. Users' concerns about how their data is or may be used has prompted legislative action in the European Union and congressional questioning in the United States. But data can also benefit society, for example giving early warnings for disease outbreaks, allowing in-depth study of relationships between genetics and disease, and elucidating local and macroeconomic trends in a timely manner. So, instead of just a focus on privacy, in the future, users may insist that their data be used on their behalf. We explore potential frameworks for groups of consenting, informed users to pool their data for their own benefit and that of society, discussing directions, challenges, and evolution for such efforts. △ Less

Submitted 27 June, 2019; originally announced June 2019.

arXiv:1904.03031 [pdf, other]

doi 10.1016/j.jss.2021.110936

On the Feasibility of Transfer-learning Code Smells using Deep Learning

Authors: Tushar Sharma, Vasiliki Efstathiou, Panos Louridas, Diomidis Spinellis

Abstract: Context: A substantial amount of work has been done to detect smells in source code using metrics-based and heuristics-based methods. Machine learning methods have been recently applied to detect source code smells; however, the current practices are considered far from mature. Objective: First, explore the feasibility of applying deep learning models to detect smells without extensive feature eng… ▽ More Context: A substantial amount of work has been done to detect smells in source code using metrics-based and heuristics-based methods. Machine learning methods have been recently applied to detect source code smells; however, the current practices are considered far from mature. Objective: First, explore the feasibility of applying deep learning models to detect smells without extensive feature engineering, just by feeding the source code in tokenized form. Second, investigate the possibility of applying transfer-learning in the context of deep learning models for smell detection. Method: We use existing metric-based state-of-the-art methods for detecting three implementation smells and one design smell in C# code. Using these results as the annotated gold standard, we train smell detection models on three different deep learning architectures. These architectures use Convolution Neural Networks (CNNs) of one or two dimensions, or Recurrent Neural Networks (RNNs) as their principal hidden layers. For the first objective of our study, we perform training and evaluation on C# samples, whereas for the second objective, we train the models from C# code and evaluate the models over Java code samples. We perform the experiments with various combinations of hyper-parameters for each model. Results: We find it feasible to detect smells using deep learning methods. Our comparative experiments find that there is no clearly superior method between CNN-1D and CNN-2D. We also observe that performance of the deep learning models is smell-specific. Our transfer-learning experiments show that transfer-learning is definitely feasible for implementation smells with performance comparable to that of direct-learning. This work opens up a new paradigm to detect code smells by transfer-learning especially for the programming languages where the comprehensive code smell detection tools are not available. △ Less

Submitted 16 September, 2019; v1 submitted 5 April, 2019; originally announced April 2019.

arXiv:1808.02861 [pdf, other]

Choose Your Neuron: Incorporating Domain Knowledge through Neuron-Importance

Authors: Ramprasaath R. Selvaraju, Prithvijit Chattopadhyay, Mohamed Elhoseiny, Tilak Sharma, Dhruv Batra, Devi Parikh, Stefan Lee

Abstract: Individual neurons in convolutional neural networks supervised for image-level classification tasks have been shown to implicitly learn semantically meaningful concepts ranging from simple textures and shapes to whole or partial objects - forming a "dictionary" of concepts acquired through the learning process. In this work we introduce a simple, efficient zero-shot learning approach based on this… ▽ More Individual neurons in convolutional neural networks supervised for image-level classification tasks have been shown to implicitly learn semantically meaningful concepts ranging from simple textures and shapes to whole or partial objects - forming a "dictionary" of concepts acquired through the learning process. In this work we introduce a simple, efficient zero-shot learning approach based on this observation. Our approach, which we call Neuron Importance-AwareWeight Transfer (NIWT), learns to map domain knowledge about novel "unseen" classes onto this dictionary of learned concepts and then optimizes for network parameters that can effectively combine these concepts - essentially learning classifiers by discovering and composing learned semantic concepts in deep networks. Our approach shows improvements over previous approaches on the CUBirds and AWA2 generalized zero-shot learning benchmarks. We demonstrate our approach on a diverse set of semantic inputs as external domain knowledge including attributes and natural language captions. Moreover by learning inverse mappings, NIWT can provide visual and textual explanations for the predictions made by the newly learned classifiers and provide neuron names. Our code is available at https://github.com/ramprs/neuron-importance-zsl. △ Less

Submitted 8 August, 2018; originally announced August 2018.

Comments: In Proceedings of ECCV 2018

arXiv:1804.04353 [pdf, other]

Global SNR Estimation of Speech Signals using Entropy and Uncertainty Estimates from Dropout Networks

Authors: Rohith Aralikatti, Dilip Margam, Tanay Sharma, Thanda Abhinav, Shankar M Venkatesan

Abstract: This paper demonstrates two novel methods to estimate the global SNR of speech signals. In both methods, Deep Neural Network-Hidden Markov Model (DNN-HMM) acoustic model used in speech recognition systems is leveraged for the additional task of SNR estimation. In the first method, the entropy of the DNN-HMM output is computed. Recent work on bayesian deep learning has shown that a DNN-HMM trained… ▽ More This paper demonstrates two novel methods to estimate the global SNR of speech signals. In both methods, Deep Neural Network-Hidden Markov Model (DNN-HMM) acoustic model used in speech recognition systems is leveraged for the additional task of SNR estimation. In the first method, the entropy of the DNN-HMM output is computed. Recent work on bayesian deep learning has shown that a DNN-HMM trained with dropout can be used to estimate model uncertainty by approximating it as a deep Gaussian process. In the second method, this approximation is used to obtain model uncertainty estimates. Noise specific regressors are used to predict the SNR from the entropy and model uncertainty. The DNN-HMM is trained on GRID corpus and tested on different noise profiles from the DEMAND noise database at SNR levels ranging from -10 dB to 30 dB. △ Less

Submitted 12 April, 2018; originally announced April 2018.

arXiv:1701.02704 [pdf, other]

What are the visual features underlying human versus machine vision?

Authors: Drew Linsley, Sven Eberhardt, Tarun Sharma, Pankaj Gupta, Thomas Serre

Abstract: Although Deep Convolutional Networks (DCNs) are approaching the accuracy of human observers at object recognition, it is unknown whether they leverage similar visual representations to achieve this performance. To address this, we introduce Clicktionary, a web-based game for identifying visual features used by human observers during object recognition. Importance maps derived from the game are con… ▽ More Although Deep Convolutional Networks (DCNs) are approaching the accuracy of human observers at object recognition, it is unknown whether they leverage similar visual representations to achieve this performance. To address this, we introduce Clicktionary, a web-based game for identifying visual features used by human observers during object recognition. Importance maps derived from the game are consistent across participants and uncorrelated with image saliency measures. These results suggest that Clicktionary identifies image regions that are meaningful and diagnostic for object recognition but different than those driving eye movements. Surprisingly, Clicktionary importance maps are only weakly correlated with relevance maps derived from DCNs trained for object recognition. Our study demonstrates that the narrowing gap between the object recognition accuracy of human observers and DCNs obscures distinct visual strategies used by each to achieve this performance. △ Less

Submitted 7 November, 2017; v1 submitted 10 January, 2017; originally announced January 2017.

Comments: 9 pages, 7 figures

arXiv:1605.01802 [pdf]

Multiple K Means++ Clustering of Satellite Image Using Hadoop MapReduce and Spark

Authors: Tapan Sharma, Dr. Vinod Shokeen, Dr. Sunil Mathur

Abstract: Clustering of image is one of the important steps of mining satellite images. In our experiment we have simultaneously run multiple K-means algorithms with different initial centroids and values of k in the same iteration of MapReduce jobs. For initialization of initial centroids we have implemented Scalable K-Means++ MapReduce (MR) job [1]. We have also run a validation algorithm of Simplified Si… ▽ More Clustering of image is one of the important steps of mining satellite images. In our experiment we have simultaneously run multiple K-means algorithms with different initial centroids and values of k in the same iteration of MapReduce jobs. For initialization of initial centroids we have implemented Scalable K-Means++ MapReduce (MR) job [1]. We have also run a validation algorithm of Simplified Silhouette Index [2] for multiple clustering outputs, again in the same iteration of MR jobs. This paper explored the behavior of above mentioned clustering algorithms when run on big data platforms like MapReduce and Spark jobs. Spark has been chosen as it is popular for fast processing particularly where iterations are involved. △ Less

Submitted 5 May, 2016; originally announced May 2016.

Comments: 9 Pages, Distributed Computing, Satellite Images, Clustering, Published in International Journal of Advanced Studies in Computer Science and Engineering, IJASCSE volume 5 issue 4, 2016

arXiv:1604.08379 [pdf, other]

Balanced Ranking Mechanisms

Authors: Debasis Mishra, Tridib Sharma

Abstract: In the private values single object auction model, we construct a satisfactory mechanism - a symmetric, dominant strategy incentive compatible, and budget-balanced mechanism. Our mechanism allocates the object to the highest valued agent with more than 99% probability provided there are at least 14 agents. It is also ex-post individually rational. We show that our mechanism is optimal in a restric… ▽ More In the private values single object auction model, we construct a satisfactory mechanism - a symmetric, dominant strategy incentive compatible, and budget-balanced mechanism. Our mechanism allocates the object to the highest valued agent with more than 99% probability provided there are at least 14 agents. It is also ex-post individually rational. We show that our mechanism is optimal in a restricted class of satisfactory ranking mechanisms. Since achieving efficiency through a dominant strategy incentive compatible and budget-balanced mechanism is impossible in this model, our results illustrate the limits of this impossibility. △ Less

Submitted 28 April, 2016; originally announced April 2016.

arXiv:1504.01101 [pdf, ps, other]

doi 10.1109/ISIT.2015.7282676

Private Data Transfer over a Broadcast Channel

Authors: Manoj Mishra, Tanmay Sharma, Bikash K. Dey, Vinod M. Prabhakaran

Abstract: We study the following private data transfer problem: Alice has a database of files. Bob and Cathy want to access a file each from this database (which may or may not be the same file), but each of them wants to ensure that their choices of file do not get revealed even if Alice colludes with the other user. Alice, on the other hand, wants to make sure that each of Bob and Cathy does not learn any… ▽ More We study the following private data transfer problem: Alice has a database of files. Bob and Cathy want to access a file each from this database (which may or may not be the same file), but each of them wants to ensure that their choices of file do not get revealed even if Alice colludes with the other user. Alice, on the other hand, wants to make sure that each of Bob and Cathy does not learn any more information from the database than the files they demand (the identities of which will be unknown to her). Moreover, they should not learn any information about the other files even if they collude. It turns out that it is impossible to accomplish this if Alice, Bob, and Cathy have access only to private randomness and noiseless communication links. We consider this problem when a binary erasure broadcast channel with independent erasures is available from Alice to Bob and Cathy in addition to a noiseless public discussion channel. We study the file-length-per-broadcast-channel-use rate in the honest-but-curious model. We focus on the case when the database consists of two files, and obtain the optimal rate. We then extend to the case of larger databases, and give upper and lower bounds on the optimal rate. △ Less

Submitted 16 April, 2015; v1 submitted 5 April, 2015; originally announced April 2015.

Comments: To be presented at IEEE International Symposium on Information Theory (ISIT 2015), Hong Kong

arXiv:1405.0787 [pdf]

Analysis of Email Fraud detection using WEKA Tool

Authors: Tarushi Sharma, AmanPreet Kaur

Abstract: Data mining is also being useful to give solutions for invasion finding and auditing. While data mining has several applications in protection, there are also serious privacy fears. Because of email mining, even inexperienced users can connect data and make responsive associations. Therefore we must to implement the privacy of persons while working on practical data mining Data mining is also being useful to give solutions for invasion finding and auditing. While data mining has several applications in protection, there are also serious privacy fears. Because of email mining, even inexperienced users can connect data and make responsive associations. Therefore we must to implement the privacy of persons while working on practical data mining △ Less

Submitted 5 May, 2014; originally announced May 2014.

arXiv:1302.0965 [pdf]

Adaptive Energy Aware Data Aggregation Tree for Wireless Sensor Networks

Authors: Deepali Virmani, Tanu Sharma, Ritu Sharma

Abstract: To meet the demands of wireless sensor networks (WSNs) where data are usually aggregated at a single source prior to transmitting to any distant user, there is a need to establish a tree structure inside to aggregate data. In this paper, an adaptive energy aware data aggregation tree (AEDT) is proposed. The proposed tree uses the maximum energy available node as the data aggregator node. The tree… ▽ More To meet the demands of wireless sensor networks (WSNs) where data are usually aggregated at a single source prior to transmitting to any distant user, there is a need to establish a tree structure inside to aggregate data. In this paper, an adaptive energy aware data aggregation tree (AEDT) is proposed. The proposed tree uses the maximum energy available node as the data aggregator node. The tree incorporates sleep and awake technology where the communicating node and the parent node are only in awake state rest all the nodes go to sleep state saving the network energy and enhancing the network lifetime. When the traffic load crosses the threshold value, then the packets are accepted adaptively according to the communication capacity of the parent node. The proposed tree maintains a memory table which stores the value of each selected path. Path selection is based on shortest path algorithm where the node with highest available energy is always selected as forwarding node. By simulation results, we show that our proposed tree enhances network lifetime minimizes energy consumption and achieves good delivery ratio with reduced delay. △ Less

Submitted 5 February, 2013; originally announced February 2013.

Comments: 12 pages, 8 figures, International Journal of Hybrid Information Technology Vol. 6, No. 1, January, 2013

Showing 1–50 of 52 results for author: Sharma, T