subscribe to arXiv mailings

Keystroke Dynamics Against Academic Dishonesty in the Age of LLMs

Authors: Debnath Kundu, Atharva Mehta, Rajesh Kumar, Naman Lal, Avinash Anand, Apoorv Singh, Rajiv Ratn Shah

Abstract: The transition to online examinations and assignments raises significant concerns about academic integrity. Traditional plagiarism detection systems often struggle to identify instances of intelligent cheating, particularly when students utilize advanced generative AI tools to craft their responses. This study proposes a keystroke dynamics-based method to differentiate between bona fide and assist… ▽ More The transition to online examinations and assignments raises significant concerns about academic integrity. Traditional plagiarism detection systems often struggle to identify instances of intelligent cheating, particularly when students utilize advanced generative AI tools to craft their responses. This study proposes a keystroke dynamics-based method to differentiate between bona fide and assisted writing within academic contexts. To facilitate this, a dataset was developed to capture the keystroke patterns of individuals engaged in writing tasks, both with and without the assistance of generative AI. The detector, trained using a modified TypeNet architecture, achieved accuracies ranging from 74.98% to 85.72% in condition-specific scenarios and from 52.24% to 80.54% in condition-agnostic scenarios. The findings highlight significant differences in keystroke dynamics between genuine and assisted writing. The outcomes of this study enhance our understanding of how users interact with generative AI and have implications for improving the reliability of digital educational platforms. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: Accepted for publication at The IEEE International Joint Conference on Biometrics (IJCB2024), contains 9 pages, 3 figures, 3 tables

ACM Class: I.5.4

arXiv:2404.13865 [pdf, other]

doi 10.1007/978-3-031-49601-1_6

Context-Enhanced Language Models for Generating Multi-Paper Citations

Authors: Avinash Anand, Kritarth Prasad, Ujjwal Goel, Mohit Gupta, Naman Lal, Astha Verma, Rajiv Ratn Shah

Abstract: Citation text plays a pivotal role in elucidating the connection between scientific documents, demanding an in-depth comprehension of the cited paper. Constructing citations is often time-consuming, requiring researchers to delve into extensive literature and grapple with articulating relevant content. To address this challenge, the field of citation text generation (CTG) has emerged. However, whi… ▽ More Citation text plays a pivotal role in elucidating the connection between scientific documents, demanding an in-depth comprehension of the cited paper. Constructing citations is often time-consuming, requiring researchers to delve into extensive literature and grapple with articulating relevant content. To address this challenge, the field of citation text generation (CTG) has emerged. However, while earlier methods have primarily centered on creating single-sentence citations, practical scenarios frequently necessitate citing multiple papers within a single paragraph. To bridge this gap, we propose a method that leverages Large Language Models (LLMs) to generate multi-citation sentences. Our approach involves a single source paper and a collection of target papers, culminating in a coherent paragraph containing multi-sentence citation text. Furthermore, we introduce a curated dataset named MCG-S2ORC, composed of English-language academic research papers in Computer Science, showcasing multiple citation instances. In our experiments, we evaluate three LLMs LLaMA, Alpaca, and Vicuna to ascertain the most effective model for this endeavor. Additionally, we exhibit enhanced performance by integrating knowledge graphs from target papers into the prompts for generating citation text. This research underscores the potential of harnessing LLMs for citation generation, opening a compelling avenue for exploring the intricate connections between scientific documents. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 14 pages, 7 figures, 11th International Conference, BDA 2023, Delhi, India

Journal ref: Big Data and Artificial Intelligence 2023, Delhi, India, December 7, 80 94

arXiv:2404.12926 [pdf, other]

MM-PhyRLHF: Reinforcement Learning Framework for Multimodal Physics Question-Answering

Authors: Avinash Anand, Janak Kapuriya, Chhavi Kirtani, Apoorv Singh, Jay Saraf, Naman Lal, Jatin Kumar, Adarsh Raj Shivam, Astha Verma, Rajiv Ratn Shah, Roger Zimmermann

Abstract: Recent advancements in LLMs have shown their significant potential in tasks like text summarization and generation. Yet, they often encounter difficulty while solving complex physics problems that require arithmetic calculation and a good understanding of concepts. Moreover, many physics problems include images that contain important details required to understand the problem's context. We propose… ▽ More Recent advancements in LLMs have shown their significant potential in tasks like text summarization and generation. Yet, they often encounter difficulty while solving complex physics problems that require arithmetic calculation and a good understanding of concepts. Moreover, many physics problems include images that contain important details required to understand the problem's context. We propose an LMM-based chatbot to answer multimodal physics MCQs. For domain adaptation, we utilize the MM-PhyQA dataset comprising Indian high school-level multimodal physics problems. To improve the LMM's performance, we experiment with two techniques, RLHF (Reinforcement Learning from Human Feedback) and Image Captioning. In image captioning, we add a detailed explanation of the diagram in each image, minimizing hallucinations and image processing errors. We further explore the integration of Reinforcement Learning from Human Feedback (RLHF) methodology inspired by the ranking approach in RLHF to enhance the human-like problem-solving abilities of the models. The RLHF approach incorporates human feedback into the learning process of LLMs, improving the model's problem-solving skills, truthfulness, and reasoning capabilities, minimizing the hallucinations in the answers, and improving the quality instead of using vanilla-supervised fine-tuned models. We employ the LLaVA open-source model to answer multimodal physics MCQs and compare the performance with and without using RLHF. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.09763 [pdf, other]

KG-CTG: Citation Generation through Knowledge Graph-guided Large Language Models

Authors: Avinash Anand, Mohit Gupta, Kritarth Prasad, Ujjwal Goel, Naman Lal, Astha Verma, Rajiv Ratn Shah

Abstract: Citation Text Generation (CTG) is a task in natural language processing (NLP) that aims to produce text that accurately cites or references a cited document within a source document. In CTG, the generated text draws upon contextual cues from both the source document and the cited paper, ensuring accurate and relevant citation information is provided. Previous work in the field of citation generati… ▽ More Citation Text Generation (CTG) is a task in natural language processing (NLP) that aims to produce text that accurately cites or references a cited document within a source document. In CTG, the generated text draws upon contextual cues from both the source document and the cited paper, ensuring accurate and relevant citation information is provided. Previous work in the field of citation generation is mainly based on the text summarization of documents. Following this, this paper presents a framework, and a comparative study to demonstrate the use of Large Language Models (LLMs) for the task of citation generation. Also, we have shown the improvement in the results of citation generation by incorporating the knowledge graph relations of the papers in the prompt for the LLM to better learn the relationship between the papers. To assess how well our model is performing, we have used a subset of standard S2ORC dataset, which only consists of computer science academic research papers in the English Language. Vicuna performs best for this task with 14.15 Meteor, 12.88 Rouge-1, 1.52 Rouge-2, and 10.94 Rouge-L. Also, Alpaca performs best, and improves the performance by 36.98% in Rouge-1, and 33.14% in Meteor by including knowledge graphs. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.09530 [pdf, other]

doi 10.1145/3595916.3626448

RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization

Authors: Avinash Anand, Raj Jaiswal, Mohit Gupta, Siddhesh S Bangar, Pijush Bhuyan, Naman Lal, Rajeev Singh, Ritika Jha, Rajiv Ratn Shah, Shin'ichi Satoh

Abstract: Large ground-truth datasets and recent advances in deep learning techniques have been useful for layout detection. However, because of the restricted layout diversity of these datasets, training on them requires a sizable number of annotated instances, which is both expensive and time-consuming. As a result, differences between the source and target domains may significantly impact how well these… ▽ More Large ground-truth datasets and recent advances in deep learning techniques have been useful for layout detection. However, because of the restricted layout diversity of these datasets, training on them requires a sizable number of annotated instances, which is both expensive and time-consuming. As a result, differences between the source and target domains may significantly impact how well these models function. To solve this problem, domain adaptation approaches have been developed that use a small quantity of labeled data to adjust the model to the target domain. In this research, we introduced a synthetic document dataset called RanLayNet, enriched with automatically assigned labels denoting spatial positions, ranges, and types of layout elements. The primary aim of this endeavor is to develop a versatile dataset capable of training models with robustness and adaptability to diverse document formats. Through empirical experimentation, we demonstrate that a deep layout identification model trained on our dataset exhibits enhanced performance compared to a model trained solely on actual documents. Moreover, we conduct a comparative analysis by fine-tuning inference models using both PubLayNet and IIIT-AR-13K datasets on the Doclaynet dataset. Our findings emphasize that models enriched with our dataset are optimal for tasks such as achieving 0.398 and 0.588 mAP95 score in the scientific document domain for the TABLE class. △ Less

Submitted 19 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

Comments: 8 pages, 6 figures, MMAsia 2023 Proceedings of the 5th ACM International Conference on Multimedia in Asia

Journal ref: In Proceedings of the 5th ACM International Conference on Multimedia in Asia 2023. Association for Computing Machinery, NY, USA, Article 74, pp. 1-6

arXiv:2404.08704 [pdf, other]

MM-PhyQA: Multimodal Physics Question-Answering With Multi-Image CoT Prompting

Authors: Avinash Anand, Janak Kapuriya, Apoorv Singh, Jay Saraf, Naman Lal, Astha Verma, Rushali Gupta, Rajiv Shah

Abstract: While Large Language Models (LLMs) can achieve human-level performance in various tasks, they continue to face challenges when it comes to effectively tackling multi-step physics reasoning tasks. To identify the shortcomings of existing models and facilitate further research in this area, we curated a novel dataset, MM-PhyQA, which comprises well-constructed, high schoollevel multimodal physics pr… ▽ More While Large Language Models (LLMs) can achieve human-level performance in various tasks, they continue to face challenges when it comes to effectively tackling multi-step physics reasoning tasks. To identify the shortcomings of existing models and facilitate further research in this area, we curated a novel dataset, MM-PhyQA, which comprises well-constructed, high schoollevel multimodal physics problems. By evaluating the performance of contemporary LLMs that are publicly available, both with and without the incorporation of multimodal elements in these problems, we aim to shed light on their capabilities. For generating answers for questions consisting of multimodal input (in this case, images and text) we employed Zero-shot prediction using GPT-4 and utilized LLaVA (LLaVA and LLaVA-1.5), the latter of which were fine-tuned on our dataset. For evaluating the performance of LLMs consisting solely of textual input, we tested the performance of the base and fine-tuned versions of the Mistral-7B and LLaMA2-7b models. We also showcased the performance of the novel Multi-Image Chain-of-Thought (MI-CoT) Prompting technique, which when used to train LLaVA-1.5 13b yielded the best results when tested on our dataset, with superior scores in most metrics and the highest accuracy of 71.65% on the test set. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2304.06430 [pdf, other]

Certified Zeroth-order Black-Box Defense with Robust UNet Denoiser

Authors: Astha Verma, A V Subramanyam, Siddhesh Bangar, Naman Lal, Rajiv Ratn Shah, Shin'ichi Satoh

Abstract: Certified defense methods against adversarial perturbations have been recently investigated in the black-box setting with a zeroth-order (ZO) perspective. However, these methods suffer from high model variance with low performance on high-dimensional datasets due to the ineffective design of the denoiser and are limited in their utilization of ZO techniques. To this end, we propose a certified ZO… ▽ More Certified defense methods against adversarial perturbations have been recently investigated in the black-box setting with a zeroth-order (ZO) perspective. However, these methods suffer from high model variance with low performance on high-dimensional datasets due to the ineffective design of the denoiser and are limited in their utilization of ZO techniques. To this end, we propose a certified ZO preprocessing technique for removing adversarial perturbations from the attacked image in the black-box setting using only model queries. We propose a robust UNet denoiser (RDUNet) that ensures the robustness of black-box models trained on high-dimensional datasets. We propose a novel black-box denoised smoothing (DS) defense mechanism, ZO-RUDS, by prepending our RDUNet to the black-box model, ensuring black-box defense. We further propose ZO-AE-RUDS in which RDUNet followed by autoencoder (AE) is prepended to the black-box model. We perform extensive experiments on four classification datasets, CIFAR-10, CIFAR-10, Tiny Imagenet, STL-10, and the MNIST dataset for image reconstruction tasks. Our proposed defense methods ZO-RUDS and ZO-AE-RUDS beat SOTA with a huge margin of $35\%$ and $9\%$, for low dimensional (CIFAR-10) and with a margin of $20.61\%$ and $23.51\%$ for high-dimensional (STL-10) datasets, respectively. △ Less

Submitted 6 July, 2024; v1 submitted 13 April, 2023; originally announced April 2023.

arXiv:1501.04553 [pdf]

A Heuristic EDF Uplink Scheduler for Real Time Application in WiMAX Communication

Authors: Nidhi Lal, Anurag Prakash Singh, Shishupal Kumar, Shikha Mittal, Meenakshi Singh

Abstract: WiMAX, Worldwide Interoperability for Microwave Access, is a developing wireless communication scheme that can provide broadband access to large-scale coverage. WiMAX belongs to the family of standards of IEEE-802.16. To satisfy user demands and support a new set of real time services and applications, a realistic and dynamic resource allocation algorithm is mandatory. One of the most efficient al… ▽ More WiMAX, Worldwide Interoperability for Microwave Access, is a developing wireless communication scheme that can provide broadband access to large-scale coverage. WiMAX belongs to the family of standards of IEEE-802.16. To satisfy user demands and support a new set of real time services and applications, a realistic and dynamic resource allocation algorithm is mandatory. One of the most efficient algorithm is EDF (earliest deadline first). But the problem is that when the difference between deadlines is large enough, then lower priority queues have to starve. So in this paper, we present a heuristic earliest deadline first (H-EDF) approach of the uplink scheduler of the WiMAX real time system. This H-EDF presents a way for efficient allocation of the bandwidth for uplink, so that bandwidth utilization is proper and appropriate fairness is provided to the system. We use Opnet simulator for implementing the WiMAX network, which uses this H-EDF scheduling algorithm. We will analysis the performance of the H-EDF algorithm in consideration with throughput as well as involvement of delay. △ Less

Submitted 23 January, 2015; v1 submitted 19 January, 2015; originally announced January 2015.

arXiv:1501.02365 [pdf]

Modified Trial Division Algorithm Using KNJ-Factorization Method To Factorize RSA Public Key Encryption

Authors: Nidhi Lal, Anurag Prakash Singh, Shishupal Kumar

Abstract: The security of RSA algorithm depends upon the positive integer N, which is the multiple of two precise large prime numbers. Factorization of such great numbers is a problematic process. There are many algorithms has been implemented in the past years. The offered KNJ -Factorization algorithm contributes a deterministic way to factorize RSA. The algorithm limits the search by only considering the… ▽ More The security of RSA algorithm depends upon the positive integer N, which is the multiple of two precise large prime numbers. Factorization of such great numbers is a problematic process. There are many algorithms has been implemented in the past years. The offered KNJ -Factorization algorithm contributes a deterministic way to factorize RSA. The algorithm limits the search by only considering the prime values. Subsequently prime numbers are odd numbers accordingly it also requires smaller number steps to factorize RSA. In this paper, the anticipated algorithm is very simple besides it is very easy to understand and implement. The main concept of this KNJ factorization algorithm is, to check only those factors which are odd and prime. The proposed KNJ- Factorization algorithm works very efficiently on those factors; which are adjoining and close to N. The proposed factorization method can speed up if we can reduce the time for primality testing. It fundamentally decreases the time complexity. △ Less

Submitted 10 January, 2015; originally announced January 2015.

arXiv:1412.8561 [pdf]

doi 10.9781/ijimai.2014.315

Modified Design of Microstrip Patch Antenna for WiMAX Communication System

Authors: Nidhi Kumari Lal, Ashutosh Kumar Singh

Abstract: In this paper, a new design for U-shaped microstrip patch antenna is proposed, which can be used in WiMAX communication systems. The aim of this paper is to optimize the performance of microstrip patch antenna. Nowadays, WiMAX communication applications are widely using U-shaped microstrip patch antenna and it has become very popular. Our proposed antenna design uses 4-4.5 GHZ frequency band and i… ▽ More In this paper, a new design for U-shaped microstrip patch antenna is proposed, which can be used in WiMAX communication systems. The aim of this paper is to optimize the performance of microstrip patch antenna. Nowadays, WiMAX communication applications are widely using U-shaped microstrip patch antenna and it has become very popular. Our proposed antenna design uses 4-4.5 GHZ frequency band and it is working at narrowband within this band. RT/DUROID 5880 material is used for creating the substrate of the microstrip antenna. This modified design of the microstrip patch antenna gives high performance in terms of gain and return loss. △ Less

Submitted 29 December, 2014; originally announced December 2014.

arXiv:1412.8013 [pdf]

An Effective Approach for Mobile ad hoc Network via I-Watchdog Protocol

Authors: Nidhi Lal

Abstract: Mobile ad hoc network (MANET) is now days become very famous due to their fixed infrastructure-less quality and dynamic nature. They contain a large number of nodes which are connected and communicated to each other in wireless nature. Mobile ad hoc network is a wireless technology that contains high mobility of nodes and does not depend on the background administrator for central authority, becau… ▽ More Mobile ad hoc network (MANET) is now days become very famous due to their fixed infrastructure-less quality and dynamic nature. They contain a large number of nodes which are connected and communicated to each other in wireless nature. Mobile ad hoc network is a wireless technology that contains high mobility of nodes and does not depend on the background administrator for central authority, because they do not contain any infrastructure. Nodes of the MANET use radio wave for communication and having limited resources and limited computational power. The Topology of this network is changing very frequently because they are distributed in nature and self-configurable. Due to its wireless nature and lack of any central authority in the background, Mobile ad hoc networks are always vulnerable to some security issues and performance issues. The security imposes a huge impact on the performance of any network. Some of the security issues are black hole attack, flooding, wormhole attack etc. In this paper, we will discuss issues regarding low performance of Watchdog protocol used in the MANET and proposed an improved Watchdog mechanism, which is called by I-Watchdog protocol that overcomes the limitations of Watchdog protocol and gives high performance in terms of throughput, delay. △ Less

Submitted 20 January, 2015; v1 submitted 26 December, 2014; originally announced December 2014.

arXiv:1004.1746 [pdf]

Internet ware cloud computing :Challenges

Authors: S Qamar, Niranjan Lal, Mrityunjay Singh

Abstract: After decades of engineering development and infrastructural investment, Internet connections have become commodity product in many countries, and Internet scale "cloud computing" has started to compete with traditional software business through its technological advantages and economy of scale. Cloud computing is a promising enabling technology of Internet ware Cloud Computing is termed as the ne… ▽ More After decades of engineering development and infrastructural investment, Internet connections have become commodity product in many countries, and Internet scale "cloud computing" has started to compete with traditional software business through its technological advantages and economy of scale. Cloud computing is a promising enabling technology of Internet ware Cloud Computing is termed as the next big thing in the modern corporate world. Apart from the present day software and technologies, cloud computing will have a growing impact on enterprise IT and business activities in many large organizations. This paper provides an insight to cloud computing, its impacts and discusses various issues that business organizations face while implementing cloud computing. Further, it recommends various strategies that organizations need to adopt while migrating to cloud computing. The purpose of this paper is to develop an understanding of cloud computing in the modern world and its impact on organizations and businesses. Initially the paper provides a brief description of the cloud computing model introduction and its purposes. Further it discusses various technical and non-technical issues that need to be overcome in order for the benefits of cloud computing to be realized in corporate businesses and organizations. It then provides various recommendations and strategies that businesses need to work on before stepping into new technologies. △ Less

Submitted 10 April, 2010; originally announced April 2010.

Comments: IEEE Publication format, ISSN 1947 5500, http://sites.google.com/site/ijcsis/

Journal ref: IJCSIS, Vol. 7 No. 3, March 2010, 206-210

Showing 1–12 of 12 results for author: Lal, N