Skip to main content

Showing 1–24 of 24 results for author: Mahata, D

  1. arXiv:2305.09316  [pdf, other

    cs.CL cs.AI

    Enhancing Keyphrase Extraction from Long Scientific Documents using Graph Embeddings

    Authors: Roberto Martínez-Cruz, Debanjan Mahata, Alvaro J. López-López, José Portela

    Abstract: In this study, we investigate using graph neural network (GNN) representations to enhance contextualized representations of pre-trained language models (PLMs) for keyphrase extraction from lengthy documents. We show that augmenting a PLM with graph embeddings provides a more comprehensive semantic understanding of words in a document, particularly for long documents. We construct a co-occurrence g… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

  2. arXiv:2203.15349  [pdf, other

    cs.CL

    LDKP: A Dataset for Identifying Keyphrases from Long Scientific Documents

    Authors: Debanjan Mahata, Navneet Agarwal, Dibya Gautam, Amardeep Kumar, Swapnil Parekh, Yaman Kumar Singla, Anish Acharya, Rajiv Ratn Shah

    Abstract: Identifying keyphrases (KPs) from text documents is a fundamental task in natural language processing and information retrieval. Vast majority of the benchmark datasets for this task are from the scientific domain containing only the document title and abstract information. This limits keyphrase extraction (KPE) and keyphrase generation (KPG) algorithms to identify keyphrases from human-written su… ▽ More

    Submitted 1 April, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

  3. arXiv:2203.04464  [pdf, other

    cs.CL

    On the Evaluation of Answer-Agnostic Paragraph-level Multi-Question Generation

    Authors: Jishnu Ray Chowdhury, Debanjan Mahata, Cornelia Caragea

    Abstract: We study the task of predicting a set of salient questions from a given paragraph without any prior knowledge of the precise answer. We make two main contributions. First, we propose a new method to evaluate a set of predicted questions against the set of references by using the Hungarian algorithm to assign predicted questions to references before scoring the assigned pairs. We show that our prop… ▽ More

    Submitted 11 March, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

  4. arXiv:2112.08547  [pdf, other

    cs.CL cs.IR cs.LG

    Learning Rich Representation of Keyphrases from Text

    Authors: Mayank Kulkarni, Debanjan Mahata, Ravneet Arora, Rajarshi Bhowmik

    Abstract: In this work, we explore how to train task-specific language models aimed towards learning rich representation of keyphrases from text documents. We experiment with different masking strategies for pre-training transformer language models (LMs) in discriminative as well as generative settings. In the discriminative setting, we introduce a new pre-training objective - Keyphrase Boundary Infilling w… ▽ More

    Submitted 10 July, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

  5. arXiv:2104.08962  [pdf, other

    cs.CL cs.AI cs.LG

    On the Use of Context for Predicting Citation Worthiness of Sentences in Scholarly Articles

    Authors: Rakesh Gosangi, Ravneet Arora, Mohsen Gheisarieha, Debanjan Mahata, Haimin Zhang

    Abstract: In this paper, we study the importance of context in predicting the citation worthiness of sentences in scholarly articles. We formulate this problem as a sequence labeling task solved using a hierarchical BiLSTM model. We contribute a new benchmark dataset containing over two million sentences and their corresponding labels. We preserve the sentence order in this dataset and perform document-leve… ▽ More

    Submitted 18 April, 2021; originally announced April 2021.

    Comments: To be published in the proceedings of NAACL 2021

    MSC Class: 68T50 ACM Class: I.2.7

  6. arXiv:2104.08578  [pdf, other

    cs.CL cs.AI cs.CY

    GupShup: An Annotated Corpus for Abstractive Summarization of Open-Domain Code-Switched Conversations

    Authors: Laiba Mehnaz, Debanjan Mahata, Rakesh Gosangi, Uma Sushmitha Gunturi, Riya Jain, Gauri Gupta, Amardeep Kumar, Isabelle Lee, Anish Acharya, Rajiv Ratn Shah

    Abstract: Code-switching is the communication phenomenon where speakers switch between different languages during a conversation. With the widespread adoption of conversational agents and chat platforms, code-switching has become an integral part of written conversations in many multi-lingual communities worldwide. This makes it essential to develop techniques for summarizing and understanding these convers… ▽ More

    Submitted 17 April, 2021; originally announced April 2021.

  7. arXiv:2012.11243  [pdf, other

    cs.AI

    Get It Scored Using AutoSAS -- An Automated System for Scoring Short Answers

    Authors: Yaman Kumar, Swati Aggarwal, Debanjan Mahata, Rajiv Ratn Shah, Ponnurangam Kumaraguru, Roger Zimmermann

    Abstract: In the era of MOOCs, online exams are taken by millions of candidates, where scoring short answers is an integral part. It becomes intractable to evaluate them by human graders. Thus, a generic automated system capable of grading these responses should be designed and deployed. In this paper, we present a fast, scalable, and accurate approach towards automated Short Answer Scoring (SAS). We propos… ▽ More

    Submitted 21 December, 2020; originally announced December 2020.

  8. arXiv:2009.02619  [pdf, other

    cs.CL

    MIDAS at SemEval-2020 Task 10: Emphasis Selection using Label Distribution Learning and Contextual Embeddings

    Authors: Sarthak Anand, Pradyumna Gupta, Hemant Yadav, Debanjan Mahata, Rakesh Gosangi, Haimin Zhang, Rajiv Ratn Shah

    Abstract: This paper presents our submission to the SemEval 2020 - Task 10 on emphasis selection in written text. We approach this emphasis selection problem as a sequence labeling task where we represent the underlying text with various contextual embedding models. We also employ label distribution learning to account for annotator disagreements. We experiment with the choice of model architectures, traina… ▽ More

    Submitted 5 September, 2020; originally announced September 2020.

  9. arXiv:2008.00525  [pdf, other

    cs.CY cs.SI

    Trawling for Trolling: A Dataset

    Authors: Hitkul, Karmanya Aggarwal, Pakhi Bamdev, Debanjan Mahata, Rajiv Ratn Shah, Ponnurangam Kumaraguru

    Abstract: The ability to accurately detect and filter offensive content automatically is important to ensure a rich and diverse digital discourse. Trolling is a type of hurtful or offensive content that is prevalent in social media, but is underrepresented in datasets for offensive content detection. In this work, we present a dataset that models trolling as a subcategory of offensive content. The dataset w… ▽ More

    Submitted 2 August, 2020; originally announced August 2020.

  10. arXiv:2001.09215  [pdf, other

    cs.CL cs.SI

    An Iterative Approach for Identifying Complaint Based Tweets in Social Media Platforms

    Authors: Gyanesh Anand, Akash Gautam, Puneet Mathur, Debanjan Mahata, Rajiv Ratn Shah, Ramit Sawhney

    Abstract: Twitter is a social media platform where users express opinions over a variety of issues. Posts offering grievances or complaints can be utilized by private/ public organizations to improve their service and promptly gauge a low-cost assessment. In this paper, we propose an iterative methodology which aims to identify complaint based posts pertaining to the transport domain. We perform comprehensi… ▽ More

    Submitted 17 June, 2020; v1 submitted 24 January, 2020; originally announced January 2020.

    Comments: Preprint of paper accepted at AAAI, student abstract 2020

  11. arXiv:1912.06927  [pdf, other

    cs.CL cs.SI

    #MeTooMA: Multi-Aspect Annotations of Tweets Related to the MeToo Movement

    Authors: Akash Gautam, Puneet Mathur, Rakesh Gosangi, Debanjan Mahata, Ramit Sawhney, Rajiv Ratn Shah

    Abstract: In this paper, we present a dataset containing 9,973 tweets related to the MeToo movement that were manually annotated for five different linguistic aspects: relevance, stance, hate speech, sarcasm, and dialogue acts. We present a detailed account of the data collection and annotation processes. The annotations have a very high inter-annotator agreement (0.79 to 0.93 k-alpha) due to the domain exp… ▽ More

    Submitted 20 April, 2020; v1 submitted 14 December, 2019; originally announced December 2019.

    Comments: Preprint of paper accepted at ICWSM 2020

  12. arXiv:1910.08840  [pdf, other

    cs.CL

    Keyphrase Extraction from Scholarly Articles as Sequence Labeling using Contextualized Embeddings

    Authors: Dhruva Sahrawat, Debanjan Mahata, Mayank Kulkarni, Haimin Zhang, Rakesh Gosangi, Amanda Stent, Agniv Sharma, Yaman Kumar, Rajiv Ratn Shah, Roger Zimmermann

    Abstract: In this paper, we formulate keyphrase extraction from scholarly articles as a sequence labeling task solved using a BiLSTM-CRF, where the words in the input text are represented using deep contextualized embeddings. We evaluate the proposed architecture using both contextualized and fixed word embedding models on three different benchmark datasets (Inspec, SemEval 2010, SemEval 2017) and compare w… ▽ More

    Submitted 19 October, 2019; originally announced October 2019.

  13. BHAAV- A Text Corpus for Emotion Analysis from Hindi Stories

    Authors: Yaman Kumar, Debanjan Mahata, Sagar Aggarwal, Anmol Chugh, Rajat Maheshwari, Rajiv Ratn Shah

    Abstract: In this paper, we introduce the first and largest Hindi text corpus, named BHAAV, which means emotions in Hindi, for analyzing emotions that a writer expresses through his characters in a story, as perceived by a narrator/reader. The corpus consists of 20,304 sentences collected from 230 different short stories spanning across 18 genres such as Inspirational and Mystery. Each sentence has been ann… ▽ More

    Submitted 9 October, 2019; originally announced October 2019.

  14. arXiv:1909.12229  [pdf, other

    cs.CL cs.IR cs.LG

    Keyphrase Generation for Scientific Articles using GANs

    Authors: Avinash Swaminathan, Raj Kuwar Gupta, Haimin Zhang, Debanjan Mahata, Rakesh Gosangi, Rajiv Ratn Shah

    Abstract: In this paper, we present a keyphrase generation approach using conditional Generative Adversarial Networks (GAN). In our GAN model, the generator outputs a sequence of keyphrases based on the title and abstract of a scientific article. The discriminator learns to distinguish between machine-generated and human-curated keyphrases. We evaluate this approach on standard benchmark datasets. Our model… ▽ More

    Submitted 23 September, 2019; originally announced September 2019.

    Comments: 2 pages, 1 fig, 8 references, 2 tables

  15. arXiv:1905.03968  [pdf, other

    cs.CL cs.CV

    MobiVSR: A Visual Speech Recognition Solution for Mobile Devices

    Authors: Nilay Shrivastava, Astitwa Saxena, Yaman Kumar, Rajiv Ratn Shah, Debanjan Mahata, Amanda Stent

    Abstract: Visual speech recognition (VSR) is the task of recognizing spoken language from video input only, without any audio. VSR has many applications as an assistive technology, especially if it could be deployed in mobile devices and embedded systems. The need of intensive computational resources and large memory footprint are two of the major obstacles in developing neural network models for VSR in a r… ▽ More

    Submitted 4 June, 2019; v1 submitted 10 May, 2019; originally announced May 2019.

  16. arXiv:1904.09076  [pdf, other

    cs.CL

    Suggestion Mining from Online Reviews using ULMFiT

    Authors: Sarthak Anand, Debanjan Mahata, Kartik Aggarwal, Laiba Mehnaz, Simra Shahid, Haimin Zhang, Yaman Kumar, Rajiv Ratn Shah, Karan Uppal

    Abstract: In this paper we present our approach and the system description for Sub Task A of SemEval 2019 Task 9: Suggestion Mining from Online Reviews and Forums. Given a sentence, the task asks to predict whether the sentence consists of a suggestion or not. Our model is based on Universal Language Model Fine-tuning for Text Classification. We apply various pre-processing techniques before training the la… ▽ More

    Submitted 19 April, 2019; originally announced April 2019.

  17. arXiv:1904.09072  [pdf, other

    cs.CL

    Identifying Offensive Posts and Targeted Offense from Twitter

    Authors: Haimin Zhang, Debanjan Mahata, Simra Shahid, Laiba Mehnaz, Sarthak Anand, Yaman Singla, Rajiv Ratn Shah, Karan Uppal

    Abstract: In this paper we present our approach and the system description for Sub-task A and Sub Task B of SemEval 2019 Task 6: Identifying and Categorizing Offensive Language in Social Media. Sub-task A involves identifying if a given tweet is offensive or not, and Sub Task B involves detecting if an offensive tweet is targeted towards someone (group or an individual). Our models for Sub-task A is based o… ▽ More

    Submitted 19 April, 2019; originally announced April 2019.

  18. arXiv:1901.10139  [pdf, other

    cs.LG stat.ML

    Harnessing GANs for Zero-shot Learning of New Classes in Visual Speech Recognition

    Authors: Yaman Kumar, Dhruva Sahrawat, Shubham Maheshwari, Debanjan Mahata, Amanda Stent, Yifang Yin, Rajiv Ratn Shah, Roger Zimmermann

    Abstract: Visual Speech Recognition (VSR) is the process of recognizing or interpreting speech by watching the lip movements of the speaker. Recent machine learning based approaches model VSR as a classification problem; however, the scarcity of training data leads to error-prone systems with very low accuracies in predicting unseen classes. To solve this problem, we present a novel approach to zero-shot le… ▽ More

    Submitted 2 January, 2020; v1 submitted 29 January, 2019; originally announced January 2019.

    Comments: Accepted for poster presentation at AAAI 2020. Dhruva Sahrawat and Yaman Kumar contributed equally to this work

  19. arXiv:1812.00399  [pdf, other

    cs.CV

    Kiki Kills: Identifying Dangerous Challenge Videos from Social Media

    Authors: Nupur Baghel, Yaman Kumar, Paavini Nanda, Rajiv Ratn Shah, Debanjan Mahata, Roger Zimmermann

    Abstract: There has been upsurge in the number of people participating in challenges made popular through social media channels. One of the examples of such a challenge is the Kiki Challenge, in which people step out of their moving cars and dance to the tunes of the song, 'Kiki, Do you love me?'. Such an action makes the people taking the challenge prone to accidents and can also create nuisance for the ot… ▽ More

    Submitted 16 December, 2018; v1 submitted 2 December, 2018; originally announced December 2018.

  20. arXiv:1808.02082  [pdf

    cs.CL

    Did you take the pill? - Detecting Personal Intake of Medicine from Twitter

    Authors: Debanjan Mahata, Jasper Friedrichs, Rajiv Ratn Shah, Jing Jiang

    Abstract: Mining social media messages such as tweets, articles, and Facebook posts for health and drug related information has received significant interest in pharmacovigilance research. Social media sites (e.g., Twitter), have been used for monitoring drug abuse, adverse reactions of drug usage and analyzing expression of sentiments related to drugs. Most of these studies are based on aggregated results… ▽ More

    Submitted 2 August, 2018; originally announced August 2018.

    Comments: arXiv admin note: substantial text overlap with arXiv:1805.06375

  21. arXiv:1807.05962  [pdf, other

    cs.CL

    Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings

    Authors: Debanjan Mahata, John Kuriakose, Rajiv Ratn Shah, Roger Zimmermann, John R. Talburt

    Abstract: Keyword extraction is a fundamental task in natural language processing that facilitates mapping of documents to a concise set of representative single and multi-word phrases. Keywords from text documents are primarily extracted using supervised and unsupervised approaches. In this paper, we present an unsupervised technique that uses a combination of theme-weighted personalized PageRank algorithm… ▽ More

    Submitted 16 July, 2018; originally announced July 2018.

    Comments: preprint for paper accepted in Proceedings of 1st IEEE International Conference on Multimedia Information Processing and Retrieval

  22. arXiv:1807.05959  [pdf, other

    cs.CV

    A Multimodal Approach to Predict Social Media Popularity

    Authors: Mayank Meghawat, Satyendra Yadav, Debanjan Mahata, Yifang Yin, Rajiv Ratn Shah, Roger Zimmermann

    Abstract: Multiple modalities represent different aspects by which information is conveyed by a data source. Modern day social media platforms are one of the primary sources of multimodal data, where users use different modes of expression by posting textual as well as multimedia content such as images and videos for sharing information. Multimodal information embedded in such posts could be useful in predi… ▽ More

    Submitted 16 July, 2018; originally announced July 2018.

    Comments: Preprint version for paper accepted in Proceedings of 1st IEEE International Conference on Multimedia Information Processing and Retrieval

  23. arXiv:1805.06375  [pdf, ps

    cs.CL

    #phramacovigilance - Exploring Deep Learning Techniques for Identifying Mentions of Medication Intake from Twitter

    Authors: Debanjan Mahata, Jasper Friedrichs, Hitkul, Rajiv Ratn Shah

    Abstract: Mining social media messages for health and drug related information has received significant interest in pharmacovigilance research. Social media sites (e.g., Twitter), have been used for monitoring drug abuse, adverse reactions of drug usage and analyzing expression of sentiments related to drugs. Most of these studies are based on aggregated results from a large population rather than specific… ▽ More

    Submitted 16 May, 2018; originally announced May 2018.

  24. arXiv:1803.07718  [pdf

    cs.CL

    InfyNLP at SMM4H Task 2: Stacked Ensemble of Shallow Convolutional Neural Networks for Identifying Personal Medication Intake from Twitter

    Authors: Jasper Friedrichs, Debanjan Mahata, Shubham Gupta

    Abstract: This paper describes Infosys's participation in the "2nd Social Media Mining for Health Applications Shared Task at AMIA, 2017, Task 2". Mining social media messages for health and drug related information has received significant interest in pharmacovigilance research. This task targets at developing automated classification models for identifying tweets containing descriptions of personal intake… ▽ More

    Submitted 20 March, 2018; originally announced March 2018.

    Comments: 2nd Workshop on Social Media Mining for Health