Skip to main content

Showing 1–17 of 17 results for author: Masud, S

  1. arXiv:2406.03953  [pdf, other

    cs.CL

    Tox-BART: Leveraging Toxicity Attributes for Explanation Generation of Implicit Hate Speech

    Authors: Neemesh Yadav, Sarah Masud, Vikram Goyal, Vikram Goyal, Md Shad Akhtar, Tanmoy Chakraborty

    Abstract: Employing language models to generate explanations for an incoming implicit hate post is an active area of research. The explanation is intended to make explicit the underlying stereotype and aid content moderators. The training often combines top-k relevant knowledge graph (KG) tuples to provide world knowledge and improve performance on standard metrics. Interestingly, our study presents conflic… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 17 Pages, 5 Figures, 13 Tables, ACL Findings 2024

  2. arXiv:2402.02144  [pdf, other

    cs.CL

    Probing Critical Learning Dynamics of PLMs for Hate Speech Detection

    Authors: Sarah Masud, Mohammad Aflah Khan, Vikram Goyal, Md Shad Akhtar, Tanmoy Chakraborty

    Abstract: Despite the widespread adoption, there is a lack of research into how various critical aspects of pretrained language models (PLMs) affect their performance in hate speech detection. Through five research questions, our findings and recommendations lay the groundwork for empirically investigating different aspects of PLMs' use in hate speech detection. We deep dive into comparing different pretrai… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

    Comments: 20 pages, 9 figures, 14 tables. Accepted at EACL'24

  3. arXiv:2311.09834  [pdf, other

    cs.CL

    Overview of the HASOC Subtrack at FIRE 2023: Identification of Tokens Contributing to Explicit Hate in English by Span Detection

    Authors: Sarah Masud, Mohammad Aflah Khan, Md. Shad Akhtar, Tanmoy Chakraborty

    Abstract: As hate speech continues to proliferate on the web, it is becoming increasingly important to develop computational methods to mitigate it. Reactively, using black-box models to identify hateful content can perplex users as to why their posts were automatically flagged as hateful. On the other hand, proactive mitigation can be achieved by suggesting rephrasing before a post is made public. However,… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 8 pages, 1 figure, 4 Tables

  4. arXiv:2309.11896  [pdf, other

    cs.CL cs.CY

    Focal Inferential Infusion Coupled with Tractable Density Discrimination for Implicit Hate Speech Detection

    Authors: Sarah Masud, Ashutosh Bajpai, Tanmoy Chakraborty

    Abstract: Although pre-trained large language models (PLMs) have achieved state-of-the-art on many NLP tasks, they lack understanding of subtle expressions of implicit hate speech. Such nuanced and implicit hate is often misclassified as non-hate. Various attempts have been made to enhance the detection of (implicit) hate content by augmenting external context or enforcing label separation via distance-base… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: 21 pages, 6 Figures and 9 Tables

  5. arXiv:2306.01105  [pdf, other

    cs.CL

    Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment

    Authors: Atharva Kulkarni, Sarah Masud, Vikram Goyal, Tanmoy Chakraborty

    Abstract: Social media is awash with hateful content, much of which is often veiled with linguistic and topical diversity. The benchmark datasets used for hate speech detection do not account for such divagation as they are predominantly compiled using hate lexicons. However, capturing hate signals becomes challenging in neutrally-seeded malicious content. Thus, designing models and datasets that mimic the… ▽ More

    Submitted 15 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: 15 pages, 4 figures, 11 tables. Accepted at SIGKDD'23

  6. arXiv:2302.07964  [pdf, other

    stat.ML cs.LG

    On Rank Energy Statistics via Optimal Transport: Continuity, Convergence, and Change Point Detection

    Authors: Matthew Werenski, Shoaib Bin Masud, James M. Murphy, Shuchin Aeron

    Abstract: This paper considers the use of recently proposed optimal transport-based multivariate test statistics, namely rank energy and its variant the soft rank energy derived from entropically regularized optimal transport, for the unsupervised nonparametric change point detection (CPD) problem. We show that the soft rank energy enjoys both fast rates of statistical convergence and robust continuity prop… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

    Comments: 36 pages, 5 figures

  7. arXiv:2206.04007  [pdf, other

    cs.CL

    Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization

    Authors: Sarah Masud, Manjot Bedi, Mohammad Aflah Khan, Md Shad Akhtar, Tanmoy Chakraborty

    Abstract: Curbing online hate speech has become the need of the hour; however, a blanket ban on such activities is infeasible for several geopolitical and cultural reasons. To reduce the severity of the problem, in this paper, we introduce a novel task, hate speech normalization, that aims to weaken the intensity of hatred exhibited by an online post. The intention of hate speech normalization is not to sup… ▽ More

    Submitted 8 June, 2022; originally announced June 2022.

    Comments: 11 pages, 4 figures, 12 tables. Accepted at KDD 2022 (ADS Track)

  8. arXiv:2202.00126  [pdf, other

    cs.SI cs.CY cs.LG

    Handling Bias in Toxic Speech Detection: A Survey

    Authors: Tanmay Garg, Sarah Masud, Tharun Suresh, Tanmoy Chakraborty

    Abstract: Detecting online toxicity has always been a challenge due to its inherent subjectivity. Factors such as the context, geography, socio-political climate, and background of the producers and consumers of the posts play a crucial role in determining if the content can be flagged as toxic. Adoption of automated toxicity detection models in production can thus lead to a sidelining of the various groups… ▽ More

    Submitted 15 January, 2023; v1 submitted 26 January, 2022; originally announced February 2022.

    Comments: Accepted in ACM Computing Surveys, 30 pages, 5 figures, 7 tables

  9. arXiv:2201.00961  [pdf, other

    cs.SI cs.LG

    Nipping in the Bud: Detection, Diffusion and Mitigation of Hate Speech on Social Media

    Authors: Tanmoy Chakraborty, Sarah Masud

    Abstract: Since the proliferation of social media usage, hate speech has become a major crisis. Hateful content can spread quickly and create an environment of distress and hostility. Further, what can be considered hateful is contextual and varies with time. While online hate speech reduces the ability of already marginalised groups to participate in discussion freely, offline hate speech leads to hate cri… ▽ More

    Submitted 3 January, 2022; originally announced January 2022.

    Comments: Submitted for publication in ACM SIGWEB Newsletter

  10. arXiv:2112.06267  [pdf, other

    cs.SI

    DiVA: A Scalable, Interactive and Customizable Visual Analytics Platform for Information Diffusion on Large Networks

    Authors: Dhruv Sahnan, Vasu Goel, Sarah Masud, Chhavi Jain, Vikram Goyal, Tanmoy Chakraborty

    Abstract: With an increasing outreach of digital platforms in our lives, researchers have taken a keen interest to study different facets of social interactions that seem to be evolving rapidly. Analysing the spread of information (aka diffusion) has brought forth multiple research areas such as modelling user engagement, determining emerging topics, forecasting virality of online posts and predicting infor… ▽ More

    Submitted 21 August, 2022; v1 submitted 12 December, 2021; originally announced December 2021.

    Comments: 33 pages, 12 figures, 11 tables

  11. arXiv:2111.00047  [pdf, other

    stat.ML cs.LG eess.SP

    Robust and efficient change point detection using novel multivariate rank-energy GoF test

    Authors: Shoaib Bin Masud, Shuchin Aeron

    Abstract: In this paper, we use and further develop upon a recently proposed multivariate, distribution-free Goodness-of-Fit (GoF) test based on the theory of Optimal Transport (OT) called the Rank Energy (RE) [1], for non-parametric and unsupervised Change Point Detection (CPD) in multivariate time series data. We show that directly using RE leads to high sensitivity to very small changes in distributions… ▽ More

    Submitted 29 October, 2021; originally announced November 2021.

    Comments: 6 pages, 1 figure

  12. arXiv:2111.00043  [pdf, other

    stat.ML cs.LG

    Multivariate rank via entropic optimal transport: sample efficiency and generative modeling

    Authors: Shoaib Bin Masud, Matthew Werenski, James M. Murphy, Shuchin Aeron

    Abstract: The framework of optimal transport has been leveraged to extend the notion of rank to the multivariate setting while preserving desirable properties of the resulting goodness-of-fit (GoF) statistics. In particular, the rank energy (RE) and rank maximum mean discrepancy (RMMD) are distribution-free under the null, exhibit high power in statistical testing, and are robust to outliers. In this paper,… ▽ More

    Submitted 25 November, 2022; v1 submitted 29 October, 2021; originally announced November 2021.

    Comments: 46 pages, 10 figures. Replacement note: Substantial revision over V2: Title change, first authors contribution change, new improved theoretical results relaxing compactness assumptions

  13. arXiv:2103.08811  [pdf, other

    stat.ML cs.IT cs.LG

    Soft and subspace robust multivariate rank tests based on entropy regularized optimal transport

    Authors: Shoaib Bin Masud, Boyang Lyu, Shuchin Aeron

    Abstract: In this paper, we extend the recently proposed multivariate rank energy distance, based on the theory of optimal transport, for statistical testing of distributional similarity, to soft rank energy distance. Being differentiable, this in turn allows us to extend the rank energy to a subspace robust rank energy distance, dubbed Projected soft-Rank Energy distance, which can be computed via optimiza… ▽ More

    Submitted 17 April, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

    Comments: 14 pages, 5 figures

  14. arXiv:2101.11425  [pdf, other

    cs.CL

    Fake News Detection System using XLNet model with Topic Distributions: CONSTRAINT@AAAI2021 Shared Task

    Authors: Akansha Gautam, Venktesh V, Sarah Masud

    Abstract: With the ease of access to information, and its rapid dissemination over the internet (both velocity and volume), it has become challenging to filter out truthful information from fake ones. The research community is now faced with the task of automatic detection of fake news, which carries real-world socio-political impact. One such research contribution came in the form of the Constraint@AAA1202… ▽ More

    Submitted 12 January, 2021; originally announced January 2021.

    Comments: Accepted at CONSTRAINT@AAAI2021 Shared Task for the CONSTRAINT workshop, collocated with AAAI 2021

  15. arXiv:2010.04377  [pdf, other

    cs.SI cs.CL cs.LG

    Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter

    Authors: Sarah Masud, Subhabrata Dutta, Sakshi Makkar, Chhavi Jain, Vikram Goyal, Amitava Das, Tanmoy Chakraborty

    Abstract: Online hate speech, particularly over microblogging platforms like Twitter, has emerged as arguably the most severe issue of the past decade. Several countries have reported a steep rise in hate crimes infuriated by malicious hate campaigns. While the detection of hate speech is one of the emerging research areas, the generation and spread of topic-dependent hate in the information network remain… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

    Comments: 6 table, 9 figures, Full paper in 37th International Conference on Data Engineering (ICDE)

  16. arXiv:2006.07812  [pdf, other

    cs.SI

    Deep Exogenous and Endogenous Influence Combination for Social Chatter Intensity Prediction

    Authors: Subhabrata Dutta, Sarah Masud, Soumen Chakrabarti, Tanmoy Chakraborty

    Abstract: Modeling user engagement dynamics on social media has compelling applications in user-persona detection and political discourse mining. Most existing approaches depend heavily on knowledge of the underlying user network. However, a large number of discussions happen on platforms that either lack any reliable social network or reveal only partially the inter-user ties (Reddit, Stackoverflow). Many… ▽ More

    Submitted 14 June, 2020; originally announced June 2020.

    Comments: 6 figures, 7 tables, Accepted in SIGKDD 2020

  17. arXiv:1310.7297  [pdf, other

    cs.DB

    Scalable Visibility Color Map Construction in Spatial Databases

    Authors: Farhana Murtaza Choudhury, Mohammed Eunus Ali, Sarah Masud, Suman Nath, Ishat E Rabban

    Abstract: Recent advances in 3D modeling provide us with real 3D datasets to answer queries, such as "What is the best position for a new billboard?" and "Which hotel room has the best view?" in the presence of obstacles. These applications require measuring and differentiating the visibility of an object (target) from different viewpoints in a dataspace, e.g., a billboard may be seen from two viewpoints bu… ▽ More

    Submitted 27 October, 2013; originally announced October 2013.

    Comments: 12 pages, 14 figures