Skip to main content

Showing 1–9 of 9 results for author: Keh, S S

  1. arXiv:2402.11818  [pdf, other

    cs.CL cs.AI cs.CY

    Where It Really Matters: Few-Shot Environmental Conservation Media Monitoring for Low-Resource Languages

    Authors: Sameer Jain, Sedrick Scott Keh, Shova Chettri, Karun Dewan, Pablo Izquierdo, Johanna Prussman, Pooja Shreshtha, Cesar Suarez, Zheyuan Ryan Shi, Lei Li, Fei Fang

    Abstract: Environmental conservation organizations routinely monitor news content on conservation in protected areas to maintain situational awareness of developments that can have an environmental impact. Existing automated media monitoring systems require large amounts of data labeled by domain experts, which is only feasible at scale for high-resource languages like English. However, such tools are most… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: AAAI 2024: AI for Social Impact Track

  2. arXiv:2305.01503  [pdf, other

    cs.IR cs.CL cs.CY

    NewsPanda: Media Monitoring for Timely Conservation Action

    Authors: Sedrick Scott Keh, Zheyuan Ryan Shi, David J. Patterson, Nirmal Bhagabati, Karun Dewan, Areendran Gopala, Pablo Izquierdo, Debojyoti Mallick, Ambika Sharma, Pooja Shrestha, Fei Fang

    Abstract: Non-governmental organizations for environmental conservation have a significant interest in monitoring conservation-related media and getting timely updates about infrastructure construction projects as they may cause massive impact to key conservation areas. Such monitoring, however, is difficult and time-consuming. We introduce NewsPanda, a toolkit which automatically detects and analyzes onlin… ▽ More

    Submitted 30 April, 2023; originally announced May 2023.

    Comments: Accepted to IAAI-23: 35th Annual Conference on Innovative Applications of Artificial Intelligence. Winner of IAAI Deployed Application Award. Code at https://github.com/NewsPanda-WWF-CMU/weekly-pipeline

  3. arXiv:2302.10143  [pdf, other

    cs.CL cs.CY

    Hashtag-Guided Low-Resource Tweet Classification

    Authors: Shizhe Diao, Sedrick Scott Keh, Liangming Pan, Zhiliang Tian, Yan Song, Tong Zhang

    Abstract: Social media classification tasks (e.g., tweet sentiment analysis, tweet stance detection) are challenging because social media posts are typically short, informal, and ambiguous. Thus, training on tweets is challenging and demands large-scale human-annotated labels, which are time-consuming and costly to obtain. In this paper, we find that providing hashtags to social media tweets can help allevi… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

    Comments: WWW 2023

  4. arXiv:2210.12926  [pdf, other

    cs.CL

    Exploring Euphemism Detection in Few-Shot and Zero-Shot Settings

    Authors: Sedrick Scott Keh

    Abstract: This work builds upon the Euphemism Detection Shared Task proposed in the EMNLP 2022 FigLang Workshop, and extends it to few-shot and zero-shot settings. We demonstrate a few-shot and zero-shot formulation using the dataset from the shared task, and we conduct experiments in these settings using RoBERTa and GPT-3. Our results show that language models are able to classify euphemistic terms relativ… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: Accepted to EMNLP 2022 Figurative Language Workshop (Euphemism Detection Shared Task). Official code at https://github.com/sedrickkeh/zero-shot-euphemism-detection

  5. arXiv:2210.12846  [pdf, other

    cs.CL

    EUREKA: EUphemism Recognition Enhanced through Knn-based methods and Augmentation

    Authors: Sedrick Scott Keh, Rohit K. Bharadwaj, Emmy Liu, Simone Tedeschi, Varun Gangal, Roberto Navigli

    Abstract: We introduce EUREKA, an ensemble-based approach for performing automatic euphemism detection. We (1) identify and correct potentially mislabelled rows in the dataset, (2) curate an expanded corpus called EuphAug, (3) leverage model representations of Potentially Euphemistic Terms (PETs), and (4) explore using representations of semantically close sentences to aid in classification. Using our augme… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: Accepted to EMNLP 2022 Figurative Language Workshop; first place for Euphemism Detection Shared Task. Code at https://github.com/sedrickkeh/EUREKA

  6. arXiv:2209.07752  [pdf, other

    cs.CL cs.AI cs.LG

    PINEAPPLE: Personifying INanimate Entities by Acquiring Parallel Personification data for Learning Enhanced generation

    Authors: Sedrick Scott Keh, Kevin Lu, Varun Gangal, Steven Y. Feng, Harsh Jhamtani, Malihe Alikhani, Eduard Hovy

    Abstract: A personification is a figure of speech that endows inanimate entities with properties and actions typically seen as requiring animacy. In this paper, we explore the task of personification generation. To this end, we propose PINEAPPLE: Personifying INanimate Entities by Acquiring Parallel Personification data for Learning Enhanced generation. We curate a corpus of personifications called Personif… ▽ More

    Submitted 16 September, 2022; originally announced September 2022.

    Comments: Accepted to COLING 2022; official Github repo at https://github.com/sedrickkeh/PINEAPPLE

  7. arXiv:2209.06275  [pdf, other

    cs.CL cs.AI cs.LG

    PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters Automatically

    Authors: Sedrick Scott Keh, Steven Y. Feng, Varun Gangal, Malihe Alikhani, Eduard Hovy

    Abstract: Tongue twisters are meaningful sentences that are difficult to pronounce. The process of automatically generating tongue twisters is challenging since the generated utterance must satisfy two conditions at once: phonetic difficulty and semantic meaning. Furthermore, phonetic difficulty is itself hard to characterize and is expressed in natural tongue twisters through a heterogeneous mix of phenome… ▽ More

    Submitted 14 February, 2023; v1 submitted 13 September, 2022; originally announced September 2022.

    Comments: EACL 2023. Code at https://github.com/sedrickkeh/PANCETTA

  8. arXiv:2012.00332  [pdf, other

    cs.CV cs.LG

    Semi-Supervised Noisy Student Pre-training on EfficientNet Architectures for Plant Pathology Classification

    Authors: Sedrick Scott Keh

    Abstract: In recent years, deep learning has vastly improved the identification and diagnosis of various diseases in plants. In this report, we investigate the problem of pathology classification using images of a single leaf. We explore the use of standard benchmark models such as VGG16, ResNet101, and DenseNet 161 to achieve a 0.945 score on the task. Furthermore, we explore the use of the newer Efficient… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

  9. arXiv:1907.06333  [pdf, ps, other

    cs.LG stat.ML

    Myers-Briggs Personality Classification and Personality-Specific Language Generation Using Pre-trained Language Models

    Authors: Sedrick Scott Keh, I-Tsun Cheng

    Abstract: The Myers-Briggs Type Indicator (MBTI) is a popular personality metric that uses four dichotomies as indicators of personality traits. This paper examines the use of pre-trained language models to predict MBTI personality types based on scraped labeled texts. The proposed model reaches an accuracy of $0.47$ for correctly predicting all 4 types and $0.86$ for correctly predicting at least 2 types.… ▽ More

    Submitted 15 July, 2019; originally announced July 2019.