-
Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model
Authors:
Elaheh Baharlouei,
Mahsa Shafaei,
Yigeng Zhang,
Hugo Jair Escalante,
Thamar Solorio
Abstract:
We address the challenge of detecting questionable content in online media, specifically the subcategory of comic mischief. This type of content combines elements such as violence, adult content, or sarcasm with humor, making it difficult to detect. Employing a multimodal approach is vital to capture the subtle details inherent in comic mischief content. To tackle this problem, we propose a novel…
▽ More
We address the challenge of detecting questionable content in online media, specifically the subcategory of comic mischief. This type of content combines elements such as violence, adult content, or sarcasm with humor, making it difficult to detect. Employing a multimodal approach is vital to capture the subtle details inherent in comic mischief content. To tackle this problem, we propose a novel end-to-end multimodal system for the task of comic mischief detection. As part of this contribution, we release a novel dataset for the targeted task consisting of three modalities: video, text (video captions and subtitles), and audio. We also design a HIerarchical Cross-attention model with CAPtions (HICCAP) to capture the intricate relationships among these modalities. The results show that the proposed approach makes a significant improvement over robust baselines and state-of-the-art models for comic mischief detection and its type classification. This emphasizes the potential of our system to empower users, to make informed decisions about the online content they choose to see. In addition, we conduct experiments on the UCF101, HMDB51, and XD-Violence datasets, comparing our model against other state-of-the-art approaches showcasing the outstanding performance of our proposed model in various scenarios.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Positive and Risky Message Assessment for Music Products
Authors:
Yigeng Zhang,
Mahsa Shafaei,
Fabio A. González,
Thamar Solorio
Abstract:
In this work, we introduce a pioneering research challenge: evaluating positive and potentially harmful messages within music products. We initiate by setting a multi-faceted, multi-task benchmark for music content assessment. Subsequently, we introduce an efficient multi-task predictive model fortified with ordinality-enforcement to address this challenge. Our findings reveal that the proposed me…
▽ More
In this work, we introduce a pioneering research challenge: evaluating positive and potentially harmful messages within music products. We initiate by setting a multi-faceted, multi-task benchmark for music content assessment. Subsequently, we introduce an efficient multi-task predictive model fortified with ordinality-enforcement to address this challenge. Our findings reveal that the proposed method not only significantly outperforms robust task-specific alternatives but also possesses the capability to assess multiple aspects simultaneously. Furthermore, through detailed case studies, where we employed Large Language Models (LLMs) as surrogates for content assessment, we provide valuable insights to inform and guide future research on this topic. The code for dataset creation and model implementation is publicly available at https://github.com/RiTUAL-UH/music-message-assessment.
△ Less
Submitted 8 April, 2024; v1 submitted 18 September, 2023;
originally announced September 2023.
-
From None to Severe: Predicting Severity in Movie Scripts
Authors:
Yigeng Zhang,
Mahsa Shafaei,
Fabio Gonzalez,
Thamar Solorio
Abstract:
In this paper, we introduce the task of predicting severity of age-restricted aspects of movie content based solely on the dialogue script. We first investigate categorizing the ordinal severity of movies on 5 aspects: Sex, Violence, Profanity, Substance consumption, and Frightening scenes. The problem is handled using a siamese network-based multitask framework which concurrently improves the int…
▽ More
In this paper, we introduce the task of predicting severity of age-restricted aspects of movie content based solely on the dialogue script. We first investigate categorizing the ordinal severity of movies on 5 aspects: Sex, Violence, Profanity, Substance consumption, and Frightening scenes. The problem is handled using a siamese network-based multitask framework which concurrently improves the interpretability of the predictions. The experimental results show that our method outperforms the previous state-of-the-art model and provides useful information to interpret model predictions. The proposed dataset and source code are publicly available at our GitHub repository.
△ Less
Submitted 3 October, 2021; v1 submitted 19 September, 2021;
originally announced September 2021.
-
White Paper -- Objectionable Online Content: What is harmful, to whom, and why
Authors:
Thamar Solorio,
Mahsa Shafaei,
Christos Smailis,
Brad J. Bushman,
Douglas A. Gentile,
Erica Scharrer,
Laura Stockdale,
Ioannis Kakadiaris
Abstract:
This White Paper summarizes the authors' discussion regarding objectionable content for the University of Houston (UH) Research Team to outline a strategy for building an extensive repository of online videos to support research into automated multimodal approaches to detect objectionable content. The workshop focused on defining what harmful content is, to whom it is harmful, and why it is harmfu…
▽ More
This White Paper summarizes the authors' discussion regarding objectionable content for the University of Houston (UH) Research Team to outline a strategy for building an extensive repository of online videos to support research into automated multimodal approaches to detect objectionable content. The workshop focused on defining what harmful content is, to whom it is harmful, and why it is harmful.
△ Less
Submitted 26 January, 2021;
originally announced April 2021.
-
A Case Study of Deep Learning Based Multi-Modal Methods for Predicting the Age-Suitability Rating of Movie Trailers
Authors:
Mahsa Shafaei,
Christos Smailis,
Ioannis A. Kakadiaris,
Thamar Solorio
Abstract:
In this work, we explore different approaches to combine modalities for the problem of automated age-suitability rating of movie trailers. First, we introduce a new dataset containing videos of movie trailers in English downloaded from IMDB and YouTube, along with their corresponding age-suitability rating labels. Secondly, we propose a multi-modal deep learning pipeline addressing the movie trail…
▽ More
In this work, we explore different approaches to combine modalities for the problem of automated age-suitability rating of movie trailers. First, we introduce a new dataset containing videos of movie trailers in English downloaded from IMDB and YouTube, along with their corresponding age-suitability rating labels. Secondly, we propose a multi-modal deep learning pipeline addressing the movie trailer age suitability rating problem. This is the first attempt to combine video, audio, and speech information for this problem, and our experimental results show that multi-modal approaches significantly outperform the best mono and bimodal models in this task.
△ Less
Submitted 26 January, 2021;
originally announced January 2021.
-
White Paper: Challenges and Considerations for the Creation of a Large Labelled Repository of Online Videos with Questionable Content
Authors:
Thamar Solorio,
Mahsa Shafaei,
Christos Smailis,
Mona Diab,
Theodore Giannakopoulos,
Heng Ji,
Yang Liu,
Rada Mihalcea,
Smaranda Muresan,
Ioannis Kakadiaris
Abstract:
This white paper presents a summary of the discussions regarding critical considerations to develop an extensive repository of online videos annotated with labels indicating questionable content. The main discussion points include: 1) the type of appropriate labels that will result in a valuable repository for the larger AI community; 2) how to design the collection and annotation process, as well…
▽ More
This white paper presents a summary of the discussions regarding critical considerations to develop an extensive repository of online videos annotated with labels indicating questionable content. The main discussion points include: 1) the type of appropriate labels that will result in a valuable repository for the larger AI community; 2) how to design the collection and annotation process, as well as the distribution of the corpus to maximize its potential impact; and, 3) what actions we can take to reduce risk of trauma to annotators.
△ Less
Submitted 25 January, 2021;
originally announced January 2021.
-
ParsiNLU: A Suite of Language Understanding Challenges for Persian
Authors:
Daniel Khashabi,
Arman Cohan,
Siamak Shakeri,
Pedram Hosseini,
Pouya Pezeshkpour,
Malihe Alikhani,
Moin Aminnaseri,
Marzieh Bitaab,
Faeze Brahman,
Sarik Ghazarian,
Mozhdeh Gheini,
Arman Kabiri,
Rabeeh Karimi Mahabadi,
Omid Memarrast,
Ahmadreza Mosallanezhad,
Erfan Noury,
Shahab Raji,
Mohammad Sadegh Rasooli,
Sepideh Sadeghi,
Erfan Sadeqi Azer,
Niloofar Safi Samghabadi,
Mahsa Shafaei,
Saber Sheybani,
Ali Tazarv,
Yadollah Yaghoobzadeh
Abstract:
Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this rich language. The availability of high-quality evaluat…
▽ More
Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this rich language. The availability of high-quality evaluation datasets is a necessity for reliable assessment of the progress on different NLU tasks and domains. We introduce ParsiNLU, the first benchmark in Persian language that includes a range of high-level tasks -- Reading Comprehension, Textual Entailment, etc. These datasets are collected in a multitude of ways, often involving manual annotations by native speakers. This results in over 14.5$k$ new instances across 6 distinct NLU tasks. Besides, we present the first results on state-of-the-art monolingual and multi-lingual pre-trained language-models on this benchmark and compare them with human performance, which provides valuable insights into our ability to tackle natural language understanding challenges in Persian. We hope ParsiNLU fosters further research and advances in Persian language understanding.
△ Less
Submitted 13 July, 2021; v1 submitted 11 December, 2020;
originally announced December 2020.
-
Attending the Emotions to Detect Online Abusive Language
Authors:
Niloofar Safi Samghabadi,
Afsheen Hatami,
Mahsa Shafaei,
Sudipta Kar,
Thamar Solorio
Abstract:
In recent years, abusive behavior has become a serious issue in online social networks. In this paper, we present a new corpus from a semi-anonymous social media platform, which contains the instances of offensive and neutral classes. We introduce a single deep neural architecture that considers both local and sequential information from the text in order to detect abusive language. Along with thi…
▽ More
In recent years, abusive behavior has become a serious issue in online social networks. In this paper, we present a new corpus from a semi-anonymous social media platform, which contains the instances of offensive and neutral classes. We introduce a single deep neural architecture that considers both local and sequential information from the text in order to detect abusive language. Along with this model, we introduce a new attention mechanism called emotion-aware attention. This mechanism utilizes the emotions behind the text to find the most important words within that text. We experiment with this model on our dataset and later present the analysis. Additionally, we evaluate our proposed method on different corpora and show new state-of-the-art results with respect to offensive language detection.
△ Less
Submitted 6 September, 2019;
originally announced September 2019.
-
Rating for Parents: Predicting Children Suitability Rating for Movies Based on Language of the Movies
Authors:
Mahsa Shafaei,
Niloofar Safi Samghabadi,
Sudipta Kar,
Thamar Solorio
Abstract:
The film culture has grown tremendously in recent years. The large number of streaming services put films as one of the most convenient forms of entertainment in today's world. Films can help us learn and inspire societal change. But they can also negatively affect viewers. In this paper, our goal is to predict the suitability of the movie content for children and young adults based on scripts. Th…
▽ More
The film culture has grown tremendously in recent years. The large number of streaming services put films as one of the most convenient forms of entertainment in today's world. Films can help us learn and inspire societal change. But they can also negatively affect viewers. In this paper, our goal is to predict the suitability of the movie content for children and young adults based on scripts. The criterion that we use to measure suitability is the MPAA rating that is specifically designed for this purpose. We propose an RNN based architecture with attention that jointly models the genre and the emotions in the script to predict the MPAA rating. We achieve 78% weighted F1-score for the classification model that outperforms the traditional machine learning method by 6%.
△ Less
Submitted 21 August, 2019; v1 submitted 21 August, 2019;
originally announced August 2019.