subscribe to arXiv mailings

Finding Fake News Websites in the Wild

Authors: Leandro Araujo, Joao M. M. Couto, Luiz Felipe Nery, Isadora C. Rodrigues, Jussara M. Almeida, Julio C. S. Reis, Fabricio Benevenuto

Abstract: The battle against the spread of misinformation on the Internet is a daunting task faced by modern society. Fake news content is primarily distributed through digital platforms, with websites dedicated to producing and disseminating such content playing a pivotal role in this complex ecosystem. Therefore, these websites are of great interest to misinformation researchers. However, obtaining a comp… ▽ More The battle against the spread of misinformation on the Internet is a daunting task faced by modern society. Fake news content is primarily distributed through digital platforms, with websites dedicated to producing and disseminating such content playing a pivotal role in this complex ecosystem. Therefore, these websites are of great interest to misinformation researchers. However, obtaining a comprehensive list of websites labeled as producers and/or spreaders of misinformation can be challenging, particularly in developing countries. In this study, we propose a novel methodology for identifying websites responsible for creating and disseminating misinformation content, which are closely linked to users who share confirmed instances of fake news on social media. We validate our approach on Twitter by examining various execution modes and contexts. Our findings demonstrate the effectiveness of the proposed methodology in identifying misinformation websites, which can aid in gaining a better understanding of this phenomenon and enabling competent entities to tackle the problem in various areas of society. △ Less

Submitted 15 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

Comments: This is a preprint version of a submitted manuscript on the Brazilian Symposium on Multimedia and the Web (WebMedia)

arXiv:2401.13161 [pdf, other]

doi 10.1109/LGRS.2024.3358694

A Generalized Multiscale Bundle-Based Hyperspectral Sparse Unmixing Algorithm

Authors: Luciano Carvalho Ayres, Ricardo Augusto Borsoi, José Carlos Moreira Bermudez, Sérgio José Melo de Almeida

Abstract: In hyperspectral sparse unmixing, a successful approach employs spectral bundles to address the variability of the endmembers in the spatial domain. However, the regularization penalties usually employed aggregate substantial computational complexity, and the solutions are very noise-sensitive. We generalize a multiscale spatial regularization approach to solve the unmixing problem by incorporatin… ▽ More In hyperspectral sparse unmixing, a successful approach employs spectral bundles to address the variability of the endmembers in the spatial domain. However, the regularization penalties usually employed aggregate substantial computational complexity, and the solutions are very noise-sensitive. We generalize a multiscale spatial regularization approach to solve the unmixing problem by incorporating group sparsity-inducing mixed norms. Then, we propose a noise-robust method that can take advantage of the bundle structure to deal with endmember variability while ensuring inter- and intra-class sparsity in abundance estimation with reasonable computational cost. We also present a general heuristic to select the \emph{most representative} abundance estimation over multiple runs of the unmixing process, yielding a solution that is robust and highly reproducible. Experiments illustrate the robustness and consistency of the results when compared to related methods. △ Less

Submitted 23 January, 2024; originally announced January 2024.

arXiv:2312.13784 [pdf, other]

Benchmarking Evolutionary Community Detection Algorithms in Dynamic Networks

Authors: Giordano Paoletti, Luca Gioacchini, Marco Mellia, Luca Vassio, Jussara M. Almeida

Abstract: In dynamic complex networks, entities interact and form network communities that evolve over time. Among the many static Community Detection (CD) solutions, the modularity-based Louvain, or Greedy Modularity Algorithm (GMA), is widely employed in real-world applications due to its intuitiveness and scalability. Nevertheless, addressing CD in dynamic graphs remains an open problem, since the evolut… ▽ More In dynamic complex networks, entities interact and form network communities that evolve over time. Among the many static Community Detection (CD) solutions, the modularity-based Louvain, or Greedy Modularity Algorithm (GMA), is widely employed in real-world applications due to its intuitiveness and scalability. Nevertheless, addressing CD in dynamic graphs remains an open problem, since the evolution of the network connections may poison the identification of communities, which may be evolving at a slower pace. Hence, naively applying GMA to successive network snapshots may lead to temporal inconsistencies in the communities. Two evolutionary adaptations of GMA, sGMA and $α$GMA, have been proposed to tackle this problem. Yet, evaluating the performance of these methods and understanding to which scenarios each one is better suited is challenging because of the lack of a comprehensive set of metrics and a consistent ground truth. To address these challenges, we propose (i) a benchmarking framework for evolutionary CD algorithms in dynamic networks and (ii) a generalised modularity-based approach (NeGMA). Our framework allows us to generate synthetic community-structured graphs and design evolving scenarios with nine basic graph transformations occurring at different rates. We evaluate performance through three metrics we define, i.e. Correctness, Delay, and Stability. Our findings reveal that $α$GMA is well-suited for detecting intermittent transformations, but struggles with abrupt changes; sGMA achieves superior stability, but fails to detect emerging communities; and NeGMA appears a well-balanced solution, excelling in responsiveness and instantaneous transformations detection. △ Less

Submitted 11 January, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: Accepted at the 4th Workshop on Graphs and more Complex structures for Learning and Reasoning (GCLR) at AAAI 2024

Journal ref: 4th Workshop on Graphs and more Complex structures for Learning and Reasoning (GCLR) at AAAI 2024

arXiv:2308.14782 [pdf, other]

Helping Fact-Checkers Identify Fake News Stories Shared through Images on WhatsApp

Authors: Julio C. S. Reis, Philipe Melo, Fabiano Belém, Fabricio Murai, Jussara M. Almeida, Fabricio Benevenuto

Abstract: WhatsApp has introduced a novel avenue for smartphone users to engage with and disseminate news stories. The convenience of forming interest-based groups and seamlessly sharing content has rendered WhatsApp susceptible to the exploitation of misinformation campaigns. While the process of fact-checking remains a potent tool in identifying fabricated news, its efficacy falters in the face of the unp… ▽ More WhatsApp has introduced a novel avenue for smartphone users to engage with and disseminate news stories. The convenience of forming interest-based groups and seamlessly sharing content has rendered WhatsApp susceptible to the exploitation of misinformation campaigns. While the process of fact-checking remains a potent tool in identifying fabricated news, its efficacy falters in the face of the unprecedented deluge of information generated on the Internet today. In this work, we explore automatic ranking-based strategies to propose a "fakeness score" model as a means to help fact-checking agencies identify fake news stories shared through images on WhatsApp. Based on the results, we design a tool and integrate it into a real system that has been used extensively for monitoring content during the 2018 Brazilian general election. Our experimental evaluation shows that this tool can reduce by up to 40% the amount of effort required to identify 80% of the fake news in the data when compared to current mechanisms practiced by the fact-checking agencies for the selection of news stories to be checked. △ Less

Submitted 28 August, 2023; originally announced August 2023.

Comments: This is a preprint version of an accepted manuscript on the Brazilian Symposium on Multimedia and the Web (WebMedia). Please, consider to cite it instead of this one

arXiv:2307.02631 [pdf, other]

doi 10.3389/frai.2024.1343447

An explainable model to support the decision about the therapy protocol for AML

Authors: Jade M. Almeida, Giovanna A. Castro, João A. Machado-Neto, Tiago A. Almeida

Abstract: Acute Myeloid Leukemia (AML) is one of the most aggressive types of hematological neoplasm. To support the specialists' decision about the appropriate therapy, patients with AML receive a prognostic of outcomes according to their cytogenetic and molecular characteristics, often divided into three risk categories: favorable, intermediate, and adverse. However, the current risk classification has kn… ▽ More Acute Myeloid Leukemia (AML) is one of the most aggressive types of hematological neoplasm. To support the specialists' decision about the appropriate therapy, patients with AML receive a prognostic of outcomes according to their cytogenetic and molecular characteristics, often divided into three risk categories: favorable, intermediate, and adverse. However, the current risk classification has known problems, such as the heterogeneity between patients of the same risk group and no clear definition of the intermediate risk category. Moreover, as most patients with AML receive an intermediate-risk classification, specialists often demand other tests and analyses, leading to delayed treatment and worsening of the patient's clinical condition. This paper presents the data analysis and an explainable machine-learning model to support the decision about the most appropriate therapy protocol according to the patient's survival prediction. In addition to the prediction model being explainable, the results obtained are promising and indicate that it is possible to use it to support the specialists' decisions safely. Most importantly, the findings offered in this study have the potential to open new avenues of research toward better treatments and prognostic markers. △ Less

Submitted 15 July, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

Comments: Preprint of the paper accepted to be published in the Proc. of the 12th Brazilian Conference on Intelligent Systems (BRACIS'2023)

arXiv:2306.15740 [pdf, other]

Impact of User Privacy and Mobility on Edge Offloading

Authors: João Paulo Esper, Nadjib Achir, Kleber Vieira Cardoso, Jussara M. Almeida

Abstract: Offloading high-demanding applications to the edge provides better quality of experience (QoE) for users with limited hardware devices. However, to maintain a competitive QoE, infrastructure, and service providers must adapt to users' different mobility patterns, which can be challenging, especially for location-based services (LBS). Another issue that needs to be tackled is the increasing demand… ▽ More Offloading high-demanding applications to the edge provides better quality of experience (QoE) for users with limited hardware devices. However, to maintain a competitive QoE, infrastructure, and service providers must adapt to users' different mobility patterns, which can be challenging, especially for location-based services (LBS). Another issue that needs to be tackled is the increasing demand for user privacy protection. With less (accurate) information regarding user location, preferences, and usage patterns, forecasting the performance of offloading mechanisms becomes even more challenging. This work discusses the impacts of users' privacy and mobility when offloading to the edge. Different privacy and mobility scenarios are simulated and discussed to shed light on the trade-offs (e.g., privacy protection at the cost of increased latency) among privacy protection, mobility, and offloading performance. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: 2023 Annual IEEE International Symposium on Personal, Indoor, and Mobile Radio Communications (IEEE PIMRC 2023)

arXiv:2111.06161 [pdf, other]

Understanding mobility in networks: A node embedding approach

Authors: Matheus F. C. Barros, Carlos H. G. Ferreira, Bruno Pereira dos Santos, Lourenço A. P. Júnior, Marco Mellia, Jussara M. Almeida

Abstract: Motivated by the growing number of mobile devices capable of connecting and exchanging messages, we propose a methodology aiming to model and analyze node mobility in networks. We note that many existing solutions in the literature rely on topological measurements calculated directly on the graph of node contacts, aiming to capture the notion of the node's importance in terms of connectivity and m… ▽ More Motivated by the growing number of mobile devices capable of connecting and exchanging messages, we propose a methodology aiming to model and analyze node mobility in networks. We note that many existing solutions in the literature rely on topological measurements calculated directly on the graph of node contacts, aiming to capture the notion of the node's importance in terms of connectivity and mobility patterns beneficial for prototyping, design, and deployment of mobile networks. However, each measure has its specificity and fails to generalize the node importance notions that ultimately change over time. Unlike previous approaches, our methodology is based on a node embedding method that models and unveils the nodes' importance in mobility and connectivity patterns while preserving their spatial and temporal characteristics. We focus on a case study based on a trace of group meetings. The results show that our methodology provides a rich representation for extracting different mobility and connectivity patterns, which can be helpful for various applications and services in mobile networks. △ Less

Submitted 11 November, 2021; originally announced November 2021.

arXiv:2109.10462 [pdf, other]

A Hierarchical Network-Oriented Analysis of User Participation in Misinformation Spread on WhatsApp

Authors: Gabriel Peres Nobre, Carlos H. G. Ferreira, Jussara M. Almeida

Abstract: WhatsApp emerged as a major communication platform in many countries in the recent years. Despite offering only one-to-one and small group conversations, WhatsApp has been shown to enable the formation of a rich underlying network, crossing the boundaries of existing groups, and with structural properties that favor information dissemination at large. Indeed, WhatsApp has reportedly been used as a… ▽ More WhatsApp emerged as a major communication platform in many countries in the recent years. Despite offering only one-to-one and small group conversations, WhatsApp has been shown to enable the formation of a rich underlying network, crossing the boundaries of existing groups, and with structural properties that favor information dissemination at large. Indeed, WhatsApp has reportedly been used as a forum of misinformation campaigns with significant social, political and economic consequences in several countries. In this article, we aim at complementing recent studies on misinformation spread on WhatsApp, mostly focused on content properties and propagation dynamics, by looking into the network that connects users sharing the same piece of content. Specifically, we present a hierarchical network-oriented characterization of the users engaged in misinformation spread by focusing on three perspectives: individuals, WhatsApp groups and user communities, i.e., groupings of users who, intentionally or not, share the same content disproportionately often. By analyzing sharing and network topological properties, our study offers valuable insights into how WhatsApp users leverage the underlying network connecting different groups to gain large reach in the spread of misinformation on the platform. △ Less

Submitted 21 September, 2021; originally announced September 2021.

Comments: Paper Accepted in Information Processing & Management, Elsevier

arXiv:2109.09152 [pdf, other]

doi 10.1016/j.osnem.2021.100155.

On the Dynamics of Political Discussions on Instagram: A Network Perspective

Authors: Carlos H. G. Ferreira, Fabricio Murai, Ana P. C. Silva, Jussara M. Almeida, Martino Trevisan, Luca Vassio, Marco Mellia, Idilio Drago

Abstract: Instagram has been increasingly used as a source of information especially among the youth. As a result, political figures now leverage the platform to spread opinions and political agenda. We here analyze online discussions on Instagram, notably in political topics, from a network perspective. Specifically, we investigate the emergence of communities of co-commenters, that is, groups of users who… ▽ More Instagram has been increasingly used as a source of information especially among the youth. As a result, political figures now leverage the platform to spread opinions and political agenda. We here analyze online discussions on Instagram, notably in political topics, from a network perspective. Specifically, we investigate the emergence of communities of co-commenters, that is, groups of users who often interact by commenting on the same posts and may be driving the ongoing online discussions. In particular, we are interested in salient co-interactions, i.e., interactions of co-commenters that occur more often than expected by chance and under independent behavior. Unlike casual and accidental co-interactions which normally happen in large volumes, salient co-interactions are key elements driving the online discussions and, ultimately, the information dissemination. We base our study on the analysis of 10 weeks of data centered around major elections in Brazil and Italy, following both politicians and other celebrities. We extract and characterize the communities of co-commenters in terms of topological structure, properties of the discussions carried out by community members, and how some community properties, notably community membership and topics, evolve over time. We show that communities discussing political topics tend to be more engaged in the debate by writing longer comments, using more emojis, hashtags and negative words than in other subjects. Also, communities built around political discussions tend to be more dynamic, although top commenters remain active and preserve community membership over time. Moreover, we observe a great diversity in discussed topics over time: whereas some topics attract attention only momentarily, others, centered around more fundamental political discussions, remain consistently active over time. △ Less

Submitted 13 September, 2022; v1 submitted 19 September, 2021; originally announced September 2021.

Journal ref: Online Social Networks and Media, Volume 25, 2021, ISSN 2468-6964

arXiv:2108.12214 [pdf, other]

Machine Learning for Performance Prediction of Spark Cloud Applications

Authors: Alexandre Maros, Fabricio Murai, Ana Paula Couto da Silva, Jussara M. Almeida, Marco Lattuada, Eugenio Gianniti, Marjan Hosseini, Danilo Ardagna

Abstract: Big data applications and analytics are employed in many sectors for a variety of goals: improving customers satisfaction, predicting market behavior or improving processes in public health. These applications consist of complex software stacks that are often run on cloud systems. Predicting execution times is important for estimating the cost of cloud services and for effectively managing the und… ▽ More Big data applications and analytics are employed in many sectors for a variety of goals: improving customers satisfaction, predicting market behavior or improving processes in public health. These applications consist of complex software stacks that are often run on cloud systems. Predicting execution times is important for estimating the cost of cloud services and for effectively managing the underlying resources at runtime. Machine Learning (ML), providing black box solutions to model the relationship between application performance and system configuration without requiring in-detail knowledge of the system, has become a popular way of predicting the performance of big data applications. We investigate the cost-benefits of using supervised ML models for predicting the performance of applications on Spark, one of today's most widely used frameworks for big data analysis. We compare our approach with \textit{Ernest} (an ML-based technique proposed in the literature by the Spark inventors) on a range of scenarios, application workloads, and cloud system configurations. Our experiments show that Ernest can accurately estimate the performance of very regular applications, but it fails when applications exhibit more irregular patterns and/or when extrapolating on bigger data set sizes. Results show that our models match or exceed Ernest's performance, sometimes enabling us to reduce the prediction error from 126-187% to only 5-19%. △ Less

Submitted 27 August, 2021; originally announced August 2021.

Comments: Published in 2019 IEEE 12th International Conference on Cloud Computing (CLOUD)

ACM Class: B.8.2; I.2

arXiv:2005.02443 [pdf, other]

A Dataset of Fact-Checked Images Shared on WhatsApp During the Brazilian and Indian Elections

Authors: Julio C. S. Reis, Philipe de Freitas Melo, Kiran Garimella, Jussara M. Almeida, Dean Eckles, Fabrício Benevenuto

Abstract: Recently, messaging applications, such as WhatsApp, have been reportedly abused by misinformation campaigns, especially in Brazil and India. A notable form of abuse in WhatsApp relies on several manipulated images and memes containing all kinds of fake stories. In this work, we performed an extensive data collection from a large set of WhatsApp publicly accessible groups and fact-checking agency w… ▽ More Recently, messaging applications, such as WhatsApp, have been reportedly abused by misinformation campaigns, especially in Brazil and India. A notable form of abuse in WhatsApp relies on several manipulated images and memes containing all kinds of fake stories. In this work, we performed an extensive data collection from a large set of WhatsApp publicly accessible groups and fact-checking agency websites. This paper opens a novel dataset to the research community containing fact-checked fake images shared through WhatsApp for two distinct scenarios known for the spread of fake news on the platform: the 2018 Brazilian elections and the 2019 Indian elections. △ Less

Submitted 5 May, 2020; originally announced May 2020.

Comments: 7 pages. This is a preprint version of an accepted paper on ICWSM'20. Please, consider to cite the conference version instead of this one

arXiv:1904.11719 [pdf, other]

doi 10.1145/3342220.3343657

Towards Understanding Political Interactions on Instagram

Authors: Martino Trevisan, Luca Vassio, Idilio Drago, Marco Mellia, Fabricio Murai, Flavio Figueiredo, Ana Paula Couto da Silva, Jussara M. Almeida

Abstract: Online Social Networks (OSNs) allow personalities and companies to communicate directly with the public, bypassing filters of traditional medias. As people rely on OSNs to stay up-to-date, the political debate has moved online too. We witness the sudden explosion of harsh political debates and the dissemination of rumours in OSNs. Identifying such behaviour requires a deep understanding on how peo… ▽ More Online Social Networks (OSNs) allow personalities and companies to communicate directly with the public, bypassing filters of traditional medias. As people rely on OSNs to stay up-to-date, the political debate has moved online too. We witness the sudden explosion of harsh political debates and the dissemination of rumours in OSNs. Identifying such behaviour requires a deep understanding on how people interact via OSNs during political debates. We present a preliminary study of interactions in a popular OSN, namely Instagram. We take Italy as a case study in the period before the 2019 European Elections. We observe the activity of top Italian Instagram profiles in different categories: politics, music, sport and show. We record their posts for more than two months, tracking "likes" and comments from users. Results suggest that profiles of politicians attract markedly different interactions than other categories. People tend to comment more, with longer comments, debating for longer time, with a large number of replies, most of which are not explicitly solicited. Moreover, comments tend to come from a small group of very active users. Finally, we witness substantial differences when comparing profiles of different parties. △ Less

Submitted 4 May, 2021; v1 submitted 26 April, 2019; originally announced April 2019.

Comments: 5 pages, 8 figures, Proceedings of the 30th ACM Conference on Hypertext and Social Media, https://dl.acm.org/doi/10.1145/3342220.3343657

Journal ref: HT19: Proceedings of the 30th ACM Conference on Hypertext and Social Media. September 2019. Pages 247-251. Association for Computing Machinery

arXiv:1810.12345 [pdf, other]

doi 10.1007/978-3-030-01129-1_16

Analyzing Ideological Communities in Congressional Voting Networks

Authors: Carlos H. G. Ferreira, Breno de Souza Matos, Jusssara M. Almeida

Abstract: We here study the behavior of political party members aiming at identifying how ideological communities are created and evolve over time in diverse (fragmented and non-fragmented) party systems. Using public voting data of both Brazil and the US, we propose a methodology to identify and characterize ideological communities, their member polarization, and how such communities evolve over time, cove… ▽ More We here study the behavior of political party members aiming at identifying how ideological communities are created and evolve over time in diverse (fragmented and non-fragmented) party systems. Using public voting data of both Brazil and the US, we propose a methodology to identify and characterize ideological communities, their member polarization, and how such communities evolve over time, covering a 15-year period. Our results reveal very distinct patterns across the two case studies, in terms of both structural and dynamic properties. △ Less

Submitted 29 October, 2018; originally announced October 2018.

arXiv:1703.06288 [pdf, other]

Gender Matters! Analyzing Global Cultural Gender Preferences for Venues Using Social Sensing

Authors: Willi Mueller, Thiago H Silva, Jussara M Almeida, Antonio A F Loureiro

Abstract: Gender differences is a phenomenon around the world actively researched by social scientists. Traditionally, the data used to support such studies is manually obtained, often through surveys with volunteers. However, due to their inherent high costs because of manual steps, such traditional methods do not quickly scale to large-size studies. We here investigate a particular aspect of gender differ… ▽ More Gender differences is a phenomenon around the world actively researched by social scientists. Traditionally, the data used to support such studies is manually obtained, often through surveys with volunteers. However, due to their inherent high costs because of manual steps, such traditional methods do not quickly scale to large-size studies. We here investigate a particular aspect of gender differences: preferences for venues. To that end we explore the use of check-in data collected from Foursquare to estimate cultural gender preferences for venues in the physical world. For that, we first demonstrate that by analyzing the check-in data in various regions of the world we can find significant differences in preferences for specific venues between gender groups. Some of these significant differences reflect well-known cultural patterns. Moreover, we also gathered evidence that our methodology offers useful information about gender preference for venues in a given region in the real world. This suggests that gender and venue preferences observed may not be independent. Our results suggests that our proposed methodology could be a promising tool to support studies on gender preferences for venues at different spatial granularities around the world, being faster and cheaper than traditional methods, besides quickly capturing changes in the real world. △ Less

Submitted 18 March, 2017; originally announced March 2017.

arXiv:1604.07890 [pdf, other]

Understanding Video-Ad Consumption on YouTube: A Measurement Study on User Behavior, Popularity, and Content Properties

Authors: Mariana Arantes, Flavio Figueiredo, Jussara M. Almeida

Abstract: Faced with the challenge of attracting user attention and revenue, social media websites have turned to video advertisements (video-ads). While in traditional media the video-ad market is mostly based on an interaction between content providers and marketers, the use of video-ads in social media has enabled a more complex interaction, that also includes content creator and viewer preferences. To b… ▽ More Faced with the challenge of attracting user attention and revenue, social media websites have turned to video advertisements (video-ads). While in traditional media the video-ad market is mostly based on an interaction between content providers and marketers, the use of video-ads in social media has enabled a more complex interaction, that also includes content creator and viewer preferences. To better understand this novel setting, we present the first data-driven analysis of video-ad exhibitions on YouTube. △ Less

Submitted 26 April, 2016; originally announced April 2016.

Comments: To Appear at WebSci 16

arXiv:1408.7094 [pdf, other]

Improving the Effectiveness of Content Popularity Prediction Methods using Time Series Trends

Authors: Flavio Figueiredo, Marcos André Gonçalves, Jussara M. Almeida

Abstract: We here present a simple and effective model to predict the popularity of web content. Our solution, which is the winner of two of the three tasks of the ECML/PKDD 2014 Predictive Analytics Challenge, aims at predicting user engagement metrics, such as number of visits and social network engagement, that a web page will achieve 48 hours after its upload, using only information available in the fir… ▽ More We here present a simple and effective model to predict the popularity of web content. Our solution, which is the winner of two of the three tasks of the ECML/PKDD 2014 Predictive Analytics Challenge, aims at predicting user engagement metrics, such as number of visits and social network engagement, that a web page will achieve 48 hours after its upload, using only information available in the first hour after upload. Our model is based on two steps. We first use time series clustering techniques to extract common temporal trends of content popularity. Next, we use linear regression models, exploiting as predictors both content features (e.g., numbers of visits and mentions on online social networks) and metrics that capture the distance between the popularity time series to the trends extracted in the first step. We discuss why this model is effective and show its gains over state of the art alternatives. △ Less

Submitted 29 August, 2014; originally announced August 2014.

Comments: Presented on the ECML/PKDD Discovery Challenge on Predictive Analytics. Winner of two out pf three tasks of the Predictive Analytics Discovery Challenge

ACM Class: H.3.5

arXiv:1405.1459 [pdf, other]

Revisit Behavior in Social Media: The Phoenix-R Model and Discoveries

Authors: Flavio Figueiredo, Jussara M. Almeida, Yasuko Matsubara, Bruno Ribeiro, Christos Faloutsos

Abstract: How many listens will an artist receive on a online radio? How about plays on a YouTube video? How many of these visits are new or returning users? Modeling and mining popularity dynamics of social activity has important implications for researchers, content creators and providers. We here investigate the effect of revisits (successive visits from a single user) on content popularity. Using four d… ▽ More How many listens will an artist receive on a online radio? How about plays on a YouTube video? How many of these visits are new or returning users? Modeling and mining popularity dynamics of social activity has important implications for researchers, content creators and providers. We here investigate the effect of revisits (successive visits from a single user) on content popularity. Using four datasets of social activity, with up to tens of millions media objects (e.g., YouTube videos, Twitter hashtags or LastFM artists), we show the effect of revisits in the popularity evolution of such objects. Secondly, we propose the Phoenix-R model which captures the popularity dynamics of individual objects. Phoenix-R has the desired properties of being: (1) parsimonious, being based on the minimum description length principle, and achieving lower root mean squared error than state-of-the-art baselines; (2) applicable, the model is effective for predicting future popularity values of objects. △ Less

Submitted 22 June, 2014; v1 submitted 6 May, 2014; originally announced May 2014.

Comments: To appear on European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2014

arXiv:1402.2351 [pdf, other]

TrendLearner: Early Prediction of Popularity Trends of User Generated Content

Authors: Flavio Figueiredo, Jussara M. Almeida, Marcos André Gonçalves, Fabrício Benevenuto

Abstract: We here focus on the problem of predicting the popularity trend of user generated content (UGC) as early as possible. Taking YouTube videos as case study, we propose a novel two-step learning approach that: (1) extracts popularity trends from previously uploaded objects, and (2) predicts trends for new content. Unlike previous work, our solution explicitly addresses the inherent tradeoff between p… ▽ More We here focus on the problem of predicting the popularity trend of user generated content (UGC) as early as possible. Taking YouTube videos as case study, we propose a novel two-step learning approach that: (1) extracts popularity trends from previously uploaded objects, and (2) predicts trends for new content. Unlike previous work, our solution explicitly addresses the inherent tradeoff between prediction accuracy and remaining interest in the content after prediction, solving it on a per-object basis. Our experimental results show great improvements of our solution over alternatives, and its applicability to improve the accuracy of state-of-the-art popularity prediction methods. △ Less

Submitted 14 February, 2016; v1 submitted 10 February, 2014; originally announced February 2014.

Comments: To appear at Elsevier Information Sciences Journal

arXiv:1402.1777 [pdf, other]

On the Dynamics of Social Media Popularity: A YouTube Case Study

Authors: Flavio Figueiredo, Jussara M. Almeida, Marcos André Gonçalves, Fabrício Benevenuto

Abstract: Understanding the factors that impact the popularity dynamics of social media can drive the design of effective information services, besides providing valuable insights to content generators and online advertisers. Taking YouTube as case study, we analyze how video popularity evolves since upload, extracting popularity trends that characterize groups of videos. We also analyze the referrers that… ▽ More Understanding the factors that impact the popularity dynamics of social media can drive the design of effective information services, besides providing valuable insights to content generators and online advertisers. Taking YouTube as case study, we analyze how video popularity evolves since upload, extracting popularity trends that characterize groups of videos. We also analyze the referrers that lead users to videos, correlating them, features of the video and early popularity measures with the popularity trend and total observed popularity the video will experience. Our findings provide fundamental knowledge about popularity dynamics and its implications for services such as advertising and search. △ Less

Submitted 17 October, 2014; v1 submitted 7 February, 2014; originally announced February 2014.

Comments: Extended version of a paper published in ACM WSDM 2011. Pre-print of the paper accepted for publication on the ACM Transactions on Internet Tecnology

arXiv:1303.2277 [pdf, ps, other]

Is Learning to Rank Worth It? A Statistical Analysis of Learning to Rank Methods

Authors: Guilherme de Castro Mendes Gomes, Vitor Campos de Oliveira, Jussara Marques de Almeida, Marcos André Gonçalves

Abstract: The Learning to Rank (L2R) research field has experienced a fast paced growth over the last few years, with a wide variety of benchmark datasets and baselines available for experimentation. We here investigate the main assumption behind this field, which is that, the use of sophisticated L2R algorithms and models, produce significant gains over more traditional and simple information retrieval app… ▽ More The Learning to Rank (L2R) research field has experienced a fast paced growth over the last few years, with a wide variety of benchmark datasets and baselines available for experimentation. We here investigate the main assumption behind this field, which is that, the use of sophisticated L2R algorithms and models, produce significant gains over more traditional and simple information retrieval approaches. Our experimental results surprisingly indicate that many L2R algorithms, when put up against the best individual features of each dataset, may not produce statistically significant differences, even if the absolute gains may seem large. We also find that most of the reported baselines are statistically tied, with no clear winner. △ Less

Submitted 9 March, 2013; originally announced March 2013.

Comments: 7 pages, 10 tables, 14 references. Original (short) paper published in the Brazilian Symposium on Databases, 2012 (SBBD2012). Current revision submitted to the Journal of Information and Data Management (JIDM)

ACM Class: H.3

arXiv:1006.3506 [pdf, ps, other]

Action Recognition in Videos: from Motion Capture Labs to the Web

Authors: Ana Paula Brandão Lopes, Eduardo Alves do Valle Jr., Jussara Marques de Almeida, Arnaldo Albuquerque de Araújo

Abstract: This paper presents a survey of human action recognition approaches based on visual data recorded from a single video camera. We propose an organizing framework which puts in evidence the evolution of the area, with techniques moving from heavily constrained motion capture scenarios towards more challenging, realistic, "in the wild" videos. The proposed organization is based on the representation… ▽ More This paper presents a survey of human action recognition approaches based on visual data recorded from a single video camera. We propose an organizing framework which puts in evidence the evolution of the area, with techniques moving from heavily constrained motion capture scenarios towards more challenging, realistic, "in the wild" videos. The proposed organization is based on the representation used as input for the recognition task, emphasizing the hypothesis assumed and thus, the constraints imposed on the type of video that each technique is able to address. Expliciting the hypothesis and constraints makes the framework particularly useful to select a method, given an application. Another advantage of the proposed organization is that it allows categorizing newest approaches seamlessly with traditional ones, while providing an insightful perspective of the evolution of the action recognition task up to now. That perspective is the basis for the discussion in the end of the paper, where we also present the main open issues in the area. △ Less

Submitted 17 June, 2010; originally announced June 2010.

Comments: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4 tables

ACM Class: I.4.8; I.4.10

arXiv:cs/0504012 [pdf, ps, other]

Improving Spam Detection Based on Structural Similarity

Authors: Luiz H. Gomes, Fernando D. O. Castro, Rodrigo B. Almeida, Luis M. A. Bettencourt, Virgilio A. F. Almeida, Jussara M. Almeida

Abstract: We propose a new detection algorithm that uses structural relationships between senders and recipients of email as the basis for the identification of spam messages. Users and receivers are represented as vectors in their reciprocal spaces. A measure of similarity between vectors is constructed and used to group users into clusters. Knowledge of their classification as past senders/receivers of… ▽ More We propose a new detection algorithm that uses structural relationships between senders and recipients of email as the basis for the identification of spam messages. Users and receivers are represented as vectors in their reciprocal spaces. A measure of similarity between vectors is constructed and used to group users into clusters. Knowledge of their classification as past senders/receivers of spam or legitimate mail, comming from an auxiliary detection algorithm, is then used to label these clusters probabilistically. This knowledge comes from an auxiliary algorithm. The measure of similarity between the sender and receiver sets of a new message to the center vector of clusters is then used to asses the possibility of that message being legitimate or spam. We show that the proposed algorithm is able to correct part of the false positives (legitimate messages classified as spam) using a testbed of one week smtp log. △ Less

Submitted 5 April, 2005; originally announced April 2005.

Showing 1–22 of 22 results for author: Almeida, J M