skip to main content
research-article

Joint Inference of Diffusion and Structure in Partially Observed Social Networks Using Coupled Matrix Factorization

Published: 18 July 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Access to complete data in large-scale networks is often infeasible. Therefore, the problem of missing data is a crucial and unavoidable issue in the analysis and modeling of real-world social networks. However, most of the research on different aspects of social networks does not consider this limitation. One effective way to solve this problem is to recover the missing data as a pre-processing step. In this paper, a model is learned from partially observed data to infer unobserved diffusion and structure networks. To jointly discover omitted diffusion activities and hidden network structures, we develop a probabilistic generative model called “DiffStru.” The interrelations among links of nodes and cascade processes are utilized in the proposed method via learning coupled with low-dimensional latent factors. Besides inferring unseen data, latent factors such as community detection may also aid in network classification problems. We tested different missing data scenarios on simulated independent cascades over LFR networks and real datasets, including Twitter and Memetracker. Experiments on these synthetic and real-world datasets show that the proposed method successfully detects invisible social behaviors, predicts links, and identifies latent features.

    References

    [1]
    2022. Supplementary document proofs and details of paper. (2022). Retrieved from https://github.com/maryram/DiffStru/blob/main/supplemental.pdf, Accessed 8-Sep-2022.
    [2]
    Lada A. Adamic and Eytan Adar. 2003. Friends and neighbors on the web. Social Networks 25, 3 (2003), 211–230.
    [3]
    Edo M. Airoldi, David Blei, Stephen Fienberg, and Eric Xing. 2008. Mixed membership stochastic blockmodels. Journal of Machine Learning Research 21, 1 (2008), 1981–2014.
    [4]
    Demetris Antoniades and Constantine Dovrolis. 2015. Co-evolutionary dynamics in social networks: A case study of twitter. Computational Social Networks 2, 1 (2015), 14.
    [5]
    Václav Belák, Afra Mashhadi, Alessandra Sala, and Donn Morrison. 2016. Phantom cascades: The effect of hidden nodes on information diffusion. Computer Communications 73, 1 (2016), 12–21.
    [6]
    Chain Monte Carlo. 2004. Markov chain monte carlo and gibbs sampling. Lecture Notes for EEB 581, 540 (2004), 3.
    [7]
    Lang Chai, Lilan Tu, Xianjia Wang, and Juan Chen. 2022. Network-energy-based predictability and link-corrected prediction in complex networks. Expert Systems with Applications 207, 1 (2022), 118005.
    [8]
    Xueqin Chen, Fengli Zhang, Fan Zhou, and Marcello Bonsangue. 2022. Multi-scale graph capsule with influence attention for information cascades prediction. International Journal of Intelligent Systems 37, 3 (2022), 2584–2611.
    [9]
    Mehrdad Farajtabar, Manuel Gomez Rodriguez, Mohammad Zamani, Nan Du, Hongyuan Zha, and Le Song. 2015. Back to the past: Source identification in diffusion networks from partially observed cascades. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics. 38 (2015), 232–240.
    [10]
    Mehrdad Farajtabar, Yichen Wang, Manuel Gomez-Rodriguez, Shuang Li, Hongyuan Zha, and Le Song. 2017. Coevolve: A joint point process model for information diffusion and network evolution. The Journal of Machine Learning Research 18, 1 (2017), 1305–1353.
    [11]
    Sushrut Ghonge and Dervis Can Vural. 2017. Inferring network structure from cascades. Phys. Rev. E 96, 1 (2017), 012319.
    [12]
    Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Krause. 2012. Inferring networks of diffusion and influence. ACM Transactions on Knowledge Discovery from Data 5, 4 (2012), 1–37.
    [13]
    Manuel Gomez-Rodriguez, Jure Leskovec, and Bernhard Schölkopf. 2013. Modeling information propagation with survival theory. In Proceedings of the International Conference on Machine Learning. 666–674.
    [14]
    Manuel Gomez Rodriguez, Jure Leskovec, and Bernhard Schölkopf. 2013. Structure and dynamics of information pathways in online media. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining. 23–32.
    [15]
    Nathan O. Hodas and Kristina Lerman. 2014. The simple rules of social contagion. Scientific Reports 4, 1 (2014), 4343.
    [16]
    Hao Huang, Qian Yan, Ting Gan, Di Niu, Wei Lu, and Yunjun Gao. 2019. Learning diffusions without timestamps. In Proceedings of the AAAI Conference on Artificial Intelligence. 582–589.
    [17]
    Marian-Daniel Iordache, José M. Bioucas-Dias, and Antonio Plaza. 2011. Sparse unmixing of hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing 49, 6 (2011), 2014–2039.
    [18]
    M. R. Islam, S. Muthiah, B. Adhikari, B. A. Prakash, and N. Ramakrishnan. 2018. DeepDiffuse: Predicting the “Who” and “When” in cascades. In Proceedings of the 2018 IEEE International Conference on Data Mining. 1055–1060.
    [19]
    Feng Ji, Wenchang Tang, Wee Peng Tay, and Edwin K. P. Chong. 2020. Network topology inference using information cascades with limited statistical knowledge. Information and Inference: A Journal of the IMA 9, 2 (2020), 327–360.
    [20]
    Yan-Tao Jia, Yuan-Zhuo Wang, and Xue-Qi Cheng. 2015. Learning to predict links by integrating structure and interaction information in microblogs. Journal of Computer Science and Technology 30, 4 (2015), 829–842.
    [21]
    Zekarias T. Kefato, Nasrullah Sheikh, and Alberto Montresor. 2019. REFINE: Representation learning from diffusion events. In Proceedings of the Machine Learning, Optimization, and Data Science: 4th International Conference, LOD 2018, Volterra, Italy, September 13-16, 2018, Revised Selected Papers 4. Springer, 141–153.
    [22]
    Dongkwan Kim, Jiho Jin, Jaimeen Ahn, and Alice Oh. 2022. Models and benchmarks for representation learning of partially observed subgraphs. In Proceedings of the International Conference on Information and Knowledge Management (CIKM, Short Papers Track).
    [23]
    Myunghwan Kim and Jure Leskovec. 2011. The network completion problem: Inferring missing nodes and edges in networks. In Proceedings of the SIAM International Conference on Data Mining. 47–58.
    [24]
    Andrea Lancichinetti, Santo Fortunato, and Filippo Radicchi. 2008. Benchmark graphs for testing community detection algorithms. Physical Review E 78, 4 (2008), 046110.
    [25]
    Daniel D. Lee and H. Sebastian Seung. 2001. Algorithms for non-negative matrix factorization. In Proceedings of the Advances in Neural Information Processing Systems. 556–562.
    [26]
    Jure Leskovec, Lars Backstrom, and Jon Kleinberg. 2009. Meme-tracking and the dynamics of the news cycle. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 497–506.
    [27]
    Dong Li, Yongchao Zhang, Zhiming Xu, Dianhui Chu, and Sheng Li. 2016. Exploiting information diffusion feature for link prediction in sina weibo. Scientific Reports 6, 1 (2016), 20058.
    [28]
    Francois Lorrain and Harrison C. White. 1971. Structural equivalence of individuals in social networks. The Journal of Mathematical Sociology 1, 1 (1971), 49–80.
    [29]
    Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 11 (2008), 2579–2605.
    [30]
    F. Masrour, I. Barjesteh, R. Forsati, A. Esfahanian, and H. Radha. 2015. Network completion with node similarity: A matrix completion approach with provable guarantees. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 302–307.
    [31]
    Brian W. Matthews. 1975. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure 405, 2 (1975), 442–451.
    [32]
    Aditya Krishna Menon and Charles Elkan. 2011. Link prediction via matrix factorization. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 437–452.
    [33]
    Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. 2018. Ray: A distributed framework for emerging \(\lbrace\) AI \(\rbrace\) applications. In Proceedings of the 13th \(\lbrace\) USENIX \(\rbrace\) Symposium on Operating Systems Design and Implementation ( \(\lbrace\) OSDI \(\rbrace\) 18). 561–577.
    [34]
    Ece C. Mutlu, Toktam Oghaz, Amirarsalan Rajabi, and Ivan Garibay. 2020. Review on learning and extracting graph features for link prediction. Machine Learning and Knowledge Extraction 2, 4 (2020), 672–704.
    [35]
    Anis Najar, Ludovic Denoyer, and Patrick Gallinari. 2012. Predicting information diffusion on social networks with partial knowledge. In Proceedings of the 21st International Conference on World Wide Web. 1197–1204.
    [36]
    M. E. J. Newman. 2018. Network structure from rich but noisy data. Nature Physics 14, 6 (2018), 542–545.
    [37]
    M. E. J. Newman. 2018. Estimating network structure from unreliable measurements. Phys. Rev. E 98, 6 (2018), 062321.
    [38]
    Qing Ou, Ying-Di Jin, Tao Zhou, Bing-Hong Wang, and Bao-Qun Yin. 2007. Power-law strength-degree correlation from resource-allocation dynamics on weighted networks. Physical Review E 75, 2 (2007), 021102.
    [39]
    Nicholas G. Polson, James G. Scott, and Jesse Windle. 2013. Bayesian inference for logistic models using Pólya–Gamma latent variables. Journal of the American statistical Association 108, 504 (2013), 1339–1349.
    [40]
    D. Rafailidis and F. Crestani. 2016. Network completion via joint node clustering and similarity learning. In Proceedings of the2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 63–68.
    [41]
    Maryam Ramezani, Hamid R Rabiee, Maryam Tahani, and Arezoo Rajabi. 2017. Dani: A fast diffusion aware network inference algorithm. arXiv preprint arXiv:1706.00941.
    [42]
    Manuel Gomez Rodriguez, Jure Leskovec, David Balduzzi, and Bernhard Schölkopf. 2014. Uncovering the structure and temporal dynamics of information propagation. Network Science 2, 1 (2014), 26–65.
    [43]
    Eldar Sadikov, Montserrat Medina, Jure Leskovec, and Hector Garcia-Molina. 2011. Correcting for missing data in information cascades. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, 55–64.
    [44]
    Kazumi Saito, Ryohei Nakano, and Masahiro Kimura. 2008. Prediction of information diffusion probabilities for independent cascade model. In Proceedings of the International Conference on Knowledge-based and Intelligent Information and Engineering Systems. Springer, 67–75.
    [45]
    Chaoyi Shi, Qi Zhang, and Tianguang Chu. 2022. Source estimation in continuous-time diffusion networks via incomplete observation. Physica A: Statistical Mechanics and its Applications 592, 1 (2022), 126843.
    [46]
    N. Sumith, B. Annappa, and Swapan Bhattacharya. 2018. Influence maximization in large social networks: Heuristics, models and parameters. Future Generation Computer Systems 89, 1 (2018), 777–790.
    [47]
    Ling Sun, Yuan Rao, Xiangbo Zhang, Yuqian Lan, and Shuanghe Yu. 2022. MS-HGAT: Memory-enhanced sequential hypergraph attention network for information diffusion prediction. In Proceedings of the AAAI Conference on Artificial Intelligence. 4156–4164.
    [48]
    Shashidhar Sundareisan, Jilles Vreeken, and B. Aditya Prakash. 2015. Hidden hazards: Finding missing nodes in large graph epidemics. In Proceedings of the SIAM International Conference on Data Mining. 415–423.
    [49]
    Maryam Tahani, Ali M. A. Hemmatyar, Hamid R. Rabiee, and Maryam Ramezani. 2016. Inferring dynamic diffusion networks in online media. ACM Transactions on Knowledge Discovery from Data 10, 4 (2016), 1–22.
    [50]
    Minghu Tang. 2023. A joint weighted nonnegative matrix factorization model via fusing attribute information for link prediction. In Mobile Multimedia Communications: 15th EAI International Conference, MobiMedia. Springer, 190–205.
    [51]
    Didier A. Vega-Oliveros, Liang Zhao, and Lilian Berton. 2019. Evaluating link prediction by diffusion processes in dynamic networks. Scientific reports 9, 1 (2019), 1–14.
    [52]
    Norases Vesdapunt and Hector Garcia-Molina. 2015. Identifying users in social networks with limited information. In Proceedings of the 2015 IEEE 31st International Conference on Data Engineering. 627–638.
    [53]
    Hoi-To Wai, Yonina C. Eldar, Asuman E. Ozdaglar, and Anna Scaglione. 2022. Community inference from partially observed graph signals: Algorithms and analysis. IEEE Transactions on Signal Processing 70, 1 (2022), 2136–2151.
    [54]
    Ding Wang, Lingwei Wei, Chunyuan Yuan, Yinan Bao, Wei Zhou, Xian Zhu, and Songlin Hu. 2022. Cascade-enhanced graph convolutional network for information diffusion prediction. In Database Systems for Advanced Applications: 27th International Conference, DASFAA 2022, Virtual Event, April 11–14, 2022, Proceedings, Part I. Springer, 615–631.
    [55]
    J. Wang, V. W. Zheng, Z. Liu, and K. C. Chang. 2017. Topological recurrent neural network for diffusion prediction. In Proceedings of the 2017 IEEE International Conference on Data Mining. 475–484.
    [56]
    Ruijie Wang, Zijie Huang, Shengzhong Liu, Huajie Shao, Dongxin Liu, Jinyang Li, Tianshi Wang, Dachun Sun, Shuochao Yao, and Tarek Abdelzaher. 2021. Dydiff-vae: A dynamic variational framework for information diffusion prediction. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 163–172.
    [57]
    Yongqing Wang, Huawei Shen, Shenghua Liu, Jinhua Gao, and Xueqi Cheng. 2017. Cascade dynamics modeling with Attention-based recurrent neural network. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), Vol. 17. 2985–2991.
    [58]
    Yansong Wang, Xiaomeng Wang, Yijun Ran, Radosław Michalski, and Tao Jia. 2022. CasSeqGCN: Combining network structure and temporal sequence to predict information cascades. Expert Systems with Applications 206, 1 (2022), 117693.
    [59]
    Zhiqiang Wang, Jiye Liang, and Ru Li. 2018. A fusion probability matrix factorization framework for link prediction. Knowledge-Based Systems 159, 1 (2018), 72–85.
    [60]
    Lilian Weng, Jacob Ratkiewicz, Nicola Perra, Bruno Gonçalves, Carlos Castillo, Francesco Bonchi, Rossano Schifanella, Filippo Menczer, and Alessandro Flammini. 2013. The role of information diffusion in the evolution of social networks. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 356–364.
    [61]
    Jiin Woo, Jungseul Ok, and Yung Yi. 2020. Iterative learning of graph connectivity from partially-observed cascade samples. In Proceedings of the 21st International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing. 141–150.
    [62]
    Cheng Yang, Maosong Sun, Haoran Liu, Shiyi Han, Zhiyuan Liu, and Huanbo Luan. 2019. Neural diffusion model for microscopic cascade study. IEEE Transactions on Knowledge and Data Engineering 33, 3 (2019), 1128–1139.
    [63]
    Cheng Yang, Jian Tang, Maosong Sun, Ganqu Cui, and Zhiyuan Liu. 2019. Multi-scale information diffusion prediction with reinforced recurrent networks. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. AAAI Press, 4033–4039.
    [64]
    Chunyuan Yuan, Jiacheng Li, Wei Zhou, Yijun Lu, Xiaodan Zhang, and Songlin Hu. 2021. DyHGCN: A dynamic heterogeneous graph convolutional network to learn users’ dynamic preferences for information diffusion prediction. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2020, Ghent, Belgium, September 14–18, 2020, Proceedings, Part III. Springer, 347–363.
    [65]
    J. Zhang, Z. Fang, W. Chen, and J. Tang. 2015. Diffusion of “following” links in microblogging networks. IEEE Transactions on Knowledge and Data Engineering 27, 8 (2015), 2093–2106.
    [66]
    Tinghui Zhou, Hanhuai Shan, Arindam Banerjee, and Guillermo Sapiro. 2012. Kernelized probabilistic matrix factorization: Exploiting graphs and side information. In Proceedings of the SIAM International Conference on Data Mining. 403–414.

    Cited By

    View all
    • (2024)Exploring the Molecular Terrain: A Survey of Analytical Methods for Biological Network AnalysisSymmetry10.3390/sym1604046216:4(462)Online publication date: 10-Apr-2024
    • (2024)A continuous-time diffusion model for inferring multi-layer diffusion networksApplied Intelligence10.1007/s10489-024-05620-wOnline publication date: 24-Jun-2024
    • (2023)A Survey of Information Dissemination Model, Datasets, and InsightMathematics10.3390/math1117370711:17(3707)Online publication date: 28-Aug-2023

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 17, Issue 9
    November 2023
    373 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/3604532
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 July 2023
    Online AM: 24 May 2023
    Accepted: 09 May 2023
    Revised: 11 March 2023
    Received: 08 September 2022
    Published in TKDD Volume 17, Issue 9

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Information diffusion
    2. partially observed data
    3. social network
    4. network structure
    5. matrix factorization
    6. link prediction
    7. cascade completion

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)278
    • Downloads (Last 6 weeks)19

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Exploring the Molecular Terrain: A Survey of Analytical Methods for Biological Network AnalysisSymmetry10.3390/sym1604046216:4(462)Online publication date: 10-Apr-2024
    • (2024)A continuous-time diffusion model for inferring multi-layer diffusion networksApplied Intelligence10.1007/s10489-024-05620-wOnline publication date: 24-Jun-2024
    • (2023)A Survey of Information Dissemination Model, Datasets, and InsightMathematics10.3390/math1117370711:17(3707)Online publication date: 28-Aug-2023

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media