skip to main content
short-paper
Open access

Patient Clustering via Integrated Profiling of Clinical and Digital Data

Published: 21 October 2023 Publication History
  • Get Citation Alerts
  • Abstract

    We introduce a novel profile-based patient clustering model designed for healthcare clinical data. By utilizing a method grounded on constrained low-rank approximation, our model takes advantage of patients' clinical data and digital interaction data, including browsing and search, to construct patient profiles. As a result of the method, nonnegative embedding vectors are generated, serving as a low-dimensional representation of the patients. Our model was assessed using real-world patient data from a healthcare web portal, with a comprehensive evaluation approach which considered clustering and recommendation capabilities. In comparison to other baselines, our approach demonstrated superior performance in terms of clustering coherence and recommendation accuracy.

    References

    [1]
    D.P. Bertsekas. 1999. Nonlinear Programming. Athena Scientific.
    [2]
    Qingyu Chen, Yifan Peng, and Zhiyong Lu. 2019. BioSentVec: creating sentence embeddings for biomedical texts. In 2019 IEEE International Conference on Healthcare Informatics (ICHI). IEEE, 1--5.
    [3]
    Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785--794.
    [4]
    Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. Advances in neural information processing systems, Vol. 29 (2016).
    [5]
    David C Chou and Amy Y Chou. 2002. Healthcare information portal: a web technology for the healthcare community. Technology in Society, Vol. 24, 3 (2002), 317--330.
    [6]
    Steven S Coughlin, Judith J Prochaska, Lovoria B Williams, Gina M Besenyi, Vahé Heboyan, D Stephen Goggans, Wonsuk Yoo, and Gianluca De Leo. 2017. Patient web portals, disease management, and primary prevention. Risk management and healthcare policy (2017), 33--40.
    [7]
    David L Davies and Donald W Bouldin. 1979. A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence 2 (1979), 224--227.
    [8]
    Rundong Du, Barry L. Drake, and Haesun Park. 2019. Hybrid clustering based on content and connection structure using joint nonnegative matrix factorization. J. Glob. Optim., Vol. 74, 4 (2019), 861--877. https://doi.org/10.1007/s10898-017-0578-x
    [9]
    Rundong Du, Da Kuang, Barry Drake, and Haesun Park. 2017. Hierarchical Community Detection via Rank-2 Symmetric Nonnegative Matrix Factorization. Computational Social Networks, Vol. 4 (12 2017), 1 -- 26. https://doi.org/10.1186/s40649-017-0043--5
    [10]
    Michael D Ekstrand, John T Riedl, Joseph A Konstan, et al. 2011. Collaborative filtering recommender systems. Foundations and Trends® in Human-Computer Interaction, Vol. 4, 2 (2011), 81--173.
    [11]
    Aron Henriksson, Jing Zhao, Henrik Boström, and Hercules Dalianis. 2015. Modeling electronic health records in ensembles of semantic spaces for adverse drug event detection. In 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 343--350.
    [12]
    Anil K. Jain, Karthik Nandakumar, and Arun Ross. 2005. Score normalization in multimodal biometric systems. Pattern Recognit., Vol. 38, 12 (2005), 2270--2285. https://doi.org/10.1016/j.patcog.2005.01.012
    [13]
    Hannah Kim, Jaegul Choo, Jingu Kim, Chandan K. Reddy, and Haesun Park. 2015. Simultaneous Discovery of Common and Discriminative Topics via Joint Nonnegative Matrix Factorization. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, August 10--13, 2015, Longbing Cao, Chengqi Zhang, Thorsten Joachims, Geoffrey I. Webb, Dragos D. Margineantu, and Graham Williams (Eds.). ACM, 567--576. https://doi.org/10.1145/2783258.2783338
    [14]
    Hyunsoo Kim and Haesun Park. 2008a. Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method. SIAM J. Matrix Anal. Appl., Vol. 30, 2 (2008), 713--730.
    [15]
    Jingu Kim, Yunlong He, and Haesun Park. 2014. Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework. J. Glob. Optim., Vol. 58, 2 (2014), 285--319.
    [16]
    Jingu Kim and Haesun Park. 2008b. Sparse nonnegative matrix factorization for clustering. Technical Report. Georgia Institute of Technology.
    [17]
    Jingu Kim and Haesun Park. 2011. Fast Nonnegative Matrix Factorization: An Active-Set-Like Method and Comparisons. SIAM J. Sci. Comput., Vol. 33, 6 (2011), 3261--3281. https://doi.org/10.1137/110821172
    [18]
    Da Kuang, Jaegul Choo, and Haesun Park. 2015a. Nonnegative matrix factorization for interactive topic modeling and document clustering. Partitional clustering algorithms (2015), 215--243.
    [19]
    Da Kuang, Sangwoon Yun, and Haesun Park. 2015b. SymNMF: Nonnegative low-rank approximation of a similarity matrix for graph clustering. Journal of Global Optimization, Vol. 62 (07 2015). https://doi.org/10.1007/s10898-014-0247--2
    [20]
    Quoc V. Le and Tomá s Mikolov. 2014. Distributed Representations of Sentences and Documents. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21--26 June 2014 (JMLR Workshop and Conference Proceedings, Vol. 32). JMLR.org, 1188--1196. http://proceedings.mlr.press/v32/le14.html
    [21]
    Tomá s Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013a. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5--8, 2013, Lake Tahoe, Nevada, United States, Christopher J. C. Burges, Lé on Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger (Eds.). 3111--3119.
    [22]
    Tomá s Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013b. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5--8, 2013, Lake Tahoe, Nevada, United States, Christopher J. C. Burges, Lé on Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger (Eds.). 3111--3119.
    [23]
    Linda E Moody. 2005. E-health web portals: delivering holistic healthcare and making home the point of care. Holistic nursing practice, Vol. 19, 4 (2005), 156--160.
    [24]
    Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich. 2016. A Review of Relational Machine Learning for Knowledge Graphs. Proc. IEEE, Vol. 104, 1 (2016), 11--33. https://doi.org/10.1109/JPROC.2015.2483592
    [25]
    World Health Organization et al. 1992. The ICD-10 classification of mental and behavioural disorders: clinical descriptions and diagnostic guidelines. World Health Organization.
    [26]
    Alexander Pretschner and Susan Gauch. 1999. Ontology Based Personalized Search. In 11th IEEE International Conference on Tools with Artificial Intelligence, ICTAI '99, Chicago, Illinois, USA, November 8--10, 1999. IEEE Computer Society, 391--398. https://doi.org/10.1109/TAI.1999.809829
    [27]
    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog, Vol. 1, 8 (2019), 9.
    [28]
    Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3--7, 2019, Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, 3980--3990. https://doi.org/10.18653/v1/D19--1410
    [29]
    Raymond Reiter. 1977. On Closed World Data Bases. In Logic and Data Bases, Symposium on Logic and Data Bases, Centre d'é tudes et de recherches de Toulouse, France, 1977 (Advances in Data Base Theory), Hervé Gallaire and Jack Minker (Eds.). Plemum Press, New York, 55--76. https://doi.org/10.1007/978--1--4684--3384--5_3
    [30]
    Peter J Rousseeuw. 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, Vol. 20 (1987), 53--65.
    [31]
    Claude Sammut and Geoffrey I. Webb (Eds.). 2010. Encyclopedia of Machine Learning. Springer. https://doi.org/10.1007/978-0--387--30164--8
    [32]
    Xuehua Shen, Bin Tan, and ChengXiang Zhai. 2005. Implicit user modeling for personalized search. In Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management, Bremen, Germany, October 31 - November 5, 2005, Otthein Herzog, Hans-Jö rg Schek, Norbert Fuhr, Abdur Chowdhury, and Wilfried Teiken (Eds.). ACM, 824--831. https://doi.org/10.1145/1099554.1099747
    [33]
    Maria Stratigi, Haridimos Kondylakis, and Kostas Stefanidis. 2020. Multidimensional group recommendations in the health domain. Algorithms, Vol. 13, 3 (2020), 54.
    [34]
    Kazunari Sugiyama, Kenji Hatano, and Masatoshi Yoshikawa. 2004. Adaptive web search based on user profile constructed without any effort from users. In Proceedings of the 13th international conference on World Wide Web, WWW 2004, New York, NY, USA, May 17--20, 2004, Stuart I. Feldman, Mike Uretsky, Marc Najork, and Craig E. Wills (Eds.). ACM, 675--684. https://doi.org/10.1145/988672.988764
    [35]
    Bin Tan, Xuehua Shen, and ChengXiang Zhai. 2006. Mining long-term search history to improve search accuracy. In Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, August 20--23, 2006, Tina Eliassi-Rad, Lyle H. Ungar, Mark Craven, and Dimitrios Gunopulos (Eds.). ACM, 718--723. https://doi.org/10.1145/1150402.1150493
    [36]
    Qiaoyu Tan, Ninghao Liu, Xing Zhao, Hongxia Yang, Jingren Zhou, and Xia Hu. 2020. Learning to hash with graph neural networks for recommender systems. In Proceedings of The Web Conference 2020. 1988--1998.
    [37]
    Yanchao Tan, Carl Yang, Xiangyu Wei, Chaochao Chen, Weiming Liu, Longfei Li, Jun Zhou, and Xiaolin Zheng. 2022. Metacare: Meta-learning with hierarchical subtyping for cold-start diagnosis prediction in healthcare data. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 449--459.
    [38]
    Syed K Tanbeer and Edward R Sykes. 2021. MyHealthPortal--A web-based e-Healthcare web portal for out-of-hospital patient care. Digital Health, Vol. 7 (2021), 2055207621989194.
    [39]
    Robert Tibshirani, Guenther Walther, and Trevor Hastie. 2001. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 63, 2 (2001), 411--423.
    [40]
    Kiona K Weisel, Lukas M Fuhrmann, Matthias Berking, Harald Baumeister, Pim Cuijpers, and David D Ebert. 2019. Standalone smartphone apps for mental health-a systematic review and meta-analysis. NPJ digital medicine, Vol. 2, 1 (2019), 118.
    [41]
    Joyce Jiyoung Whang, Rundong Du, Sangwon Jung, Geon Lee, Barry Drake, Qingqing Liu, Seonggoo Kang, and Haesun Park. 2020. MEGA: Multi-view semi-supervised clustering of hypergraphs. Proceedings of the VLDB Endowment, Vol. 13, 5 (2020), 698--711.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
    October 2023
    5508 pages
    ISBN:9798400701245
    DOI:10.1145/3583780
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 October 2023

    Check for updates

    Author Tags

    1. clustering
    2. healthcare
    3. nonnegative matrix factorization
    4. patient profiling
    5. recommendation systems

    Qualifiers

    • Short-paper

    Funding Sources

    Conference

    CIKM '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 206
      Total Downloads
    • Downloads (Last 12 months)206
    • Downloads (Last 6 weeks)39

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media