skip to main content
Public Access

How Data Scientists Review the Scholarly Literature

Published: 20 March 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Keeping up with the research literature plays an important role in the workflow of scientists – allowing them to understand a field, formulate the problems they focus on, and develop the solutions that they contribute, which in turn shape the nature of the discipline. In this paper, we examine the literature review practices of data scientists. Data science represents a field seeing an exponential rise in papers, and increasingly drawing on and being applied in numerous diverse disciplines. Recent efforts have seen the development of several tools intended to help data scientists cope with a deluge of research and coordinated efforts to develop AI tools intended to uncover the research frontier. Despite these trends indicative of the information overload faced by data scientists, no prior work has examined the specific practices and challenges faced by these scientists in an interdisciplinary field with evolving scholarly norms. In this paper, we close this gap through a set of semi-structured interviews and think-aloud protocols of industry and academic data scientists (N = 20). Our results while corroborating other knowledge workers’ practices uncover several novel findings: individuals (1) are challenged in seeking and sensemaking of papers beyond their disciplinary bubbles, (2) struggle to understand papers in the face of missing details and mathematical content, (3) grapple with the deluge by leveraging the knowledge context in code, blogs, and talks, and (4) lean on their peers online and in-person. Furthermore, we outline future directions likely to help data scientists cope with the burgeoning research literature.


    ICLR 2022. 2022. ICLR 2022 Blog Track. Accessed: 15 October 2022.
    Jafar Afzali, Aleksander Mark Drzewiecki, and Krisztian Balog. 2021. POINTREC: A Test Collection for Narrative-Driven Point of Interest Recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, Canada) (SIGIR ’21). Association for Computing Machinery, New York, NY, USA, 2478–2484.
    Netta Aizenbud-Reshef, Ido Guy, and Michal Jacovi. 2009. Collaborative Feed Reading in a Community. In Proceedings of the ACM 2009 International Conference on Supporting Group Work (Sanibel Island, Florida, USA) (GROUP ’09). Association for Computing Machinery, New York, NY, USA, 277–280.
    Hamed Alhoori, Mohammed Samaka, Richard Furuta, and Edward A Fox. 2019. Anatomy of scholarly information behavior patterns in the wake of academic social media platforms. International Journal on Digital Libraries 20, 4 (2019), 369–389.
    Mohammad Aliannejadi, Leif Azzopardi, Hamed Zamani, Evangelos Kanoulas, Paul Thomas, and Nick Craswell. 2021. Analysing Mixed Initiatives and Search Strategies during Conversational Search. In Proceedings of the 30th ACM International Conference on Information Knowledge Management(Virtual Event, Queensland, Australia) (CIKM ’21). Association for Computing Machinery, New York, NY, USA, 16–26.
    Lorin W Anderson and David R Krathwohl. 2001. A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives. Longman.
    Kumaripaba Athukorala, Eve Hoggan, Anu Lehtiö, Tuukka Ruotsalo, and Giulio Jacucci. 2013. Information-seeking behaviors of computer scientists: Challenges for electronic literature search tools. Proceedings of the American Society for Information Science and Technology 50, 1 (2013), 1–11. arXiv:
    Tal August, Lucy Lu Wang, Jonathan Bragg, Marti A Hearst, Andrew Head, and Kyle Lo. 2022. Paper Plain: Making Medical Research Papers Approachable to Healthcare Consumers with Natural Language Processing. arXiv preprint arXiv:2203.00130(2022).
    Sandeep Avula, Gordon Chadwick, Jaime Arguello, and Robert Capra. 2018. SearchBots: User Engagement with ChatBots during Collaborative Search. In Proceedings of the 2018 Conference on Human Information Interaction Retrieval (New Brunswick, NJ, USA) (CHIIR ’18). Association for Computing Machinery, New York, NY, USA, 52–61.
    Leif Azzopardi. 2021. Cognitive Biases in Search: A Review and Reflection of Cognitive Biases in Information Retrieval. In Proceedings of the 2021 Conference on Human Information Interaction and Retrieval (Canberra ACT, Australia) (CHIIR ’21). Association for Computing Machinery, New York, NY, USA, 27–37.
    Alina Beygelzimer, Emily Fox, Florence d’Alché Buc, and Hugo Larochelle. 2019. What we learned from NeurIPS 2019 data.
    Abeba Birhane, Pratyusha Kalluri, Dallas Card, William Agnew, Ravit Dotan, and Michelle Bao. 2022. The Values Encoded in Machine Learning Research. In 2022 ACM Conference on Fairness, Accountability, and Transparency (Seoul, Republic of Korea) (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 173–184.
    David M. Blei and Padhraic Smyth. 2017. Science and data science. Proceedings of the National Academy of Sciences 114, 33(2017), 8689–8692. arXiv:
    Marcel Bollmann and Desmond Elliott. 2020. On Forgetting to Cite Older Papers: An Analysis of the ACL Anthology. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 7819–7827.
    Corinna Breitinger, Patrick Wortner, Bela Gipp, and Harald Reiterer. 2019. ’Too Late to Collaborate’: Challenges to the Discovery of in-Progress Research. In 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL). 134–137.
    Melanie S Brucks and Jonathan Levav. 2022. Virtual communication curbs creative idea generation. Nature 605, 7908 (2022), 108–112.
    Hilary Bussell, Jennifer Schnabel, and Amanda K. Rinehart. 2020. Meeting graduate student needs: an exploration of disciplinary differences. Public Services Quarterly 16, 4 (2020), 213–233. arXiv:
    Arthur Câmara, Nirmal Roy, David Maxwell, and Claudia Hauff. 2021. Searching to Learn with Instructional Scaffolding. In Proceedings of the 2021 Conference on Human Information Interaction and Retrieval (Canberra ACT, Australia) (CHIIR ’21). Association for Computing Machinery, New York, NY, USA, 209–218.
    Joel Chan, Joseph Chee Chang, Tom Hope, Dafna Shahaf, and Aniket Kittur. 2018. SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers. Proc. ACM Hum.-Comput. Interact. 2, CSCW, Article 31 (Nov. 2018), 21 pages.
    Jiyoo Chang and Christine Custis. 2022. Understanding Implementation Challenges in Machine Learning Documentation. In Equity and Access in Algorithms, Mechanisms, and Optimization (Arlington, VA, USA) (EAAMO ’22). Association for Computing Machinery, New York, NY, USA, Article 16, 8 pages.
    Duen Horng Chau, Aniket Kittur, Jason I. Hong, and Christos Faloutsos. 2011. Apolo: Making Sense of Large Network Data by Combining Rich User Interaction and Machine Learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI ’11). Association for Computing Machinery, New York, NY, USA, 167–176.
    Kiroong Choe, Seokweon Jung, Seokhyeon Park, Hwajung Hong, and Jinwook Seo. 2021. Papers101: Supporting the Discovery Process in the Literature Review Workflow for Novice Researchers. In 2021 IEEE 14th Pacific Visualization Symposium (PacificVis). 176–180.
    Johan S. G. Chu and James A. Evans. 2021. Slowed canonical progress in large fields of science. Proceedings of the National Academy of Sciences 118, 41(2021).
    Anamaria Crisan, Brittany Fiore-Gartland, and Melanie Tory. 2020. Passing the data baton: A retrospective analysis on data science work and workers. IEEE Transactions on Visualization and Computer Graphics 27, 2(2020), 1860–1870.
    Cecilia di Sciascio, Eduardo Veas, Jordan Barria-Pineda, and Colleen Culley. 2020. Understanding the Effects of Control and Transparency in Searching as Learning. In Proceedings of the 25th International Conference on Intelligent User Interfaces (Cagliari, Italy) (IUI ’20). Association for Computing Machinery, New York, NY, USA, 498–509.
    Cheyenne Dosso, Lynda Tamine, Pierre-Vincent Paubel, and Aline Chevalier. 2022. The Impact of Expertise on Query Formulation Strategies During Complex Learning Task Solving: A Study with Students in Medicine and Computer Science. In Proceedings of the 21st Congress of the International Ergonomics Association (IEA 2021), Nancy L. Black, W. Patrick Neumann, and Ian Noy (Eds.). Springer International Publishing, Cham, 621–627.
    Marcel Dunaiski, Gillian J Greene, and Bernd Fischer. 2017. Exploratory search of academic publication and citation data using interactive tag cloud visualizations. Scientometrics 110, 3 (2017), 1539–1571.
    Marian Dörk, Nathalie Henry Riche, Gonzalo Ramos, and Susan Dumais. 2012. PivotPaths: Strolling through Faceted Information Spaces. IEEE Transactions on Visualization and Computer Graphics 18, 12(2012), 2709–2718.
    Debra Engel, Sarah Robbins, and Christina Kulp. 2011. The Information-Seeking Habits of Engineering Faculty. College & Research Libraries 72, 6 (2011), 548–567.
    Michael Färber and Ann-Kathrin Leisinger. 2021. DataHunter: A System for Finding Datasets Based on Scientific Problem Descriptions. Association for Computing Machinery, New York, NY, USA, 749–752.
    Michael Färber and Ann-Kathrin Leisinger. 2021. DataHunter: A System for Finding Datasets Based on Scientific Problem Descriptions. In Proceedings of the 15th ACM Conference on Recommender Systems (Amsterdam, Netherlands) (RecSys ’21). Association for Computing Machinery, New York, NY, USA, 749–752.
    Raymond Fok, Andrew Head, Jonathan Bragg, Kyle Lo, Marti A Hearst, and Daniel S Weld. 2022. Scim: Intelligent Faceted Highlights for Interactive, Multi-Pass Skimming of Scientific Papers. (2022).
    Luanne Freund, Rick Kopak, and Heather O’Brien. 2016. The effects of textual environment on reading comprehension: Implications for searching as learning. Journal of Information Science 42, 1 (2016), 79–93.
    Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé Iii, and Kate Crawford. 2021. Datasheets for datasets. Commun. ACM 64, 12 (2021), 86–92.
    Katy Ilonka Gero, Vivian Liu, Sarah Huang, Jennifer Lee, and Lydia B. Chilton. 2021. What Makes Tweetorials Tick: How Experts Communicate Complex Topics on Twitter. Proc. ACM Hum.-Comput. Interact. 5, CSCW2, Article 422 (oct 2021), 26 pages.
    Charles J Gomez, Andrew C Herman, and Paolo Parigi. 2022. Leading countries in global science increasingly receive more citations than other countries doing similar research. Nature Human Behaviour(2022), 1–11.
    Ido Guy and Luiz Pizzato. 2016. People Recommendation Tutorial. In Proceedings of the 10th ACM Conference on Recommender Systems (Boston, Massachusetts, USA) (RecSys ’16). Association for Computing Machinery, New York, NY, USA, 431–432.
    Han L. Han, Junhang Yu, Raphael Bournet, Alexandre Ciorascu, Wendy E. Mackay, and Michel Beaudouin-Lafon. 2022. Passages: Interacting with Text Across Documents. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 338, 17 pages.
    Abram Handler, Narges Mahyar, and Brendan O’Connor. 2022. ClioQuery: Interactive Query-Oriented Text Analytics for Comprehensive Investigation of Historical News Archives. ACM Trans. Interact. Intell. Syst. 12, 3, Article 22 (jul 2022), 49 pages.
    Jiangen He, Qing Ping, Wen Lou, and Chaomei Chen. 2019. PaperPoles: Facilitating adaptive visual exploration of scientific publications by citation links. Journal of the Association for Information Science and Technology 70, 8(2019), 843–857.
    Lu He and Changyang He. 2022. Help Me #DebunkThis: Unpacking Individual and Community’s Collaborative Work in Information Credibility Assessment. Proc. ACM Hum.-Comput. Interact. 6, CSCW2, Article 413 (nov 2022), 31 pages.
    Andrew Head, Kyle Lo, Dongyeop Kang, Raymond Fok, Sam Skjonsberg, Daniel S. Weld, and Marti A. Hearst. 2021. Augmenting Scientific Papers with Just-in-Time, Position-Sensitive Definitions of Terms and Symbols. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 413, 18 pages.
    Andrew Head, Amber Xie, and Marti A. Hearst. 2022. Math Augmentation: How Authors Enhance the Readability of Formulas Using Novel Visual Design Practices. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 491, 18 pages.
    Kevin Heffernan and Simone Teufel. 2018. Identifying problems and solutions in scientific text. Scientometrics 116, 2 (2018), 1367–1382.
    Florian Heimerl, Qi Han, and Steffen Koch. 2016. CiteRivers: Visual Analytics of Citation Patterns. IEEE Transactions on Visualization and Computer Graphics 22, 1 (jan 2016), 190–199.
    Orland Hoeber, Dolinkumar Patel, and Dale Storie. 2019. A Study of Academic Search Scenarios and Information Seeking Behaviour. In Proceedings of the 2019 Conference on Human Information Interaction and Retrieval(Glasgow, Scotland UK) (CHIIR ’19). Association for Computing Machinery, New York, NY, USA, 231–235.
    Fred Hohman, Matthew Conlen, Jeffrey Heer, and Duen Horng (Polo) Chau. 2020. Communicating with Interactive Articles. Distill (2020).
    Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé, Miro Dudik, and Hanna Wallach. 2019. Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need?. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–16.
    Chien-yu Huang, Arlene Casey, Dorota Głowacka, and Alan Medlar. 2019. Holes in the Outline: Subject-Dependent Abstract Quality and Its Implications for Scientific Literature Search(CHIIR ’19). Association for Computing Machinery, New York, NY, USA, 289–293.
    Chien-yu Huang, Arlene Casey, Dorota Głowacka, and Alan Medlar. 2019. Holes in the Outline: Subject-Dependent Abstract Quality and Its Implications for Scientific Literature Search. In Proceedings of the 2019 Conference on Human Information Interaction and Retrieval (Glasgow, Scotland UK) (CHIIR ’19). Association for Computing Machinery, New York, NY, USA, 289–293.
    Sharon Favaro Ince, Christopher Hoadley, and Paul A. Kirschner. 2018. A Study of Search Practices in Doctoral Student Scholarly Workflows. In Proceedings of the 2018 Conference on Human Information Interaction; Retrieval (New Brunswick, NJ, USA) (CHIIR ’18). Association for Computing Machinery, New York, NY, USA, 245–248.
    Nanna Inie, Jonas Frich, and Peter Dalsgaard. 2022. How Researchers Manage Ideas. In Creativity and Cognition (Venice, Italy). Association for Computing Machinery, New York, NY, USA, 83–96.
    Emi Ishita, Yasuko Hagiwara, Yukiko Watanabe, and Yoichi Tomiura. 2018. Which Parts of Search Results Do Researchers Check When Selecting Academic Documents?. In Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (Fort Worth, Texas, USA) (JCDL ’18). Association for Computing Machinery, New York, NY, USA, 345–346.
    Charles Jacobs, Wil Li, Evan Schrier, David Bargeron, and David Salesin. 2004. Adaptive Document Layout. Commun. ACM 47, 8 (aug 2004), 60–66.
    Anthony Jameson and Barry Smyth. 2007. Recommendation to groups. In The adaptive web. Springer, 596–627.
    Nicole Janssen. 2022. The Data Science Talent Gap: Why It Exists And What Businesses Can Do About It.
    David Jurgens, Srijan Kumar, Raine Hoover, Dan McFarland, and Dan Jurafsky. 2018. Measuring the Evolution of a Scientific Field through Citation Frames. Transactions of the Association for Computational Linguistics 6 (2018), 391–406.
    Hyeonsu Kang, Joseph Chee Chang, Yongsung Kim, and Aniket Kittur. 2022. Threddy: An Interactive System for Personalized Thread-Based Exploration and Organization of Scientific Literature. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (Bend, OR, USA) (UIST ’22). Association for Computing Machinery, New York, NY, USA, Article 94, 15 pages.
    Hyeonsu B Kang, Rafal Kocielnik, Andrew Head, Jiangjiang Yang, Matt Latzke, Aniket Kittur, Daniel S Weld, Doug Downey, and Jonathan Bragg. 2022. From Who You Know to What You Read: Augmenting Scientific Recommendations with Implicit Social Networks. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 302, 23 pages.
    Hyeonsu B Kang, Sheshera Mysore, Kevin J Huang, Haw-Shiuan Chang, Thorben Prein, Andrew McCallum, Aniket Kittur, and Elsa Olivetti. 2022. Augmenting Scientific Creativity with Retrieval across Knowledge Domains. In Second Workshop on Bridging Human-Computer Interaction and Natural Language Processing at NAACL 2022.
    Hyeonsu B. Kang, Xin Qian, Tom Hope, Dafna Shahaf, Joel Chan, and Aniket Kittur. 2022. Augmenting Scientific Creativity with an Analogical Search Engine. ACM Trans. Comput.-Hum. Interact. (mar 2022). Just Accepted.
    Mary Beth Kery, Bonnie E. John, Patrick O’Flaherty, Amber Horvath, and Brad A. Myers. 2019. Towards Effective Foraging by Data Scientists to Find Past Analysis Choices. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–13.
    Ian A. Knight, Max L. Wilson, David F. Brailsford, and Natasa Milic-Frayling. 2019. Enslaved to the Trapped Data: A Cognitive Work Analysis of Medical Systematic Reviews. In Proceedings of the 2019 Conference on Human Information Interaction and Retrieval (Glasgow, Scotland UK) (CHIIR ’19). Association for Computing Machinery, New York, NY, USA, 203–212.
    Laura M. Koesten, Emilia Kacprzak, Jenifer F. A. Tennison, and Elena Simperl. 2017. The Trials and Tribulations of Working with Structured Data: -A Study on Information Seeking Behaviour. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 1277–1289.
    Kaisu Koivumäki, Timo Koivumäki, and Erkki Karvonen. 2020. "On Social Media Science Seems to Be More Human": Exploring Researchers as Digital Science Communicators. Media and Communication 8, 2 (2020), 425–439.
    Mario Krenn, Lorenzo Buffoni, Bruno Coutinho, Sagi Eppel, Jacob Gates Foster, Andrew Gritsevskiy, Harlin Lee, Yichao Lu, Joao P Moutinho, Nima Sanjabi, 2022. Predicting the Future of AI with AI: High-quality link prediction in an exponentially growing knowledge network. (2022).
    Satyapriya Krishna, Tessa Han, Alex Gu, Javin Pombra, Shahin Jabbari, Steven Wu, and Himabindu Lakkaraju. 2022. The Disagreement Problem in Explainable Machine Learning: A Practitioner’s Perspective. arXiv preprint arXiv:2202.01602(2022).
    Sean Kross and Philip J. Guo. 2019. Practitioners Teaching Data Science in Industry and Academia: Expectations, Workflows, and Challenges. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland UK) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–14.
    Carol Kuhlthau. 1993. Seeking Meaning: a process approach to library and information services" Ablex Publishing. (01 1993).
    Ilia Kuznetsov, Jan Buchmann, Max Eichler, and Iryna Gurevych. 2022. Revise and Resubmit: An Intertextual Model of Text-based Collaboration in Peer Review. arXiv preprint arXiv:2204.10805(2022).
    Esther Landhuis. 2016. Scientific literature: Information overload. Nature 535, 7612 (2016), 457–458.
    Or Levi, Ido Guy, Fiana Raiber, and Oren Kurland. 2018. Selective Cluster Presentation on the Search Results Page. ACM Trans. Inf. Syst. 36, 3, Article 28 (feb 2018), 42 pages.
    Irene Li, Alexander R Fabbri, Robert R Tung, and Dragomir R Radev. 2019. What should i learn first: Introducing lecturebank for nlp education and prerequisite chain learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6674–6681.
    Kevin Li, Haoyang Yang, Anish Upadhayay, Zhiyan Zhou, Jon Saad-Falcon, and Duen Horng Chau. 2021. Argo Scholar: Interactive Visual Exploration of Literature in Browsers. (2021).
    Michael Xieyang Liu, Aniket Kittur, and Brad A. Myers. 2021. To Reuse or Not To Reuse? A Framework and System for Evaluating Summarized Knowledge. Proc. ACM Hum.-Comput. Interact. 5, CSCW1, Article 166 (apr 2021), 35 pages.
    Ying-Hsang Liu, Paul Thomas, Tom Gedeon, and Nicolay Rusnachenko. 2022. Search Interfaces for Biomedical Searching: How Do Gaze, User Perception, Search Behaviour and Search Performance Relate?. In ACM SIGIR Conference on Human Information Interaction and Retrieval (Regensburg, Germany) (CHIIR ’22). Association for Computing Machinery, New York, NY, USA, 78–89.
    Kelvin Luu, Xinyi Wu, Rik Koncel-Kedziorski, Kyle Lo, Isabel Cachola, and Noah A. Smith. 2021. Explaining Relationships Between Scientific Documents. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 2130–2144.
    Gary Marchionini. 2018. Search, sense making and learning: closing gaps. Information and Learning Sciences(2018).
    Moshe Mash, Stephanie Rosenthal, and Reid Simmons. 2021. DSWorkFlow: A Framework for Capturing Data Scientists’ Workflows. Association for Computing Machinery, New York, NY, USA.
    Justin Matejka, Tovi Grossman, and George Fitzmaurice. 2021. Paper Forager: Supporting the Rapid Exploration of Research Document Collections. In Graphics Interface 2021.
    Lori McCay-Peet, Anabel Quan-Haase, and Dagmar Kern. 2015. Exploratory search in digital libraries: a preliminary examination of the use and role of interface features. Proceedings of the Association for Information Science and Technology 52, 1 (2015), 1–4.
    Graham McDonald, Craig Macdonald, and Iadh Ounis. 2022. Search results diversification for effective fair ranking in academic search. Information Retrieval Journal 25, 1 (2022), 1–26.
    Alan Medlar, Jing Li, and Dorota Głowacka. 2021. Query Suggestions as Summarization in Exploratory Search. In Proceedings of the 2021 Conference on Human Information Interaction and Retrieval (Canberra ACT, Australia) (CHIIR ’21). Association for Computing Machinery, New York, NY, USA, 119–128.
    Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model Cards for Model Reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency (Atlanta, GA, USA) (FAT* ’19). Association for Computing Machinery, New York, NY, USA, 220–229.
    John S Morabito and Joel Chan. 2021. Managing Context during Scholarly Knowledge Synthesis: Process Patterns and System Mechanics. In Creativity and Cognition (Virtual Event, Italy). Association for Computing Machinery, New York, NY, USA, Article 39, 5 pages.
    Felipe Moraes, Sindunuraga Rikarno Putra, and Claudia Hauff. 2018. Contrasting Search as a Learning Activity with Instructor-Designed Learning. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (Torino, Italy) (CIKM ’18). Association for Computing Machinery, New York, NY, USA, 167–176.
    Meredith Ringel Morris. 2013. Collaborative Search Revisited. In Proceedings of the 2013 Conference on Computer Supported Cooperative Work (San Antonio, Texas, USA) (CSCW ’13). Association for Computing Machinery, New York, NY, USA, 1181–1192.
    Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q. Vera Liao, Casey Dugan, and Thomas Erickson. 2019. How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–15.
    Sheshera Mysore, Tim O’Gorman, Andrew McCallum, and Hamed Zamani. 2021. CSFCube - A Test Collection of Computer Science Research Articles for Faceted Query by Example. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
    Ya R Nedumov and Sergei D Kuznetsov. 2019. Exploratory search for scientific articles. Programming and Computer Software 45, 7 (2019), 405–416.
    Xi Niu, Bradley M Hemminger, Cory Lown, Stephanie Adams, Cecelia Brown, Allison Level, Merinda McLure, Audrey Powers, Michele R Tennant, and Tara Cataldo. 2010. National study of information seeking behavior of academic researchers in the United States. Journal of the American Society for Information Science and Technology 61, 5 (2010), 869–890.
    Fatima W. Nosheen, Irfan Ali, and Shazia Yasmeen. 2018. Keeping found things found. Information and Learning Science 119, 12 (2018), 712–720.
    Iadh Ounis, Craig MacDonald, and Ian Soboroff. 2021. On the TREC Blog Track. Proceedings of the International AAAI Conference on Web and Social Media 2, 1 (Sep. 2021), 93–101.
    Elisabeth Pain. 2016. How to keep up with the scientific literature. Science Careers 30(2016).
    Srishti Palani, Zijian Ding, Stephen MacNeil, and Steven P. Dow. 2021. The "Active Search" Hypothesis: How Search Strategies Relate to Creative Learning. Association for Computing Machinery, New York, NY, USA, 325–329.
    Srishti Palani, Zijian Ding, Austin Nguyen, Andrew Chuang, Stephen MacNeil, and Steven P. Dow. 2021. CoNotate: Suggesting Queries Based on Notes Promotes Knowledge Discovery. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 726, 14 pages.
    Jennifer Pearson, Tom Owen, Harold Thimbleby, and George R. Buchanan. 2012. Co-Reading: Investigating Collaborative Group Reading. In Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries (Washington, DC, USA) (JCDL ’12). Association for Computing Machinery, New York, NY, USA, 325–334.
    Timothy Persons. 2016. Data and Analytics Innovation: Emerging Opportunities and Challenges. (2016).
    Jinghua Piao, Guozhen Zhang, Fengli Xu, Zhilong Chen, Yu Zheng, Chen Gao, and Yong Li. 2021. Bringing Friends into the Loop of Recommender Systems: An Exploratory Study. Proc. ACM Hum.-Comput. Interact. 5, CSCW2, Article 439 (oct 2021), 26 pages.
    Peter Pirolli and Stuart Card. 2005. The sensemaking process and leverage points for analyst technology as identified through cognitive task analysis. In Proceedings of international conference on intelligence analysis, Vol. 5. McLean, VA, USA, 2–4.
    Antoine Ponsard, Francisco Escalona, and Tamara Munzner. 2016. PaperQuest: A Visualization Tool to Support Literature Review. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems (San Jose, California, USA) (CHI EA ’16). Association for Computing Machinery, New York, NY, USA, 2264–2271.
    Jason Portenoy, Marissa Radensky, Jevin D West, Eric Horvitz, Daniel S Weld, and Tom Hope. 2022. Bursting Scientific Filter Bubbles: Boosting Innovation via Novel Author Discovery. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 309, 13 pages.
    Gil Press. 2013. A Very Short History Of Data Science.
    Sihang Qiu, Ujwal Gadiraju, and Alessandro Bozzon. 2020. Towards Memorable Information Retrieval. In Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval (Virtual Event, Norway) (ICTIR ’20). Association for Computing Machinery, New York, NY, USA, 69–76.
    Napol Rachatasumrit, Jonathan Bragg, Amy X. Zhang, and Daniel S Weld. 2022. CiteRead: Integrating Localized Citation Contexts into Scientific Paper Reading. In 27th International Conference on Intelligent User Interfaces (Helsinki, Finland) (IUI ’22). Association for Computing Machinery, New York, NY, USA, 707–719.
    Behnam Rahdari and Peter Brusilovsky. 2021. PaperExplorer: Personalized Exploratory Search for Conference Proceedings. In IUI Workshops.
    Dheeraj Rajagopal, Xuchao Zhang, Michael Gamon, Sujay Kumar Jauhar, Diyi Yang, and Eduard Hovy. 2022. One Document, Many Revisions: A Dataset for Classification and Description of Edit Intents. In Proceedings of the Thirteenth Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, 5517–5524.
    Nirmal Roy, Felipe Moraes, and Claudia Hauff. 2020. Exploring Users’ Learning Gains within Search Sessions. In Proceedings of the 2020 Conference on Human Information Interaction and Retrieval (Vancouver BC, Canada) (CHIIR ’20). Association for Computing Machinery, New York, NY, USA, 432–436.
    Nirmal Roy, Manuel Valle Torre, Ujwal Gadiraju, David Maxwell, and Claudia Hauff. 2021. Note the Highlight: Incorporating Active Reading Tools in a Search as Learning Environment. In Proceedings of the 2021 Conference on Human Information Interaction and Retrieval (Canberra ACT, Australia) (CHIIR ’21). Association for Computing Machinery, New York, NY, USA, 229–238.
    Daniel M. Russell, Mark J. Stefik, Peter Pirolli, and Stuart K. Card. 1993. The Cost Structure of Sensemaking. In Proceedings of the INTERACT ’93 and CHI ’93 Conference on Human Factors in Computing Systems (Amsterdam, The Netherlands) (CHI ’93). Association for Computing Machinery, New York, NY, USA, 269–276.
    Andrey Rzhetsky, Jacob G. Foster, Ian T. Foster, and James A. Evans. 2015. Choosing experiments to accelerate collective discovery. Proceedings of the National Academy of Sciences 112, 47(2015), 14569–14574. arXiv:
    Hemant Kumar Sahu and Surya Nath Singh. 2013. Information seeking behaviour of astronomy/astrophysics scientists. In Aslib Proceedings. Emerald Group Publishing Limited.
    Dafna Shahaf, Carlos Guestrin, and Eric Horvitz. 2012. Metro Maps of Science. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Beijing, China) (KDD ’12). Association for Computing Machinery, New York, NY, USA, 1122–1130.
    Rina Shaikh-Lesko. 2019. Web annotation tool Hypothesis hits a milestone. Nature 569, 7756 (2019), 295–296.
    Namit Shetty. 2017. Query Suggestions for Detailed Queries.
    Catherine L. Smith and Paul B. Kantor. 2008. User Adaptation: Good Results from Poor Systems. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(Singapore, Singapore) (SIGIR ’08). Association for Computing Machinery, New York, NY, USA, 147–154.
    Catherine L. Smith and Soo Young Rieh. 2019. Knowledge-Context in Search Systems: Toward Information-Literate Actions. In Proceedings of the 2019 Conference on Human Information Interaction and Retrieval(Glasgow, Scotland UK) (CHIIR ’19). Association for Computing Machinery, New York, NY, USA, 55–62.
    Ayah Soufan, Ian Ruthven, and Leif Azzopardi. 2022. Searching the Literature: An Analysis of an Exploratory Search Task. In ACM SIGIR Conference on Human Information Interaction and Retrieval (Regensburg, Germany) (CHIIR ’22). Association for Computing Machinery, New York, NY, USA, 146–157.
    Sruti Srinivasa Ragavan, Sandeep Kaur Kuttal, Charles Hill, Anita Sarma, David Piorkowski, and Margaret Burnett. 2016. Foraging Among an Overabundance of Similar Variants. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’16). Association for Computing Machinery, New York, NY, USA, 3509–3521.
    Krishna Subramanian, Johannes Maas, and Jan Borchers. 2020. TRACTUS: Understanding and Supporting Source Code Experimentation in Hypothesis-Driven Data Science. Association for Computing Machinery, New York, NY, USA, 1–12.
    Hariharan Subramonyam, Colleen Seifert, Priti Shah, and Eytan Adar. 2020. TexSketch: Active Diagramming through Pen-and-Ink Annotations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13.
    Nicole Sultanum, Christine Murad, and Daniel Wigdor. 2020. Understanding and Supporting Academic Literature Review Workflows with LitSense. In Proceedings of the International Conference on Advanced Visual Interfaces (Salerno, Italy) (AVI ’20). Association for Computing Machinery, New York, NY, USA, Article 67, 5 pages.
    Rohail Syed, Kevyn Collins-Thompson, Paul N. Bennett, Mengqiu Teng, Shane Williams, Dr. Wendy W. Tay, and Shamsi Iqbal. 2020. Improving Learning Outcomes with Gaze Tracking and Automatic Question Generation. In Proceedings of The Web Conference 2020 (Taipei, Taiwan) (WWW ’20). Association for Computing Machinery, New York, NY, USA, 1693–1703.
    Editorial Team. 2021. Distill Hiatus. Distill (2021).
    Kelsey Urgo and Jaime Arguello. 2022. Understanding the “Pathway” Towards a Searcher’s Learning Objective. ACM Trans. Inf. Syst. 40, 4, Article 77 (jan 2022), 42 pages.
    Pertti Vakkari. 2016. Searching as learning: A systematization based on literature. Journal of Information Science 42, 1 (2016), 7–18.
    Pertti Vakkari and Saila Huuskonen. 2012. Search effort degrades search output but improves task outcome. Journal of the American Society for Information Science and Technology 63, 4 (2012), 657–670. arXiv:
    Pertti Vakkari, Mikko Pennanen, and Sami Serola. 2003. Changes of search terms and tactics while writing a research proposal: A longitudinal case study. Information Processing & Management 39, 3 (2003), 445–463.
    Richard Van Noorden. 2014. Global scientific output doubles every nine years. Nature news blog (2014).
    April Yi Wang, Dakuo Wang, Jaimie Drozdal, Xuye Liu, Soya Park, Steve Oney, and Christopher Brooks. 2021. What Makes a Well-Documented Notebook? A Case Study of Data Scientists’ Documentation Practices in Kaggle. Association for Computing Machinery, New York, NY, USA.
    Yun Wang, Dongyu Liu, Huamin Qu, Qiong Luo, and Xiaojuan Ma. 2016. A Guided Tour of Literature Review: Facilitating Academic Paper Reading with Narrative Visualization. In Proceedings of the 9th International Symposium on Visual Information Communication and Interaction (Dallas, TX, USA) (VINCI ’16). Association for Computing Machinery, New York, NY, USA, 17–24.
    Jevin D. West and Carl T. Bergstrom. 2021. Misinformation in and about science. Proceedings of the National Academy of Sciences 118, 15(2021), e1912444117.
    Ryen W White and Resa A Roth. 2009. Exploratory search: Beyond the query-response paradigm. Synthesis lectures on information concepts, retrieval, and services 1, 1(2009), 1–98.
    Teena Willoughby, S. Alexandria Anderson, Eileen Wood, Julie Mueller, and Craig Ross. 2009. Fast searching for information on the Internet to use in a learning context: The impact of domain knowledge. Computers & Education 52, 3 (2009), 640–648.
    Longqi Yang, David Holtz, Sonia Jaffe, Siddharth Suri, Shilpi Sinha, Jeffrey Weston, Connor Joyce, Neha Shah, Kevin Sherman, Brent Hecht, 2022. The effects of remote work on collaboration among information workers. Nature human behaviour 6, 1 (2022), 43–54.
    Amy X. Zhang and Justin Cranshaw. 2018. Making Sense of Group Chat through Collaborative Tagging and Summarization. Proc. ACM Hum.-Comput. Interact. 2, CSCW, Article 196 (nov 2018), 27 pages.
    Amy X. Zhang, Michael Muller, and Dakuo Wang. 2020. How Do Data Science Workers Collaborate? Roles, Workflows, and Tools. Proc. ACM Hum.-Comput. Interact. 4, CSCW1, Article 22 (may 2020), 23 pages.
    Huiwen Zhang, Dana McKay, and George Buchanan. 2021. I’ve Got All My Readers With Me: A Model of Reading as a Social Activity. In Proceedings of the 2021 Conference on Human Information Interaction and Retrieval (Canberra ACT, Australia) (CHIIR ’21). Association for Computing Machinery, New York, NY, USA, 185–195.
    Xiaoyu Zhang, Senthil Chandrasegaran, and Kwan-Liu Ma. 2021. ConceptScope: Organizing and Visualizing Knowledge in Documents Based on Domain Ontology. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 19, 13 pages.
    Xiaolong Zhang, Yan Qu, C. Lee Giles, and Piyou Song. 2008. CiteSense: Supporting Sensemaking of Research Literature. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Florence, Italy) (CHI ’08). Association for Computing Machinery, New York, NY, USA, 677–680.
    Yongfeng Zhang, Xu Chen, 2020. Explainable recommendation: A survey and new perspectives. Foundations and Trends® in Information Retrieval 14, 1(2020), 1–101.
    Sacha Zyto, David Karger, Mark Ackerman, and Sanjoy Mahajan. 2012. Successful Classroom Deployment of a Social Document Annotation System. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Austin, Texas, USA) (CHI ’12). Association for Computing Machinery, New York, NY, USA, 1883–1892.
    Sacha Zyto, David Karger, Mark Ackerman, and Sanjoy Mahajan. 2012. Successful Classroom Deployment of a Social Document Annotation System. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Austin, Texas, USA) (CHI ’12). Association for Computing Machinery, New York, NY, USA, 1883–1892.

    Cited By

    View all
    • (2024)Artificial Intelligence in Educational ResearchResearch Advances in Data Mining Techniques and Applications10.5772/intechopen.113844Online publication date: 2-May-2024
    • (2024)Adaptive Search Support for Teachers in Lesson PlanningAdjunct Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization10.1145/3631700.3664921(20-24)Online publication date: 27-Jun-2024
    • (2023)FFL: A Language and Live Runtime for Styling and Labeling Typeset Math FormulasProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606731(1-16)Online publication date: 29-Oct-2023
    • Show More Cited By



    Information & Contributors


    Published In

    cover image ACM Conferences
    CHIIR '23: Proceedings of the 2023 Conference on Human Information Interaction and Retrieval
    March 2023
    520 pages
    • Editors:
    • Jacek Gwizdka,
    • Soo Young Rieh
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].



    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 March 2023


    Request permissions for this article.

    Check for updates


    • Research-article
    • Research
    • Refereed limited

    Funding Sources


    CHIIR '23

    Acceptance Rates

    Overall Acceptance Rate 55 of 163 submissions, 34%


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • Downloads (Last 12 months)257
    • Downloads (Last 6 weeks)42

    Other Metrics


    Cited By

    View all
    • (2024)Artificial Intelligence in Educational ResearchResearch Advances in Data Mining Techniques and Applications10.5772/intechopen.113844Online publication date: 2-May-2024
    • (2024)Adaptive Search Support for Teachers in Lesson PlanningAdjunct Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization10.1145/3631700.3664921(20-24)Online publication date: 27-Jun-2024
    • (2023)FFL: A Language and Live Runtime for Styling and Labeling Typeset Math FormulasProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606731(1-16)Online publication date: 29-Oct-2023
    • (2023)Word embeddings for retrieving tabular data from research publicationsMachine Language10.1007/s10994-023-06472-0113:4(2227-2248)Online publication date: 29-Nov-2023

    View Options

    View options


    View or Download as a PDF file.



    View online with eReader.


    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options







    Share this Publication link

    Share on social media