subscribe to arXiv mailings

Imitation of Life: A Search Engine for Biologically Inspired Design

Authors: Hen Emuna, Nadav Borenstein, Xin Qian, Hyeonsu Kang, Joel Chan, Aniket Kittur, Dafna Shahaf

Abstract: Biologically Inspired Design (BID), or Biomimicry, is a problem-solving methodology that applies analogies from nature to solve engineering challenges. For example, Speedo engineers designed swimsuits based on shark skin. Finding relevant biological solutions for real-world problems poses significant challenges, both due to the limited biological knowledge engineers and designers typically possess… ▽ More Biologically Inspired Design (BID), or Biomimicry, is a problem-solving methodology that applies analogies from nature to solve engineering challenges. For example, Speedo engineers designed swimsuits based on shark skin. Finding relevant biological solutions for real-world problems poses significant challenges, both due to the limited biological knowledge engineers and designers typically possess and to the limited BID resources. Existing BID datasets are hand-curated and small, and scaling them up requires costly human annotations. In this paper, we introduce BARcode (Biological Analogy Retriever), a search engine for automatically mining bio-inspirations from the web at scale. Using advances in natural language understanding and data programming, BARcode identifies potential inspirations for engineering challenges. Our experiments demonstrate that BARcode can retrieve inspirations that are valuable to engineers and designers tackling real-world problems, as well as recover famous historical BID examples. We release data and code; we view BARcode as a step towards addressing the challenges that have historically hindered the practical application of BID to engineering innovation. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: To be published in the AAAI 2024 Proceedings Main Track

arXiv:2312.11388 [pdf, other]

BioSpark: An End-to-End Generative System for Biological-Analogical Inspirations and Ideation

Authors: Hyeonsu B. Kang, David Chuan-En Lin, Nikolas Martelaro, Aniket Kittur, Yan-Ying Chen, Matthew K. Hong

Abstract: Nature is often used to inspire solutions for complex engineering problems, but achieving its full potential is challenging due to difficulties in discovering relevant analogies and synthesizing from them. Here, we present an end-to-end system, BioSpark, that generates biological-analogical mechanisms and provides an interactive interface to comprehend and synthesize from them. BioSpark pipeline s… ▽ More Nature is often used to inspire solutions for complex engineering problems, but achieving its full potential is challenging due to difficulties in discovering relevant analogies and synthesizing from them. Here, we present an end-to-end system, BioSpark, that generates biological-analogical mechanisms and provides an interactive interface to comprehend and synthesize from them. BioSpark pipeline starts with a small seed set of mechanisms and expands it using an iteratively constructed taxonomic hierarchies, overcoming data sparsity in manual expert curation and limited conceptual diversity in automated analogy generation via LLMs. The interface helps designers with recognizing and understanding relevant analogs to design problems using four main interaction features. We evaluate the biological-analogical mechanism generation pipeline and showcase the value of BioSpark through case studies. We end with discussion and implications for future work in this area. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: NeurIPS 2023 Workshop on Machine Learning for Creativity and Design

arXiv:2310.02161 [pdf, other]

doi 10.1145/3613904.3642149

Selenite: Scaffolding Online Sensemaking with Comprehensive Overviews Elicited from Large Language Models

Authors: Michael Xieyang Liu, Tongshuang Wu, Tianying Chen, Franklin Mingzhe Li, Aniket Kittur, Brad A. Myers

Abstract: Sensemaking in unfamiliar domains can be challenging, demanding considerable user effort to compare different options with respect to various criteria. Prior research and our formative study found that people would benefit from reading an overview of an information space upfront, including the criteria others previously found useful. However, existing sensemaking tools struggle with the "cold-star… ▽ More Sensemaking in unfamiliar domains can be challenging, demanding considerable user effort to compare different options with respect to various criteria. Prior research and our formative study found that people would benefit from reading an overview of an information space upfront, including the criteria others previously found useful. However, existing sensemaking tools struggle with the "cold-start" problem -- it not only requires significant input from previous users to generate and share these overviews, but such overviews may also turn out to be biased and incomplete. In this work, we introduce a novel system, Selenite, which leverages Large Language Models (LLMs) as reasoning machines and knowledge retrievers to automatically produce a comprehensive overview of options and criteria to jumpstart users' sensemaking processes. Subsequently, Selenite also adapts as people use it, helping users find, read, and navigate unfamiliar information in a systematic yet personalized manner. Through three studies, we found that Selenite produced accurate and high-quality overviews reliably, significantly accelerated users' information processing, and effectively improved their overall comprehension and sensemaking experience. △ Less

Submitted 28 January, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

Comments: Accepted to CHI 2024

arXiv:2308.07517 [pdf, other]

doi 10.1145/3586183.3606759

Synergi: A Mixed-Initiative System for Scholarly Synthesis and Sensemaking

Authors: Hyeonsu B. Kang, Sherry Tongshuang Wu, Joseph Chee Chang, Aniket Kittur

Abstract: Efficiently reviewing scholarly literature and synthesizing prior art are crucial for scientific progress. Yet, the growing scale of publications and the burden of knowledge make synthesis of research threads more challenging than ever. While significant research has been devoted to helping scholars interact with individual papers, building research threads scattered across multiple papers remains… ▽ More Efficiently reviewing scholarly literature and synthesizing prior art are crucial for scientific progress. Yet, the growing scale of publications and the burden of knowledge make synthesis of research threads more challenging than ever. While significant research has been devoted to helping scholars interact with individual papers, building research threads scattered across multiple papers remains a challenge. Most top-down synthesis (and LLMs) make it difficult to personalize and iterate on the output, while bottom-up synthesis is costly in time and effort. Here, we explore a new design space of mixed-initiative workflows. In doing so we develop a novel computational pipeline, Synergi, that ties together user input of relevant seed threads with citation graphs and LLMs, to expand and structure them, respectively. Synergi allows scholars to start with an entire threads-and-subthreads structure generated from papers relevant to their interests, and to iterate and customize on it as they wish. In our evaluation, we find that Synergi helps scholars efficiently make sense of relevant threads, broaden their perspectives, and increases their curiosity. We discuss future design implications for thread-based, mixed-initiative scholarly synthesis support tools. △ Less

Submitted 14 August, 2023; originally announced August 2023.

Comments: ACM UIST'23

arXiv:2303.14334 [pdf, other]

The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces

Authors: Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X. Zhang, Cassidy Trier, Chloe Anastasiades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Yen-Sung Chen, Evie Yu-Yen Cheng, Yvonne Chou, Doug Downey, Rob Evans, Raymond Fok, Fangzhou Hu, Regan Huff, Dongyeop Kang, Tae Soo Kim, Rodney Kinney , et al. (30 additional authors not shown)

Abstract: Scholarly publications are key to the transfer of knowledge from scholars to others. However, research papers are information-dense, and as the volume of the scientific literature grows, the need for new technology to support the reading process grows. In contrast to the process of finding papers, which has been transformed by Internet technology, the experience of reading research papers has chan… ▽ More Scholarly publications are key to the transfer of knowledge from scholars to others. However, research papers are information-dense, and as the volume of the scientific literature grows, the need for new technology to support the reading process grows. In contrast to the process of finding papers, which has been transformed by Internet technology, the experience of reading research papers has changed little in decades. The PDF format for sharing research papers is widely used due to its portability, but it has significant downsides including: static content, poor accessibility for low-vision readers, and difficulty reading on mobile devices. This paper explores the question "Can recent advances in AI and HCI power intelligent, interactive, and accessible reading interfaces -- even for legacy PDFs?" We describe the Semantic Reader Project, a collaborative effort across multiple institutions to explore automatic creation of dynamic reading interfaces for research papers. Through this project, we've developed ten research prototype interfaces and conducted usability studies with more than 300 participants and real-world users showing improved reading experiences for scholars. We've also released a production reading interface for research papers that will incorporate the best features as they mature. We structure this paper around challenges scholars and the public face when reading research papers -- Discovery, Efficiency, Comprehension, Synthesis, and Accessibility -- and present an overview of our progress and remaining open challenges. △ Less

Submitted 23 April, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

arXiv:2208.14861 [pdf, other]

doi 10.1145/3526113.3545693

Fuse: In-Situ Sensemaking Support in the Browser

Authors: Andrew Kuznetsov, Joseph Chee Chang, Nathan Hahn, Napol Rachatasumrit, Bradley Breneisen, Julina Coupland, Aniket Kittur

Abstract: People spend a significant amount of time trying to make sense of the internet, collecting content from a variety of sources and organizing it to make decisions and achieve their goals. While humans are able to fluidly iterate on collecting and organizing information in their minds, existing tools and approaches introduce significant friction into the process. We introduce Fuse, a browser extensio… ▽ More People spend a significant amount of time trying to make sense of the internet, collecting content from a variety of sources and organizing it to make decisions and achieve their goals. While humans are able to fluidly iterate on collecting and organizing information in their minds, existing tools and approaches introduce significant friction into the process. We introduce Fuse, a browser extension that externalizes users' working memory by combining low-cost collection with lightweight organization of content in a compact card-based sidebar that is always available. Fuse helps users simultaneously extract key web content and structure it in a lightweight and visual way. We discuss how these affordances help users externalize more of their mental model into the system (e.g., saving, annotating, and structuring items) and support fast reviewing and resumption of task contexts. Our 22-month public deployment and follow-up interviews provide longitudinal insights into the structuring behaviors of real-world users conducting information foraging tasks. △ Less

Submitted 31 August, 2022; originally announced August 2022.

arXiv:2208.03455 [pdf, other]

doi 10.1145/3526113.3545660

Threddy: An Interactive System for Personalized Thread-based Exploration and Organization of Scientific Literature

Authors: Hyeonsu B. Kang, Joseph Chee Chang, Yongsung Kim, Aniket Kittur

Abstract: Reviewing the literature to understand relevant threads of past work is a critical part of research and vehicle for learning. However, as the scientific literature grows the challenges for users to find and make sense of the many different threads of research grow as well. Previous work has helped scholars to find and group papers with citation information or textual similarity using standalone to… ▽ More Reviewing the literature to understand relevant threads of past work is a critical part of research and vehicle for learning. However, as the scientific literature grows the challenges for users to find and make sense of the many different threads of research grow as well. Previous work has helped scholars to find and group papers with citation information or textual similarity using standalone tools or overview visualizations. Instead, in this work we explore a tool integrated into users' reading process that helps them with leveraging authors' existing summarization of threads, typically in introduction or related work sections, in order to situate their own work's contributions. To explore this we developed a prototype that supports efficient extraction and organization of threads along with supporting evidence as scientists read research articles. The system then recommends further relevant articles based on user-created threads. We evaluate the system in a lab study and find that it helps scientists to follow and curate research threads without breaking out of their flow of reading, collect relevant papers and clips, and discover interesting new articles to further grow threads. △ Less

Submitted 16 August, 2022; v1 submitted 6 August, 2022; originally announced August 2022.

Comments: To appear at ACM UIST'22

arXiv:2208.00496 [pdf, other]

doi 10.1145/3526113.3545661

Wigglite: Low-cost Information Collection and Triage

Authors: Michael Xieyang Liu, Andrew Kuznetsov, Yongsung Kim, Joseph Chee Chang, Aniket Kittur, Brad A. Myers

Abstract: Consumers conducting comparison shopping, researchers making sense of competitive space, and developers looking for code snippets online all face the challenge of capturing the information they find for later use without interrupting their current flow. In addition, during many learning and exploration tasks, people need to externalize their mental context, such as estimating how urgent a topic is… ▽ More Consumers conducting comparison shopping, researchers making sense of competitive space, and developers looking for code snippets online all face the challenge of capturing the information they find for later use without interrupting their current flow. In addition, during many learning and exploration tasks, people need to externalize their mental context, such as estimating how urgent a topic is to follow up on, or rating a piece of evidence as a "pro" or "con," which helps scaffold subsequent deeper exploration. However, current approaches incur a high cost, often requiring users to select, copy, context switch, paste, and annotate information in a separate document without offering specific affordances that capture their mental context. In this work, we explore a new interaction technique called "wiggling," which can be used to fluidly collect, organize, and rate information during early sensemaking stages with a single gesture. Wiggling involves rapid back-and-forth movements of a pointer or up-and-down scrolling on a smartphone, which can indicate the information to be collected and its valence, using a single, light-weight gesture that does not interfere with other interactions that are already available. Through implementation and user evaluation, we found that wiggling helped participants accurately collect information and encode their mental context with a 58% reduction in operational cost while being 24% faster compared to a common baseline. △ Less

Submitted 31 July, 2022; originally announced August 2022.

arXiv:2206.01328 [pdf, other]

Augmenting Scientific Creativity with Retrieval across Knowledge Domains

Authors: Hyeonsu B. Kang, Sheshera Mysore, Kevin Huang, Haw-Shiuan Chang, Thorben Prein, Andrew McCallum, Aniket Kittur, Elsa Olivetti

Abstract: Exposure to ideas in domains outside a scientist's own may benefit her in reformulating existing research problems in novel ways and discovering new application domains for existing solution ideas. While improved performance in scholarly search engines can help scientists efficiently identify relevant advances in domains they may already be familiar with, it may fall short of helping them explore… ▽ More Exposure to ideas in domains outside a scientist's own may benefit her in reformulating existing research problems in novel ways and discovering new application domains for existing solution ideas. While improved performance in scholarly search engines can help scientists efficiently identify relevant advances in domains they may already be familiar with, it may fall short of helping them explore diverse ideas \textit{outside} such domains. In this paper we explore the design of systems aimed at augmenting the end-user ability in cross-domain exploration with flexible query specification. To this end, we develop an exploratory search system in which end-users can select a portion of text core to their interest from a paper abstract and retrieve papers that have a high similarity to the user-selected core aspect but differ in terms of domains. Furthermore, end-users can `zoom in' to specific domain clusters to retrieve more papers from them and understand nuanced differences within the clusters. Our case studies with scientists uncover opportunities and design implications for systems aimed at facilitating cross-domain exploration and inspiration. △ Less

Submitted 14 December, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

Comments: NLP+HCI Workshop at NAACL 2022

arXiv:2205.15476 [pdf, other]

Augmenting Scientific Creativity with an Analogical Search Engine

Authors: Hyeonsu B. Kang, Xin Qian, Tom Hope, Dafna Shahaf, Joel Chan, Aniket Kittur

Abstract: Analogies have been central to creative problem-solving throughout the history of science and technology. As the number of scientific papers continues to increase exponentially, there is a growing opportunity for finding diverse solutions to existing problems. However, realizing this potential requires the development of a means for searching through a large corpus that goes beyond surface matches… ▽ More Analogies have been central to creative problem-solving throughout the history of science and technology. As the number of scientific papers continues to increase exponentially, there is a growing opportunity for finding diverse solutions to existing problems. However, realizing this potential requires the development of a means for searching through a large corpus that goes beyond surface matches and simple keywords. Here we contribute the first end-to-end system for analogical search on scientific papers and evaluate its effectiveness with scientists' own problems. Using a human-in-the-loop AI system as a probe we find that our system facilitates creative ideation, and that ideation success is mediated by an intermediate level of matching on the problem abstraction (i.e., high versus low). We also demonstrate a fully automated AI search engine that achieves a similar accuracy with the human-in-the-loop system. We conclude with design implications for enabling automated analogical inspiration engines to accelerate scientific innovation. △ Less

Submitted 30 May, 2022; originally announced May 2022.

arXiv:2204.10254 [pdf, other]

doi 10.1145/3491102.3517470

From Who You Know to What You Read: Augmenting Scientific Recommendations with Implicit Social Networks

Authors: Hyeonsu B. Kang, Rafal Kocielnik, Andrew Head, Jiangjiang Yang, Matt Latzke, Aniket Kittur, Daniel S. Weld, Doug Downey, Jonathan Bragg

Abstract: The ever-increasing pace of scientific publication necessitates methods for quickly identifying relevant papers. While neural recommenders trained on user interests can help, they still result in long, monotonous lists of suggested papers. To improve the discovery experience we introduce multiple new methods for \em augmenting recommendations with textual relevance messages that highlight knowledg… ▽ More The ever-increasing pace of scientific publication necessitates methods for quickly identifying relevant papers. While neural recommenders trained on user interests can help, they still result in long, monotonous lists of suggested papers. To improve the discovery experience we introduce multiple new methods for \em augmenting recommendations with textual relevance messages that highlight knowledge-graph connections between recommended papers and a user's publication and interaction history. We explore associations mediated by author entities and those using citations alone. In a large-scale, real-world study, we show how our approach significantly increases engagement -- and future engagement when mediated by authors -- without introducing bias towards highly-cited authors. To expand message coverage for users with less publication or interaction history, we develop a novel method that highlights connections with proxy authors of interest to users and evaluate it in a controlled lab study. Finally, we synthesize design implications for future graph-based messages. △ Less

Submitted 21 April, 2022; originally announced April 2022.

Comments: to be published in ACM SIGCHI 2022

arXiv:2202.02175 [pdf, other]

doi 10.1145/3491102.3501968

Crystalline: Lowering the Cost for Developers to Collect and Organize Information for Decision Making

Authors: Michael Xieyang Liu, Aniket Kittur, Brad A. Myers

Abstract: Developers perform online sensemaking on a daily basis, such as researching and choosing libraries and APIs. Prior research has introduced tools that help developers capture information from various sources and organize it into structures useful for subsequent decision-making. However, it remains a laborious process for developers to manually identify and clip content, maintaining its provenance a… ▽ More Developers perform online sensemaking on a daily basis, such as researching and choosing libraries and APIs. Prior research has introduced tools that help developers capture information from various sources and organize it into structures useful for subsequent decision-making. However, it remains a laborious process for developers to manually identify and clip content, maintaining its provenance and synthesizing it with other content. In this work, we introduce a new system called Crystalline that attempts to automatically collect and organize information into tabular structures as the user searches and browses the web. It leverages natural language processing to automatically group similar criteria together to reduce clutter as well as passive behavioral signals such as mouse movement and dwell time to infer what information to collect and how to visualize and prioritize it. Our user study suggests that developers are able to create comparison tables about 20% faster with a 60% reduction in operational cost without sacrificing the quality of the tables. △ Less

Submitted 4 February, 2022; originally announced February 2022.

Journal ref: CHI Conference on Human Factors in Computing Systems (CHI 2022)

arXiv:2111.07250 [pdf]

Metrics and Mechanisms: Measuring the Unmeasurable in the Science of Science

Authors: Lingfei Wu, Aniket Kittur, Hyejin Youn, Staša Milojević, Erin Leahey, Stephen M. Fiore, Yong Yeol Ahn

Abstract: What science does, what science could do, and how to make science work? If we want to know the answers to these questions, we need to be able to uncover the mechanisms of science, going beyond metrics that are easily collectible and quantifiable. In this perspective piece, we link metrics to mechanisms by demonstrating how emerging metrics of science not only offer complementaries to existing ones… ▽ More What science does, what science could do, and how to make science work? If we want to know the answers to these questions, we need to be able to uncover the mechanisms of science, going beyond metrics that are easily collectible and quantifiable. In this perspective piece, we link metrics to mechanisms by demonstrating how emerging metrics of science not only offer complementaries to existing ones, but also shed light on the hidden structure and mechanisms of science. Based on fundamental properties of science, we classify existing theories and findings into: hot and cold science referring to attention shift between scientific fields, fast and slow science reflecting productivity of scientists and teams, soft and hard science revealing reproducibility of scientific research. We suggest that interest about mechanisms of science since Derek J. de Solla Price, Robert K. Merton, Eugene Garfield, and many others complement the zeitgeist in pursuing new, complex metrics without understanding the underlying processes. We propose that understanding and modeling the mechanisms of science condition effective development and application of metrics. △ Less

Submitted 9 April, 2022; v1 submitted 14 November, 2021; originally announced November 2021.

Comments: 20 pages, 1 figure

arXiv:2102.09761 [pdf, other]

Scaling Creative Inspiration with Fine-Grained Functional Aspects of Ideas

Authors: Tom Hope, Ronen Tamari, Hyeonsu Kang, Daniel Hershcovich, Joel Chan, Aniket Kittur, Dafna Shahaf

Abstract: Large repositories of products, patents and scientific papers offer an opportunity for building systems that scour millions of ideas and help users discover inspirations. However, idea descriptions are typically in the form of unstructured text, lacking key structure that is required for supporting creative innovation interactions. Prior work has explored idea representations that were either limi… ▽ More Large repositories of products, patents and scientific papers offer an opportunity for building systems that scour millions of ideas and help users discover inspirations. However, idea descriptions are typically in the form of unstructured text, lacking key structure that is required for supporting creative innovation interactions. Prior work has explored idea representations that were either limited in expressivity, required significant manual effort from users, or dependent on curated knowledge bases with poor coverage. We explore a novel representation that automatically breaks up products into fine-grained functional aspects capturing the purposes and mechanisms of ideas, and use it to support important creative innovation interactions: functional search for ideas, and exploration of the design space around a focal problem by viewing related problem perspectives pooled from across many products. In user studies, our approach boosts the quality of creative search and inspirations, substantially outperforming strong baselines by 50-60%. △ Less

Submitted 17 February, 2022; v1 submitted 19 February, 2021; originally announced February 2021.

Comments: To appear in CHI 2022

Journal ref: CHI 2022

arXiv:2102.06231 [pdf, other]

doi 10.1145/3449240

To Reuse or Not To Reuse? A Framework and System for Evaluating Summarized Knowledge

Authors: Michael Xieyang Liu, Aniket Kittur, Brad A. Myers

Abstract: As the amount of information online continues to grow, a correspondingly important opportunity is for individuals to reuse knowledge which has been summarized by others rather than starting from scratch. However, appropriate reuse requires judging the relevance, trustworthiness, and thoroughness of others' knowledge in relation to an individual's goals and context. In this work, we explore augment… ▽ More As the amount of information online continues to grow, a correspondingly important opportunity is for individuals to reuse knowledge which has been summarized by others rather than starting from scratch. However, appropriate reuse requires judging the relevance, trustworthiness, and thoroughness of others' knowledge in relation to an individual's goals and context. In this work, we explore augmenting judgements of the appropriateness of reusing knowledge in the domain of programming, specifically of reusing artifacts that result from other developers' searching and decision making. Through an analysis of prior research on sensemaking and trust, along with new interviews with developers, we synthesized a framework for reuse judgements. The interviews also validated that developers express a desire for help with judging whether to reuse an existing decision. From this framework, we developed a set of techniques for capturing the initial decision maker's behavior and visualizing signals calculated based on the behavior, to facilitate subsequent consumers' reuse decisions, instantiated in a prototype system called Strata. Results of a user study suggest that the system significantly improves the accuracy, depth, and speed of reusing decisions. These results have implications for systems involving user-generated content in which other users need to evaluate the relevance and trustworthiness of that content. △ Less

Submitted 18 February, 2021; v1 submitted 11 February, 2021; originally announced February 2021.

Journal ref: Proc. ACM Hum.-Comput. Interact.5, CSCW1, Article 166(April 2021), 35 pages

arXiv:1712.06880 [pdf, other]

Analogy Mining for Specific Design Needs

Authors: Karni Gilon, Felicia Y Ng, Joel Chan, Hila Lifshitz Assaf, Aniket Kittur, Dafna Shahaf

Abstract: Finding analogical inspirations in distant domains is a powerful way of solving problems. However, as the number of inspirations that could be matched and the dimensions on which that matching could occur grow, it becomes challenging for designers to find inspirations relevant to their needs. Furthermore, designers are often interested in exploring specific aspects of a product-- for example, one… ▽ More Finding analogical inspirations in distant domains is a powerful way of solving problems. However, as the number of inspirations that could be matched and the dimensions on which that matching could occur grow, it becomes challenging for designers to find inspirations relevant to their needs. Furthermore, designers are often interested in exploring specific aspects of a product-- for example, one designer might be interested in improving the brewing capability of an outdoor coffee maker, while another might wish to optimize for portability. In this paper we introduce a novel system for targeting analogical search for specific needs. Specifically, we contribute a novel analogical search engine for expressing and abstracting specific design needs that returns more distant yet relevant inspirations than alternate approaches. △ Less

Submitted 19 December, 2017; originally announced December 2017.

arXiv:1706.05585 [pdf, other]

Accelerating Innovation Through Analogy Mining

Authors: Tom Hope, Joel Chan, Aniket Kittur, Dafna Shahaf

Abstract: The availability of large idea repositories (e.g., the U.S. patent database) could significantly accelerate innovation and discovery by providing people with inspiration from solutions to analogous problems. However, finding useful analogies in these large, messy, real-world repositories remains a persistent challenge for either human or automated methods. Previous approaches include costly hand-c… ▽ More The availability of large idea repositories (e.g., the U.S. patent database) could significantly accelerate innovation and discovery by providing people with inspiration from solutions to analogous problems. However, finding useful analogies in these large, messy, real-world repositories remains a persistent challenge for either human or automated methods. Previous approaches include costly hand-created databases that have high relational structure (e.g., predicate calculus representations) but are very sparse. Simpler machine-learning/information-retrieval similarity metrics can scale to large, natural-language datasets, but struggle to account for structural similarity, which is central to analogy. In this paper we explore the viability and value of learning simpler structural representations, specifically, "problem schemas", which specify the purpose of a product and the mechanisms by which it achieves that purpose. Our approach combines crowdsourcing and recurrent neural networks to extract purpose and mechanism vector representations from product descriptions. We demonstrate that these learned vectors allow us to find analogies with higher precision and recall than traditional information-retrieval methods. In an ideation experiment, analogies retrieved by our models significantly increased people's likelihood of generating creative ideas compared to analogies retrieved by traditional methods. Our results suggest a promising approach to enabling computational analogy at scale is to learn and leverage weaker structural representations. △ Less

Submitted 17 June, 2017; originally announced June 2017.

Comments: KDD 2017

arXiv:1110.6200 [pdf, other]

TopicViz: Semantic Navigation of Document Collections

Authors: Jacob Eisenstein, Duen Horng "Polo" Chau, Aniket Kittur, Eric P. Xing

Abstract: When people explore and manage information, they think in terms of topics and themes. However, the software that supports information exploration sees text at only the surface level. In this paper we show how topic modeling -- a technique for identifying latent themes across large collections of documents -- can support semantic exploration. We present TopicViz, an interactive environment for info… ▽ More When people explore and manage information, they think in terms of topics and themes. However, the software that supports information exploration sees text at only the surface level. In this paper we show how topic modeling -- a technique for identifying latent themes across large collections of documents -- can support semantic exploration. We present TopicViz, an interactive environment for information exploration. TopicViz combines traditional search and citation-graph functionality with a range of novel interactive visualizations, centered around a force-directed layout that links documents to the latent themes discovered by the topic model. We describe several use scenarios in which TopicViz supports rapid sensemaking on large document collections. △ Less

Submitted 3 November, 2011; v1 submitted 27 October, 2011; originally announced October 2011.

Showing 1–18 of 18 results for author: Kittur, A