-
Form follows Function: Text-to-Text Conditional Graph Generation based on Functional Requirements
Authors:
Peter A. Zachares,
Vahan Hovhannisyan,
Alan Mosca,
Yarin Gal
Abstract:
This work focuses on the novel problem setting of generating graphs conditioned on a description of the graph's functional requirements in a downstream task. We pose the problem as a text-to-text generation problem and focus on the approach of fine-tuning a pretrained large language model (LLM) to generate graphs. We propose an inductive bias which incorporates information about the structure of t…
▽ More
This work focuses on the novel problem setting of generating graphs conditioned on a description of the graph's functional requirements in a downstream task. We pose the problem as a text-to-text generation problem and focus on the approach of fine-tuning a pretrained large language model (LLM) to generate graphs. We propose an inductive bias which incorporates information about the structure of the graph into the LLM's generation process by incorporating message passing layers into an LLM's architecture. To evaluate our proposed method, we design a novel set of experiments using publicly available and widely studied molecule and knowledge graph data sets. Results suggest our proposed approach generates graphs which more closely meet the requested functional requirements, outperforming baselines developed on similar tasks by a statistically significant margin.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
A Hybrid Model for Forecasting Short-Term Electricity Demand
Authors:
Maria Eleni Athanasopoulou,
Justina Deveikyte,
Alan Mosca,
Ilaria Peri,
Alessandro Provetti
Abstract:
Currently the UK Electric market is guided by load (demand) forecasts published every thirty minutes by the regulator. A key factor in predicting demand is weather conditions, with forecasts published every hour. We present HYENA: a hybrid predictive model that combines feature engineering (selection of the candidate predictor features), mobile-window predictors and finally LSTM encoder-decoders t…
▽ More
Currently the UK Electric market is guided by load (demand) forecasts published every thirty minutes by the regulator. A key factor in predicting demand is weather conditions, with forecasts published every hour. We present HYENA: a hybrid predictive model that combines feature engineering (selection of the candidate predictor features), mobile-window predictors and finally LSTM encoder-decoders to achieve higher accuracy with respect to mainstream models from the literature. HYENA decreased MAPE loss by 16\% and RMSE loss by 10\% over the best available benchmark model, thus establishing a new state of the art for the UK electric load (and price) forecasting.
△ Less
Submitted 20 May, 2022;
originally announced May 2022.
-
Inferential Tasks as an Evaluation Technique for Visualization
Authors:
Ashley Suh,
Ab Mosca,
Shannon Robinson,
Quinn Pham,
Dylan Cashman,
Alvitta Ottley,
Remco Chang
Abstract:
Designing suitable tasks for visualization evaluation remains challenging. Traditional evaluation techniques commonly rely on 'low-level' or 'open-ended' tasks to assess the efficacy of a proposed visualization, however, nontrivial trade-offs exist between the two. Low-level tasks allow for robust quantitative evaluations, but are not indicative of the complex usage of a visualization. Open-ended…
▽ More
Designing suitable tasks for visualization evaluation remains challenging. Traditional evaluation techniques commonly rely on 'low-level' or 'open-ended' tasks to assess the efficacy of a proposed visualization, however, nontrivial trade-offs exist between the two. Low-level tasks allow for robust quantitative evaluations, but are not indicative of the complex usage of a visualization. Open-ended tasks, while excellent for insight-based evaluations, are typically unstructured and require time-consuming interviews. Bridging this gap, we propose inferential tasks: a complementary task category based on inferential learning in psychology. Inferential tasks produce quantitative evaluation data in which users are prompted to form and validate their own findings with a visualization. We demonstrate the use of inferential tasks through a validation experiment on two well-known visualization tools.
△ Less
Submitted 11 May, 2022;
originally announced May 2022.
-
A Grammar of Hypotheses for Visualization, Data, and Analysis
Authors:
Ashley Suh,
Ab Mosca,
Eugene Wu,
Remco Chang
Abstract:
We present a grammar for expressing hypotheses in visual data analysis to formalize the previously abstract notion of "analysis tasks." Through the lens of our grammar, we lay the groundwork for how a user's data analysis questions can be operationalized and automated as a set of hypotheses (a hypothesis space). We demonstrate that our grammar-based approach for analysis tasks can provide a system…
▽ More
We present a grammar for expressing hypotheses in visual data analysis to formalize the previously abstract notion of "analysis tasks." Through the lens of our grammar, we lay the groundwork for how a user's data analysis questions can be operationalized and automated as a set of hypotheses (a hypothesis space). We demonstrate that our grammar-based approach for analysis tasks can provide a systematic method towards unifying three disparate spaces in visualization research: the hypotheses a dataset can express (a data hypothesis space), the hypotheses a user would like to refine or verify through analysis (an analysis hypothesis space), and the hypotheses a visualization design is capable of supporting (a visualization hypothesis space). We illustrate how the formalization of these three spaces can inform future research in visualization evaluation, knowledge elicitation, analytic provenance, and visualization recommendation by using a shared language for hypotheses. Finally, we compare our proposed grammar-based approach with existing visual analysis models and discuss the potential of a new hypothesis-driven theory of visual analytics.
△ Less
Submitted 3 April, 2023; v1 submitted 29 April, 2022;
originally announced April 2022.
-
INODE: Building an End-to-End Data Exploration System in Practice [Extended Vision]
Authors:
Sihem Amer-Yahia,
Georgia Koutrika,
Frederic Bastian,
Theofilos Belmpas,
Martin Braschler,
Ursin Brunner,
Diego Calvanese,
Maximilian Fabricius,
Orest Gkini,
Catherine Kosten,
Davide Lanti,
Antonis Litke,
Hendrik Lücke-Tieke,
Francesco Alessandro Massucci,
Tarcisio Mendes de Farias,
Alessandro Mosca,
Francesco Multari,
Nikolaos Papadakis,
Dimitris Papadopoulos,
Yogendra Patil,
Aurélien Personnaz,
Guillem Rull,
Ana Sima,
Ellery Smith,
Dimitrios Skoutas
, et al. (3 additional authors not shown)
Abstract:
A full-fledged data exploration system must combine different access modalities with a powerful concept of guiding the user in the exploration process, by being reactive and anticipative both for data discovery and for data linking. Such systems are a real opportunity for our community to cater to users with different domain and data science expertise. We introduce INODE -- an end-to-end data expl…
▽ More
A full-fledged data exploration system must combine different access modalities with a powerful concept of guiding the user in the exploration process, by being reactive and anticipative both for data discovery and for data linking. Such systems are a real opportunity for our community to cater to users with different domain and data science expertise. We introduce INODE -- an end-to-end data exploration system -- that leverages, on the one hand, Machine Learning and, on the other hand, semantics for the purpose of Data Management (DM). Our vision is to develop a classic unified, comprehensive platform that provides extensive access to open datasets, and we demonstrate it in three significant use cases in the fields of Cancer Biomarker Reearch, Research and Innovation Policy Making, and Astrophysics. INODE offers sustainable services in (a) data modeling and linking, (b) integrated query processing using natural language, (c) guidance, and (d) data exploration through visualization, thus facilitating the user in discovering new insights. We demonstrate that our system is uniquely accessible to a wide range of users from larger scientific communities to the public. Finally, we briefly illustrate how this work paves the way for new research opportunities in DM.
△ Less
Submitted 9 April, 2021;
originally announced April 2021.
-
Does Interaction Improve Bayesian Reasoning with Visualization?
Authors:
Ab Mosca,
Alvitta Ottley,
Remco Chang
Abstract:
Interaction enables users to navigate large amounts of data effectively, supports cognitive processing, and increases data representation methods. However, there have been few attempts to empirically demonstrate whether adding interaction to a static visualization improves its function beyond popular beliefs. In this paper, we address this gap. We use a classic Bayesian reasoning task as a testbed…
▽ More
Interaction enables users to navigate large amounts of data effectively, supports cognitive processing, and increases data representation methods. However, there have been few attempts to empirically demonstrate whether adding interaction to a static visualization improves its function beyond popular beliefs. In this paper, we address this gap. We use a classic Bayesian reasoning task as a testbed for evaluating whether allowing users to interact with a static visualization can improve their reasoning. Through two crowdsourced studies, we show that adding interaction to a static Bayesian reasoning visualization does not improve participants' accuracy on a Bayesian reasoning task. In some cases, it can significantly detract from it. Moreover, we demonstrate that underlying visualization design modulates performance and that people with high versus low spatial ability respond differently to different interaction techniques and underlying base visualizations. Our work suggests that interaction is not as unambiguously good as we often believe; a well designed static visualization can be as, if not more, effective than an interactive one.
△ Less
Submitted 5 March, 2021; v1 submitted 2 March, 2021;
originally announced March 2021.
-
Mapping Patterns for Virtual Knowledge Graphs
Authors:
Diego Calvanese,
Avigdor Gal,
Davide Lanti,
Marco Montali,
Alessandro Mosca,
Roee Shraga
Abstract:
Virtual Knowledge Graphs (VKG) constitute one of the most promising paradigms for integrating and accessing legacy data sources. A critical bottleneck in the integration process involves the definition, validation, and maintenance of mappings that link data sources to a domain ontology. To support the management of mappings throughout their entire lifecycle, we propose a comprehensive catalog of s…
▽ More
Virtual Knowledge Graphs (VKG) constitute one of the most promising paradigms for integrating and accessing legacy data sources. A critical bottleneck in the integration process involves the definition, validation, and maintenance of mappings that link data sources to a domain ontology. To support the management of mappings throughout their entire lifecycle, we propose a comprehensive catalog of sophisticated mapping patterns that emerge when linking databases to ontologies. To do so, we build on well-established methodologies and patterns studied in data management, data analysis, and conceptual modeling. These are extended and refined through the analysis of concrete VKG benchmarks and real-world use cases, and considering the inherent impedance mismatch between data sources and ontologies. We validate our catalog on the considered VKG scenarios, showing that it covers the vast majority of patterns present therein.
△ Less
Submitted 11 August, 2023; v1 submitted 3 December, 2020;
originally announced December 2020.
-
RNNbow: Visualizing Learning via Backpropagation Gradients in Recurrent Neural Networks
Authors:
Dylan Cashman,
Genevieve Patterson,
Abigail Mosca,
Nathan Watts,
Shannon Robinson,
Remco Chang
Abstract:
We present RNNbow, an interactive tool for visualizing the gradient flow during backpropagation training in recurrent neural networks. RNNbow is a web application that displays the relative gradient contributions from Recurrent Neural Network (RNN) cells in a neighborhood of an element of a sequence. We describe the calculation of backpropagation through time (BPTT) that keeps track of itemized gr…
▽ More
We present RNNbow, an interactive tool for visualizing the gradient flow during backpropagation training in recurrent neural networks. RNNbow is a web application that displays the relative gradient contributions from Recurrent Neural Network (RNN) cells in a neighborhood of an element of a sequence. We describe the calculation of backpropagation through time (BPTT) that keeps track of itemized gradients, or gradient contributions from one element of a sequence to previous elements of a sequence. By visualizing the gradient, as opposed to activations, RNNbow offers insight into how the network is learning. We use it to explore the learning of an RNN that is trained to generate code in the C programming language. We show how it uncovers insights into the vanishing gradient as well as the evolution of training as the RNN works its way through a corpus.
△ Less
Submitted 29 July, 2019;
originally announced July 2019.
-
At a Glance: Pixel Approximate Entropy as a Measure of Line Chart Complexity
Authors:
Gabriel Ryan,
Abigail Mosca,
Remco Chang,
Eugene Wu
Abstract:
When inspecting information visualizations under time critical settings, such as emergency response or monitoring the heart rate in a surgery room, the user only has a small amount of time to view the visualization "at a glance". In these settings, it is important to provide a quantitative measure of the visualization to understand whether or not the visualization is too "complex" to accurately ju…
▽ More
When inspecting information visualizations under time critical settings, such as emergency response or monitoring the heart rate in a surgery room, the user only has a small amount of time to view the visualization "at a glance". In these settings, it is important to provide a quantitative measure of the visualization to understand whether or not the visualization is too "complex" to accurately judge at a glance. This paper proposes Pixel Approximate Entropy (PAE), which adapts the approximate entropy statistical measure commonly used to quantify regularity and unpredictability in time-series data, as a measure of visual complexity for line charts. We show that PAE is correlated with user-perceived chart complexity, and that increased chart PAE correlates with reduced judgement accuracy. We also find that the correlation between PAE values and participants' judgment increases when the user has less time to examine the line charts.
△ Less
Submitted 7 November, 2018;
originally announced November 2018.
-
A User-based Visual Analytics Workflow for Exploratory Model Analysis
Authors:
Dylan Cashman,
Shah Rukh Humayoun,
Florian Heimerl,
Kendall Park,
Subhajit Das,
John Thompson,
Bahador Saket,
Abigail Mosca,
John Stasko,
Alex Endert,
Michael Gleicher,
Remco Chang
Abstract:
Many visual analytics systems allow users to interact with machine learning models towards the goals of data exploration and insight generation on a given dataset. However, in some situations, insights may be less important than the production of an accurate predictive model for future use. In that case, users are more interested in generating of diverse and robust predictive models, verifying the…
▽ More
Many visual analytics systems allow users to interact with machine learning models towards the goals of data exploration and insight generation on a given dataset. However, in some situations, insights may be less important than the production of an accurate predictive model for future use. In that case, users are more interested in generating of diverse and robust predictive models, verifying their performance on holdout data, and selecting the most suitable model for their usage scenario. In this paper, we consider the concept of Exploratory Model Analysis (EMA), which is defined as the process of discovering and selecting relevant models that can be used to make predictions on a data source. We delineate the differences between EMA and the well-known term exploratory data analysis in terms of the desired outcome of the analytic process: insights into the data or a set of deployable models. The contributions of this work are a visual analytics system workflow for EMA, a user study, and two use cases validating the effectiveness of the workflow. We found that our system workflow enabled users to generate complex models, to assess them for various qualities, and to select the most relevant model for their task.
△ Less
Submitted 29 July, 2019; v1 submitted 27 September, 2018;
originally announced September 2018.
-
Mapping knowledge with ontologies: the case of obesity
Authors:
Montserrat Estañol,
Francesco Masucci,
Alessandro Mosca,
Ismael Ràfols
Abstract:
Scientometric techniques have been remarkably successful at mapping science but they face important difficulties when mapping research for societal problems possibly because they they are derived only from scientific documents and thus do not rely on non-academic expert knowledge. Here we aim to explore how ontologies can be used in science mapping, thus enriching current algorithmic techniques wi…
▽ More
Scientometric techniques have been remarkably successful at mapping science but they face important difficulties when mapping research for societal problems possibly because they they are derived only from scientific documents and thus do not rely on non-academic expert knowledge. Here we aim to explore how ontologies can be used in science mapping, thus enriching current algorithmic techniques with systematic domain expert knowledge. This study introduces the methodology behind the construction of an ontology and tests potential uses in science mapping. We use obesity as a topic of case study.
△ Less
Submitted 4 December, 2017;
originally announced December 2017.
-
Deep Incremental Boosting
Authors:
Alan Mosca,
George D Magoulas
Abstract:
This paper introduces Deep Incremental Boosting, a new technique derived from AdaBoost, specifically adapted to work with Deep Learning methods, that reduces the required training time and improves generalisation. We draw inspiration from Transfer of Learning approaches to reduce the start-up time to training each incremental Ensemble member. We show a set of experiments that outlines some prelimi…
▽ More
This paper introduces Deep Incremental Boosting, a new technique derived from AdaBoost, specifically adapted to work with Deep Learning methods, that reduces the required training time and improves generalisation. We draw inspiration from Transfer of Learning approaches to reduce the start-up time to training each incremental Ensemble member. We show a set of experiments that outlines some preliminary results on some common Deep Learning datasets and discuss the potential improvements Deep Incremental Boosting brings to traditional Ensemble methods in Deep Learning.
△ Less
Submitted 11 August, 2017;
originally announced August 2017.
-
Adapting Resilient Propagation for Deep Learning
Authors:
Alan Mosca,
George D. Magoulas
Abstract:
The Resilient Propagation (Rprop) algorithm has been very popular for backpropagation training of multilayer feed-forward neural networks in various applications. The standard Rprop however encounters difficulties in the context of deep neural networks as typically happens with gradient-based learning algorithms. In this paper, we propose a modification of the Rprop that combines standard Rprop st…
▽ More
The Resilient Propagation (Rprop) algorithm has been very popular for backpropagation training of multilayer feed-forward neural networks in various applications. The standard Rprop however encounters difficulties in the context of deep neural networks as typically happens with gradient-based learning algorithms. In this paper, we propose a modification of the Rprop that combines standard Rprop steps with a special drop out technique. We apply the method for training Deep Neural Networks as standalone components and in ensemble formulations. Results on the MNIST dataset show that the proposed modification alleviates standard Rprop's problems demonstrating improved learning speed and accuracy.
△ Less
Submitted 16 September, 2015; v1 submitted 15 September, 2015;
originally announced September 2015.
-
Semi-structured data extraction and modelling: the WIA Project
Authors:
Gianluca Colombo,
Ettore Colombo,
Andrea Bonomi,
Alessandro Mosca,
Simone Bassis
Abstract:
Over the last decades, the amount of data of all kinds available electronically has increased dramatically. Data are accessible through a range of interfaces including Web browsers, database query languages, application-specific interfaces, built on top of a number of different data exchange formats. All these data span from un-structured to highly structured data. Very often, some of them have st…
▽ More
Over the last decades, the amount of data of all kinds available electronically has increased dramatically. Data are accessible through a range of interfaces including Web browsers, database query languages, application-specific interfaces, built on top of a number of different data exchange formats. All these data span from un-structured to highly structured data. Very often, some of them have structure even if the structure is implicit, and not as rigid or regular as that found in standard database systems. Spreadsheet documents are prototypical in this respect. Spreadsheets are the lightweight technology able to supply companies with easy to build business management and business intelligence applications, and business people largely adopt spreadsheets as smart vehicles for data files generation and sharing. Actually, the more spreadsheets grow in complexity (e.g., their use in product development plans and quoting), the more their arrangement, maintenance, and analysis appear as a knowledge-driven activity. The algorithmic approach to the problem of automatic data structure extraction from spreadsheet documents (i.e., grid-structured and free topological-related data) emerges from the WIA project: Worksheets Intelligent Analyser. The WIA-algorithm shows how to provide a description of spreadsheet contents in terms of higher level of abstractions or conceptualisations. In particular, the WIA-algorithm target is about the extraction of i) the calculus work-flow implemented in the spreadsheets formulas and ii) the logical role played by the data which take part into the calculus. The aim of the resulting conceptualisations is to provide spreadsheets with abstract representations useful for further model refinements and optimizations through evolutionary algorithms computations.
△ Less
Submitted 29 September, 2013;
originally announced September 2013.