subscribe to arXiv mailings

Form follows Function: Text-to-Text Conditional Graph Generation based on Functional Requirements

Authors: Peter A. Zachares, Vahan Hovhannisyan, Alan Mosca, Yarin Gal

Abstract: This work focuses on the novel problem setting of generating graphs conditioned on a description of the graph's functional requirements in a downstream task. We pose the problem as a text-to-text generation problem and focus on the approach of fine-tuning a pretrained large language model (LLM) to generate graphs. We propose an inductive bias which incorporates information about the structure of t… ▽ More This work focuses on the novel problem setting of generating graphs conditioned on a description of the graph's functional requirements in a downstream task. We pose the problem as a text-to-text generation problem and focus on the approach of fine-tuning a pretrained large language model (LLM) to generate graphs. We propose an inductive bias which incorporates information about the structure of the graph into the LLM's generation process by incorporating message passing layers into an LLM's architecture. To evaluate our proposed method, we design a novel set of experiments using publicly available and widely studied molecule and knowledge graph data sets. Results suggest our proposed approach generates graphs which more closely meet the requested functional requirements, outperforming baselines developed on similar tasks by a statistically significant margin. △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2205.10449 [pdf, other]

doi 10.1145/3490354.3494371

A Hybrid Model for Forecasting Short-Term Electricity Demand

Authors: Maria Eleni Athanasopoulou, Justina Deveikyte, Alan Mosca, Ilaria Peri, Alessandro Provetti

Abstract: Currently the UK Electric market is guided by load (demand) forecasts published every thirty minutes by the regulator. A key factor in predicting demand is weather conditions, with forecasts published every hour. We present HYENA: a hybrid predictive model that combines feature engineering (selection of the candidate predictor features), mobile-window predictors and finally LSTM encoder-decoders t… ▽ More Currently the UK Electric market is guided by load (demand) forecasts published every thirty minutes by the regulator. A key factor in predicting demand is weather conditions, with forecasts published every hour. We present HYENA: a hybrid predictive model that combines feature engineering (selection of the candidate predictor features), mobile-window predictors and finally LSTM encoder-decoders to achieve higher accuracy with respect to mainstream models from the literature. HYENA decreased MAPE loss by 16\% and RMSE loss by 10\% over the best available benchmark model, thus establishing a new state of the art for the UK electric load (and price) forecasting. △ Less

Submitted 20 May, 2022; originally announced May 2022.

ACM Class: I.2.6

arXiv:2205.05712 [pdf, other]

Inferential Tasks as an Evaluation Technique for Visualization

Authors: Ashley Suh, Ab Mosca, Shannon Robinson, Quinn Pham, Dylan Cashman, Alvitta Ottley, Remco Chang

Abstract: Designing suitable tasks for visualization evaluation remains challenging. Traditional evaluation techniques commonly rely on 'low-level' or 'open-ended' tasks to assess the efficacy of a proposed visualization, however, nontrivial trade-offs exist between the two. Low-level tasks allow for robust quantitative evaluations, but are not indicative of the complex usage of a visualization. Open-ended… ▽ More Designing suitable tasks for visualization evaluation remains challenging. Traditional evaluation techniques commonly rely on 'low-level' or 'open-ended' tasks to assess the efficacy of a proposed visualization, however, nontrivial trade-offs exist between the two. Low-level tasks allow for robust quantitative evaluations, but are not indicative of the complex usage of a visualization. Open-ended tasks, while excellent for insight-based evaluations, are typically unstructured and require time-consuming interviews. Bridging this gap, we propose inferential tasks: a complementary task category based on inferential learning in psychology. Inferential tasks produce quantitative evaluation data in which users are prompted to form and validate their own findings with a visualization. We demonstrate the use of inferential tasks through a validation experiment on two well-known visualization tools. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: EuroVis Short Paper 2022

arXiv:2204.14267 [pdf, other]

A Grammar of Hypotheses for Visualization, Data, and Analysis

Authors: Ashley Suh, Ab Mosca, Eugene Wu, Remco Chang

Abstract: We present a grammar for expressing hypotheses in visual data analysis to formalize the previously abstract notion of "analysis tasks." Through the lens of our grammar, we lay the groundwork for how a user's data analysis questions can be operationalized and automated as a set of hypotheses (a hypothesis space). We demonstrate that our grammar-based approach for analysis tasks can provide a system… ▽ More We present a grammar for expressing hypotheses in visual data analysis to formalize the previously abstract notion of "analysis tasks." Through the lens of our grammar, we lay the groundwork for how a user's data analysis questions can be operationalized and automated as a set of hypotheses (a hypothesis space). We demonstrate that our grammar-based approach for analysis tasks can provide a systematic method towards unifying three disparate spaces in visualization research: the hypotheses a dataset can express (a data hypothesis space), the hypotheses a user would like to refine or verify through analysis (an analysis hypothesis space), and the hypotheses a visualization design is capable of supporting (a visualization hypothesis space). We illustrate how the formalization of these three spaces can inform future research in visualization evaluation, knowledge elicitation, analytic provenance, and visualization recommendation by using a shared language for hypotheses. Finally, we compare our proposed grammar-based approach with existing visual analysis models and discuss the potential of a new hypothesis-driven theory of visual analytics. △ Less

Submitted 3 April, 2023; v1 submitted 29 April, 2022; originally announced April 2022.

arXiv:2104.04194 [pdf, other]

INODE: Building an End-to-End Data Exploration System in Practice [Extended Vision]

Authors: Sihem Amer-Yahia, Georgia Koutrika, Frederic Bastian, Theofilos Belmpas, Martin Braschler, Ursin Brunner, Diego Calvanese, Maximilian Fabricius, Orest Gkini, Catherine Kosten, Davide Lanti, Antonis Litke, Hendrik Lücke-Tieke, Francesco Alessandro Massucci, Tarcisio Mendes de Farias, Alessandro Mosca, Francesco Multari, Nikolaos Papadakis, Dimitris Papadopoulos, Yogendra Patil, Aurélien Personnaz, Guillem Rull, Ana Sima, Ellery Smith, Dimitrios Skoutas , et al. (3 additional authors not shown)

Abstract: A full-fledged data exploration system must combine different access modalities with a powerful concept of guiding the user in the exploration process, by being reactive and anticipative both for data discovery and for data linking. Such systems are a real opportunity for our community to cater to users with different domain and data science expertise. We introduce INODE -- an end-to-end data expl… ▽ More A full-fledged data exploration system must combine different access modalities with a powerful concept of guiding the user in the exploration process, by being reactive and anticipative both for data discovery and for data linking. Such systems are a real opportunity for our community to cater to users with different domain and data science expertise. We introduce INODE -- an end-to-end data exploration system -- that leverages, on the one hand, Machine Learning and, on the other hand, semantics for the purpose of Data Management (DM). Our vision is to develop a classic unified, comprehensive platform that provides extensive access to open datasets, and we demonstrate it in three significant use cases in the fields of Cancer Biomarker Reearch, Research and Innovation Policy Making, and Astrophysics. INODE offers sustainable services in (a) data modeling and linking, (b) integrated query processing using natural language, (c) guidance, and (d) data exploration through visualization, thus facilitating the user in discovering new insights. We demonstrate that our system is uniquely accessible to a wide range of users from larger scientific communities to the public. Finally, we briefly illustrate how this work paves the way for new research opportunities in DM. △ Less

Submitted 9 April, 2021; originally announced April 2021.

Comments: 8 pages, 5 figures

ACM Class: I.2; H.2

arXiv:2103.01701 [pdf, other]

doi 10.1145/3411764.3445176

Does Interaction Improve Bayesian Reasoning with Visualization?

Authors: Ab Mosca, Alvitta Ottley, Remco Chang

Abstract: Interaction enables users to navigate large amounts of data effectively, supports cognitive processing, and increases data representation methods. However, there have been few attempts to empirically demonstrate whether adding interaction to a static visualization improves its function beyond popular beliefs. In this paper, we address this gap. We use a classic Bayesian reasoning task as a testbed… ▽ More Interaction enables users to navigate large amounts of data effectively, supports cognitive processing, and increases data representation methods. However, there have been few attempts to empirically demonstrate whether adding interaction to a static visualization improves its function beyond popular beliefs. In this paper, we address this gap. We use a classic Bayesian reasoning task as a testbed for evaluating whether allowing users to interact with a static visualization can improve their reasoning. Through two crowdsourced studies, we show that adding interaction to a static Bayesian reasoning visualization does not improve participants' accuracy on a Bayesian reasoning task. In some cases, it can significantly detract from it. Moreover, we demonstrate that underlying visualization design modulates performance and that people with high versus low spatial ability respond differently to different interaction techniques and underlying base visualizations. Our work suggests that interaction is not as unambiguously good as we often believe; a well designed static visualization can be as, if not more, effective than an interactive one. △ Less

Submitted 5 March, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

Comments: 14 pages, 11 figures, To be published in 2021 ACM CHI Virtual Conference on Human Factors in Computing Systems

arXiv:2012.01917 [pdf, other]

Mapping Patterns for Virtual Knowledge Graphs

Authors: Diego Calvanese, Avigdor Gal, Davide Lanti, Marco Montali, Alessandro Mosca, Roee Shraga

Abstract: Virtual Knowledge Graphs (VKG) constitute one of the most promising paradigms for integrating and accessing legacy data sources. A critical bottleneck in the integration process involves the definition, validation, and maintenance of mappings that link data sources to a domain ontology. To support the management of mappings throughout their entire lifecycle, we propose a comprehensive catalog of s… ▽ More Virtual Knowledge Graphs (VKG) constitute one of the most promising paradigms for integrating and accessing legacy data sources. A critical bottleneck in the integration process involves the definition, validation, and maintenance of mappings that link data sources to a domain ontology. To support the management of mappings throughout their entire lifecycle, we propose a comprehensive catalog of sophisticated mapping patterns that emerge when linking databases to ontologies. To do so, we build on well-established methodologies and patterns studied in data management, data analysis, and conceptual modeling. These are extended and refined through the analysis of concrete VKG benchmarks and real-world use cases, and considering the inherent impedance mismatch between data sources and ontologies. We validate our catalog on the considered VKG scenarios, showing that it covers the vast majority of patterns present therein. △ Less

Submitted 11 August, 2023; v1 submitted 3 December, 2020; originally announced December 2020.

Comments: 40 pages

arXiv:1907.12545 [pdf, other]

doi 10.1109/MCG.2018.2878902

RNNbow: Visualizing Learning via Backpropagation Gradients in Recurrent Neural Networks

Authors: Dylan Cashman, Genevieve Patterson, Abigail Mosca, Nathan Watts, Shannon Robinson, Remco Chang

Abstract: We present RNNbow, an interactive tool for visualizing the gradient flow during backpropagation training in recurrent neural networks. RNNbow is a web application that displays the relative gradient contributions from Recurrent Neural Network (RNN) cells in a neighborhood of an element of a sequence. We describe the calculation of backpropagation through time (BPTT) that keeps track of itemized gr… ▽ More We present RNNbow, an interactive tool for visualizing the gradient flow during backpropagation training in recurrent neural networks. RNNbow is a web application that displays the relative gradient contributions from Recurrent Neural Network (RNN) cells in a neighborhood of an element of a sequence. We describe the calculation of backpropagation through time (BPTT) that keeps track of itemized gradients, or gradient contributions from one element of a sequence to previous elements of a sequence. By visualizing the gradient, as opposed to activations, RNNbow offers insight into how the network is learning. We use it to explore the learning of an RNN that is trained to generate code in the C programming language. We show how it uncovers insights into the vanishing gradient as well as the evolution of training as the RNN works its way through a corpus. △ Less

Submitted 29 July, 2019; originally announced July 2019.

Journal ref: IEEE Computer Graphics and Applications ( Volume: 38 , Issue: 6 , Nov.-Dec. 1 2018 ) pg 39-50

arXiv:1811.03180 [pdf, other]

At a Glance: Pixel Approximate Entropy as a Measure of Line Chart Complexity

Authors: Gabriel Ryan, Abigail Mosca, Remco Chang, Eugene Wu

Abstract: When inspecting information visualizations under time critical settings, such as emergency response or monitoring the heart rate in a surgery room, the user only has a small amount of time to view the visualization "at a glance". In these settings, it is important to provide a quantitative measure of the visualization to understand whether or not the visualization is too "complex" to accurately ju… ▽ More When inspecting information visualizations under time critical settings, such as emergency response or monitoring the heart rate in a surgery room, the user only has a small amount of time to view the visualization "at a glance". In these settings, it is important to provide a quantitative measure of the visualization to understand whether or not the visualization is too "complex" to accurately judge at a glance. This paper proposes Pixel Approximate Entropy (PAE), which adapts the approximate entropy statistical measure commonly used to quantify regularity and unpredictability in time-series data, as a measure of visual complexity for line charts. We show that PAE is correlated with user-perceived chart complexity, and that increased chart PAE correlates with reduced judgement accuracy. We also find that the correlation between PAE values and participants' judgment increases when the user has less time to examine the line charts. △ Less

Submitted 7 November, 2018; originally announced November 2018.

arXiv:1809.10782 [pdf, other]

doi 10.1111/cgf.13681

A User-based Visual Analytics Workflow for Exploratory Model Analysis

Authors: Dylan Cashman, Shah Rukh Humayoun, Florian Heimerl, Kendall Park, Subhajit Das, John Thompson, Bahador Saket, Abigail Mosca, John Stasko, Alex Endert, Michael Gleicher, Remco Chang

Abstract: Many visual analytics systems allow users to interact with machine learning models towards the goals of data exploration and insight generation on a given dataset. However, in some situations, insights may be less important than the production of an accurate predictive model for future use. In that case, users are more interested in generating of diverse and robust predictive models, verifying the… ▽ More Many visual analytics systems allow users to interact with machine learning models towards the goals of data exploration and insight generation on a given dataset. However, in some situations, insights may be less important than the production of an accurate predictive model for future use. In that case, users are more interested in generating of diverse and robust predictive models, verifying their performance on holdout data, and selecting the most suitable model for their usage scenario. In this paper, we consider the concept of Exploratory Model Analysis (EMA), which is defined as the process of discovering and selecting relevant models that can be used to make predictions on a data source. We delineate the differences between EMA and the well-known term exploratory data analysis in terms of the desired outcome of the analytic process: insights into the data or a set of deployable models. The contributions of this work are a visual analytics system workflow for EMA, a user study, and two use cases validating the effectiveness of the workflow. We found that our system workflow enabled users to generate complex models, to assess them for various qualities, and to select the most relevant model for their task. △ Less

Submitted 29 July, 2019; v1 submitted 27 September, 2018; originally announced September 2018.

Journal ref: Computer Graphics Forum 38(3) 2019, The Eurographics Association and John Wiley & Sons Ltd

arXiv:1712.03081 [pdf]

Mapping knowledge with ontologies: the case of obesity

Authors: Montserrat Estañol, Francesco Masucci, Alessandro Mosca, Ismael Ràfols

Abstract: Scientometric techniques have been remarkably successful at mapping science but they face important difficulties when mapping research for societal problems possibly because they they are derived only from scientific documents and thus do not rely on non-academic expert knowledge. Here we aim to explore how ontologies can be used in science mapping, thus enriching current algorithmic techniques wi… ▽ More Scientometric techniques have been remarkably successful at mapping science but they face important difficulties when mapping research for societal problems possibly because they they are derived only from scientific documents and thus do not rely on non-academic expert knowledge. Here we aim to explore how ontologies can be used in science mapping, thus enriching current algorithmic techniques with systematic domain expert knowledge. This study introduces the methodology behind the construction of an ontology and tests potential uses in science mapping. We use obesity as a topic of case study. △ Less

Submitted 4 December, 2017; originally announced December 2017.

Comments: 2017 Science and Technology Indicators Proceeedings, 10 pages, 6 figures

arXiv:1708.03704 [pdf, ps, other]

Deep Incremental Boosting

Authors: Alan Mosca, George D Magoulas

Abstract: This paper introduces Deep Incremental Boosting, a new technique derived from AdaBoost, specifically adapted to work with Deep Learning methods, that reduces the required training time and improves generalisation. We draw inspiration from Transfer of Learning approaches to reduce the start-up time to training each incremental Ensemble member. We show a set of experiments that outlines some prelimi… ▽ More This paper introduces Deep Incremental Boosting, a new technique derived from AdaBoost, specifically adapted to work with Deep Learning methods, that reduces the required training time and improves generalisation. We draw inspiration from Transfer of Learning approaches to reduce the start-up time to training each incremental Ensemble member. We show a set of experiments that outlines some preliminary results on some common Deep Learning datasets and discuss the potential improvements Deep Incremental Boosting brings to traditional Ensemble methods in Deep Learning. △ Less

Submitted 11 August, 2017; originally announced August 2017.

Journal ref: Christoph Benzmüller, Geoff Sutcliffe and Raul Rojas (editors). GCAI 2016. 2nd Global Conference on Artificial Intelligence, vol 41, pages 293--302

arXiv:1509.04612 [pdf, ps, other]

Adapting Resilient Propagation for Deep Learning

Authors: Alan Mosca, George D. Magoulas

Abstract: The Resilient Propagation (Rprop) algorithm has been very popular for backpropagation training of multilayer feed-forward neural networks in various applications. The standard Rprop however encounters difficulties in the context of deep neural networks as typically happens with gradient-based learning algorithms. In this paper, we propose a modification of the Rprop that combines standard Rprop st… ▽ More The Resilient Propagation (Rprop) algorithm has been very popular for backpropagation training of multilayer feed-forward neural networks in various applications. The standard Rprop however encounters difficulties in the context of deep neural networks as typically happens with gradient-based learning algorithms. In this paper, we propose a modification of the Rprop that combines standard Rprop steps with a special drop out technique. We apply the method for training Deep Neural Networks as standalone components and in ensemble formulations. Results on the MNIST dataset show that the proposed modification alleviates standard Rprop's problems demonstrating improved learning speed and accuracy. △ Less

Submitted 16 September, 2015; v1 submitted 15 September, 2015; originally announced September 2015.

Comments: Published in the proceedings of the UK workshop on Computational Intelligence 2015 (UKCI)

arXiv:1309.7697 [pdf, other]

doi 10.4204/EPTCS.130.16

Semi-structured data extraction and modelling: the WIA Project

Authors: Gianluca Colombo, Ettore Colombo, Andrea Bonomi, Alessandro Mosca, Simone Bassis

Abstract: Over the last decades, the amount of data of all kinds available electronically has increased dramatically. Data are accessible through a range of interfaces including Web browsers, database query languages, application-specific interfaces, built on top of a number of different data exchange formats. All these data span from un-structured to highly structured data. Very often, some of them have st… ▽ More Over the last decades, the amount of data of all kinds available electronically has increased dramatically. Data are accessible through a range of interfaces including Web browsers, database query languages, application-specific interfaces, built on top of a number of different data exchange formats. All these data span from un-structured to highly structured data. Very often, some of them have structure even if the structure is implicit, and not as rigid or regular as that found in standard database systems. Spreadsheet documents are prototypical in this respect. Spreadsheets are the lightweight technology able to supply companies with easy to build business management and business intelligence applications, and business people largely adopt spreadsheets as smart vehicles for data files generation and sharing. Actually, the more spreadsheets grow in complexity (e.g., their use in product development plans and quoting), the more their arrangement, maintenance, and analysis appear as a knowledge-driven activity. The algorithmic approach to the problem of automatic data structure extraction from spreadsheet documents (i.e., grid-structured and free topological-related data) emerges from the WIA project: Worksheets Intelligent Analyser. The WIA-algorithm shows how to provide a description of spreadsheet contents in terms of higher level of abstractions or conceptualisations. In particular, the WIA-algorithm target is about the extraction of i) the calculus work-flow implemented in the spreadsheets formulas and ii) the logical role played by the data which take part into the calculus. The aim of the resulting conceptualisations is to provide spreadsheets with abstract representations useful for further model refinements and optimizations through evolutionary algorithms computations. △ Less

Submitted 29 September, 2013; originally announced September 2013.

Comments: In Proceedings Wivace 2013, arXiv:1309.7122

ACM Class: H3; I.2; H.1.2

Journal ref: EPTCS 130, 2013, pp. 98-103

Showing 1–14 of 14 results for author: Mosca, A