-
Neuro-Symbolic Fusion of Wi-Fi Sensing Data for Passive Radar with Inter-Modal Knowledge Transfer
Authors:
Marco Cominelli,
Francesco Gringoli,
Lance M. Kaplan,
Mani B. Srivastava,
Trevor Bihl,
Erik P. Blasch,
Nandini Iyer,
Federico Cerutti
Abstract:
Wi-Fi devices, akin to passive radars, can discern human activities within indoor settings due to the human body's interaction with electromagnetic signals. Current Wi-Fi sensing applications predominantly employ data-driven learning techniques to associate the fluctuations in the physical properties of the communication channel with the human activity causing them. However, these techniques often…
▽ More
Wi-Fi devices, akin to passive radars, can discern human activities within indoor settings due to the human body's interaction with electromagnetic signals. Current Wi-Fi sensing applications predominantly employ data-driven learning techniques to associate the fluctuations in the physical properties of the communication channel with the human activity causing them. However, these techniques often lack the desired flexibility and transparency. This paper introduces DeepProbHAR, a neuro-symbolic architecture for Wi-Fi sensing, providing initial evidence that Wi-Fi signals can differentiate between simple movements, such as leg or arm movements, which are integral to human activities like running or walking. The neuro-symbolic approach affords gathering such evidence without needing additional specialised data collection or labelling. The training of DeepProbHAR is facilitated by declarative domain knowledge obtained from a camera feed and by fusing signals from various antennas of the Wi-Fi receivers. DeepProbHAR achieves results comparable to the state-of-the-art in human activity recognition. Moreover, as a by-product of the learning process, DeepProbHAR generates specialised classifiers for simple movements that match the accuracy of models trained on finely labelled datasets, which would be particularly costly.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Accurate Passive Radar via an Uncertainty-Aware Fusion of Wi-Fi Sensing Data
Authors:
Marco Cominelli,
Francesco Gringoli,
Lance M. Kaplan,
Mani B. Srivastava,
Federico Cerutti
Abstract:
Wi-Fi devices can effectively be used as passive radar systems that sense what happens in the surroundings and can even discern human activity. We propose, for the first time, a principled architecture which employs Variational Auto-Encoders for estimating a latent distribution responsible for generating the data, and Evidential Deep Learning for its ability to sense out-of-distribution activities…
▽ More
Wi-Fi devices can effectively be used as passive radar systems that sense what happens in the surroundings and can even discern human activity. We propose, for the first time, a principled architecture which employs Variational Auto-Encoders for estimating a latent distribution responsible for generating the data, and Evidential Deep Learning for its ability to sense out-of-distribution activities. We verify that the fused data processed by different antennas of the same Wi-Fi receiver results in increased accuracy of human activity recognition compared with the most recent benchmarks, while still being informative when facing out-of-distribution samples and enabling semantic interpretation of latent variables in terms of physical phenomena. The results of this paper are a first contribution toward the ultimate goal of providing a flexible, semantic characterisation of black-swan events, i.e., events for which we have limited to no training data.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Winning the Social Media Influence Battle: Uncertainty-Aware Opinions to Understand and Spread True Information via Competitive Influence Maximization
Authors:
Qi Zhang,
Lance M. Kaplan,
Audun Jøsang,
Dong Hyun. Jeong,
Feng Chen,
Jin-Hee Cho
Abstract:
Competitive Influence Maximization (CIM) involves entities competing to maximize influence in online social networks (OSNs). Current Deep Reinforcement Learning (DRL) methods in CIM rely on simplistic binary opinion models (i.e., an opinion is represented by either 0 or 1) and often overlook the complexity of users' behavioral characteristics and their prior knowledge. We propose a novel DRL-based…
▽ More
Competitive Influence Maximization (CIM) involves entities competing to maximize influence in online social networks (OSNs). Current Deep Reinforcement Learning (DRL) methods in CIM rely on simplistic binary opinion models (i.e., an opinion is represented by either 0 or 1) and often overlook the complexity of users' behavioral characteristics and their prior knowledge. We propose a novel DRL-based framework that enhances CIM analysis by integrating Subjective Logic (SL) to accommodate uncertain opinions, users' behaviors, and their preferences. This approach targets the mitigation of false information by effectively propagating true information. By modeling two competitive agents, one spreading true information and the other spreading false information, we capture the strategic interplay essential to CIM. Our framework utilizes an uncertainty-based opinion model (UOM) to assess the impact on information quality in OSNs, emphasizing the importance of user behavior alongside network topology in selecting influential seed nodes. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art methods, achieving faster and more influential results (i.e., outperforming over 20%) under realistic network conditions. Moreover, our method shows robust performance in partially observable networks, effectively doubling the performance when users are predisposed to disbelieve true information.
△ Less
Submitted 29 April, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Hyper Evidential Deep Learning to Quantify Composite Classification Uncertainty
Authors:
Changbin Li,
Kangshuo Li,
Yuzhe Ou,
Lance M. Kaplan,
Audun Jøsang,
Jin-Hee Cho,
Dong Hyun Jeong,
Feng Chen
Abstract:
Deep neural networks (DNNs) have been shown to perform well on exclusive, multi-class classification tasks. However, when different classes have similar visual features, it becomes challenging for human annotators to differentiate them. This scenario necessitates the use of composite class labels. In this paper, we propose a novel framework called Hyper-Evidential Neural Network (HENN) that explic…
▽ More
Deep neural networks (DNNs) have been shown to perform well on exclusive, multi-class classification tasks. However, when different classes have similar visual features, it becomes challenging for human annotators to differentiate them. This scenario necessitates the use of composite class labels. In this paper, we propose a novel framework called Hyper-Evidential Neural Network (HENN) that explicitly models predictive uncertainty due to composite class labels in training data in the context of the belief theory called Subjective Logic (SL). By placing a grouped Dirichlet distribution on the class probabilities, we treat predictions of a neural network as parameters of hyper-subjective opinions and learn the network that collects both single and composite evidence leading to these hyper-opinions by a deterministic DNN from data. We introduce a new uncertainty type called vagueness originally designed for hyper-opinions in SL to quantify composite classification uncertainty for DNNs. Our results demonstrate that HENN outperforms its state-of-the-art counterparts based on four image datasets. The code and datasets are available at: https://github.com/Hugo101/HyperEvidentialNN.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
GDTM: An Indoor Geospatial Tracking Dataset with Distributed Multimodal Sensors
Authors:
Ho Lyun Jeong,
Ziqi Wang,
Colin Samplawski,
Jason Wu,
Shiwei Fang,
Lance M. Kaplan,
Deepak Ganesan,
Benjamin Marlin,
Mani Srivastava
Abstract:
Constantly locating moving objects, i.e., geospatial tracking, is essential for autonomous building infrastructure. Accurate and robust geospatial tracking often leverages multimodal sensor fusion algorithms, which require large datasets with time-aligned, synchronized data from various sensor types. However, such datasets are not readily available. Hence, we propose GDTM, a nine-hour dataset for…
▽ More
Constantly locating moving objects, i.e., geospatial tracking, is essential for autonomous building infrastructure. Accurate and robust geospatial tracking often leverages multimodal sensor fusion algorithms, which require large datasets with time-aligned, synchronized data from various sensor types. However, such datasets are not readily available. Hence, we propose GDTM, a nine-hour dataset for multimodal object tracking with distributed multimodal sensors and reconfigurable sensor node placements. Our dataset enables the exploration of several research problems, such as optimizing architectures for processing multimodal data, and investigating models' robustness to adverse sensing conditions and sensor placement variances. A GitHub repository containing the code, sample data, and checkpoints of this work is available at https://github.com/nesl/GDTM.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Knowledge from Uncertainty in Evidential Deep Learning
Authors:
Cai Davies,
Marc Roig Vilamala,
Alun D. Preece,
Federico Cerutti,
Lance M. Kaplan,
Supriyo Chakraborty
Abstract:
This work reveals an evidential signal that emerges from the uncertainty value in Evidential Deep Learning (EDL). EDL is one example of a class of uncertainty-aware deep learning approaches designed to provide confidence (or epistemic uncertainty) about the current test sample. In particular for computer vision and bidirectional encoder large language models, the `evidential signal' arising from t…
▽ More
This work reveals an evidential signal that emerges from the uncertainty value in Evidential Deep Learning (EDL). EDL is one example of a class of uncertainty-aware deep learning approaches designed to provide confidence (or epistemic uncertainty) about the current test sample. In particular for computer vision and bidirectional encoder large language models, the `evidential signal' arising from the Dirichlet strength in EDL can, in some cases, discriminate between classes, which is particularly strong when using large language models. We hypothesise that the KL regularisation term causes EDL to couple aleatoric and epistemic uncertainty. In this paper, we empirically investigate the correlations between misclassification and evaluated uncertainty, and show that EDL's `evidential signal' is due to misclassification bias. We critically evaluate EDL with other Dirichlet-based approaches, namely Generative Evidential Neural Networks (EDL-GEN) and Prior Networks, and show theoretically and empirically the differences between these loss functions. We conclude that EDL's coupling of uncertainty arises from these differences due to the use (or lack) of out-of-distribution samples during training.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Uncertainty-Aware Reward-based Deep Reinforcement Learning for Intent Analysis of Social Media Information
Authors:
Zhen Guo,
Qi Zhang,
Xinwei An,
Qisheng Zhang,
Audun Jøsang,
Lance M. Kaplan,
Feng Chen,
Dong H. Jeong,
Jin-Hee Cho
Abstract:
Due to various and serious adverse impacts of spreading fake news, it is often known that only people with malicious intent would propagate fake news. However, it is not necessarily true based on social science studies. Distinguishing the types of fake news spreaders based on their intent is critical because it will effectively guide how to intervene to mitigate the spread of fake news with differ…
▽ More
Due to various and serious adverse impacts of spreading fake news, it is often known that only people with malicious intent would propagate fake news. However, it is not necessarily true based on social science studies. Distinguishing the types of fake news spreaders based on their intent is critical because it will effectively guide how to intervene to mitigate the spread of fake news with different approaches. To this end, we propose an intent classification framework that can best identify the correct intent of fake news. We will leverage deep reinforcement learning (DRL) that can optimize the structural representation of each tweet by removing noisy words from the input sequence when appending an actor to the long short-term memory (LSTM) intent classifier. Policy gradient DRL model (e.g., REINFORCE) can lead the actor to a higher delayed reward. We also devise a new uncertainty-aware immediate reward using a subjective opinion that can explicitly deal with multidimensional uncertainty for effective decision-making. Via 600K training episodes from a fake news tweets dataset with an annotated intent class, we evaluate the performance of uncertainty-aware reward in DRL. Evaluation results demonstrate that our proposed framework efficiently reduces the number of selected words to maintain a high 95\% multi-class accuracy.
△ Less
Submitted 18 February, 2023;
originally announced February 2023.
-
PPO-UE: Proximal Policy Optimization via Uncertainty-Aware Exploration
Authors:
Qisheng Zhang,
Zhen Guo,
Audun Jøsang,
Lance M. Kaplan,
Feng Chen,
Dong H. Jeong,
Jin-Hee Cho
Abstract:
Proximal Policy Optimization (PPO) is a highly popular policy-based deep reinforcement learning (DRL) approach. However, we observe that the homogeneous exploration process in PPO could cause an unexpected stability issue in the training phase. To address this issue, we propose PPO-UE, a PPO variant equipped with self-adaptive uncertainty-aware explorations (UEs) based on a ratio uncertainty level…
▽ More
Proximal Policy Optimization (PPO) is a highly popular policy-based deep reinforcement learning (DRL) approach. However, we observe that the homogeneous exploration process in PPO could cause an unexpected stability issue in the training phase. To address this issue, we propose PPO-UE, a PPO variant equipped with self-adaptive uncertainty-aware explorations (UEs) based on a ratio uncertainty level. The proposed PPO-UE is designed to improve convergence speed and performance with an optimized ratio uncertainty level. Through extensive sensitivity analysis by varying the ratio uncertainty level, our proposed PPO-UE considerably outperforms the baseline PPO in Roboschool continuous control tasks.
△ Less
Submitted 12 December, 2022;
originally announced December 2022.
-
Research Note on Uncertain Probabilities and Abstract Argumentation
Authors:
Pietro Baroni,
Federico Cerutti,
Massimiliano Giacomin,
Lance M. Kaplan,
Murat Sensoy
Abstract:
The sixth assessment of the international panel on climate change (IPCC) states that "cumulative net CO2 emissions over the last decade (2010-2019) are about the same size as the 11 remaining carbon budget likely to limit warming to 1.5C (medium confidence)." Such reports directly feed the public discourse, but nuances such as the degree of belief and of confidence are often lost. In this paper, w…
▽ More
The sixth assessment of the international panel on climate change (IPCC) states that "cumulative net CO2 emissions over the last decade (2010-2019) are about the same size as the 11 remaining carbon budget likely to limit warming to 1.5C (medium confidence)." Such reports directly feed the public discourse, but nuances such as the degree of belief and of confidence are often lost. In this paper, we propose a formal account for allowing such degrees of belief and the associated confidence to be used to label arguments in abstract argumentation settings. Differently from other proposals in probabilistic argumentation, we focus on the task of probabilistic inference over a chosen query building upon Sato's distribution semantics which has been already shown to encompass a variety of cases including the semantics of Bayesian networks. Borrowing from the vast literature on such semantics, we examine how such tasks can be dealt with in practice when considering uncertain probabilities, and discuss the connections with existing proposals for probabilistic argumentation.
△ Less
Submitted 23 August, 2022;
originally announced August 2022.
-
SOLBP: Second-Order Loopy Belief Propagation for Inference in Uncertain Bayesian Networks
Authors:
Conrad D. Hougen,
Lance M. Kaplan,
Magdalena Ivanovska,
Federico Cerutti,
Kumar Vijay Mishra,
Alfred O. Hero III
Abstract:
In second-order uncertain Bayesian networks, the conditional probabilities are only known within distributions, i.e., probabilities over probabilities. The delta-method has been applied to extend exact first-order inference methods to propagate both means and variances through sum-product networks derived from Bayesian networks, thereby characterizing epistemic uncertainty, or the uncertainty in t…
▽ More
In second-order uncertain Bayesian networks, the conditional probabilities are only known within distributions, i.e., probabilities over probabilities. The delta-method has been applied to extend exact first-order inference methods to propagate both means and variances through sum-product networks derived from Bayesian networks, thereby characterizing epistemic uncertainty, or the uncertainty in the model itself. Alternatively, second-order belief propagation has been demonstrated for polytrees but not for general directed acyclic graph structures. In this work, we extend Loopy Belief Propagation to the setting of second-order Bayesian networks, giving rise to Second-Order Loopy Belief Propagation (SOLBP). For second-order Bayesian networks, SOLBP generates inferences consistent with those generated by sum-product networks, while being more computationally efficient and scalable.
△ Less
Submitted 16 August, 2022;
originally announced August 2022.
-
Uncertain Bayesian Networks: Learning from Incomplete Data
Authors:
Conrad D. Hougen,
Lance M. Kaplan,
Federico Cerutti,
Alfred O. Hero III
Abstract:
When the historical data are limited, the conditional probabilities associated with the nodes of Bayesian networks are uncertain and can be empirically estimated. Second order estimation methods provide a framework for both estimating the probabilities and quantifying the uncertainty in these estimates. We refer to these cases as uncer tain or second-order Bayesian networks. When such data are com…
▽ More
When the historical data are limited, the conditional probabilities associated with the nodes of Bayesian networks are uncertain and can be empirically estimated. Second order estimation methods provide a framework for both estimating the probabilities and quantifying the uncertainty in these estimates. We refer to these cases as uncer tain or second-order Bayesian networks. When such data are complete, i.e., all variable values are observed for each instantiation, the conditional probabilities are known to be Dirichlet-distributed. This paper improves the current state-of-the-art approaches for handling uncertain Bayesian networks by enabling them to learn distributions for their parameters, i.e., conditional probabilities, with incomplete data. We extensively evaluate various methods to learn the posterior of the parameters through the desired and empirically derived strength of confidence bounds for various queries.
△ Less
Submitted 8 August, 2022;
originally announced August 2022.
-
A Survey on Uncertainty Reasoning and Quantification for Decision Making: Belief Theory Meets Deep Learning
Authors:
Zhen Guo,
Zelin Wan,
Qisheng Zhang,
Xujiang Zhao,
Feng Chen,
Jin-Hee Cho,
Qi Zhang,
Lance M. Kaplan,
Dong H. Jeong,
Audun Jøsang
Abstract:
An in-depth understanding of uncertainty is the first step to making effective decisions under uncertainty. Deep/machine learning (ML/DL) has been hugely leveraged to solve complex problems involved with processing high-dimensional data. However, reasoning and quantifying different types of uncertainties to achieve effective decision-making have been much less explored in ML/DL than in other Artif…
▽ More
An in-depth understanding of uncertainty is the first step to making effective decisions under uncertainty. Deep/machine learning (ML/DL) has been hugely leveraged to solve complex problems involved with processing high-dimensional data. However, reasoning and quantifying different types of uncertainties to achieve effective decision-making have been much less explored in ML/DL than in other Artificial Intelligence (AI) domains. In particular, belief/evidence theories have been studied in KRR since the 1960s to reason and measure uncertainties to enhance decision-making effectiveness. We found that only a few studies have leveraged the mature uncertainty research in belief/evidence theories in ML/DL to tackle complex problems under different types of uncertainty. In this survey paper, we discuss several popular belief theories and their core ideas dealing with uncertainty causes and types and quantifying them, along with the discussions of their applicability in ML/DL. In addition, we discuss three main approaches that leverage belief theories in Deep Neural Networks (DNNs), including Evidential DNNs, Fuzzy DNNs, and Rough DNNs, in terms of their uncertainty causes, types, and quantification methods along with their applicability in diverse problem domains. Based on our in-depth survey, we discuss insights, lessons learned, limitations of the current state-of-the-art bridging belief theories and ML/DL, and finally, future research directions.
△ Less
Submitted 13 June, 2022; v1 submitted 12 June, 2022;
originally announced June 2022.
-
Handling Epistemic and Aleatory Uncertainties in Probabilistic Circuits
Authors:
Federico Cerutti,
Lance M. Kaplan,
Angelika Kimmig,
Murat Sensoy
Abstract:
When collaborating with an AI system, we need to assess when to trust its recommendations. If we mistakenly trust it in regions where it is likely to err, catastrophic failures may occur, hence the need for Bayesian approaches for probabilistic reasoning in order to determine the confidence (or epistemic uncertainty) in the probabilities in light of the training data. We propose an approach to ove…
▽ More
When collaborating with an AI system, we need to assess when to trust its recommendations. If we mistakenly trust it in regions where it is likely to err, catastrophic failures may occur, hence the need for Bayesian approaches for probabilistic reasoning in order to determine the confidence (or epistemic uncertainty) in the probabilities in light of the training data. We propose an approach to overcome the independence assumption behind most of the approaches dealing with a large class of probabilistic reasoning that includes Bayesian networks as well as several instances of probabilistic logic. We provide an algorithm for Bayesian learning from sparse, albeit complete, observations, and for deriving inferences and their confidences keeping track of the dependencies between variables when they are manipulated within the unifying computational formalism provided by probabilistic circuits. Each leaf of such circuits is labelled with a beta-distributed random variable that provides us with an elegant framework for representing uncertain probabilities. We achieve better estimation of epistemic uncertainty than state-of-the-art approaches, including highly engineered ones, while being able to handle general circuits and with just a modest increase in the computational effort compared to using point probabilities.
△ Less
Submitted 22 February, 2021;
originally announced February 2021.
-
HONEM: Learning Embedding for Higher Order Networks
Authors:
Mandana Saebi,
Giovanni Luca Ciampaglia,
Lance M Kaplan,
Nitesh V Chawla
Abstract:
Representation learning on networks offers a powerful alternative to the oft painstaking process of manual feature engineering, and as a result, has enjoyed considerable success in recent years. However, all the existing representation learning methods are based on the first-order network (FON), that is, the network that only captures the pairwise interactions between the nodes. As a result, these…
▽ More
Representation learning on networks offers a powerful alternative to the oft painstaking process of manual feature engineering, and as a result, has enjoyed considerable success in recent years. However, all the existing representation learning methods are based on the first-order network (FON), that is, the network that only captures the pairwise interactions between the nodes. As a result, these methods may fail to incorporate non-Markovian higher-order dependencies in the network. Thus, the embeddings that are generated may not accurately represent of the underlying phenomena in a network, resulting in inferior performance in different inductive or transductive learning tasks. To address this challenge, this paper presents HONEM, a higher-order network embedding method that captures the non-Markovian higher-order dependencies in a network. HONEM is specifically designed for the higher-order network structure (HON) and outperforms other state-of-the-art methods in node classification, network re-construction, link prediction, and visualization for networks that contain non-Markovian higher-order dependencies.
△ Less
Submitted 3 September, 2020; v1 submitted 14 August, 2019;
originally announced August 2019.
-
Efficient modeling of higher-order dependencies in networks: from algorithm to application for anomaly detection
Authors:
Mandana Saebi,
Jian Xu,
Lance M. Kaplan,
Bruno Ribeiro,
Nitesh V. Chawla
Abstract:
Complex systems, represented as dynamic networks, comprise of components that influence each other via direct and/or indirect interactions. Recent research has shown the importance of using Higher-Order Networks (HONs) for modeling and analyzing such complex systems, as the typical Markovian assumption in developing the First Order Network (FON) can be limiting. This higher-order network represent…
▽ More
Complex systems, represented as dynamic networks, comprise of components that influence each other via direct and/or indirect interactions. Recent research has shown the importance of using Higher-Order Networks (HONs) for modeling and analyzing such complex systems, as the typical Markovian assumption in developing the First Order Network (FON) can be limiting. This higher-order network representation not only creates a more accurate representation of the underlying complex system, but also leads to more accurate network analysis. In this paper, we first present a scalable and accurate model, BuildHON+, for higher-order network representation of data derived from a complex system with various orders of dependencies. Then, we show that this higher-order network representation modeled by BuildHON+ is significantly more accurate in identifying anomalies than FON, demonstrating a need for the higher-order network representation and modeling of complex systems for deriving meaningful conclusions.
△ Less
Submitted 11 October, 2020; v1 submitted 27 December, 2017;
originally announced December 2017.
-
Attack Detection in Sensor Network Target Localization Systems with Quantized Data
Authors:
Jiangfan Zhang,
Xiaodong Wang,
Rick S. Blum,
Lance M. Kaplan
Abstract:
We consider a sensor network focused on target localization, where sensors measure the signal strength emitted from the target. Each measurement is quantized to one bit and sent to the fusion center. A general attack is considered at some sensors that attempts to cause the fusion center to produce an inaccurate estimation of the target location with a large mean-square-error. The attack is a combi…
▽ More
We consider a sensor network focused on target localization, where sensors measure the signal strength emitted from the target. Each measurement is quantized to one bit and sent to the fusion center. A general attack is considered at some sensors that attempts to cause the fusion center to produce an inaccurate estimation of the target location with a large mean-square-error. The attack is a combination of man-in-the-middle, hacking, and spoofing attacks that can effectively change both signals going into and coming out of the sensor nodes in a realistic manner. We show that the essential effect of attacks is to alter the estimated distance between the target and each attacked sensor to a different extent, giving rise to a geometric inconsistency among the attacked and unattacked sensors. Hence, with the help of two secure sensors, a class of detectors are proposed to detect the attacked sensors by scrutinizing the existence of the geometric inconsistency. We show that the false alarm and miss probabilities of the proposed detectors decrease exponentially as the number of measurement samples increases, which implies that for sufficiently large number of samples, the proposed detectors can identify the attacked and unattacked sensors with any required accuracy.
△ Less
Submitted 15 May, 2017;
originally announced May 2017.
-
MetaPAD: Meta Pattern Discovery from Massive Text Corpora
Authors:
Meng Jiang,
Jingbo Shang,
Taylor Cassidy,
Xiang Ren,
Lance M. Kaplan,
Timothy P. Hanratty,
Jiawei Han
Abstract:
Mining textual patterns in news, tweets, papers, and many other kinds of text corpora has been an active theme in text mining and NLP research. Previous studies adopt a dependency parsing-based pattern discovery approach. However, the parsing results lose rich context around entities in the patterns, and the process is costly for a corpus of large scale. In this study, we propose a novel typed tex…
▽ More
Mining textual patterns in news, tweets, papers, and many other kinds of text corpora has been an active theme in text mining and NLP research. Previous studies adopt a dependency parsing-based pattern discovery approach. However, the parsing results lose rich context around entities in the patterns, and the process is costly for a corpus of large scale. In this study, we propose a novel typed textual pattern structure, called meta pattern, which is extended to a frequent, informative, and precise subsequence pattern in certain context. We propose an efficient framework, called MetaPAD, which discovers meta patterns from massive corpora with three techniques: (1) it develops a context-aware segmentation method to carefully determine the boundaries of patterns with a learnt pattern quality assessment function, which avoids costly dependency parsing and generates high-quality patterns; (2) it identifies and groups synonymous meta patterns from multiple facets---their types, contexts, and extractions; and (3) it examines type distributions of entities in the instances extracted by each group of patterns, and looks for appropriate type levels to make discovered patterns precise. Experiments demonstrate that our proposed framework discovers high-quality typed textual patterns efficiently from different genres of massive corpora and facilitates information extraction.
△ Less
Submitted 14 March, 2017; v1 submitted 12 March, 2017;
originally announced March 2017.
-
Meta-Path Guided Embedding for Similarity Search in Large-Scale Heterogeneous Information Networks
Authors:
Jingbo Shang,
Meng Qu,
Jialu Liu,
Lance M. Kaplan,
Jiawei Han,
Jian Peng
Abstract:
Most real-world data can be modeled as heterogeneous information networks (HINs) consisting of vertices of multiple types and their relationships. Search for similar vertices of the same type in large HINs, such as bibliographic networks and business-review networks, is a fundamental problem with broad applications. Although similarity search in HINs has been studied previously, most existing appr…
▽ More
Most real-world data can be modeled as heterogeneous information networks (HINs) consisting of vertices of multiple types and their relationships. Search for similar vertices of the same type in large HINs, such as bibliographic networks and business-review networks, is a fundamental problem with broad applications. Although similarity search in HINs has been studied previously, most existing approaches neither explore rich semantic information embedded in the network structures nor take user's preference as a guidance.
In this paper, we re-examine similarity search in HINs and propose a novel embedding-based framework. It models vertices as low-dimensional vectors to explore network structure-embedded similarity. To accommodate user preferences at defining similarity semantics, our proposed framework, ESim, accepts user-defined meta-paths as guidance to learn vertex vectors in a user-preferred embedding space. Moreover, an efficient and parallel sampling-based optimization algorithm has been developed to learn embeddings in large-scale HINs. Extensive experiments on real-world large-scale HINs demonstrate a significant improvement on the effectiveness of ESim over several state-of-the-art algorithms as well as its scalability.
△ Less
Submitted 30 October, 2016;
originally announced October 2016.
-
Measuring Two-Event Structural Correlations on Graphs
Authors:
Ziyu Guan,
Xifeng Yan,
Lance M. Kaplan
Abstract:
Real-life graphs usually have various kinds of events happening on them, e.g., product purchases in online social networks and intrusion alerts in computer networks. The occurrences of events on the same graph could be correlated, exhibiting either attraction or repulsion. Such structural correlations can reveal important relationships between different events. Unfortunately, correlation relations…
▽ More
Real-life graphs usually have various kinds of events happening on them, e.g., product purchases in online social networks and intrusion alerts in computer networks. The occurrences of events on the same graph could be correlated, exhibiting either attraction or repulsion. Such structural correlations can reveal important relationships between different events. Unfortunately, correlation relationships on graph structures are not well studied and cannot be captured by traditional measures. In this work, we design a novel measure for assessing two-event structural correlations on graphs. Given the occurrences of two events, we choose uniformly a sample of "reference nodes" from the vicinity of all event nodes and employ the Kendall's tau rank correlation measure to compute the average concordance of event density changes. Significance can be efficiently assessed by tau's nice property of being asymptotically normal under the null hypothesis. In order to compute the measure in large scale networks, we develop a scalable framework using different sampling strategies. The complexity of these strategies is analyzed. Experiments on real graph datasets with both synthetic and real events demonstrate that the proposed framework is not only efficacious, but also efficient and scalable.
△ Less
Submitted 1 August, 2012;
originally announced August 2012.