-
NPGLM: A Non-Parametric Method for Temporal Link Prediction
Authors:
Sina Sajadmanesh,
Jiawei Zhang,
Hamid R. Rabiee
Abstract:
In this paper, we try to solve the problem of temporal link prediction in information networks. This implies predicting the time it takes for a link to appear in the future, given its features that have been extracted at the current network snapshot. To this end, we introduce a probabilistic non-parametric approach, called "Non-Parametric Generalized Linear Model" (NP-GLM), which infers the hidden…
▽ More
In this paper, we try to solve the problem of temporal link prediction in information networks. This implies predicting the time it takes for a link to appear in the future, given its features that have been extracted at the current network snapshot. To this end, we introduce a probabilistic non-parametric approach, called "Non-Parametric Generalized Linear Model" (NP-GLM), which infers the hidden underlying probability distribution of the link advent time given its features. We then present a learning algorithm for NP-GLM and an inference method to answer time-related queries. Extensive experiments conducted on both synthetic data and real-world Sina Weibo social network demonstrate the effectiveness of NP-GLM in solving temporal link prediction problem vis-a-vis competitive baselines.
△ Less
Submitted 21 June, 2017;
originally announced June 2017.
-
DANI: A Fast Diffusion Aware Network Inference Algorithm
Authors:
Maryam Ramezani,
Hamid R. Rabiee,
Maryam Tahani,
Arezoo Rajabi
Abstract:
The fast growth of social networks and their privacy requirements in recent years, has lead to increasing difficulty in obtaining complete topology of these networks. However, diffusion information over these networks is available and many algorithms have been proposed to infer the underlying networks by using this information. The previously proposed algorithms only focus on inferring more links…
▽ More
The fast growth of social networks and their privacy requirements in recent years, has lead to increasing difficulty in obtaining complete topology of these networks. However, diffusion information over these networks is available and many algorithms have been proposed to infer the underlying networks by using this information. The previously proposed algorithms only focus on inferring more links and do not pay attention to the important characteristics of the underlying social networks In this paper, we propose a novel algorithm, called DANI, to infer the underlying network structure while preserving its properties by using the diffusion information. Moreover, the running time of the proposed method is considerably lower than the previous methods. We applied the proposed method to both real and synthetic networks. The experimental results showed that DANI has higher accuracy and lower run time compared to well-known network inference methods.
△ Less
Submitted 3 June, 2017;
originally announced June 2017.
-
Sensor Selection Cost Optimization for Tracking Structurally Cyclic Systems: a P-Order Solution
Authors:
Mohammadreza Doostmohammadian,
Houman Zarrabi,
Hamid R. Rabiee
Abstract:
Measurements and sensing implementations impose certain cost in sensor networks. The sensor selection cost optimization is the problem of minimizing the sensing cost of monitoring a physical (or cyber- physical) system. Consider a given set of sensors tracking states of a dynamical system for estimation purposes. For each sensor assume different costs to measure different (realizable) states. The…
▽ More
Measurements and sensing implementations impose certain cost in sensor networks. The sensor selection cost optimization is the problem of minimizing the sensing cost of monitoring a physical (or cyber- physical) system. Consider a given set of sensors tracking states of a dynamical system for estimation purposes. For each sensor assume different costs to measure different (realizable) states. The idea is to assign sensors to measure states such that the global cost is minimized. The number and selection of sensor measurements need to ensure the observability to track the dynamic state of the system with bounded estimation error. The main question we address is how to select the state measurements to minimize the cost while satisfying the observability conditions. Relaxing the observability condition for structurally cyclic systems, the main contribution is to propose a graph theoretic approach to solve the problem in polynomial time. Note that, polynomial time algorithms are suitable for large-scale systems as their running time is upper-bounded by a polynomial expression in the size of input for the algorithm. We frame the problem as a linear sum assignment with solution complexity of O(m3).
△ Less
Submitted 26 May, 2017;
originally announced May 2017.
-
A Hybrid Deep Learning Architecture for Privacy-Preserving Mobile Analytics
Authors:
Seyed Ali Osia,
Ali Shahin Shamsabadi,
Sina Sajadmanesh,
Ali Taheri,
Kleomenis Katevas,
Hamid R. Rabiee,
Nicholas D. Lane,
Hamed Haddadi
Abstract:
Internet of Things (IoT) devices and applications are being deployed in our homes and workplaces. These devices often rely on continuous data collection to feed machine learning models. However, this approach introduces several privacy and efficiency challenges, as the service operator can perform unwanted inferences on the available data. Recently, advances in edge processing have paved the way f…
▽ More
Internet of Things (IoT) devices and applications are being deployed in our homes and workplaces. These devices often rely on continuous data collection to feed machine learning models. However, this approach introduces several privacy and efficiency challenges, as the service operator can perform unwanted inferences on the available data. Recently, advances in edge processing have paved the way for more efficient, and private, data processing at the source for simple tasks and lighter models, though they remain a challenge for larger, and more complicated models. In this paper, we present a hybrid approach for breaking down large, complex deep neural networks for cooperative, privacy-preserving analytics. To this end, instead of performing the whole operation on the cloud, we let an IoT device to run the initial layers of the neural network, and then send the output to the cloud to feed the remaining layers and produce the final result. In order to ensure that the user's device contains no extra information except what is necessary for the main task and preventing any secondary inference on the data, we introduce Siamese fine-tuning. We evaluate the privacy benefits of this approach based on the information exposed to the cloud service. We also assess the local inference cost of different layers on a modern handset. Our evaluations show that by using Siamese fine-tuning and at a small processing cost, we can greatly reduce the level of unnecessary, potentially sensitive information in the personal data, and thus achieving the desired trade-off between utility, privacy, and performance.
△ Less
Submitted 26 December, 2019; v1 submitted 8 March, 2017;
originally announced March 2017.
-
Recurrent Poisson Factorization for Temporal Recommendation
Authors:
Seyed Abbas Hosseini,
Keivan Alizadeh,
Ali Khodadadi,
Ali Arabzadeh,
Mehrdad Farajtabar,
Hongyuan Zha,
Hamid R. Rabiee
Abstract:
Poisson factorization is a probabilistic model of users and items for recommendation systems, where the so-called implicit consumer data is modeled by a factorized Poisson distribution. There are many variants of Poisson factorization methods who show state-of-the-art performance on real-world recommendation tasks. However, most of them do not explicitly take into account the temporal behavior and…
▽ More
Poisson factorization is a probabilistic model of users and items for recommendation systems, where the so-called implicit consumer data is modeled by a factorized Poisson distribution. There are many variants of Poisson factorization methods who show state-of-the-art performance on real-world recommendation tasks. However, most of them do not explicitly take into account the temporal behavior and the recurrent activities of users which is essential to recommend the right item to the right user at the right time. In this paper, we introduce Recurrent Poisson Factorization (RPF) framework that generalizes the classical PF methods by utilizing a Poisson process for modeling the implicit feedback. RPF treats time as a natural constituent of the model and brings to the table a rich family of time-sensitive factorization models. To elaborate, we instantiate several variants of RPF who are capable of handling dynamic user preferences and item specification (DRPF), modeling the social-aspect of product adoption (SRPF), and capturing the consumption heterogeneity among users and items (HRPF). We also develop a variational algorithm for approximate posterior inference that scales up to massive data sets. Furthermore, we demonstrate RPF's superior performance over many state-of-the-art methods on synthetic dataset, and large scale real-world datasets on music streaming logs, and user-item interactions in M-Commerce platforms.
△ Less
Submitted 4 March, 2017;
originally announced March 2017.
-
Continuous-Time User Modeling in the Presence of Badges: A Probabilistic Approach
Authors:
Ali Khodadadi,
Seyed Abbas Hosseini,
Erfan Tavakoli,
Hamid R. Rabiee
Abstract:
User modeling plays an important role in delivering customized web services to the users and improving their engagement. However, most user models in the literature do not explicitly consider the temporal behavior of users. More recently, continuous-time user modeling has gained considerable attention and many user behavior models have been proposed based on temporal point processes. However, typi…
▽ More
User modeling plays an important role in delivering customized web services to the users and improving their engagement. However, most user models in the literature do not explicitly consider the temporal behavior of users. More recently, continuous-time user modeling has gained considerable attention and many user behavior models have been proposed based on temporal point processes. However, typical point process based models often considered the impact of peer influence and content on the user participation and neglected other factors. Gamification elements, are among those factors that are neglected, while they have a strong impact on user participation in online services. In this paper, we propose interdependent multi-dimensional temporal point processes that capture the impact of badges on user participation besides the peer influence and content factors. We extend the proposed processes to model user actions over the community based question and answering websites, and propose an inference algorithm based on Variational-EM that can efficiently learn the model parameters. Extensive experiments on both synthetic and real data gathered from Stack Overflow show that our inference algorithm learns the parameters efficiently and the proposed method can better predict the user behavior compared to the alternatives.
△ Less
Submitted 7 February, 2017;
originally announced February 2017.
-
Spatio-Temporal Modeling of Users' Check-ins in Location-Based Social Networks
Authors:
Ali Zarezade,
Sina Jafarzadeh,
Hamid R. Rabiee
Abstract:
Social networks are getting closer to our real physical world. People share the exact location and time of their check-ins and are influenced by their friends. Modeling the spatio-temporal behavior of users in social networks is of great importance for predicting the future behavior of users, controlling the users' movements, and finding the latent influence network. It is observed that users have…
▽ More
Social networks are getting closer to our real physical world. People share the exact location and time of their check-ins and are influenced by their friends. Modeling the spatio-temporal behavior of users in social networks is of great importance for predicting the future behavior of users, controlling the users' movements, and finding the latent influence network. It is observed that users have periodic patterns in their movements. Also, they are influenced by the locations that their close friends recently visited. Leveraging these two observations, we propose a probabilistic model based on a doubly stochastic point process with a periodic decaying kernel for the time of check-ins and a time-varying multinomial distribution for the location of check-ins of users in the location-based social networks. We learn the model parameters using an efficient EM algorithm, which distributes over the users. Experiments on synthetic and real data gathered from Foursquare show that the proposed inference algorithm learns the parameters efficiently and our model outperforms the other alternatives in the prediction of time and location of check-ins.
△ Less
Submitted 10 April, 2017; v1 submitted 23 November, 2016;
originally announced November 2016.
-
Kissing Cuisines: Exploring Worldwide Culinary Habits on the Web
Authors:
Sina Sajadmanesh,
Sina Jafarzadeh,
Seyed Ali Osia,
Hamid R. Rabiee,
Hamed Haddadi,
Yelena Mejova,
Mirco Musolesi,
Emiliano De Cristofaro,
Gianluca Stringhini
Abstract:
Food and nutrition occupy an increasingly prevalent space on the web, and dishes and recipes shared online provide an invaluable mirror into culinary cultures and attitudes around the world. More specifically, ingredients, flavors, and nutrition information become strong signals of the taste preferences of individuals and civilizations. However, there is little understanding of these palate variet…
▽ More
Food and nutrition occupy an increasingly prevalent space on the web, and dishes and recipes shared online provide an invaluable mirror into culinary cultures and attitudes around the world. More specifically, ingredients, flavors, and nutrition information become strong signals of the taste preferences of individuals and civilizations. However, there is little understanding of these palate varieties. In this paper, we present a large-scale study of recipes published on the web and their content, aiming to understand cuisines and culinary habits around the world. Using a database of more than 157K recipes from over 200 different cuisines, we analyze ingredients, flavors, and nutritional values which distinguish dishes from different regions, and use this knowledge to assess the predictability of recipes from different cuisines. We then use country health statistics to understand the relation between these factors and health indicators of different nations, such as obesity, diabetes, migration, and health expenditure. Our results confirm the strong effects of geographical and cultural similarities on recipes, health indicators, and culinary preferences across the globe.
△ Less
Submitted 25 April, 2017; v1 submitted 26 October, 2016;
originally announced October 2016.
-
HNP3: A Hierarchical Nonparametric Point Process for Modeling Content Diffusion over Social Media
Authors:
Seyed Abbas Hosseini,
Ali Khodadadi,
Soheil Arabzade,
Hamid R. Rabiee
Abstract:
This paper introduces a novel framework for modeling temporal events with complex longitudinal dependency that are generated by dependent sources. This framework takes advantage of multidimensional point processes for modeling time of events. The intensity function of the proposed process is a mixture of intensities, and its complexity grows with the complexity of temporal patterns of data. Moreov…
▽ More
This paper introduces a novel framework for modeling temporal events with complex longitudinal dependency that are generated by dependent sources. This framework takes advantage of multidimensional point processes for modeling time of events. The intensity function of the proposed process is a mixture of intensities, and its complexity grows with the complexity of temporal patterns of data. Moreover, it utilizes a hierarchical dependent nonparametric approach to model marks of events. These capabilities allow the proposed model to adapt its temporal and topical complexity according to the complexity of data, which makes it a suitable candidate for real world scenarios. An online inference algorithm is also proposed that makes the framework applicable to a vast range of applications. The framework is applied to a real world application, modeling the diffusion of contents over networks. Extensive experiments reveal the effectiveness of the proposed framework in comparison with state-of-the-art methods.
△ Less
Submitted 2 October, 2016;
originally announced October 2016.
-
Predicting Anchor Links between Heterogeneous Social Networks
Authors:
Sina Sajadmanesh,
Hamid R. Rabiee,
Ali Khodadadi
Abstract:
People usually get involved in multiple social networks to enjoy new services or to fulfill their needs. Many new social networks try to attract users of other existing networks to increase the number of their users. Once a user (called source user) of a social network (called source network) joins a new social network (called target network), a new inter-network link (called anchor link) is forme…
▽ More
People usually get involved in multiple social networks to enjoy new services or to fulfill their needs. Many new social networks try to attract users of other existing networks to increase the number of their users. Once a user (called source user) of a social network (called source network) joins a new social network (called target network), a new inter-network link (called anchor link) is formed between the source and target networks. In this paper, we concentrated on predicting the formation of such anchor links between heterogeneous social networks. Unlike conventional link prediction problems in which the formation of a link between two existing users within a single network is predicted, in anchor link prediction, the target user is missing and will be added to the target network once the anchor link is created. To solve this problem, we use meta-paths as a powerful tool for utilizing heterogeneous information in both the source and target networks. To this end, we propose an effective general meta-path-based approach called Connector and Recursive Meta-Paths (CRMP). By using those two different categories of meta-paths, we model different aspects of social factors that may affect a source user to join the target network, resulting in the formation of a new anchor link. Extensive experiments on real-world heterogeneous social networks demonstrate the effectiveness of the proposed method against the recent methods.
△ Less
Submitted 1 August, 2016; v1 submitted 29 July, 2016;
originally announced July 2016.
-
Bayesian Overlapping Community Detection in Dynamic Networks
Authors:
Mahsa Ghorbani,
Hamid R. Rabiee,
Ali Khodadadi
Abstract:
Detecting community structures in social networks has gained considerable attention in recent years. However, lack of prior knowledge about the number of communities, and their overlapping nature have made community detection a challenging problem. Moreover, many of the existing methods only consider static networks, while most of real world networks are dynamic and evolve over time. Hence, findin…
▽ More
Detecting community structures in social networks has gained considerable attention in recent years. However, lack of prior knowledge about the number of communities, and their overlapping nature have made community detection a challenging problem. Moreover, many of the existing methods only consider static networks, while most of real world networks are dynamic and evolve over time. Hence, finding consistent overlapping communities in dynamic networks without any prior knowledge about the number of communities is still an interesting open research problem. In this paper, we present an overlapping community detection method for dynamic networks called Dynamic Bayesian Overlapping Community Detector (DBOCD). DBOCD assumes that in every snapshot of network, overlapping parts of communities are dense areas and utilizes link communities instead of common node communities. Using Recurrent Chinese Restaurant Process and community structure of the network in the last snapshot, DBOCD simultaneously extracts the number of communities and soft community memberships of nodes while maintaining the consistency of communities over time. We evaluated DBOCD on both synthetic and real dynamic data-sets to assess its ability to find overlapping communities in different types of network evolution. The results show that DBOCD outperforms the recent state of the art dynamic community detection methods.
△ Less
Submitted 8 May, 2016;
originally announced May 2016.
-
Fundamental Limits of Pooled-DNA Sequencing
Authors:
Amir Najafi,
Damoun Nashta-ali,
Seyed Abolfazl Motahari,
Mehrdad Khani,
Babak H. Khalaj,
Hamid R. Rabiee
Abstract:
In this paper, fundamental limits in sequencing of a set of closely related DNA molecules are addressed. This problem is called pooled-DNA sequencing which encompasses many interesting problems such as haplotype phasing, metageomics, and conventional pooled-DNA sequencing in the absence of tagging. From an information theoretic point of view, we have proposed fundamental limits on the number and l…
▽ More
In this paper, fundamental limits in sequencing of a set of closely related DNA molecules are addressed. This problem is called pooled-DNA sequencing which encompasses many interesting problems such as haplotype phasing, metageomics, and conventional pooled-DNA sequencing in the absence of tagging. From an information theoretic point of view, we have proposed fundamental limits on the number and length of DNA reads in order to achieve a reliable assembly of all the pooled DNA sequences. In particular, pooled-DNA sequencing from both noiseless and noisy reads are investigated in this paper. In the noiseless case, necessary and sufficient conditions on perfect assembly are derived. Moreover, asymptotically tight lower and upper bounds on the error probability of correct assembly are obtained under a biologically plausible probabilistic model. For the noisy case, we have proposed two novel DNA read denoising methods, as well as corresponding upper bounds on assembly error probabilities. It has been shown that, under mild circumstances, the performance of the reliable assembly converges to that of the noiseless regime when, for a given read length, the number of DNA reads is sufficiently large. Interestingly, the emergence of long DNA read technologies in recent years envisions the applicability of our results in real-world applications.
△ Less
Submitted 19 April, 2016; v1 submitted 16 April, 2016;
originally announced April 2016.
-
A Bayesian Approach to the Data Description Problem
Authors:
Alireza Ghasemi,
Hamid R. Rabiee,
Mohammad T. Manzuri,
M. H. Rohban
Abstract:
In this paper, we address the problem of data description using a Bayesian framework. The goal of data description is to draw a boundary around objects of a certain class of interest to discriminate that class from the rest of the feature space. Data description is also known as one-class learning and has a wide range of applications.
The proposed approach uses a Bayesian framework to precisely…
▽ More
In this paper, we address the problem of data description using a Bayesian framework. The goal of data description is to draw a boundary around objects of a certain class of interest to discriminate that class from the rest of the feature space. Data description is also known as one-class learning and has a wide range of applications.
The proposed approach uses a Bayesian framework to precisely compute the class boundary and therefore can utilize domain information in form of prior knowledge in the framework. It can also operate in the kernel space and therefore recognize arbitrary boundary shapes. Moreover, the proposed method can utilize unlabeled data in order to improve accuracy of discrimination.
We evaluate our method using various real-world datasets and compare it with other state of the art approaches of data description. Experiments show promising results and improved performance over other data description and one-class learning algorithms.
△ Less
Submitted 24 February, 2016;
originally announced February 2016.
-
Active Learning from Positive and Unlabeled Data
Authors:
Alireza Ghasemi,
Hamid R. Rabiee,
Mohsen Fadaee,
Mohammad T. Manzuri,
Mohammad H. Rohban
Abstract:
During recent years, active learning has evolved into a popular paradigm for utilizing user's feedback to improve accuracy of learning algorithms. Active learning works by selecting the most informative sample among unlabeled data and querying the label of that point from user. Many different methods such as uncertainty sampling and minimum risk sampling have been utilized to select the most infor…
▽ More
During recent years, active learning has evolved into a popular paradigm for utilizing user's feedback to improve accuracy of learning algorithms. Active learning works by selecting the most informative sample among unlabeled data and querying the label of that point from user. Many different methods such as uncertainty sampling and minimum risk sampling have been utilized to select the most informative sample in active learning. Although many active learning algorithms have been proposed so far, most of them work with binary or multi-class classification problems and therefore can not be applied to problems in which only samples from one class as well as a set of unlabeled data are available.
Such problems arise in many real-world situations and are known as the problem of learning from positive and unlabeled data. In this paper we propose an active learning algorithm that can work when only samples of one class as well as a set of unlabelled data are available. Our method works by separately estimating probability desnity of positive and unlabeled points and then computing expected value of informativeness to get rid of a hyper-parameter and have a better measure of informativeness./ Experiments and empirical analysis show promising results compared to other similar methods.
△ Less
Submitted 24 February, 2016;
originally announced February 2016.
-
Correlated Cascades: Compete or Cooperate
Authors:
Ali Zarezade,
Ali Khodadadi,
Mehrdad Farajtabar,
Hamid R. Rabiee,
Hongyuan Zha
Abstract:
In real world social networks, there are multiple cascades which are rarely independent. They usually compete or cooperate with each other. Motivated by the reinforcement theory in sociology we leverage the fact that adoption of a user to any behavior is modeled by the aggregation of behaviors of its neighbors. We use a multidimensional marked Hawkes process to model users product adoption and con…
▽ More
In real world social networks, there are multiple cascades which are rarely independent. They usually compete or cooperate with each other. Motivated by the reinforcement theory in sociology we leverage the fact that adoption of a user to any behavior is modeled by the aggregation of behaviors of its neighbors. We use a multidimensional marked Hawkes process to model users product adoption and consequently spread of cascades in social networks. The resulting inference problem is proved to be convex and is solved in parallel by using the barrier method. The advantage of the proposed model is twofold; it models correlated cascades and also learns the latent diffusion network. Experimental results on synthetic and two real datasets gathered from Twitter, URL shortening and music streaming services, illustrate the superior performance of the proposed model over the alternatives.
△ Less
Submitted 22 November, 2016; v1 submitted 4 October, 2015;
originally announced October 2015.
-
Diffusion-Aware Sampling and Estimation in Information Diffusion Networks
Authors:
Motahareh Eslami Mehdiabadi,
Hamid R. Rabiee,
Mostafa Salehi
Abstract:
Partially-observed data collected by sampling methods is often being studied to obtain the characteristics of information diffusion networks. However, these methods usually do not consider the behavior of diffusion process. In this paper, we propose a novel two-step (sampling/estimation) measurement framework by utilizing the diffusion process characteristics. To this end, we propose a link-tracin…
▽ More
Partially-observed data collected by sampling methods is often being studied to obtain the characteristics of information diffusion networks. However, these methods usually do not consider the behavior of diffusion process. In this paper, we propose a novel two-step (sampling/estimation) measurement framework by utilizing the diffusion process characteristics. To this end, we propose a link-tracing based sampling design which uses the infection times as local information without any knowledge about the latent structure of diffusion network. To correct the bias of sampled data, we introduce three estimators for different categories; link-based, node-based, and cascade-based. To the best of our knowledge, this is the first attempt to introduce a complete measurement framework for diffusion networks. We also show that the estimator plays an important role in correcting the bias of sampling from diffusion networks. Our comprehensive empirical analysis over large synthetic and real datasets demonstrates that in average, the proposed framework outperforms the common BFS and RW sampling methods in terms of link-based characteristics by about 37% and 35%, respectively.
△ Less
Submitted 29 May, 2014;
originally announced May 2014.
-
Sampling from Diffusion Networks
Authors:
Motahareh Eslami Mehdiabadi,
Hamid R. Rabiee,
Mostafa Salehi
Abstract:
The diffusion phenomenon has a remarkable impact on Online Social Networks (OSNs). Gathering diffusion data over these large networks encounters many challenges which can be alleviated by adopting a suitable sampling approach. The contributions of this paper is twofold. First we study the sampling approaches over diffusion networks, and for the first time, classify these approaches into two catego…
▽ More
The diffusion phenomenon has a remarkable impact on Online Social Networks (OSNs). Gathering diffusion data over these large networks encounters many challenges which can be alleviated by adopting a suitable sampling approach. The contributions of this paper is twofold. First we study the sampling approaches over diffusion networks, and for the first time, classify these approaches into two categories; (1) Structure-based Sampling (SBS), and (2) Diffusion-based Sampling (DBS). The dependency of the former approach to topological features of the network, and unavailability of real diffusion paths in the latter, converts the problem of choosing an appropriate sampling approach to a trade-off. Second, we formally define the diffusion network sampling problem and propose a number of new diffusion-based characteristics to evaluate introduced sampling approaches. Our experiments on large scale synthetic and real datasets show that although DBS performs much better than SBS in higher sampling rates (16% ~ 29% on average), their performances differ about 7% in lower sampling rates. Therefore, in real large scale systems with low sampling rate requirements, SBS would be a better choice according to its lower time complexity in gathering data compared to DBS. Moreover, we show that the introduced sampling approaches (SBS and DBS) play a more important role than the graph exploration techniques such as Breadth-First Search (BFS) and Random Walk (RW) in the analysis of diffusion processes.
△ Less
Submitted 28 May, 2014;
originally announced May 2014.
-
A Measurement Framework for Directed Networks
Authors:
Mostafa Salehi,
Hamid R. Rabiee
Abstract:
Partially-observed network data collected by link-tracing based sampling methods is often being studied to obtain the characteristics of a large complex network. However, little attention has been paid to sampling from directed networks such as WWW and Peer-to-Peer networks. In this paper, we propose a novel two-step (sampling/estimation) framework to measure nodal characteristics which can be def…
▽ More
Partially-observed network data collected by link-tracing based sampling methods is often being studied to obtain the characteristics of a large complex network. However, little attention has been paid to sampling from directed networks such as WWW and Peer-to-Peer networks. In this paper, we propose a novel two-step (sampling/estimation) framework to measure nodal characteristics which can be defined by an average target function in an arbitrary directed network. To this end, we propose a personalized PageRank-based algorithm to visit and sample nodes. This algorithm only uses already visited nodes as local information without any prior knowledge about the latent structure of the network. Moreover, we introduce a new estimator based on the approximate importance sampling to estimate average target functions. The proposed estimator utilizes calculated PageRank value of each sampled node as an approximation for the exact visiting probability. To the best of our knowledge, this is the first study on correcting the bias of a sampling method by re-weighting of measured values that considers the effect of approximation of visiting probabilities. Comprehensive theoretical and empirical analysis of the estimator demonstrate that it is asymptotically unbiased even in situations where stationary distribution of PageRank is poorly approximated.
△ Less
Submitted 27 May, 2014;
originally announced May 2014.
-
Patchwise Joint Sparse Tracking with Occlusion Detection
Authors:
Ali Zarezade,
Hamid R. Rabiee,
Ali Soltani-Farani,
Ahmad Khajenezhad
Abstract:
This paper presents a robust tracking approach to handle challenges such as occlusion and appearance change. Here, the target is partitioned into a number of patches. Then, the appearance of each patch is modeled using a dictionary composed of corresponding target patches in previous frames. In each frame, the target is found among a set of candidates generated by a particle filter, via a likeliho…
▽ More
This paper presents a robust tracking approach to handle challenges such as occlusion and appearance change. Here, the target is partitioned into a number of patches. Then, the appearance of each patch is modeled using a dictionary composed of corresponding target patches in previous frames. In each frame, the target is found among a set of candidates generated by a particle filter, via a likelihood measure that is shown to be proportional to the sum of patch-reconstruction errors of each candidate. Since the target's appearance often changes slowly in a video sequence, it is assumed that the target in the current frame and the best candidates of a small number of previous frames, belong to a common subspace. This is imposed using joint sparse representation to enforce the target and previous best candidates to have a common sparsity pattern. Moreover, an occlusion detection scheme is proposed that uses patch-reconstruction errors and a prior probability of occlusion, extracted from an adaptive Markov chain, to calculate the probability of occlusion per patch. In each frame, occluded patches are excluded when updating the dictionary. Extensive experimental results on several challenging sequences shows that the proposed method outperforms state-of-the-art trackers.
△ Less
Submitted 5 February, 2014;
originally announced February 2014.
-
Local Similarities, Global Coding: An Algorithm for Feature Coding and its Applications
Authors:
Amirreza Shaban,
Hamid R. Rabiee,
Mahyar Najibi
Abstract:
Data coding as a building block of several image processing algorithms has been received great attention recently. Indeed, the importance of the locality assumption in coding approaches is studied in numerous works and several methods are proposed based on this concept. We probe this assumption and claim that taking the similarity between a data point and a more global set of anchor points does no…
▽ More
Data coding as a building block of several image processing algorithms has been received great attention recently. Indeed, the importance of the locality assumption in coding approaches is studied in numerous works and several methods are proposed based on this concept. We probe this assumption and claim that taking the similarity between a data point and a more global set of anchor points does not necessarily weaken the coding method as long as the underlying structure of the anchor points are taken into account. Based on this fact, we propose to capture this underlying structure by assuming a random walker over the anchor points. We show that our method is a fast approximate learning algorithm based on the diffusion map kernel. The experiments on various datasets show that making different state-of-the-art coding algorithms aware of this structure boosts them in different learning tasks.
△ Less
Submitted 5 March, 2014; v1 submitted 23 November, 2013;
originally announced November 2013.
-
Spatial-Aware Dictionary Learning for Hyperspectral Image Classification
Authors:
Ali Soltani-Farani,
Hamid R. Rabiee,
Seyyed Abbas Hosseini
Abstract:
This paper presents a structured dictionary-based model for hyperspectral data that incorporates both spectral and contextual characteristics of a spectral sample, with the goal of hyperspectral image classification. The idea is to partition the pixels of a hyperspectral image into a number of spatial neighborhoods called contextual groups and to model each pixel with a linear combination of a few…
▽ More
This paper presents a structured dictionary-based model for hyperspectral data that incorporates both spectral and contextual characteristics of a spectral sample, with the goal of hyperspectral image classification. The idea is to partition the pixels of a hyperspectral image into a number of spatial neighborhoods called contextual groups and to model each pixel with a linear combination of a few dictionary elements learned from the data. Since pixels inside a contextual group are often made up of the same materials, their linear combinations are constrained to use common elements from the dictionary. To this end, dictionary learning is carried out with a joint sparse regularizer to induce a common sparsity pattern in the sparse coefficients of each contextual group. The sparse coefficients are then used for classification using a linear SVM. Experimental results on a number of real hyperspectral images confirm the effectiveness of the proposed representation for hyperspectral image classification. Moreover, experiments with simulated multispectral data show that the proposed model is capable of finding representations that may effectively be used for classification of multispectral-resolution samples.
△ Less
Submitted 6 August, 2013;
originally announced August 2013.
-
Incorporating Betweenness Centrality in Compressive Sensing for Congestion Detection
Authors:
Hoda S. Ayatollahi Tabatabaii,
Hamid R. Rabiee,
Mohammad Hossein Rohban,
Mostafa Salehi
Abstract:
This paper presents a new Compressive Sensing (CS) scheme for detecting network congested links. We focus on decreasing the required number of measurements to detect all congested links in the context of network tomography. We have expanded the LASSO objective function by adding a new term corresponding to the prior knowledge based on the relationship between the congested links and the correspond…
▽ More
This paper presents a new Compressive Sensing (CS) scheme for detecting network congested links. We focus on decreasing the required number of measurements to detect all congested links in the context of network tomography. We have expanded the LASSO objective function by adding a new term corresponding to the prior knowledge based on the relationship between the congested links and the corresponding link Betweenness Centrality (BC). The accuracy of the proposed model is verified by simulations on two real datasets. The results demonstrate that our model outperformed the state-of-the-art CS based method with significant improvements in terms of F-Score.
△ Less
Submitted 22 January, 2013;
originally announced January 2013.
-
Crowd Labeling: a survey
Authors:
Jafar Muhammadi,
Hamid Reza Rabiee,
Abbas Hosseini
Abstract:
Recently, there has been a burst in the number of research projects on human computation via crowdsourcing. Multiple choice (or labeling) questions could be referred to as a common type of problem which is solved by this approach. As an application, crowd labeling is applied to find true labels for large machine learning datasets. Since crowds are not necessarily experts, the labels they provide a…
▽ More
Recently, there has been a burst in the number of research projects on human computation via crowdsourcing. Multiple choice (or labeling) questions could be referred to as a common type of problem which is solved by this approach. As an application, crowd labeling is applied to find true labels for large machine learning datasets. Since crowds are not necessarily experts, the labels they provide are rather noisy and erroneous. This challenge is usually resolved by collecting multiple labels for each sample, and then aggregating them to estimate the true label. Although the mechanism leads to high-quality labels, it is not actually cost-effective. As a result, efforts are currently made to maximize the accuracy in estimating true labels, while fixing the number of acquired labels.
This paper surveys methods to aggregate redundant crowd labels in order to estimate unknown true labels. It presents a unified statistical latent model where the differences among popular methods in the field correspond to different choices for the parameters of the model. Afterwards, algorithms to make inference on these models will be surveyed. Moreover, adaptive methods which iteratively collect labels based on the previously collected labels and estimated models will be discussed. In addition, this paper compares the distinguished methods, and provides guidelines for future work required to address the current open issues.
△ Less
Submitted 3 September, 2014; v1 submitted 13 January, 2013;
originally announced January 2013.