Skip to main content

Showing 1–21 of 21 results for author: Ting, M

  1. arXiv:2403.10802  [pdf, other

    cs.LG

    Anomaly Detection Based on Isolation Mechanisms: A Survey

    Authors: Yang Cao, Haolong Xiang, Hang Zhang, Ye Zhu, Kai Ming Ting

    Abstract: Anomaly detection is a longstanding and active research area that has many applications in domains such as finance, security, and manufacturing. However, the efficiency and performance of anomaly detection algorithms are challenged by the large-scale, high-dimensional, and heterogeneous data that are prevalent in the era of big data. Isolation-based unsupervised anomaly detection is a novel and ef… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  2. arXiv:2310.05123  [pdf, other

    cs.AI

    Distribution-Based Trajectory Clustering

    Authors: Zi Jing Wang, Ye Zhu, Kai Ming Ting

    Abstract: Trajectory clustering enables the discovery of common patterns in trajectory data. Current methods of trajectory clustering rely on a distance measure between two points in order to measure the dissimilarity between two trajectories. The distance measures employed have two challenges: high computational cost and low fidelity. Independent of the distance measure employed, existing clustering algori… ▽ More

    Submitted 30 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

  3. arXiv:2307.03930  [pdf, other

    cs.LG cs.AR cs.PF cs.PL

    Rosko: Row Skipping Outer Products for Sparse Matrix Multiplication Kernels

    Authors: Vikas Natesh, Andrew Sabot, H. T. Kung, Mark Ting

    Abstract: We propose Rosko -- row skipping outer products -- for deriving sparse matrix multiplication (SpMM) kernels in reducing computation and memory access requirements of deep neural networks (DNNs). Rosko allows skipping of entire row computations during program execution with low sparsity-management overheads. We analytically derive sparse CPU kernels that adapt to given hardware characteristics to e… ▽ More

    Submitted 8 July, 2023; originally announced July 2023.

    Comments: Rosko's CPU implementation can be found at https://github.com/vnatesh/Rosko

  4. arXiv:2301.06794  [pdf, other

    cs.LG cs.SI

    Subgraph Centralization: A Necessary Step for Graph Anomaly Detection

    Authors: Zhong Zhuang, Kai Ming Ting, Guansong Pang, Shuaibin Song

    Abstract: Graph anomaly detection has attracted a lot of interest recently. Despite their successes, existing detectors have at least two of the three weaknesses: (a) high computational cost which limits them to small-scale networks only; (b) existing treatment of subgraphs produces suboptimal detection accuracy; and (c) unable to provide an explanation as to why a node is anomalous, once it is identified.… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

    Comments: To be published in SDM2023

  5. arXiv:2301.00393  [pdf, other

    cs.LG

    A principled distributional approach to trajectory similarity measurement

    Authors: Yufan Wang, Kai Ming Ting, Yuanyi Shang

    Abstract: Existing measures and representations for trajectories have two longstanding fundamental shortcomings, i.e., they are computationally expensive and they can not guarantee the `uniqueness' property of a distance function: dist(X,Y) = 0 if and only if X=Y, where $X$ and $Y$ are two trajectories. This paper proposes a simple yet powerful way to represent trajectories and measure the similarity betwee… ▽ More

    Submitted 1 January, 2023; originally announced January 2023.

  6. Detecting Change Intervals with Isolation Distributional Kernel

    Authors: Yang Cao, Ye Zhu, Kai Ming Ting, Flora D. Salim, Hong Xian Li, Luxing Yang, Gang Li

    Abstract: Detecting abrupt changes in data distribution is one of the most significant tasks in streaming data analysis. Although many unsupervised Change-Point Detection (CPD) methods have been proposed recently to identify those changes, they still suffer from missing subtle changes, poor scalability, or/and sensitivity to outliers. To meet these challenges, we are the first to generalise the CPD problem… ▽ More

    Submitted 18 January, 2024; v1 submitted 30 December, 2022; originally announced December 2022.

    Journal ref: Journal of Artificial Intelligence Research, 2024, 79: 273-306

  7. arXiv:2109.14198  [pdf, other

    cs.LG

    Breaking the curse of dimensionality with Isolation Kernel

    Authors: Kai Ming Ting, Takashi Washio, Ye Zhu, Yang Xu

    Abstract: The curse of dimensionality has been studied in different aspects. However, breaking the curse has been elusive. We show for the first time that it is possible to break the curse using the recently introduced Isolation Kernel. We show that only Isolation Kernel performs consistently well in indexed search, spectral & density peaks clustering, SVM classification and t-SNE visualization in both low… ▽ More

    Submitted 29 September, 2021; originally announced September 2021.

  8. The Impact of Isolation Kernel on Agglomerative Hierarchical Clustering Algorithms

    Authors: Xin Han, Ye Zhu, Kai Ming Ting, Gang Li

    Abstract: Agglomerative hierarchical clustering (AHC) is one of the popular clustering approaches. Existing AHC methods, which are based on a distance measure, have one key issue: it has difficulty in identifying adjacent clusters with varied densities, regardless of the cluster extraction methods applied on the resultant dendrogram. In this paper, we identify the root cause of this issue and show that the… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Journal ref: Han, X., Zhu, Y., Ting, K. M., & Li, G. (2023). The impact of isolation kernel on agglomerative hierarchical clustering algorithms. Pattern Recognition, 139, 109517

  9. arXiv:2009.12196  [pdf, other

    cs.LG stat.ML

    Isolation Distributional Kernel: A New Tool for Point & Group Anomaly Detection

    Authors: Kai Ming Ting, Bi-Cun Xu, Takashi Washio, Zhi-Hua Zhou

    Abstract: We introduce Isolation Distributional Kernel as a new way to measure the similarity between two distributions. Existing approaches based on kernel mean embedding, which convert a point kernel to a distributional kernel, have two key issues: the point kernel employed has a feature map with intractable dimensionality; and it is {\em data independent}. This paper shows that Isolation Distributional K… ▽ More

    Submitted 24 September, 2020; originally announced September 2020.

    Comments: 14 pages

  10. arXiv:2004.13550   

    cs.LG stat.ML

    A new effective and efficient measure for outlying aspect mining

    Authors: Durgesh Samariya, Sunil Aryal, Kai Ming Ting

    Abstract: Outlying Aspect Mining (OAM) aims to find the subspaces (a.k.a. aspects) in which a given query is an outlier with respect to a given dataset. Existing OAM algorithms use traditional distance/density-based outlier scores to rank subspaces. Because these distance/density-based scores depend on the dimensionality of subspaces, they cannot be compared directly between subspaces of different dimension… ▽ More

    Submitted 26 May, 2020; v1 submitted 28 April, 2020; originally announced April 2020.

    Comments: Co-authors are not agree with submission of paper on arxiv

  11. arXiv:2002.05815  [pdf, other

    cs.LG stat.ML

    Point-Set Kernel Clustering

    Authors: Kai Ming Ting, Jonathan R. Wells, Ye Zhu

    Abstract: Measuring similarity between two objects is the core operation in existing clustering algorithms in grouping similar objects into clusters. This paper introduces a new similarity measure called point-set kernel which computes the similarity between an object and a set of objects. The proposed clustering procedure utilizes this new measure to characterize every cluster grown from a seed object. We… ▽ More

    Submitted 6 January, 2022; v1 submitted 13 February, 2020; originally announced February 2020.

    Comments: Updated the paper

  12. arXiv:1907.01104  [pdf, other

    cs.LG stat.ML

    Isolation Kernel: The X Factor in Efficient and Effective Large Scale Online Kernel Learning

    Authors: Kai Ming Ting, Jonathan R. Wells, Takashi Washio

    Abstract: Large scale online kernel learning aims to build an efficient and scalable kernel-based predictive model incrementally from a sequence of potentially infinite data points. A current key approach focuses on ways to produce an approximate finite-dimensional feature map, assuming that the kernel used has a feature map with intractable dimensionality---an assumption traditionally held in kernel-based… ▽ More

    Submitted 24 September, 2019; v1 submitted 1 July, 2019; originally announced July 2019.

    Comments: Textural updates. Restructured section 8.4 including additional experimental results

  13. Nearest-Neighbour-Induced Isolation Similarity and its Impact on Density-Based Clustering

    Authors: Xiaoyu Qin, Kai Ming Ting, Ye Zhu, Vincent CS Lee

    Abstract: A recent proposal of data dependent similarity called Isolation Kernel/Similarity has enabled SVM to produce better classification accuracy. We identify shortcomings of using a tree method to implement Isolation Similarity; and propose a nearest neighbour method instead. We formally prove the characteristic of Isolation Similarity with the use of the proposed method. The impact of Isolation Simila… ▽ More

    Submitted 30 June, 2019; originally announced July 2019.

    Journal ref: Qin, Xiaoyu, et al. "Nearest-neighbour-induced isolation similarity and its impact on density-based clustering." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. No. 01. 2019

  14. arXiv:1906.09744  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Improving the Effectiveness and Efficiency of Stochastic Neighbour Embedding with Isolation Kernel

    Authors: Ye Zhu, Kai Ming Ting

    Abstract: This paper presents a new insight into improving the performance of Stochastic Neighbour Embedding (t-SNE) by using Isolation kernel instead of Gaussian kernel. Isolation kernel outperforms Gaussian kernel in two aspects. First, the use of Isolation kernel in t-SNE overcomes the drawback of misrepresenting some structures in the data, which often occurs when Gaussian kernel is applied in t-SNE. Th… ▽ More

    Submitted 8 July, 2021; v1 submitted 24 June, 2019; originally announced June 2019.

    Journal ref: Zhu, Y., & Ting, K. M. (2021). Improving the effectiveness and efficiency of stochastic neighbour embedding with isolation kernel. Journal of Artificial Intelligence Research, 71, 667-695

  15. arXiv:1902.03402  [pdf, ps, other

    cs.IR

    A new simple and effective measure for bag-of-word inter-document similarity measurement

    Authors: Sunil Aryal, Kai Ming Ting, Takashi Washio, Gholamreza Haffari

    Abstract: To measure the similarity of two documents in the bag-of-words (BoW) vector representation, different term weighting schemes are used to improve the performance of cosine similarity---the most widely used inter-document similarity measure in text mining. In this paper, we identify the shortcomings of the underlying assumptions of term weighting in the inter-document similarity measurement task; an… ▽ More

    Submitted 9 February, 2019; originally announced February 2019.

  16. arXiv:1810.03393  [pdf, other

    cs.LG cs.AI stat.ML

    Hierarchical clustering that takes advantage of both density-peak and density-connectivity

    Authors: Ye Zhu, Kai Ming Ting, Yuan Jin, Maia Angelova

    Abstract: This paper focuses on density-based clustering, particularly the Density Peak (DP) algorithm and the one based on density-connectivity DBSCAN; and proposes a new method which takes advantage of the individual strengths of these two methods to yield a density-based hierarchical clustering algorithm. Our investigation begins with formally defining the types of clusters DP and DBSCAN are designed to… ▽ More

    Submitted 20 September, 2021; v1 submitted 8 October, 2018; originally announced October 2018.

    Journal ref: Zhu, Y., Ting, K. M., Jin, Y., & Angelova, M. (2022). Hierarchical clustering that takes advantage of both density-peak and density-connectivity. Information Systems, 103, 101871

  17. arXiv:1810.02897  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    CDF Transform-and-Shift: An effective way to deal with datasets of inhomogeneous cluster densities

    Authors: Ye Zhu, Kai Ming Ting, Mark Carman, Maia Angelova

    Abstract: The problem of inhomogeneous cluster densities has been a long-standing issue for distance-based and density-based algorithms in clustering and anomaly detection. These algorithms implicitly assume that all clusters have approximately the same density. As a result, they often exhibit a bias towards dense clusters in the presence of sparse clusters. Many remedies have been suggested; yet, we show t… ▽ More

    Submitted 12 April, 2021; v1 submitted 5 October, 2018; originally announced October 2018.

    Comments: Pattern Recognition (2021)

    Journal ref: Zhu Y, Ting K M, Carman M J, et al. CDF Transform-and-Shift: An effective way to deal with datasets of inhomogeneous cluster densities. Pattern Recognition, 2021, 117: 107977

  18. arXiv:1809.01245  [pdf, other

    cs.GT

    Maximizing net income of the auction waterfall with an abort decision tree

    Authors: Michael Ting, Nicolas Grislain

    Abstract: An online auction waterfall for an ad impression may contain auctions that are unlikely to result in a winning bid. Instead of always running through the full auction sequence, one could reduce the transaction cost by predicting and skipping these auctions. In this paper, we derive the auction abort rule that maximizes the net income of the waterfall under certain conditions, knowing only the publ… ▽ More

    Submitted 4 September, 2018; originally announced September 2018.

    Comments: 4 pages, 2 figures

  19. arXiv:1707.00783  [pdf, other

    cs.LG stat.ML

    A simple efficient density estimator that enables fast systematic search

    Authors: Jonathan R. Wells, Kai Ming Ting

    Abstract: This paper introduces a simple and efficient density estimator that enables fast systematic search. To show its advantage over commonly used kernel density estimator, we apply it to outlying aspects mining. Outlying aspects mining discovers feature subsets (or subspaces) that describe how a query stand out from a given dataset. The task demands a systematic search of subspaces. We identify that ex… ▽ More

    Submitted 12 September, 2017; v1 submitted 3 July, 2017; originally announced July 2017.

    Comments: Corrected typos in the reference section and added an acknowledgement on the first page

  20. arXiv:1605.09131  [pdf, ps, other

    cs.LG

    Classification under Streaming Emerging New Classes: A Solution using Completely Random Trees

    Authors: Xin Mu, Kai Ming Ting, Zhi-Hua Zhou

    Abstract: This paper investigates an important problem in stream mining, i.e., classification under streaming emerging new classes or SENC. The common approach is to treat it as a classification problem and solve it using either a supervised learner or a semi-supervised learner. We propose an alternative approach by using unsupervised learning as the basis to solve this problem. The SENC problem can be deco… ▽ More

    Submitted 30 May, 2016; originally announced May 2016.

  21. Issues in Stacked Generalization

    Authors: K. M. Ting, I. H. Witten

    Abstract: Stacked generalization is a general method of using a high-level model to combine lower-level models to achieve greater predictive accuracy. In this paper we address two crucial issues which have been considered to be a `black art' in classification tasks ever since the introduction of stacked generalization in 1992 by Wolpert: the type of generalizer that is suitable to derive the… ▽ More

    Submitted 26 May, 2011; originally announced May 2011.

    Journal ref: Journal Of Artificial Intelligence Research, Volume 10, pages 271-289, 1999