subscribe to arXiv mailings

EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy Network

Authors: Yining Shi, Kun Jiang, Ke Wang, Kangan Qian, Yunlong Wang, Jiusi Li, Tuopu Wen, Mengmeng Yang, Yiliang Xu, Diange Yang

Abstract: 3D occupancy prediction (Occ) is a rapidly rising challenging perception task in the field of autonomous driving which represents the driving scene as uniformly partitioned 3D voxel grids with semantics. Compared to 3D object detection, grid perception has great advantage of better recognizing irregularly shaped, unknown category, or partially occluded general objects. However, existing 3D occupan… ▽ More 3D occupancy prediction (Occ) is a rapidly rising challenging perception task in the field of autonomous driving which represents the driving scene as uniformly partitioned 3D voxel grids with semantics. Compared to 3D object detection, grid perception has great advantage of better recognizing irregularly shaped, unknown category, or partially occluded general objects. However, existing 3D occupancy networks (occnets) are both computationally heavy and label-hungry. In terms of model complexity, occnets are commonly composed of heavy Conv3D modules or transformers on the voxel level. In terms of label annotations requirements, occnets are supervised with large-scale expensive dense voxel labels. Model and data inefficiency, caused by excessive network parameters and label annotations requirement, severely hinder the onboard deployment of occnets. This paper proposes an efficient 3d occupancy network (EFFOcc), that targets the minimal network complexity and label requirement while achieving state-of-the-art accuracy. EFFOcc only uses simple 2D operators, and improves Occ accuracy to the state-of-the-art on multiple large-scale benchmarks: Occ3D-nuScenes, Occ3D-Waymo, and OpenOccupancy-nuScenes. On Occ3D-nuScenes benchmark, EFFOcc has only 18.4M parameters, and achieves 50.46 in terms of mean IoU (mIoU), to our knowledge, it is the occnet with minimal parameters compared with related occnets. Moreover, we propose a two-stage active learning strategy to reduce the requirements of labelled data. Active EFFOcc trained with 6\% labelled voxels achieves 47.19 mIoU, which is 95.7% fully supervised performance. The proposed EFFOcc also supports improved vision-only occupancy prediction with the aid of region-decomposed distillation. Code and demo videos will be available at https://github.com/synsin0/EFFOcc. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: preprint under review

arXiv:2405.02008 [pdf, other]

DiffMap: Enhancing Map Segmentation with Map Prior Using Diffusion Model

Authors: Peijin Jia, Tuopu Wen, Ziang Luo, Mengmeng Yang, Kun Jiang, Zhiquan Lei, Xuewei Tang, Ziyuan Liu, Le Cui, Kehua Sheng, Bo Zhang, Diange Yang

Abstract: Constructing high-definition (HD) maps is a crucial requirement for enabling autonomous driving. In recent years, several map segmentation algorithms have been developed to address this need, leveraging advancements in Bird's-Eye View (BEV) perception. However, existing models still encounter challenges in producing realistic and consistent semantic map layouts. One prominent issue is the limited… ▽ More Constructing high-definition (HD) maps is a crucial requirement for enabling autonomous driving. In recent years, several map segmentation algorithms have been developed to address this need, leveraging advancements in Bird's-Eye View (BEV) perception. However, existing models still encounter challenges in producing realistic and consistent semantic map layouts. One prominent issue is the limited utilization of structured priors inherent in map segmentation masks. In light of this, we propose DiffMap, a novel approach specifically designed to model the structured priors of map segmentation masks using latent diffusion model. By incorporating this technique, the performance of existing semantic segmentation methods can be significantly enhanced and certain structural errors present in the segmentation outputs can be effectively rectified. Notably, the proposed module can be seamlessly integrated into any map segmentation model, thereby augmenting its capability to accurately delineate semantic information. Furthermore, through extensive visualization analysis, our model demonstrates superior proficiency in generating results that more accurately reflect real-world map layouts, further validating its efficacy in improving the quality of the generated maps. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2404.18392 [pdf, other]

Dflow, a Python framework for constructing cloud-native AI-for-Science workflows

Authors: Xinzijian Liu, Yanbo Han, Zhuoyuan Li, Jiahao Fan, Chengqian Zhang, Jinzhe Zeng, Yifan Shan, Yannan Yuan, Wei-Hong Xu, Yun-Pei Liu, Yuzhi Zhang, Tongqi Wen, Darrin M. York, Zhicheng Zhong, Hang Zheng, Jun Cheng, Linfeng Zhang, Han Wang

Abstract: In the AI-for-science era, scientific computing scenarios such as concurrent learning and high-throughput computing demand a new generation of infrastructure that supports scalable computing resources and automated workflow management on both cloud and high-performance supercomputers. Here we introduce Dflow, an open-source Python toolkit designed for scientists to construct workflows with simple… ▽ More In the AI-for-science era, scientific computing scenarios such as concurrent learning and high-throughput computing demand a new generation of infrastructure that supports scalable computing resources and automated workflow management on both cloud and high-performance supercomputers. Here we introduce Dflow, an open-source Python toolkit designed for scientists to construct workflows with simple programming interfaces. It enables complex process control and task scheduling across a distributed, heterogeneous infrastructure, leveraging containers and Kubernetes for flexibility. Dflow is highly observable and can scale to thousands of concurrent nodes per workflow, enhancing the efficiency of complex scientific computing tasks. The basic unit in Dflow, known as an Operation (OP), is reusable and independent of the underlying infrastructure or context. Dozens of workflow projects have been developed based on Dflow, spanning a wide range of projects. We anticipate that the reusability of Dflow and its components will encourage more scientists to publish their workflows and OP components. These components, in turn, can be adapted and reused in various contexts, fostering greater collaboration and innovation in the scientific community. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.06687 [pdf, other]

Fast and Accurate Relative Motion Tracking for Two Industrial Robots

Authors: Honglu He, Chen-lung Lu, Glenn Saunders, Pinghai Yang, Jeffrey Schoonover, John Wason, Santiago Paternain, Agung Julius, John T. Wen

Abstract: Industrial robotic applications such as spraying, welding, and additive manufacturing frequently require fast, accurate, and uniform motion along a 3D spatial curve. To increase process throughput, some manufacturers propose a dual-robot setup to overcome the speed limitation of a single robot. Industrial robot motion is programmed through waypoints connected by motion primitives (Cartesian linear… ▽ More Industrial robotic applications such as spraying, welding, and additive manufacturing frequently require fast, accurate, and uniform motion along a 3D spatial curve. To increase process throughput, some manufacturers propose a dual-robot setup to overcome the speed limitation of a single robot. Industrial robot motion is programmed through waypoints connected by motion primitives (Cartesian linear and circular paths and linear joint paths at constant Cartesian speed). The actual robot motion is affected by the blending between these motion primitives and the pose of the robot (an outstretched/close to singularity pose tends to have larger path-tracking errors). Choosing the waypoints and the speed along each motion segment to achieve the performance requirement is challenging. At present, there is no automated solution, and laborious manual tuning by robot experts is needed to approach the desired performance. In this paper, we present a systematic three-step approach to designing and programming a dual-robot system to optimize system performance. The first step is to select the relative placement between the two robots based on the specified relative motion path. The second step is to select the relative waypoints and the motion primitives. The final step is to update the waypoints iteratively based on the actual relative motion. Waypoint iteration is first executed in simulation and then completed using the actual robots. For performance measures, we use the mean path speed subject to the relative position and orientation constraints and the path speed uniformity constraint. We have demonstrated the effectiveness of this method with ABB and FANUC robots on two challenging test curves. The performance improvement over the current industrial practice baseline is over 300%. Compared to the optimized single-arm case that we have previously reported, the improvement is over 14%. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2401.12007 [pdf, other]

Tensor-view Topological Graph Neural Network

Authors: Tao Wen, Elynn Chen, Yuzhou Chen

Abstract: Graph classification is an important learning task for graph-structured data. Graph neural networks (GNNs) have recently gained growing attention in graph learning and have shown significant improvements in many important graph problems. Despite their state-of-the-art performances, existing GNNs only use local information from a very limited neighborhood around each node, suffering from loss of mu… ▽ More Graph classification is an important learning task for graph-structured data. Graph neural networks (GNNs) have recently gained growing attention in graph learning and have shown significant improvements in many important graph problems. Despite their state-of-the-art performances, existing GNNs only use local information from a very limited neighborhood around each node, suffering from loss of multi-modal information and overheads of excessive computation. To address these issues, we propose a novel Tensor-view Topological Graph Neural Network (TTG-NN), a class of simple yet effective topological deep learning built upon persistent homology, graph convolution, and tensor operations. This new method incorporates tensor learning to simultaneously capture Tensor-view Topological (TT), as well as Tensor-view Graph (TG) structural information on both local and global levels. Computationally, to fully exploit graph topology and structure, we propose two flexible TT and TG representation learning modules that disentangle feature tensor aggregation and transformation and learn to preserve multi-modal structure with less computation. Theoretically, we derive high probability bounds on both the out-of-sample and in-sample mean squared approximation errors for our proposed Tensor Transformation Layer (TTL). Real data experiments show that the proposed TTG-NN outperforms 20 state-of-the-art methods on various graph benchmarks. △ Less

Submitted 29 January, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

Comments: Accepted at AISTATS 2024

arXiv:2401.03685 [pdf, other]

Logits Poisoning Attack in Federated Distillation

Authors: Yuhan Tang, Zhiyuan Wu, Bo Gao, Tian Wen, Yuwei Wang, Sheng Sun

Abstract: Federated Distillation (FD) is a novel and promising distributed machine learning paradigm, where knowledge distillation is leveraged to facilitate a more efficient and flexible cross-device knowledge transfer in federated learning. By optimizing local models with knowledge distillation, FD circumvents the necessity of uploading large-scale model parameters to the central server, simultaneously pr… ▽ More Federated Distillation (FD) is a novel and promising distributed machine learning paradigm, where knowledge distillation is leveraged to facilitate a more efficient and flexible cross-device knowledge transfer in federated learning. By optimizing local models with knowledge distillation, FD circumvents the necessity of uploading large-scale model parameters to the central server, simultaneously preserving the raw data on local clients. Despite the growing popularity of FD, there is a noticeable gap in previous works concerning the exploration of poisoning attacks within this framework. This can lead to a scant understanding of the vulnerabilities to potential adversarial actions. To this end, we introduce FDLA, a poisoning attack method tailored for FD. FDLA manipulates logit communications in FD, aiming to significantly degrade model performance on clients through misleading the discrimination of private samples. Through extensive simulation experiments across a variety of datasets, attack scenarios, and FD configurations, we demonstrate that LPA effectively compromises client model accuracy, outperforming established baseline algorithms in this regard. Our findings underscore the critical need for robust defense mechanisms in FD settings to mitigate such adversarial threats. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: 13 pages, 3 figures, 5 tables

arXiv:2312.04166 [pdf, other]

Improving Communication Efficiency of Federated Distillation via Accumulating Local Updates

Authors: Zhiyuan Wu, Sheng Sun, Yuwei Wang, Min Liu, Tian Wen, Wen Wang

Abstract: As an emerging federated learning paradigm, federated distillation enables communication-efficient model training by transmitting only small-scale knowledge during the learning process. To further improve the communication efficiency of federated distillation, we propose a novel technique, ALU, which accumulates multiple rounds of local updates before transferring the knowledge to the central serv… ▽ More As an emerging federated learning paradigm, federated distillation enables communication-efficient model training by transmitting only small-scale knowledge during the learning process. To further improve the communication efficiency of federated distillation, we propose a novel technique, ALU, which accumulates multiple rounds of local updates before transferring the knowledge to the central server. ALU drastically decreases the frequency of communication in federated distillation, thereby significantly reducing the communication overhead during the training process. Empirical experiments demonstrate the substantial effect of ALU in improving the communication efficiency of federated distillation. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: 2 pages, 3 figures

arXiv:2311.16410 [pdf, other]

Reduced-order modeling for parameterized PDEs via implicit neural representations

Authors: Tianshu Wen, Kookjin Lee, Youngsoo Choi

Abstract: We present a new data-driven reduced-order modeling approach to efficiently solve parametrized partial differential equations (PDEs) for many-query problems. This work is inspired by the concept of implicit neural representation (INR), which models physics signals in a continuous manner and independent of spatial/temporal discretization. The proposed framework encodes PDE and utilizes a parametriz… ▽ More We present a new data-driven reduced-order modeling approach to efficiently solve parametrized partial differential equations (PDEs) for many-query problems. This work is inspired by the concept of implicit neural representation (INR), which models physics signals in a continuous manner and independent of spatial/temporal discretization. The proposed framework encodes PDE and utilizes a parametrized neural ODE (PNODE) to learn latent dynamics characterized by multiple PDE parameters. PNODE can be inferred by a hypernetwork to reduce the potential difficulties in learning PNODE due to a complex multilayer perceptron (MLP). The framework uses an INR to decode the latent dynamics and reconstruct accurate PDE solutions. Further, a physics-informed loss is also introduced to correct the prediction of unseen parameter instances. Incorporating the physics-informed loss also enables the model to be fine-tuned in an unsupervised manner on unseen PDE parameters. A numerical experiment is performed on a two-dimensional Burgers equation with a large variation of PDE parameters. We evaluate the proposed method at a large Reynolds number and obtain up to speedup of O(10^3) and ~1% relative error to the ground truth values. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: 9 pages, 5 figures, Machine Learning and the Physical Sciences Workshop, NeurIPS 2023

arXiv:2311.15643 [pdf, other]

A Survey on Monocular Re-Localization: From the Perspective of Scene Map Representation

Authors: Jinyu Miao, Kun Jiang, Tuopu Wen, Yunlong Wang, Peijing Jia, Xuhe Zhao, Qian Cheng, Zhongyang Xiao, Jin Huang, Zhihua Zhong, Diange Yang

Abstract: Monocular Re-Localization (MRL) is a critical component in autonomous applications, estimating 6 degree-of-freedom ego poses w.r.t. the scene map based on monocular images. In recent decades, significant progress has been made in the development of MRL techniques. Numerous algorithms have accomplished extraordinary success in terms of localization accuracy and robustness. In MRL, scene maps are re… ▽ More Monocular Re-Localization (MRL) is a critical component in autonomous applications, estimating 6 degree-of-freedom ego poses w.r.t. the scene map based on monocular images. In recent decades, significant progress has been made in the development of MRL techniques. Numerous algorithms have accomplished extraordinary success in terms of localization accuracy and robustness. In MRL, scene maps are represented in various forms, and they determine how MRL methods work and how MRL methods perform. However, to the best of our knowledge, existing surveys do not provide systematic reviews about the relationship between MRL solutions and their used scene map representation. This survey fills the gap by comprehensively reviewing MRL methods from such a perspective, promoting further research. 1) We commence by delving into the problem definition of MRL, exploring current challenges, and comparing ours with existing surveys. 2) Many well-known MRL methods are categorized and reviewed into five classes according to the representation forms of utilized map, i.e., geo-tagged frames, visual landmarks, point clouds, vectorized semantic map, and neural network-based map. 3) To quantitatively and fairly compare MRL methods with various map, we introduce some public datasets and provide the performances of some state-of-the-art MRL methods. The strengths and weakness of MRL methods with different map are analyzed. 4) We finally introduce some topics of interest in this field and give personal opinions. This survey can serve as a valuable referenced materials for MRL, and a continuously updated summary of this survey is publicly available to the community at: https://github.com/jinyummiao/map-in-mono-reloc. △ Less

Submitted 12 January, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: 33 pages, 10 tables, 16 figures, under review

arXiv:2311.09800 [pdf, other]

$\textit{Dial BeInfo for Faithfulness}$: Improving Factuality of Information-Seeking Dialogue via Behavioural Fine-Tuning

Authors: Evgeniia Razumovskaia, Ivan Vulić, Pavle Marković, Tomasz Cichy, Qian Zheng, Tsung-Hsien Wen, Paweł Budzianowski

Abstract: Factuality is a crucial requirement in information seeking dialogue: the system should respond to the user's queries so that the responses are meaningful and aligned with the knowledge provided to the system. However, most modern large language models suffer from hallucinations, that is, they generate responses not supported by or contradicting the knowledge source. To mitigate the issue and incre… ▽ More Factuality is a crucial requirement in information seeking dialogue: the system should respond to the user's queries so that the responses are meaningful and aligned with the knowledge provided to the system. However, most modern large language models suffer from hallucinations, that is, they generate responses not supported by or contradicting the knowledge source. To mitigate the issue and increase faithfulness of information-seeking dialogue systems, we introduce BeInfo, a simple yet effective method that applies behavioural tuning to aid information-seeking dialogue. Relying on three standard datasets, we show that models tuned with BeInfo} become considerably more faithful to the knowledge source both for datasets and domains seen during BeInfo-tuning, as well as on unseen domains, when applied in a zero-shot manner. In addition, we show that the models with 3B parameters (e.g., Flan-T5) tuned with BeInfo demonstrate strong performance on data from real `production' conversations and outperform GPT4 when tuned on a limited amount of such realistic in-domain dialogues. △ Less

Submitted 4 March, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2307.13122 [pdf, other]

Redundancy parameterization and inverse kinematics of 7-DOF revolute manipulators

Authors: Alexander J. Elias, John T. Wen

Abstract: Seven degree-of-freedom (DOF) robot arms have one redundant DOF which does not change the motion of the end effector. The redundant DOF offers greater manipulability of the arm configuration to avoid obstacles and singularities, but it must be parameterized to fully specify the joint angles for a given end effector pose. For 7-DOF revolute (7R) manipulators, we introduce a new concept of generaliz… ▽ More Seven degree-of-freedom (DOF) robot arms have one redundant DOF which does not change the motion of the end effector. The redundant DOF offers greater manipulability of the arm configuration to avoid obstacles and singularities, but it must be parameterized to fully specify the joint angles for a given end effector pose. For 7-DOF revolute (7R) manipulators, we introduce a new concept of generalized shoulder-elbow-wrist (SEW) angle, a generalization of the conventional SEW angle but with an arbitrary choice of the reference direction function. The SEW angle is widely used and easy for human operators to visualize as a rotation of the elbow about the shoulder-wrist line. Since other redundancy parameterizations including the conventional SEW angle encounter an algorithmic singularity along a line in the workspace, we introduce a special choice of the reference direction function called the stereographic SEW angle which has a singularity only along a half-line, which can be placed out of reach. We prove that such a singularity is unavoidable for any parameterization. We also include expressions for the SEW angle Jacobian along with singularity analysis. Finally, we provide efficient and singularity-robust inverse kinematics solutions for most known 7R manipulators using the general SEW angle and the subproblem decomposition method. These solutions are often closed-form but may sometimes involve a 1D or 2D search in the general case. Search-based solutions may be converted to finding zeros of a high-order polynomial. Inverse kinematics solutions, examples, and evaluations are available in a publicly accessible repository. △ Less

Submitted 20 March, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

Comments: 22 pages, 14 figures. Update: Sawyer IK using polynomial method, two video extensions, expanded related literature

arXiv:2305.13987 [pdf, other]

On Structural Expressive Power of Graph Transformers

Authors: Wenhao Zhu, Tianyu Wen, Guojie Song, Liang Wang, Bo Zheng

Abstract: Graph Transformer has recently received wide attention in the research community with its outstanding performance, yet its structural expressive power has not been well analyzed. Inspired by the connections between Weisfeiler-Lehman (WL) graph isomorphism test and graph neural network (GNN), we introduce \textbf{SEG-WL test} (\textbf{S}tructural \textbf{E}ncoding enhanced \textbf{G}lobal \textbf{W… ▽ More Graph Transformer has recently received wide attention in the research community with its outstanding performance, yet its structural expressive power has not been well analyzed. Inspired by the connections between Weisfeiler-Lehman (WL) graph isomorphism test and graph neural network (GNN), we introduce \textbf{SEG-WL test} (\textbf{S}tructural \textbf{E}ncoding enhanced \textbf{G}lobal \textbf{W}eisfeiler-\textbf{L}ehman test), a generalized graph isomorphism test algorithm as a powerful theoretical tool for exploring the structural discriminative power of graph Transformers. We theoretically prove that the SEG-WL test is an expressivity upper bound on a wide range of graph Transformers, and the representational power of SEG-WL test can be approximated by a simple Transformer network arbitrarily under certain conditions. With the SEG-WL test, we show how graph Transformers' expressive power is determined by the design of structural encodings, and present conditions that make the expressivity of graph Transformers beyond WL test and GNNs. Moreover, motivated by the popular shortest path distance encoding, we follow the theory-oriented principles and develop a provably stronger structural encoding method, Shortest Path Induced Subgraph (\textit{SPIS}) encoding. Our theoretical findings provide a novel and practical paradigm for investigating the expressive power of graph Transformers, and extensive synthetic and real-world experiments empirically verify the strengths of our proposed methods. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2305.13732 [pdf, ps, other]

doi 10.3390/pr11051560

Task Containerization and Container Placement Optimization for MEC: A Joint Communication and Computing Perspective

Authors: Ao Liu, Shaoshi Yang, Jingsheng Tan, Zongze Liang, Jiasen Sun, Tao Wen, Hongyan Yan

Abstract: Containers are used by an increasing number of Internet service providers to deploy their applications in multi-access edge computing (MEC) systems. Although container-based virtualization technologies significantly increase application availability, they may suffer expensive communication overhead and resource use imbalances. However, so far there has been a scarcity of studies to conquer these d… ▽ More Containers are used by an increasing number of Internet service providers to deploy their applications in multi-access edge computing (MEC) systems. Although container-based virtualization technologies significantly increase application availability, they may suffer expensive communication overhead and resource use imbalances. However, so far there has been a scarcity of studies to conquer these difficulties. In this paper, we design a workflow-based mathematical model for applications built upon interdependent multitasking composition, formulate a multi-objective combinatorial optimization problem composed of two subproblems -- graph partitioning and multi-choice vector bin packing, and propose several joint task-containerization-and-container-placement methods to reduce communication overhead and balance multi-type computing resource utilization. The performance superiority of the proposed algorithms is demonstrated by comparison with the state-of-the-art task and container scheduling schemes. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: 11 pages, 8 figures

Journal ref: Processes 2023, 11(5), pp. 1-19, article number: 1560

arXiv:2305.04298 [pdf, other]

Poses as Queries: Image-to-LiDAR Map Localization with Transformers

Authors: Jinyu Miao, Kun Jiang, Yunlong Wang, Tuopu Wen, Zhongyang Xiao, Zheng Fu, Mengmeng Yang, Maolin Liu, Diange Yang

Abstract: High-precision vehicle localization with commercial setups is a crucial technique for high-level autonomous driving tasks. Localization with a monocular camera in LiDAR map is a newly emerged approach that achieves promising balance between cost and accuracy, but estimating pose by finding correspondences between such cross-modal sensor data is challenging, thereby damaging the localization accura… ▽ More High-precision vehicle localization with commercial setups is a crucial technique for high-level autonomous driving tasks. Localization with a monocular camera in LiDAR map is a newly emerged approach that achieves promising balance between cost and accuracy, but estimating pose by finding correspondences between such cross-modal sensor data is challenging, thereby damaging the localization accuracy. In this paper, we address the problem by proposing a novel Transformer-based neural network to register 2D images into 3D LiDAR map in an end-to-end manner. Poses are implicitly represented as high-dimensional feature vectors called pose queries and can be iteratively updated by interacting with the retrieved relevant information from cross-model features using attention mechanism in a proposed POse Estimator Transformer (POET) module. Moreover, we apply a multiple hypotheses aggregation method that estimates the final poses by performing parallel optimization on multiple randomly initialized pose queries to reduce the network uncertainty. Comprehensive analysis and experimental results on public benchmark conclude that the proposed image-to-LiDAR map localization network could achieve state-of-the-art performances in challenging cross-modal localization tasks. △ Less

Submitted 7 May, 2023; originally announced May 2023.

Comments: 8 pages, 3 figures, 4 tables

arXiv:2305.02866 [pdf, other]

Hierarchical Transformer for Scalable Graph Learning

Authors: Wenhao Zhu, Tianyu Wen, Guojie Song, Xiaojun Ma, Liang Wang

Abstract: Graph Transformer is gaining increasing attention in the field of machine learning and has demonstrated state-of-the-art performance on benchmarks for graph representation learning. However, as current implementations of Graph Transformer primarily focus on learning representations of small-scale graphs, the quadratic complexity of the global self-attention mechanism presents a challenge for full-… ▽ More Graph Transformer is gaining increasing attention in the field of machine learning and has demonstrated state-of-the-art performance on benchmarks for graph representation learning. However, as current implementations of Graph Transformer primarily focus on learning representations of small-scale graphs, the quadratic complexity of the global self-attention mechanism presents a challenge for full-batch training when applied to larger graphs. Additionally, conventional sampling-based methods fail to capture necessary high-level contextual information, resulting in a significant loss of performance. In this paper, we introduce the Hierarchical Scalable Graph Transformer (HSGT) as a solution to these challenges. HSGT successfully scales the Transformer architecture to node representation learning tasks on large-scale graphs, while maintaining high performance. By utilizing graph hierarchies constructed through coarsening techniques, HSGT efficiently updates and stores multi-scale information in node embeddings at different levels. Together with sampling-based training methods, HSGT effectively captures and aggregates multi-level information on the hierarchical graph using only Transformer blocks. Empirical evaluations demonstrate that HSGT achieves state-of-the-art performance on large-scale benchmarks with graphs containing millions of nodes with high efficiency. △ Less

Submitted 5 May, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

Comments: 11 pages; 3 figures

arXiv:2304.13338 [pdf, ps, other]

Game-Theoretically Secure Protocols for the Ordinal Random Assignment Problem

Authors: T-H. Hubert Chan, Ting Wen, Hao Xie, Quan Xue

Abstract: We study game-theoretically secure protocols for the classical ordinal assignment problem (aka matching with one-sided preference), in which each player has a total preference order on items. To achieve the fairness notion of equal treatment of equals, conventionally the randomness necessary to resolve conflicts between players is assumed to be generated by some trusted authority. However, in a di… ▽ More We study game-theoretically secure protocols for the classical ordinal assignment problem (aka matching with one-sided preference), in which each player has a total preference order on items. To achieve the fairness notion of equal treatment of equals, conventionally the randomness necessary to resolve conflicts between players is assumed to be generated by some trusted authority. However, in a distributed setting, the mutually untrusted players are responsible for generating the randomness themselves. In addition to standard desirable properties such as fairness and Pareto-efficiency, we investigate the game-theoretic notion of maximin security, which guarantees that an honest player following a protocol will not be harmed even if corrupted players deviate from the protocol. Our main contribution is an impossibility result that shows no maximin secure protocol can achieve both fairness and ordinal efficiency. Specifically, this implies that the well-known probabilistic serial (PS) mechanism by Bogomolnaia and Moulin cannot be realized by any maximin secure protocol. On the other hand, we give a maximin secure protocol that achieves fairness and stability (aka ex-post Pareto-efficiency). Moreover, inspired by the PS mechanism, we show that a variant known as the OnlinePSVar (varying rates) protocol can achieve fairness, stability and uniform dominance, which means that an honest player is guaranteed to receive an item distribution that is at least as good as a uniformly random item. In some sense, this is the best one can hope for in the case when all players have the same preference order. △ Less

Submitted 26 April, 2023; originally announced April 2023.

arXiv:2304.12828 [pdf, other]

A optimization framework for herbal prescription planning based on deep reinforcement learning

Authors: Kuo Yang, Zecong Yu, Xin Su, Xiong He, Ning Wang, Qiguang Zheng, Feidie Yu, Zhuang Liu, Tiancai Wen, Xuezhong Zhou

Abstract: Treatment planning for chronic diseases is a critical task in medical artificial intelligence, particularly in traditional Chinese medicine (TCM). However, generating optimized sequential treatment strategies for patients with chronic diseases in different clinical encounters remains a challenging issue that requires further exploration. In this study, we proposed a TCM herbal prescription plannin… ▽ More Treatment planning for chronic diseases is a critical task in medical artificial intelligence, particularly in traditional Chinese medicine (TCM). However, generating optimized sequential treatment strategies for patients with chronic diseases in different clinical encounters remains a challenging issue that requires further exploration. In this study, we proposed a TCM herbal prescription planning framework based on deep reinforcement learning for chronic disease treatment (PrescDRL). PrescDRL is a sequential herbal prescription optimization model that focuses on long-term effectiveness rather than achieving maximum reward at every step, thereby ensuring better patient outcomes. We constructed a high-quality benchmark dataset for sequential diagnosis and treatment of diabetes and evaluated PrescDRL against this benchmark. Our results showed that PrescDRL achieved a higher curative effect, with the single-step reward improving by 117% and 153% compared to doctors. Furthermore, PrescDRL outperformed the benchmark in prescription prediction, with precision improving by 40.5% and recall improving by 63%. Overall, our study demonstrates the potential of using artificial intelligence to improve clinical intelligent diagnosis and treatment in TCM. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Comments: 13 pages, 4 figures

arXiv:2303.06644 [pdf, other]

Mitigating the Effect of Class Imbalance in Fault Localization Using Context-aware Generative Adversarial Network

Authors: Yan Lei, Tiantian Wen, Huan Xie, Lingfeng Fu, Chunyan Liu, Lei Xu, Hongxia Sun

Abstract: Fault localization (FL) analyzes the execution information of a test suite to pinpoint the root cause of a failure. The class imbalance of a test suite, i.e., the imbalanced class proportion between passing test cases (i.e., majority class) and failing ones (i.e., minority class), adversely affects FL effectiveness. To mitigate the effect of class imbalance in FL, we propose CGAN4FL: a data augmen… ▽ More Fault localization (FL) analyzes the execution information of a test suite to pinpoint the root cause of a failure. The class imbalance of a test suite, i.e., the imbalanced class proportion between passing test cases (i.e., majority class) and failing ones (i.e., minority class), adversely affects FL effectiveness. To mitigate the effect of class imbalance in FL, we propose CGAN4FL: a data augmentation approach using Context-aware Generative Adversarial Network for Fault Localization. Specifically, CGAN4FL uses program dependencies to construct a failure-inducing context showing how a failure is caused. Then, CGAN4FL leverages a generative adversarial network to analyze the failure-inducing context and synthesize the minority class of test cases (i.e., failing test cases). Finally, CGAN4FL augments the synthesized data into original test cases to acquire a class-balanced dataset for FL. Our experiments show that CGAN4FL significantly improves FL effectiveness, e.g., promoting MLP-FL by 200.00%, 25.49%, and 17.81% under the Top-1, Top-5, and Top-10 respectively. △ Less

Submitted 12 March, 2023; originally announced March 2023.

arXiv:2301.02348 [pdf, other]

High-Speed High-Accuracy Spatial Curve Tracking Using Motion Primitives in Industrial Robots

Authors: Honglu He, Chen-lung Lu, Yunshi Wen, Glenn Saunders, Pinghai Yang, Jeffrey Schoonover, Agung Julius, John T. Wen

Abstract: Industrial robots are increasingly deployed in applications requiring an end effector tool to closely track a specified path, such as in spraying and welding. Performance and productivity present possibly conflicting objectives: tracking accuracy, path speed, and motion uniformity. Industrial robots are programmed through motion primitives consisting of waypoints connected by pre-defined motion se… ▽ More Industrial robots are increasingly deployed in applications requiring an end effector tool to closely track a specified path, such as in spraying and welding. Performance and productivity present possibly conflicting objectives: tracking accuracy, path speed, and motion uniformity. Industrial robots are programmed through motion primitives consisting of waypoints connected by pre-defined motion segments, with specified parameters such as path speed and blending zone. The actual executed robot motion depends on the robot joint servo controller and joint motion constraints (velocity, acceleration, etc.) which are largely unknown to the users. Programming a robot to achieve the desired performance today is time-consuming and mostly manual, requiring tuning a large number of coupled parameters in the motion primitives. The performance also depends on the choice of additional parameters: possible redundant degrees of freedom, location of the target curve, and the robot configuration. This paper presents a systematic approach to optimize the robot motion primitives for performance. The approach first selects the static parameters, then the motion primitives, and finally iteratively update the waypoints to minimize the tracking error. The ultimate performance objective is to maximize the path speed subject to the tracking accuracy and speed uniformity constraints over the entire path. We have demonstrated the effectiveness of this approach in simulation for ABB and FANUC robots for two challenging example curves, and experimentally for an ABB robot. Comparing with the baseline using the current industry practice, the optimized performance shows over 200% performance improvement. △ Less

Submitted 5 January, 2023; originally announced January 2023.

arXiv:2211.05737 [pdf, other]

IK-Geo: Unified Robot Inverse Kinematics Using Subproblem Decomposition

Authors: Alexander J. Elias, John T. Wen

Abstract: This paper presents the open-source robot inverse kinematics (IK) solver IK-Geo, the fastest general IK solver based on published literature. In this unifying approach, IK for any 6-DOF all-revolute (6R) manipulator is decomposed into six canonical geometric subproblems solved by intersecting circles with other geometric objects. We present new efficient and singularity-robust solutions to these s… ▽ More This paper presents the open-source robot inverse kinematics (IK) solver IK-Geo, the fastest general IK solver based on published literature. In this unifying approach, IK for any 6-DOF all-revolute (6R) manipulator is decomposed into six canonical geometric subproblems solved by intersecting circles with other geometric objects. We present new efficient and singularity-robust solutions to these subproblems using geometric and linear algebra methods. IK-Geo finds all IK solutions including singular solutions and sometimes least-squares solutions by solving for subproblem solutions in all cases, including in a continuous and sometimes least-squares sense when a solution does not exist. Robots are classified into kinematic families based on cases of intersecting or parallel joint axes, and robots in the same family use the same IK algorithm. 6R robots with three intersecting or parallel axes are solved in closed form, and all solutions are found exactly without iteration. Other 6R robots are efficiently solved by searching for zeros of an error function of one or two joint angles. The subproblem and IK solutions are easy to understand, implement, test, and modify, meaning this method is readily ported to new languages and environments. We connect our geometric method with less efficient but more robust polynomial-based methods: rather than using search, subproblems and error functions may be written in terms of the tangent half-angle of one joint. This results in a system of multivariate polynomial equations from which the univariate polynomial with zeros corresponding to IK solutions is readily derived. △ Less

Submitted 19 February, 2024; v1 submitted 10 November, 2022; originally announced November 2022.

Comments: 19 pages, 8 figures. Update 2: New polynomial IK method, 1D search for nonconsecutive intersecting axes. Update 1: New subproblem solution methods and timing results

arXiv:2208.13226 [pdf, other]

Map Container: A Map-based Framework for Cooperative Perception

Authors: Kun Jiang, Yining Shi, Benny Wijaya, Mengmeng Yang, Tuopu Wen, Zhongyang Xiao, Diange Yang

Abstract: The idea of cooperative perception is to benefit from shared perception data between multiple vehicles and overcome the limitations of on-board sensors on single vehicle. However, the fusion of multi-vehicle information is still challenging due to inaccurate localization, limited communication bandwidth and ambiguous fusion. Past practices simplify the problem by placing a precise GNSS localizatio… ▽ More The idea of cooperative perception is to benefit from shared perception data between multiple vehicles and overcome the limitations of on-board sensors on single vehicle. However, the fusion of multi-vehicle information is still challenging due to inaccurate localization, limited communication bandwidth and ambiguous fusion. Past practices simplify the problem by placing a precise GNSS localization system, manually specify the number of connected vehicles and determine the fusion strategy. This paper proposes a map-based cooperative perception framework, named map container, to improve the accuracy and robustness of cooperative perception, which ultimately overcomes this problem. The concept 'Map Container' denotes that the map serves as the platform to transform all information into the map coordinate space automatically and incorporate different sources of information in a distributed fusion architecture. In the proposed map container, the GNSS signal and the matching relationship between sensor feature and map feature are considered to optimize the estimation of environment states. Evaluation on simulation dataset and real-vehicle platform result validates the effectiveness of the proposed method. △ Less

Submitted 28 August, 2022; originally announced August 2022.

Comments: 10 pages, 11 figures

arXiv:2202.08498 [pdf, other]

doi 10.1109/ICFSP55781.2022.9925001

Mirror-Yolo: An attention-based instance segmentation and detection model for mirrors

Authors: Fengze Li, Jieming Ma, Zhongbei Tian, Ji Ge, Hai-Ning Liang, Yungang Zhang, Tianxi Wen

Abstract: Mirrors can degrade the performance of computer vision models, however to accurately detect mirrors in images remains challenging. YOLOv4 achieves phenomenal results both in object detection accuracy and speed, nevertheless the model often fails in detecting mirrors. In this paper, a novel mirror detection method `Mirror-YOLO' is proposed, which mainly targets on mirror detection. Based on YOLOv4,… ▽ More Mirrors can degrade the performance of computer vision models, however to accurately detect mirrors in images remains challenging. YOLOv4 achieves phenomenal results both in object detection accuracy and speed, nevertheless the model often fails in detecting mirrors. In this paper, a novel mirror detection method `Mirror-YOLO' is proposed, which mainly targets on mirror detection. Based on YOLOv4, the proposed model embeds an attention mechanism for better feature acquisition, and a hypercolumn-stairstep approach for feature map fusion. Mirror-YOLO can also produce accurate bounding polygons for instance segmentation. The effectiveness of our proposed model is demonstrated by our experiments, compared to the existing mirror detection methods, the proposed Mirror-YOLO achieves better performance in detection accuracy on the mirror image dataset. △ Less

Submitted 17 February, 2022; originally announced February 2022.

arXiv:2201.08714 [pdf, other]

Equivalent Distance Geometry Error for Molecular Conformation Comparison

Authors: Shuwen Yang, Tianyu Wen, Ziyao Li, Guojie Song

Abstract: Straight-forward conformation generation models, which generate 3-D structures directly from input molecular graphs, play an important role in various molecular tasks with machine learning, such as 3D-QSAR and virtual screening in drug design. However, existing loss functions in these models either cost overmuch time or fail to guarantee the equivalence during optimization, which means treating di… ▽ More Straight-forward conformation generation models, which generate 3-D structures directly from input molecular graphs, play an important role in various molecular tasks with machine learning, such as 3D-QSAR and virtual screening in drug design. However, existing loss functions in these models either cost overmuch time or fail to guarantee the equivalence during optimization, which means treating different items unfairly, resulting in poor local geometry in generated conformation. So, we propose Equivalent Distance Geometry Error (EDGE) to calculate the differential discrepancy between conformations where the essential factors of three kinds in conformation geometry (i.e. bond lengths, bond angles and dihedral angles) are equivalently optimized with certain weights. And in the improved version of our method, the optimization features minimizing linear transformations of atom-pair distances within 3-hop. Extensive experiments show that, compared with existing loss functions, EDGE performs effectively and efficiently in two tasks under the same backbones. △ Less

Submitted 15 March, 2022; v1 submitted 13 November, 2021; originally announced January 2022.

arXiv:2111.00943 [pdf, other]

SVBRDF Recovery From a Single Image With Highlights using a Pretrained Generative Adversarial Network

Authors: Tao Wen, Beibei Wang, Lei Zhang, Jie Guo, Nicolas Holzschuch

Abstract: Spatially-varying bi-directional reflectance distribution functions (SVBRDFs) are crucial for designers to incorporate new materials in virtual scenes, making them look more realistic. Reconstruction of SVBRDFs is a long-standing problem. Existing methods either rely on extensive acquisition system or require huge datasets which are nontrivial to acquire. We aim to recover SVBRDFs from a single im… ▽ More Spatially-varying bi-directional reflectance distribution functions (SVBRDFs) are crucial for designers to incorporate new materials in virtual scenes, making them look more realistic. Reconstruction of SVBRDFs is a long-standing problem. Existing methods either rely on extensive acquisition system or require huge datasets which are nontrivial to acquire. We aim to recover SVBRDFs from a single image, without any datasets. A single image contains incomplete information about the SVBRDF, making the reconstruction task highly ill-posed. It is also difficult to separate between the changes in color that are caused by the material and those caused by the illumination, without the prior knowledge learned from the dataset. In this paper, we use an unsupervised generative adversarial neural network (GAN) to recover SVBRDFs maps with a single image as input. To better separate the effects due to illumination from the effects due to the material, we add the hypothesis that the material is stationary and introduce a new loss function based on Fourier coefficients to enforce this stationarity. For efficiency, we train the network in two stages: reusing a trained model to initialize the SVBRDFs and fine-tune it based on the input image. Our method generates high-quality SVBRDFs maps from a single input photograph, and provides more vivid rendering results compared to previous work. The two-stage training boosts runtime performance, making it 8 times faster than previous work. △ Less

Submitted 29 October, 2021; originally announced November 2021.

arXiv:2110.07449 [pdf, other]

zk-Fabric, a Polylithic Syntax Zero Knowledge Joint Proof System

Authors: Sheng Sun, Dr. Tong Wen

Abstract: In this paper, we create a single-use and full syntax zero-knowledge proof system, a.k.a zk-Fabric. Comparing with zk-SNARKS and another variant zero-knowledge proofing system, zkBOO and it's variant zkBOO++. We present multiple new approaches on how to use partitioned garbled circuits to achieve a joint zero-knowledge proof system, with the benefits of less overhead and full syntax verification.… ▽ More In this paper, we create a single-use and full syntax zero-knowledge proof system, a.k.a zk-Fabric. Comparing with zk-SNARKS and another variant zero-knowledge proofing system, zkBOO and it's variant zkBOO++. We present multiple new approaches on how to use partitioned garbled circuits to achieve a joint zero-knowledge proof system, with the benefits of less overhead and full syntax verification. zk-Fabric based on partitioned garbled circuits has the advantage of being versatile and single-use, meaning it can be applied to arbitrary circuits with more comprehensive statements, and it can achieve the non-interactivity among all participants. One of the protocols proposed within is used for creating a new kind of partitioned garbled circuits to match the comprehensive Boolean logical expression with multiple variables, we use the term "polythitic syntax" to refer to the context-based multiple variables in a comprehensive statement. We also designed a joint zero knowledge proof protocol that uses partitioned garbled circuits △ Less

Submitted 14 October, 2021; originally announced October 2021.

Comments: 6 pages, 5 figures

arXiv:2110.05801

A Simple Way to Verify Linearizability of Concurrent Stacks

Authors: Tangliu Wen

Abstract: Linearizability is a commonly accepted correctness criterion for concurrent data structures. However, verifying linearizability of highly concurrent data structures is still a challenging task. In this paper, we present a simple and complete proof technique for verifying linearizability of concurrent stacks. Our proof technique reduces linearizability of concurrent stacks to establishing a set of… ▽ More Linearizability is a commonly accepted correctness criterion for concurrent data structures. However, verifying linearizability of highly concurrent data structures is still a challenging task. In this paper, we present a simple and complete proof technique for verifying linearizability of concurrent stacks. Our proof technique reduces linearizability of concurrent stacks to establishing a set of conditions. These conditions are based on the happened-before order of operations, intuitively express the LIFO semantics and can be proved by simple arguments. Designers of concurrent data structures can easily and quickly learn to use the proof technique. We have successfully applied the method to several challenging concurrent stacks: the TS stack, the HSY stack, and the FA stack, etc. △ Less

Submitted 9 July, 2024; v1 submitted 12 October, 2021; originally announced October 2021.

Comments: 1. Algorithm 1 is not actually an algorithm, it is a schema describing a family of special case algorithms, one for each possible length of sequence. 2.The proofs presented for the various algorithms that have been verified are very operational and often seem to treat method executions as whole

arXiv:2109.10126 [pdf, other]

ConvFiT: Conversational Fine-Tuning of Pretrained Language Models

Authors: Ivan Vulić, Pei-Hao Su, Sam Coope, Daniela Gerz, Paweł Budzianowski, Iñigo Casanueva, Nikola Mrkšić, Tsung-Hsien Wen

Abstract: Transformer-based language models (LMs) pretrained on large text collections are proven to store a wealth of semantic knowledge. However, 1) they are not effective as sentence encoders when used off-the-shelf, and 2) thus typically lag behind conversationally pretrained (e.g., via response selection) encoders on conversational tasks such as intent detection (ID). In this work, we propose ConvFiT,… ▽ More Transformer-based language models (LMs) pretrained on large text collections are proven to store a wealth of semantic knowledge. However, 1) they are not effective as sentence encoders when used off-the-shelf, and 2) thus typically lag behind conversationally pretrained (e.g., via response selection) encoders on conversational tasks such as intent detection (ID). In this work, we propose ConvFiT, a simple and efficient two-stage procedure which turns any pretrained LM into a universal conversational encoder (after Stage 1 ConvFiT-ing) and task-specialised sentence encoder (after Stage 2). We demonstrate that 1) full-blown conversational pretraining is not required, and that LMs can be quickly transformed into effective conversational encoders with much smaller amounts of unannotated data; 2) pretrained LMs can be fine-tuned into task-specialised sentence encoders, optimised for the fine-grained semantics of a particular task. Consequently, such specialised sentence encoders allow for treating ID as a simple semantic similarity task based on interpretable nearest neighbours retrieval. We validate the robustness and versatility of the ConvFiT framework with such similarity-based inference on the standard ID evaluation sets: ConvFiT-ed LMs achieve state-of-the-art ID performance across the board, with particular gains in the most challenging, few-shot setups. △ Less

Submitted 21 September, 2021; originally announced September 2021.

Comments: EMNLP 2021 (long paper)

arXiv:2104.08524 [pdf, other]

Multilingual and Cross-Lingual Intent Detection from Spoken Data

Authors: Daniela Gerz, Pei-Hao Su, Razvan Kusztos, Avishek Mondal, Michał Lis, Eshan Singhal, Nikola Mrkšić, Tsung-Hsien Wen, Ivan Vulić

Abstract: We present a systematic study on multilingual and cross-lingual intent detection from spoken data. The study leverages a new resource put forth in this work, termed MInDS-14, a first training and evaluation resource for the intent detection task with spoken data. It covers 14 intents extracted from a commercial system in the e-banking domain, associated with spoken examples in 14 diverse language… ▽ More We present a systematic study on multilingual and cross-lingual intent detection from spoken data. The study leverages a new resource put forth in this work, termed MInDS-14, a first training and evaluation resource for the intent detection task with spoken data. It covers 14 intents extracted from a commercial system in the e-banking domain, associated with spoken examples in 14 diverse language varieties. Our key results indicate that combining machine translation models with state-of-the-art multilingual sentence encoders (e.g., LaBSE) can yield strong intent detectors in the majority of target languages covered in MInDS-14, and offer comparative analyses across different axes: e.g., zero-shot versus few-shot learning, translation direction, and impact of speech recognition. We see this work as an important step towards more inclusive development and evaluation of multilingual intent detectors from spoken data, in a much wider spectrum of languages compared to prior work. △ Less

Submitted 17 April, 2021; originally announced April 2021.

arXiv:2011.12538 [pdf, other]

An Odor Labeling Convolutional Encoder-Decoder for Odor Sensing in Machine Olfaction

Authors: Tengteng Wen, Zhuofeng Mo, Jingshan Li, Qi Liu, Liming Wu, Dehan Luo

Abstract: Deep learning methods have been widely applied to visual and acoustic technology. In this paper, we proposed an odor labeling convolutional encoder-decoder (OLCE) for odor identification in machine olfaction. OLCE composes a convolutional neural network encoder and decoder where the encoder output is constrained to odor labels. An electronic nose was used for the data collection of gas responses f… ▽ More Deep learning methods have been widely applied to visual and acoustic technology. In this paper, we proposed an odor labeling convolutional encoder-decoder (OLCE) for odor identification in machine olfaction. OLCE composes a convolutional neural network encoder and decoder where the encoder output is constrained to odor labels. An electronic nose was used for the data collection of gas responses followed by a normative experimental procedure. Several evaluation indexes were calculated to evaluate the algorithm effectiveness: accuracy 92.57%, precision 92.29%, recall rate 92.06%, F1-Score 91.96%, and Kappa coefficient 90.76%. We also compared the model with some algorithms used in machine olfaction. The comparison result demonstrated that OLCE had the best performance among these algorithms. In the paper, some perspectives of machine olfactions have been also discussed. △ Less

Submitted 6 January, 2021; v1 submitted 25 November, 2020; originally announced November 2020.

arXiv:2006.02038 [pdf, other]

Nested Scale Editing for Conditional Image Synthesis

Authors: Lingzhi Zhang, Jiancong Wang, Yinshuang Xu, Jie Min, Tarmily Wen, James C. Gee, Jianbo Shi

Abstract: We propose an image synthesis approach that provides stratified navigation in the latent code space. With a tiny amount of partial or very low-resolution image, our approach can consistently out-perform state-of-the-art counterparts in terms of generating the closest sampled image to the ground truth. We achieve this through scale-independent editing while expanding scale-specific diversity. Scale… ▽ More We propose an image synthesis approach that provides stratified navigation in the latent code space. With a tiny amount of partial or very low-resolution image, our approach can consistently out-perform state-of-the-art counterparts in terms of generating the closest sampled image to the ground truth. We achieve this through scale-independent editing while expanding scale-specific diversity. Scale-independence is achieved with a nested scale disentanglement loss. Scale-specific diversity is created by incorporating a progressive diversification constraint. We introduce semantic persistency across the scales by sharing common latent codes. Together they provide better control of the image synthesis process. We evaluate the effectiveness of our proposed approach through various tasks, including image outpainting, image superresolution, and cross-domain image translation. △ Less

Submitted 3 June, 2020; originally announced June 2020.

arXiv:1912.07293 [pdf, other]

Evaluating the Vulnerability of Communities in Social Networks by Gravity Model

Authors: Tao Wen

Abstract: With the development of network science, the various properties of complex networks have recently received extensive attention. Among these properties, the vulnerability of the communities (VoCs) is particularly important. In the conventional research, only parts of structural features of the community rather than multiple aspects are considered in the evaluating model. However, in reality, the im… ▽ More With the development of network science, the various properties of complex networks have recently received extensive attention. Among these properties, the vulnerability of the communities (VoCs) is particularly important. In the conventional research, only parts of structural features of the community rather than multiple aspects are considered in the evaluating model. However, in reality, the impact on the VoC is multifaceted, not only its own structure property, but also the influence of other communities. In order to better model the influence between communities, so as to evaluate the VoCs in the social network, a gravity-based community vulnerability evaluation model is proposed in this paper. In this proposed model, three different aspects of the factor are considered, i.e. the number of edges inside the community, the number of edges connected neighboring communities, and the gravity index (GI) of each community, which correspond to the interior information, small scale interaction relationship, and large scale interaction relationship of communities. By means of the Jensen-Shannon divergence (JSD) and log-sigmoid transition (LST) function, the abstract distance (AD) between each pair of communities can be calculated to construct the community network (CN). With the usage of gravity model, the GI of each community which describes the large scale interaction relationship can be obtained. Eventually, the community vulnerability degree and order can be calculated by this proposed model, and the sensitivity of weighting parameters is analyzed by Sobol' indices. In particular, this proposed method can degenerate to the classical method with the setting of weighting parameters. The effectiveness and reasonability of this proposed model are demonstrated by several real world complex networks. △ Less

Submitted 16 December, 2019; originally announced December 2019.

arXiv:1911.07471 [pdf, other]

Preparing Lessons: Improve Knowledge Distillation with Better Supervision

Authors: Tiancheng Wen, Shenqi Lai, Xueming Qian

Abstract: Knowledge distillation (KD) is widely used for training a compact model with the supervision of another large model, which could effectively improve the performance. Previous methods mainly focus on two aspects: 1) training the student to mimic representation space of the teacher; 2) training the model progressively or adding extra module like discriminator. Knowledge from teacher is useful, but i… ▽ More Knowledge distillation (KD) is widely used for training a compact model with the supervision of another large model, which could effectively improve the performance. Previous methods mainly focus on two aspects: 1) training the student to mimic representation space of the teacher; 2) training the model progressively or adding extra module like discriminator. Knowledge from teacher is useful, but it is still not exactly right compared with ground truth. Besides, overly uncertain supervision also influences the result. We introduce two novel approaches, Knowledge Adjustment (KA) and Dynamic Temperature Distillation (DTD), to penalize bad supervision and improve student model. Experiments on CIFAR-100, CINIC-10 and Tiny ImageNet show that our methods get encouraging performance compared with state-of-the-art methods. When combined with other KD-based methods, the performance will be further improved. △ Less

Submitted 24 July, 2020; v1 submitted 18 November, 2019; originally announced November 2019.

arXiv:1911.03688 [pdf, other]

ConveRT: Efficient and Accurate Conversational Representations from Transformers

Authors: Matthew Henderson, Iñigo Casanueva, Nikola Mrkšić, Pei-Hao Su, Tsung-Hsien Wen, Ivan Vulić

Abstract: General-purpose pretrained sentence encoders such as BERT are not ideal for real-world conversational AI applications; they are computationally heavy, slow, and expensive to train. We propose ConveRT (Conversational Representations from Transformers), a pretraining framework for conversational tasks satisfying all the following requirements: it is effective, affordable, and quick to train. We pret… ▽ More General-purpose pretrained sentence encoders such as BERT are not ideal for real-world conversational AI applications; they are computationally heavy, slow, and expensive to train. We propose ConveRT (Conversational Representations from Transformers), a pretraining framework for conversational tasks satisfying all the following requirements: it is effective, affordable, and quick to train. We pretrain using a retrieval-based response selection task, effectively leveraging quantization and subword-level parameterization in the dual encoder to build a lightweight memory- and energy-efficient model. We show that ConveRT achieves state-of-the-art performance across widely established response selection tasks. We also demonstrate that the use of extended dialog history as context yields further performance gains. Finally, we show that pretrained representations from the proposed encoder can be transferred to the intent classification task, yielding strong results across three diverse data sets. ConveRT trains substantially faster than standard sentence encoders or previous state-of-the-art dual encoders. With its reduced size and superior performance, we believe this model promises wider portability and scalability for Conversational AI applications. △ Less

Submitted 29 April, 2020; v1 submitted 9 November, 2019; originally announced November 2019.

arXiv:1910.11495 [pdf, other]

Deep Image Blending

Authors: Lingzhi Zhang, Tarmily Wen, Jianbo Shi

Abstract: Image composition is an important operation to create visual content. Among image composition tasks, image blending aims to seamlessly blend an object from a source image onto a target image with lightly mask adjustment. A popular approach is Poisson image blending, which enforces the gradient domain smoothness in the composite image. However, this approach only considers the boundary pixels of ta… ▽ More Image composition is an important operation to create visual content. Among image composition tasks, image blending aims to seamlessly blend an object from a source image onto a target image with lightly mask adjustment. A popular approach is Poisson image blending, which enforces the gradient domain smoothness in the composite image. However, this approach only considers the boundary pixels of target image, and thus can not adapt to texture of target image. In addition, the colors of the target image often seep through the original source object too much causing a significant loss of content of the source object. We propose a Poisson blending loss that achieves the same purpose of Poisson image blending. In addition, we jointly optimize the proposed Poisson blending loss as well as the style and content loss computed from a deep network, and reconstruct the blending region by iteratively updating the pixels using the L-BFGS solver. In the blending image, we not only smooth out gradient domain of the blending boundary but also add consistent texture into the blending region. User studies show that our method outperforms strong baselines as well as state-of-the-art approaches when placing objects onto both paintings and real-world images. △ Less

Submitted 24 October, 2019; originally announced October 2019.

arXiv:1909.13646 [pdf, other]

Vital Spreaders Identification in Complex Networks with Multi-Local Dimension

Authors: Tao Wen, Danilo Pelusi, Yong Deng

Abstract: The important nodes identification has been an interesting problem in this issue. Several centrality measures have been proposed to solve this problem, but most of previous methods have their own limitations. To address this problem more effectively, multi-local dimension (MLD) which is based on the fractal property is proposed to identify the vital spreaders in this paper. This proposed method co… ▽ More The important nodes identification has been an interesting problem in this issue. Several centrality measures have been proposed to solve this problem, but most of previous methods have their own limitations. To address this problem more effectively, multi-local dimension (MLD) which is based on the fractal property is proposed to identify the vital spreaders in this paper. This proposed method considers the information contained in the box and $q$ plays a weighting coefficient for this partition information. MLD would have different expressions with different value of $q$, and it would degenerate to local information dimension and variant of local dimension when $q = 1$ when $q = 0$ respectively, both of which have been effective identification method for influential nodes. Thus, MLD would be a more general method which can degenerate to some exiting centrality measures. In addition, different with classical methods, the node with low MLD would be more important in the network. Some comparison methods and real-world complex networks are applied in this paper to show the effectiveness and reasonableness of this proposed method. The experiment results show the superiority of this proposed method. △ Less

Submitted 30 September, 2019; originally announced September 2019.

arXiv:1909.13628 [pdf, ps, other]

The vulnerability of communities in complex network: An entropy approach

Authors: Tao Wen, Yong Deng

Abstract: Measuring the vulnerability of communities in complex network has become an important topic in the research of complex system. Numerous existing vulnerability measures have been proposed to solve such problems, however, most of these methods have their own shortcomings and limitations. Therefore, a new entropy-based approach is proposed in this paper to address such problems. This measure combines… ▽ More Measuring the vulnerability of communities in complex network has become an important topic in the research of complex system. Numerous existing vulnerability measures have been proposed to solve such problems, however, most of these methods have their own shortcomings and limitations. Therefore, a new entropy-based approach is proposed in this paper to address such problems. This measure combines the internal factors and external factors for each communities which can give the quantitative description of vulnerability of community. The internal factors contain the complexity degree of community and the number of edges inside the community, and the external factors contain the similarity degree between chosen community and other communities and the number of nodes outside the community. Considering community vulnerability from the perspective of entropy provides a new solution to such problem. Due to sufficient consideration of community information, more reasonable vulnerability result can be obtained. In order to show the performance and effectiveness of this proposed method, one example network and three real-world complex network is used to compare with some exiting methods, and the sensitivity of weight factors is analysed by Sobol' indices. The experiment results demonstrate the reasonableness and superiority of this proposed method. △ Less

Submitted 30 September, 2019; originally announced September 2019.

arXiv:1909.01296 [pdf, other]

PolyResponse: A Rank-based Approach to Task-Oriented Dialogue with Application in Restaurant Search and Booking

Authors: Matthew Henderson, Ivan Vulić, Iñigo Casanueva, Paweł Budzianowski, Daniela Gerz, Sam Coope, Georgios Spithourakis, Tsung-Hsien Wen, Nikola Mrkšić, Pei-Hao Su

Abstract: We present PolyResponse, a conversational search engine that supports task-oriented dialogue. It is a retrieval-based approach that bypasses the complex multi-component design of traditional task-oriented dialogue systems and the use of explicit semantics in the form of task-specific ontologies. The PolyResponse engine is trained on hundreds of millions of examples extracted from real conversation… ▽ More We present PolyResponse, a conversational search engine that supports task-oriented dialogue. It is a retrieval-based approach that bypasses the complex multi-component design of traditional task-oriented dialogue systems and the use of explicit semantics in the form of task-specific ontologies. The PolyResponse engine is trained on hundreds of millions of examples extracted from real conversations: it learns what responses are appropriate in different conversational contexts. It then ranks a large index of text and visual responses according to their similarity to the given context, and narrows down the list of relevant entities during the multi-turn conversation. We introduce a restaurant search and booking system powered by the PolyResponse engine, currently available in 8 different languages. △ Less

Submitted 3 September, 2019; originally announced September 2019.

Comments: EMNLP 2019 (Demo paper)

arXiv:1908.11298 [pdf, ps, other]

doi 10.1016/j.ins.2019.10.003

Identification of influencers in complex networks by local information dimensionality

Authors: Tao Wen, Yong Deng

Abstract: The identification of influential spreaders in complex networks is a popular topic in studies of network characteristics. Many centrality measures have been proposed to address this problem, but most have limitations. In this paper, a method for identifying influencers in complex networks via the local information dimensionality. The proposed method considers the local structural properties around… ▽ More The identification of influential spreaders in complex networks is a popular topic in studies of network characteristics. Many centrality measures have been proposed to address this problem, but most have limitations. In this paper, a method for identifying influencers in complex networks via the local information dimensionality. The proposed method considers the local structural properties around the central node; therefore, the scale of locality only increases to half of the maximum value of the shortest distance from the central node. Thus, the proposed method considers the quasilocal information and reduces the computational complexity. The information (number of nodes) in boxes is described via the Shannon entropy, which is more reasonable. A node is more influential when its local information dimensionality is higher. In order to show the effectiveness of the proposed method, five existing centrality measures are used as comparison methods to rank influential nodes in six real world complex networks. In addition, a susceptible infected (SI) model and Kendall's tau coefficient are applied to show the correlation between different methods. Experiment results show the superiority of the proposed method. △ Less

Submitted 5 October, 2019; v1 submitted 29 August, 2019; originally announced August 2019.

Journal ref: Information Sciences 2019

arXiv:1908.11056 [pdf, other]

Targeted Source Detection for Environmental Data

Authors: Guanjie Zheng, Mengqi Liu, Tao Wen, Hongjian Wang, Huaxiu Yao, Susan L. Brantley, Zhenhui Li

Abstract: In the face of growing needs for water and energy, a fundamental understanding of the environmental impacts of human activities becomes critical for managing water and energy resources, remedying water pollution, and making regulatory policy wisely. Among activities that impact the environment, oil and gas production, wastewater transport, and urbanization are included. In addition to the occurren… ▽ More In the face of growing needs for water and energy, a fundamental understanding of the environmental impacts of human activities becomes critical for managing water and energy resources, remedying water pollution, and making regulatory policy wisely. Among activities that impact the environment, oil and gas production, wastewater transport, and urbanization are included. In addition to the occurrence of anthropogenic contamination, the presence of some contaminants (e.g., methane, salt, and sulfate) of natural origin is not uncommon. Therefore, scientists sometimes find it difficult to identify the sources of contaminants in the coupled natural and human systems. In this paper, we propose a technique to simultaneously conduct source detection and prediction, which outperforms other approaches in the interdisciplinary case study of the identification of potential groundwater contamination within a region of high-density shale gas development. △ Less

Submitted 29 August, 2019; originally announced August 2019.

Comments: 8 pages, 4 figures, 1 table

arXiv:1908.03269 [pdf, other]

Neural-Learning Trajectory Tracking Control of Flexible-Joint Robot Manipulators with Unknown Dynamics

Authors: Shuyang Chen, John T. Wen

Abstract: Fast and precise motion control is important for industrial robots in manufacturing applications. However, some collaborative robots sacrifice precision for safety, particular for high motion speed. The performance degradation is caused by the inability of the joint servo controller to address the uncertain nonlinear dynamics of the robot arm, e.g., due to joint flexibility. We consider two approa… ▽ More Fast and precise motion control is important for industrial robots in manufacturing applications. However, some collaborative robots sacrifice precision for safety, particular for high motion speed. The performance degradation is caused by the inability of the joint servo controller to address the uncertain nonlinear dynamics of the robot arm, e.g., due to joint flexibility. We consider two approaches to improve the trajectory tracking performance through feedforward compensation. The first approach uses iterative learning control, with the gradient-based iterative update generated from the robot forward dynamics model. The second approach uses dynamic inversion to directly compensate for the robot forward dynamics. If the forward dynamics is strictly proper or is non-minimum-phase (e.g., due to time delays), its stable inverse would be non-causal. Both approaches require robot dynamical models. This paper presents results of using recurrent neural networks (RNNs) to approximate these dynamical models-forward dynamics in the first case, inverse dynamics (possibly non-causal) in the second case. We use the bi-directional RNN to capture the noncausality. The RNNs are trained based on a collection of commanded trajectories and the actual robot responses. We use a Baxter robot to evaluate the two approaches. The Baxter robot exhibits significant joint flexibility due to the series-elastic joint actuators. Both approaches achieve sizable improvement over the uncompensated robot motion, for both random joint trajectories and Cartesian motion. The inverse dynamics method is particularly attractive as it may be used to more accurately track a user input as in teleoperation. △ Less

Submitted 8 August, 2019; originally announced August 2019.

Comments: Accepted by IROS 2019

arXiv:1908.01477 [pdf, other]

GDRQ: Group-based Distribution Reshaping for Quantization

Authors: Haibao Yu, Tuopu Wen, Guangliang Cheng, Jiankai Sun, Qi Han, Jianping Shi

Abstract: Low-bit quantization is challenging to maintain high performance with limited model capacity (e.g., 4-bit for both weights and activations). Naturally, the distribution of both weights and activations in deep neural network are Gaussian-like. Nevertheless, due to the limited bitwidth of low-bit model, uniform-like distributed weights and activations have been proved to be more friendly to quantiza… ▽ More Low-bit quantization is challenging to maintain high performance with limited model capacity (e.g., 4-bit for both weights and activations). Naturally, the distribution of both weights and activations in deep neural network are Gaussian-like. Nevertheless, due to the limited bitwidth of low-bit model, uniform-like distributed weights and activations have been proved to be more friendly to quantization while preserving accuracy~\cite{Han2015Learning}. Motivated by this, we propose Scale-Clip, a Distribution Reshaping technique that can reshape weights or activations into a uniform-like distribution in a dynamic manner. Furthermore, to increase the model capability for a low-bit model, a novel Group-based Quantization algorithm is proposed to split the filters into several groups. Different groups can learn different quantization parameters, which can be elegantly merged in to batch normalization layer without extra computational cost in the inference stage. Finally, we integrate Scale-Clip technique with Group-based Quantization algorithm and propose the Group-based Distribution Reshaping Quantization (GDQR) framework to further improve the quantization performance. Experiments on various networks (e.g. VGGNet and ResNet) and vision tasks (e.g. classification, detection and segmentation) demonstrate that our framework achieves good performance. △ Less

Submitted 5 August, 2019; originally announced August 2019.

arXiv:1906.01543 [pdf, other]

Training Neural Response Selection for Task-Oriented Dialogue Systems

Authors: Matthew Henderson, Ivan Vulić, Daniela Gerz, Iñigo Casanueva, Paweł Budzianowski, Sam Coope, Georgios Spithourakis, Tsung-Hsien Wen, Nikola Mrkšić, Pei-Hao Su

Abstract: Despite their popularity in the chatbot literature, retrieval-based models have had modest impact on task-oriented dialogue systems, with the main obstacle to their application being the low-data regime of most task-oriented dialogue tasks. Inspired by the recent success of pretraining in language modelling, we propose an effective method for deploying response selection in task-oriented dialogue.… ▽ More Despite their popularity in the chatbot literature, retrieval-based models have had modest impact on task-oriented dialogue systems, with the main obstacle to their application being the low-data regime of most task-oriented dialogue tasks. Inspired by the recent success of pretraining in language modelling, we propose an effective method for deploying response selection in task-oriented dialogue. To train response selection models for task-oriented dialogue tasks, we propose a novel method which: 1) pretrains the response selection model on large general-domain conversational corpora; and then 2) fine-tunes the pretrained model for the target dialogue domain, relying only on the small in-domain dataset to capture the nuances of the given dialogue domain. Our evaluation on six diverse application domains, ranging from e-commerce to banking, demonstrates the effectiveness of the proposed training method. △ Less

Submitted 7 June, 2019; v1 submitted 4 June, 2019; originally announced June 2019.

Comments: ACL 2019 long paper

arXiv:1905.13628 [pdf, other]

Time Series Anomaly Detection Using Convolutional Neural Networks and Transfer Learning

Authors: Tailai Wen, Roy Keyes

Abstract: Time series anomaly detection plays a critical role in automated monitoring systems. Most previous deep learning efforts related to time series anomaly detection were based on recurrent neural networks (RNN). In this paper, we propose a time series segmentation approach based on convolutional neural networks (CNN) for anomaly detection. Moreover, we propose a transfer learning framework that pretr… ▽ More Time series anomaly detection plays a critical role in automated monitoring systems. Most previous deep learning efforts related to time series anomaly detection were based on recurrent neural networks (RNN). In this paper, we propose a time series segmentation approach based on convolutional neural networks (CNN) for anomaly detection. Moreover, we propose a transfer learning framework that pretrains a model on a large-scale synthetic univariate time series data set and then fine-tunes its weights on small-scale, univariate or multivariate data sets with previously unseen classes of anomalies. For the multivariate case, we introduce a novel network architecture. The approach was tested on multiple synthetic and real data sets successfully. △ Less

Submitted 31 May, 2019; originally announced May 2019.

Comments: 8 pages, 8 figures, AI for Internet of Things Workshop in IJCAI 2019

arXiv:1904.06472 [pdf, other]

A Repository of Conversational Datasets

Authors: Matthew Henderson, Paweł Budzianowski, Iñigo Casanueva, Sam Coope, Daniela Gerz, Girish Kumar, Nikola Mrkšić, Georgios Spithourakis, Pei-Hao Su, Ivan Vulić, Tsung-Hsien Wen

Abstract: Progress in Machine Learning is often driven by the availability of large datasets, and consistent evaluation metrics for comparing modeling approaches. To this end, we present a repository of conversational datasets consisting of hundreds of millions of examples, and a standardised evaluation procedure for conversational response selection models using '1-of-100 accuracy'. The repository contains… ▽ More Progress in Machine Learning is often driven by the availability of large datasets, and consistent evaluation metrics for comparing modeling approaches. To this end, we present a repository of conversational datasets consisting of hundreds of millions of examples, and a standardised evaluation procedure for conversational response selection models using '1-of-100 accuracy'. The repository contains scripts that allow researchers to reproduce the standard datasets, or to adapt the pre-processing and data filtering steps to their needs. We introduce and evaluate several competitive baselines for conversational response selection, whose implementations are shared in the repository, as well as a neural encoder model that is trained on the entire training set. △ Less

Submitted 28 May, 2019; v1 submitted 12 April, 2019; originally announced April 2019.

Journal ref: Proceedings of the Workshop on NLP for Conversational AI (2019)

arXiv:1903.00082 [pdf, other]

Industrial Robot Trajectory Tracking Using Multi-Layer Neural Networks Trained by Iterative Learning Control

Authors: Shuyang Chen, John T. Wen

Abstract: Fast and precise robot motion is needed in certain applications such as electronic manufacturing, additive manufacturing and assembly. Most industrial robot motion controllers allow externally commanded motion profile, but the trajectory tracking performance is affected by the robot dynamics and joint servo controllers which users have no direct access and little information. The performance is fu… ▽ More Fast and precise robot motion is needed in certain applications such as electronic manufacturing, additive manufacturing and assembly. Most industrial robot motion controllers allow externally commanded motion profile, but the trajectory tracking performance is affected by the robot dynamics and joint servo controllers which users have no direct access and little information. The performance is further compromised by time delays in transmitting the external command as a setpoint to the inner control loop. This paper presents an approach of combining neural networks and iterative learning control to improve the trajectory tracking performance for a multi-axis articulated industrial robot. For a given desired trajectory, the external command is iteratively refined using a high fidelity dynamical simulator to compensate for the robot inner loop dynamics. These desired trajectories and the corresponding refined input trajectories are then used to train multi-layer neural networks to emulate the dynamical inverse of the nonlinear inner loop dynamics. We show that with a sufficiently rich training set, the trained neural networks can generalize well to trajectories beyond the training set. In applying the trained neural networks to the physical robot, the tracking performance still improves but not as much as in the simulator. We show that transfer learning can effectively bridge the gap between simulation and the physical robot. In the end, we test the trained neural networks on other robot models in simulation and demonstrate the possibility of a general purpose network. Development and evaluation of this methodology is based on the ABB IRB6640-180 industrial robot and ABB RobotStudio software packages. △ Less

Submitted 4 March, 2019; v1 submitted 28 February, 2019; originally announced March 2019.

arXiv:1901.09658 [pdf, ps, other]

doi 10.1016/j.chaos.2019.01.011

Identifying influential nodes based on fuzzy local dimension in complex networks

Authors: Tao Wen, Wen Jiang

Abstract: How to identify influential nodes in complex networks is an important aspect in the study of complex network. In this paper, a novel fuzzy local dimension (FLD) is proposed to rank the influential nodes in complex networks, where a node with high fuzzy local dimension has high influential ability. This proposed method focuses on the influence of the distance from the center node on the local dimen… ▽ More How to identify influential nodes in complex networks is an important aspect in the study of complex network. In this paper, a novel fuzzy local dimension (FLD) is proposed to rank the influential nodes in complex networks, where a node with high fuzzy local dimension has high influential ability. This proposed method focuses on the influence of the distance from the center node on the local dimension of center node by fuzzy set, resulting in a change in influential ability. In order to show this proposed method's effectiveness and accuracy, four real-world networks are applied in this paper. Meanwhile, Susceptible-Infected (SI) is used to simulate the spreading process by FLD and other centrality measures, and Kendall's tau coefficient is used to describe the correlation between the influential nodes obtained by centrality and the results measured by SI model. Observing from the ranking lists and simulated results, this method is effective and accurate to rank the influential nodes. △ Less

Submitted 21 December, 2018; originally announced January 2019.

Comments: 38 pages, 6 figures

arXiv:1810.00278 [pdf, other]

MultiWOZ -- A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling

Authors: Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, Iñigo Casanueva, Stefan Ultes, Osman Ramadan, Milica Gašić

Abstract: Even though machine learning has become the major scene in dialogue research community, the real breakthrough has been blocked by the scale of data available. To address this fundamental obstacle, we introduce the Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully-labeled collection of human-human written conversations spanning over multiple domains and topics. At a size of $10$k dialogues, it… ▽ More Even though machine learning has become the major scene in dialogue research community, the real breakthrough has been blocked by the scale of data available. To address this fundamental obstacle, we introduce the Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully-labeled collection of human-human written conversations spanning over multiple domains and topics. At a size of $10$k dialogues, it is at least one order of magnitude larger than all previous annotated task-oriented corpora. The contribution of this work apart from the open-sourced dataset labelled with dialogue belief states and dialogue actions is two-fold: firstly, a detailed description of the data collection procedure along with a summary of data structure and analysis is provided. The proposed data-collection pipeline is entirely based on crowd-sourcing without the need of hiring professional annotators; secondly, a set of benchmark results of belief tracking, dialogue act and response generation is reported, which shows the usability of the data and sets a baseline for future studies. △ Less

Submitted 20 April, 2020; v1 submitted 29 September, 2018; originally announced October 2018.

Comments: Accepted for publication at EMNLP 2018

arXiv:1809.07070 [pdf, other]

Latent Topic Conversational Models

Authors: Tsung-Hsien Wen, Minh-Thang Luong

Abstract: Latent variable models have been a preferred choice in conversational modeling compared to sequence-to-sequence (seq2seq) models which tend to generate generic and repetitive responses. Despite so, training latent variable models remains to be difficult. In this paper, we propose Latent Topic Conversational Model (LTCM) which augments seq2seq with a neural latent topic component to better guide re… ▽ More Latent variable models have been a preferred choice in conversational modeling compared to sequence-to-sequence (seq2seq) models which tend to generate generic and repetitive responses. Despite so, training latent variable models remains to be difficult. In this paper, we propose Latent Topic Conversational Model (LTCM) which augments seq2seq with a neural latent topic component to better guide response generation and make training easier. The neural topic component encodes information from the source sentence to build a global "topic" distribution over words, which is then consulted by the seq2seq model at each generation step. We study in details how the latent representation is learnt in both the vanilla model and LTCM. Our extensive experiments contribute to better understanding and training of conditional latent models for languages. Our results show that by sampling from the learnt latent representations, LTCM can generate diverse and interesting responses. In a subjective human evaluation, the judges also confirm that LTCM is the overall preferred option. △ Less

Submitted 19 September, 2018; originally announced September 2018.

arXiv:1806.08206

Proving Linearizability Using Reduction

Authors: Tangliu Wen

Abstract: Lipton's reduction theory provides an intuitive and simple way for deducing the non-interference properties of concurrent programs, but it is difficult to directly apply the technique to verify linearizability of sophisticated fine-grained concurrent data structures. In this paper, we propose three reduction-based proof methods that can handle such data structures. The key idea behind our reductio… ▽ More Lipton's reduction theory provides an intuitive and simple way for deducing the non-interference properties of concurrent programs, but it is difficult to directly apply the technique to verify linearizability of sophisticated fine-grained concurrent data structures. In this paper, we propose three reduction-based proof methods that can handle such data structures. The key idea behind our reduction methods is that an irreducible operation can be viewed as an atomic operation at a higher level of abstraction. This allows us to focus on the reduction properties of an operation related to its abstract semantics. We have successfully applied the methods to verify 11 concurrent data structures including the most challenging ones: the Herlihy and Wing queue, the HSY elimination-based stack, and the time-stamped queue, and the lazy list. Our methods inherit intuition and simplicity of Lipton's reduction, and concurrent data structures designers can easily and quickly learn to use the methods. △ Less

Submitted 29 August, 2018; v1 submitted 21 June, 2018; originally announced June 2018.

Comments: the co-authors of the paper require me to withdraw it

arXiv:1806.08128 [pdf, other]

Strict Linearizability and Abstract Atomicity

Authors: Tangliu Wen

Abstract: Linearizability is a commonly accepted consistency condition for concurrent objects. Filipović et al. show that linearizability is equivalent to observational refinement. However, linearizability does not permit concurrent objects to share memory spaces with their client programs. We show that linearizability (or observational refinement) can be broken even though a client program of an object acc… ▽ More Linearizability is a commonly accepted consistency condition for concurrent objects. Filipović et al. show that linearizability is equivalent to observational refinement. However, linearizability does not permit concurrent objects to share memory spaces with their client programs. We show that linearizability (or observational refinement) can be broken even though a client program of an object accesses the shared memory spaces without interference from the methods of the object. In this paper, we present strict linearizability which lifts this limitation and can ensure client-side traces and final-states equivalence even in a relaxed program model allowing clients to directly access the states of concurrent objects. We also investigate several important properties of strict linearizability. At a high level of abstraction, a concurrent object can be viewed as a concurrent implementation of an abstract data type (ADT). We also present a correctness criterion for relating an ADT and its concurrent implementation, which is the combination of linearizability and data abstraction and can ensure observational equivalence. We also investigate its relationship with strict linearizability. △ Less

Submitted 21 June, 2018; originally announced June 2018.

Showing 1–50 of 73 results for author: Wen, T