subscribe to arXiv mailings

Breaking Free: Efficient Multi-Party Private Set Union Without Non-Collusion Assumptions

Authors: Minglang Dong, Yu Chen, Cong Zhang, Yujie Bai

Abstract: Multi-party private set union (MPSU) protocol enables $m$ $(m > 2)$ parties, each holding a set, to collectively compute the union of their sets without revealing any additional information to other parties. There are two main categories of MPSU protocols: The first builds on public-key techniques. All existing works in this category involve a super-linear number of public-key operations, resultin… ▽ More Multi-party private set union (MPSU) protocol enables $m$ $(m > 2)$ parties, each holding a set, to collectively compute the union of their sets without revealing any additional information to other parties. There are two main categories of MPSU protocols: The first builds on public-key techniques. All existing works in this category involve a super-linear number of public-key operations, resulting in poor practical efficiency. The second builds on oblivious transfer and symmetric-key techniques. The only existing work in this category is proposed by Liu and Gao (ASIACRYPT 2023), which features the best concrete performance among all existing protocols, despite its super-linear computation and communication. Unfortunately, it does not achieve the standard semi-honest security, as it inherently relies on a non-collusion assumption, which is unlikely to hold in practice. Therefore, the problem of constructing a practical MPSU protocol based on oblivious transfer and symmetric-key techniques in standard semi-honest model remains open. Furthermore, there is no MPSU protocol achieving both linear computation and linear communication complexity, which leaves another unresolved problem. In this work, we resolve these two open problems. We propose the first MPSU protocol based on oblivious transfer and symmetric-key techniques in the standard semi-honest model. This protocol is $4.9-9.3 \times$ faster than Liu and Gao in the LAN setting. Concretely, our protocol requires only $3.6$ seconds in online phase for 3 parties with sets of $2^{20}$ items each. We propose the first MPSU protocol achieving both linear computation and linear communication complexity, based on public-key operations. This protocol has the lowest overall communication costs and shows a factor of $3.0-36.5\times$ improvement in terms of overall communication compared to Liu and Gao. △ Less

Submitted 1 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

arXiv:2405.14174 [pdf, other]

Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model

Authors: Yuheng Shi, Minjing Dong, Chang Xu

Abstract: Despite the significant achievements of Vision Transformers (ViTs) in various vision tasks, they are constrained by the quadratic complexity. Recently, State Space Models (SSMs) have garnered widespread attention due to their global receptive field and linear complexity with respect to the input length, demonstrating substantial potential across fields including natural language processing and com… ▽ More Despite the significant achievements of Vision Transformers (ViTs) in various vision tasks, they are constrained by the quadratic complexity. Recently, State Space Models (SSMs) have garnered widespread attention due to their global receptive field and linear complexity with respect to the input length, demonstrating substantial potential across fields including natural language processing and computer vision. To improve the performance of SSMs in vision tasks, a multi-scan strategy is widely adopted, which leads to significant redundancy of SSMs. For a better trade-off between efficiency and performance, we analyze the underlying reasons behind the success of the multi-scan strategy, where long-range dependency plays an important role. Based on the analysis, we introduce Multi-Scale Vision Mamba (MSVMamba) to preserve the superiority of SSMs in vision tasks with limited parameters. It employs a multi-scale 2D scanning technique on both original and downsampled feature maps, which not only benefits long-range dependency learning but also reduces computational costs. Additionally, we integrate a Convolutional Feed-Forward Network (ConvFFN) to address the lack of channel mixing. Our experiments demonstrate that MSVMamba is highly competitive, with the MSVMamba-Tiny model achieving 82.8% top-1 accuracy on ImageNet, 46.9% box mAP, and 42.2% instance mAP with the Mask R-CNN framework, 1x training schedule on COCO, and 47.6% mIoU with single-scale testing on ADE20K.Code is available at \url{https://github.com/YuHengsss/MSVMamba}. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.07527 [pdf, other]

Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models

Authors: Yubin Shi, Yixuan Chen, Mingzhi Dong, Xiaochen Yang, Dongsheng Li, Yujiang Wang, Robert P. Dick, Qin Lv, Yingying Zhao, Fan Yang, Tun Lu, Ning Gu, Li Shang

Abstract: Despite their prevalence in deep-learning communities, over-parameterized models convey high demands of computational costs for proper training. This work studies the fine-grained, modular-level learning dynamics of over-parameterized models to attain a more efficient and fruitful training strategy. Empirical evidence reveals that when scaling down into network modules, such as heads in self-atten… ▽ More Despite their prevalence in deep-learning communities, over-parameterized models convey high demands of computational costs for proper training. This work studies the fine-grained, modular-level learning dynamics of over-parameterized models to attain a more efficient and fruitful training strategy. Empirical evidence reveals that when scaling down into network modules, such as heads in self-attention models, we can observe varying learning patterns implicitly associated with each module's trainability. To describe such modular-level learning capabilities, we introduce a novel concept dubbed modular neural tangent kernel (mNTK), and we demonstrate that the quality of a module's learning is tightly associated with its mNTK's principal eigenvalue $λ_{\max}$. A large $λ_{\max}$ indicates that the module learns features with better convergence, while those miniature ones may impact generalization negatively. Inspired by the discovery, we propose a novel training strategy termed Modular Adaptive Training (MAT) to update those modules with their $λ_{\max}$ exceeding a dynamic threshold selectively, concentrating the model on learning common features and ignoring those inconsistent ones. Unlike most existing training schemes with a complete BP cycle across all network modules, MAT can significantly save computations by its partially-updating strategy and can further improve performance. Experiments show that MAT nearly halves the computational cost of model training and outperforms the accuracy of baselines. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: Accepted at NeurIPS 2023

arXiv:2405.00527 [pdf, other]

ChatBI: Towards Natural Language to Complex Business Intelligence SQL

Authors: Jinqing Lian, Xinyi Liu, Yingxia Shao, Yang Dong, Ming Wang, Zhang Wei, Tianqi Wan, Ming Dong, Hailin Yan

Abstract: The Natural Language to SQL (NL2SQL) technology provides non-expert users who are unfamiliar with databases the opportunity to use SQL for data analysis.Converting Natural Language to Business Intelligence (NL2BI) is a popular practical scenario for NL2SQL in actual production systems. Compared to NL2SQL, NL2BI introduces more challenges. In this paper, we propose ChatBI, a comprehensive and eff… ▽ More The Natural Language to SQL (NL2SQL) technology provides non-expert users who are unfamiliar with databases the opportunity to use SQL for data analysis.Converting Natural Language to Business Intelligence (NL2BI) is a popular practical scenario for NL2SQL in actual production systems. Compared to NL2SQL, NL2BI introduces more challenges. In this paper, we propose ChatBI, a comprehensive and efficient technology for solving the NL2BI task. First, we analyze the interaction mode, an important module where NL2SQL and NL2BI differ in use, and design a smaller and cheaper model to match this interaction mode. In BI scenarios, tables contain a huge number of columns, making it impossible for existing NL2SQL methods that rely on Large Language Models (LLMs) for schema linking to proceed due to token limitations. The higher proportion of ambiguous columns in BI scenarios also makes schema linking difficult. ChatBI combines existing view technology in the database community to first decompose the schema linking problem into a Single View Selection problem and then uses a smaller and cheaper machine learning model to select the single view with a significantly reduced number of columns. The columns of this single view are then passed as the required columns for schema linking into the LLM. Finally, ChatBI proposes a phased process flow different from existing process flows, which allows ChatBI to generate SQL containing complex semantics and comparison relations more accurately. We have deployed ChatBI on Baidu's data platform and integrated it into multiple product lines for large-scale production task evaluation. The obtained results highlight its superiority in practicality, versatility, and efficiency. At the same time, compared with the current mainstream NL2SQL technology under our real BI scenario data tables and queries, it also achieved the best results. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.02106 [pdf, other]

Neural Ordinary Differential Equation based Sequential Image Registration for Dynamic Characterization

Authors: Yifan Wu, Mengjin Dong, Rohit Jena, Chen Qin, James C. Gee

Abstract: Deformable image registration (DIR) is crucial in medical image analysis, enabling the exploration of biological dynamics such as organ motions and longitudinal changes in imaging. Leveraging Neural Ordinary Differential Equations (ODE) for registration, this extension work discusses how this framework can aid in the characterization of sequential biological processes. Utilizing the Neural ODE's a… ▽ More Deformable image registration (DIR) is crucial in medical image analysis, enabling the exploration of biological dynamics such as organ motions and longitudinal changes in imaging. Leveraging Neural Ordinary Differential Equations (ODE) for registration, this extension work discusses how this framework can aid in the characterization of sequential biological processes. Utilizing the Neural ODE's ability to model state derivatives with neural networks, our Neural Ordinary Differential Equation Optimization-based (NODEO) framework considers voxels as particles within a dynamic system, defining deformation fields through the integration of neural differential equations. This method learns dynamics directly from data, bypassing the need for physical priors, making it exceptionally suitable for medical scenarios where such priors are unavailable or inapplicable. Consequently, the framework can discern underlying dynamics and use sequence data to regularize the transformation trajectory. We evaluated our framework on two clinical datasets: one for cardiac motion tracking and another for longitudinal brain MRI analysis. Demonstrating its efficacy in both 2D and 3D imaging scenarios, our framework offers flexibility and model agnosticism, capable of managing image sequences and facilitating label propagation throughout these sequences. This study provides a comprehensive understanding of how the Neural ODE-based framework uniquely benefits the image registration challenge. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: Journal extension of NODEO: A Neural Ordinary Differential Equation Based Optimization Framework for Deformable Image Registration, CVPR 2022

arXiv:2403.10927 [pdf, ps, other]

Distributed Multi-Objective Dynamic Offloading Scheduling for Air-Ground Cooperative MEC

Authors: Yang Huang, Miaomiao Dong, Yijie Mao, Wenqiang Liu, Zhen Gao

Abstract: Utilizing unmanned aerial vehicles (UAVs) with edge server to assist terrestrial mobile edge computing (MEC) has attracted tremendous attention. Nevertheless, state-of-the-art schemes based on deterministic optimizations or single-objective reinforcement learning (RL) cannot reduce the backlog of task bits and simultaneously improve energy efficiency in highly dynamic network environments, where t… ▽ More Utilizing unmanned aerial vehicles (UAVs) with edge server to assist terrestrial mobile edge computing (MEC) has attracted tremendous attention. Nevertheless, state-of-the-art schemes based on deterministic optimizations or single-objective reinforcement learning (RL) cannot reduce the backlog of task bits and simultaneously improve energy efficiency in highly dynamic network environments, where the design problem amounts to a sequential decision-making problem. In order to address the aforementioned problems, as well as the curses of dimensionality introduced by the growing number of terrestrial terrestrial users, this paper proposes a distributed multi-objective (MO) dynamic trajectory planning and offloading scheduling scheme, integrated with MORL and the kernel method. The design of n-step return is also applied to average fluctuations in the backlog. Numerical results reveal that the n-step return can benefit the proposed kernel-based approach, achieving significant improvement in the long-term average backlog performance, compared to the conventional 1-step return design. Due to such design and the kernel-based neural network, to which decision-making features can be continuously added, the kernel-based approach can outperform the approach based on fully-connected deep neural network, yielding improvement in energy consumption and the backlog performance, as well as a significant reduction in decision-making and online learning time. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: This paper has been accepted for publication in the IEEE Transactions on Vehicular Technology

arXiv:2403.10002 [pdf, ps, other]

Fast Group Scheduling for Downlink Large-Scale Multi-Group Multicast Beamforming

Authors: Chong Zhang, Min Dong, Ben Liang, Ali Afana, Yahia Ahmed

Abstract: Next-generation wireless networks need to handle massive user access effectively. This paper addresses the problem of joint group scheduling and multicast beamforming for downlink transmission with many active user groups. Aiming to maximize the minimum user throughput, we propose a three-phase approach to tackle this difficult joint optimization problem efficiently. In Phase 1, we utilize the opt… ▽ More Next-generation wireless networks need to handle massive user access effectively. This paper addresses the problem of joint group scheduling and multicast beamforming for downlink transmission with many active user groups. Aiming to maximize the minimum user throughput, we propose a three-phase approach to tackle this difficult joint optimization problem efficiently. In Phase 1, we utilize the optimal multicast beamforming structure obtained recently to find the group-channel directions for all groups. We propose two low-complexity group scheduling algorithms in Phase 2, which determine the subset of groups in each time slot sequentially and the total number of time slots required for all groups. The first algorithm measures the level of spatial separation among groups and selects the dissimilar groups that maximize the minimum user rate into the same time slot. In contrast, the second algorithm first identifies the spatially correlated groups via a learning-based clustering method based on the group-channel directions, and then separates spatially similar groups into different time slots. Finally, the multicast beamformers for the scheduled groups are obtained in each time slot by a computationally efficient method. Simulation results show that our proposed scheduling methods can effectively capture the level of spatial separation among groups to improve the minimum user throughput over the conventional approach that serves all groups in a single time slot or one group per time slot, and can be executed with low computational complexity. △ Less

Submitted 24 June, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Comments: 13 pages, 8 figures

arXiv:2403.08492 [pdf, other]

Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking

Authors: Ming Dong, Yujing Chen, Miao Zhang, Hao Sun, Tingting He

Abstract: Chinese Spell Checking (CSC) is a widely used technology, which plays a vital role in speech to text (STT) and optical character recognition (OCR). Most of the existing CSC approaches relying on BERT architecture achieve excellent performance. However, limited by the scale of the foundation model, BERT-based method does not work well in few-shot scenarios, showing certain limitations in practical… ▽ More Chinese Spell Checking (CSC) is a widely used technology, which plays a vital role in speech to text (STT) and optical character recognition (OCR). Most of the existing CSC approaches relying on BERT architecture achieve excellent performance. However, limited by the scale of the foundation model, BERT-based method does not work well in few-shot scenarios, showing certain limitations in practical applications. In this paper, we explore using an in-context learning method named RS-LLM (Rich Semantic based LLMs) to introduce large language models (LLMs) as the foundation model. Besides, we study the impact of introducing various Chinese rich semantic information in our framework. We found that by introducing a small number of specific Chinese rich semantic structures, LLMs achieve better performance than the BERT-based model on few-shot CSC task. Furthermore, we conduct experiments on multiple datasets, and the experimental results verified the superiority of our proposed framework. △ Less

Submitted 7 June, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.08484 [pdf, other]

Data-oriented Dynamic Fine-tuning Parameter Selection Strategy for FISH Mask based Efficient Fine-tuning

Authors: Ming Dong, Kang Xue, Bolong Zheng, Tingting He

Abstract: In view of the huge number of parameters of Large language models (LLMs) , tuning all parameters is very costly, and accordingly fine-tuning specific parameters is more sensible. Most of parameter efficient fine-tuning (PEFT) concentrate on parameter selection strategies, such as additive method, selective method and reparametrization-based method. However, there are few methods that consider the… ▽ More In view of the huge number of parameters of Large language models (LLMs) , tuning all parameters is very costly, and accordingly fine-tuning specific parameters is more sensible. Most of parameter efficient fine-tuning (PEFT) concentrate on parameter selection strategies, such as additive method, selective method and reparametrization-based method. However, there are few methods that consider the impact of data samples on parameter selecting, such as Fish Mask based method. Fish Mask randomly choose a part of data samples and treat them equally during parameter selection, which is unable to dynamically select optimal parameters for inconstant data distributions. In this work, we adopt a data-oriented perspective, then proposing an IRD ($\mathrm{\underline I}$terative sample-parameter $\mathrm{\underline R}$ange $\mathrm{\underline D}$ecreasing) algorithm to search the best setting of sample-parameter pair for FISH Mask. In each iteration, by searching the set of samples and parameters with larger Fish information, IRD can find better sample-parameter pair in most scale. We demonstrate the effectiveness and rationality of proposed strategy by conducting experiments on GLUE benchmark. Experimental results show our strategy optimizes the parameter selection and achieves preferable performance. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.01450 [pdf, other]

Collision-Free Robot Navigation in Crowded Environments using Learning based Convex Model Predictive Control

Authors: Zhuanglei Wen, Mingze Dong, Xiai Chen

Abstract: Navigating robots safely and efficiently in crowded and complex environments remains a significant challenge. However, due to the dynamic and intricate nature of these settings, planning efficient and collision-free paths for robots to track is particularly difficult. In this paper, we uniquely bridge the robot's perception, decision-making and control processes by utilizing the convex obstacle-fr… ▽ More Navigating robots safely and efficiently in crowded and complex environments remains a significant challenge. However, due to the dynamic and intricate nature of these settings, planning efficient and collision-free paths for robots to track is particularly difficult. In this paper, we uniquely bridge the robot's perception, decision-making and control processes by utilizing the convex obstacle-free region computed from 2D LiDAR data. The overall pipeline is threefold: (1) We proposes a robot navigation framework that utilizes deep reinforcement learning (DRL), conceptualizing the observation as the convex obstacle-free region, a departure from general reliance on raw sensor inputs. (2) We design the action space, derived from the intersection of the robot's kinematic limits and the convex region, to enable efficient sampling of inherently collision-free reference points. These actions assists in guiding the robot to move towards the goal and interact with other obstacles during navigation. (3) We employ model predictive control (MPC) to track the trajectory formed by the reference points while satisfying constraints imposed by the convex obstacle-free region and the robot's kinodynamic limits. The effectiveness of proposed improvements has been validated through two sets of ablation studies and a comparative experiment against the Timed Elastic Band (TEB), demonstrating improved navigation performance in crowded and complex environments. △ Less

Submitted 14 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

arXiv:2402.15721 [pdf, other]

Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models

Authors: Chaoya Jiang, Wei Ye, Mengfan Dong, Hongrui Jia, Haiyang Xu, Ming Yan, Ji Zhang, Shikun Zhang

Abstract: Large Vision Language Models exhibit remarkable capabilities but struggle with hallucinations inconsistencies between images and their descriptions. Previous hallucination evaluation studies on LVLMs have identified hallucinations in terms of objects, attributes, and relations but overlooked complex hallucinations that create an entire narrative around a fictional entity. In this paper, we introdu… ▽ More Large Vision Language Models exhibit remarkable capabilities but struggle with hallucinations inconsistencies between images and their descriptions. Previous hallucination evaluation studies on LVLMs have identified hallucinations in terms of objects, attributes, and relations but overlooked complex hallucinations that create an entire narrative around a fictional entity. In this paper, we introduce a refined taxonomy of hallucinations, featuring a new category: Event Hallucination. We then utilize advanced LLMs to generate and filter fine grained hallucinatory data consisting of various types of hallucinations, with a particular focus on event hallucinations, laying the groundwork for integrating discriminative and generative evaluation methods within our universal evaluation framework. The proposed benchmark distinctively assesses LVLMs ability to tackle a broad spectrum of hallucinations, making it a reliable and comprehensive tool for gauging LVLMs efficacy in handling hallucinations. We will release our code and data. △ Less

Submitted 24 February, 2024; originally announced February 2024.

arXiv:2402.14140 [pdf, other]

QuantTM: Business-Centric Threat Quantification for Risk Management and Cyber Resilience

Authors: Jan von der Assen, Muriel F. Franco, Muyao Dong, Burkhard Stiller

Abstract: Threat modeling has emerged as a key process for understanding relevant threats within businesses. However, understanding the importance of threat events is rarely driven by the business incorporating the system. Furthermore, prioritization of threat events often occurs based on abstract and qualitative scoring. While such scores enable prioritization, they do not allow the results to be easily in… ▽ More Threat modeling has emerged as a key process for understanding relevant threats within businesses. However, understanding the importance of threat events is rarely driven by the business incorporating the system. Furthermore, prioritization of threat events often occurs based on abstract and qualitative scoring. While such scores enable prioritization, they do not allow the results to be easily interpreted by decision-makers. This can hinder downstream activities, such as discussing security investments and a security control's economic applicability. This article introduces QuantTM, an approach that incorporates views from operational and strategic business representatives to collect threat information during the threat modeling process to measure potential financial loss incurred by a specific threat event. It empowers the analysis of threats' impacts and the applicability of security controls, thus supporting the threat analysis and prioritization from an economic perspective. QuantTM comprises an overarching process for data collection and aggregation and a method for business impact analysis. The performance and feasibility of the QuantTM approach are demonstrated in a real-world case study conducted in a Swiss SME to analyze the impacts of threats and economic benefits of security controls. Secondly, it is shown that employing business impact analysis is feasible and that the supporting prototype exhibits great usability. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2401.17585 [pdf, other]

Propagation and Pitfalls: Reasoning-based Assessment of Knowledge Editing through Counterfactual Tasks

Authors: Wenyue Hua, Jiang Guo, Mingwen Dong, Henghui Zhu, Patrick Ng, Zhiguo Wang

Abstract: Current approaches of knowledge editing struggle to effectively propagate updates to interconnected facts. In this work, we delve into the barriers that hinder the appropriate propagation of updated knowledge within these models for accurate reasoning. To support our analysis, we introduce a novel reasoning-based benchmark -- ReCoE (Reasoning-based Counterfactual Editing dataset) -- which covers s… ▽ More Current approaches of knowledge editing struggle to effectively propagate updates to interconnected facts. In this work, we delve into the barriers that hinder the appropriate propagation of updated knowledge within these models for accurate reasoning. To support our analysis, we introduce a novel reasoning-based benchmark -- ReCoE (Reasoning-based Counterfactual Editing dataset) -- which covers six common reasoning schemes in real world. We conduct a thorough analysis of existing knowledge editing techniques, including input augmentation, finetuning, and locate-and-edit. We found that all model editing methods show notably low performance on this dataset, especially in certain reasoning schemes. Our analysis over the chain-of-thought generation of edited models further uncover key reasons behind the inadequacy of existing knowledge editing methods from a reasoning standpoint, involving aspects on fact-wise editing, fact recall ability, and coherence in generation. We will make our benchmark publicly available. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: 22 pages, 14 figures, 5 tables

arXiv:2401.00594 [pdf, ps, other]

Efficient Design for Multi-user Downlink Beamforming with Reconfigurable Intelligent Surface

Authors: Mohammad Ebrahimi, Min Dong

Abstract: This paper considers downlink multi-user transmission facilitated by a reconfigurable intelligent surface (RIS). First, focusing on the multi-group multicast beamforming scenario, we develop a fast and scalable algorithm for the joint base station (BS) and RIS beamforming optimization to minimize the transmit power subject to the user quality-of-service (QoS) constraints. By exploring the structur… ▽ More This paper considers downlink multi-user transmission facilitated by a reconfigurable intelligent surface (RIS). First, focusing on the multi-group multicast beamforming scenario, we develop a fast and scalable algorithm for the joint base station (BS) and RIS beamforming optimization to minimize the transmit power subject to the user quality-of-service (QoS) constraints. By exploring the structure of this QoS problem, we show that the joint beamforming optimization can be naturally decomposed into a BS multicast beamforming QoS problem and an RIS passive multicast beamforming max-min-fair (MMF) problem. We propose an alternating multicast beamforming (AMBF) algorithm to solve the two subproblems alternatingly. For the BS QoS subproblem, we utilize the optimal multicast beamforming structure to obtain the BS beamformers efficiently. Furthermore, we reformulate the challenging RIS MMF subproblem and employ a first-order projected subgradient algorithm (PSA), which yields closed-form updates. The computational complexity of the AMBF algorithm grows linearly with the number of RIS elements and BS antennas. We further show that the AMBF approach is also an efficient method for the RIS-assisted downlink multi-user unicast beamforming problem, providing semi-closed-form updates. Next, we study the MMF problem for the RIS-assisted downlink beamforming design and propose a PSA-based fast algorithm to compute the BS and RIS beamforming solutions with closed-form updates per iteration, leading to a highly computationally efficient solution. Simulation results show the efficacy of our proposed algorithms in both performance and computational cost compared to other alternative methods. △ Less

Submitted 29 February, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

Comments: 13 pages, 10 figures

arXiv:2312.16902 [pdf, other]

Joint Learning for Scattered Point Cloud Understanding with Hierarchical Self-Distillation

Authors: Kaiyue Zhou, Ming Dong, Peiyuan Zhi, Shengjin Wang

Abstract: Numerous point-cloud understanding techniques focus on whole entities and have succeeded in obtaining satisfactory results and limited sparsity tolerance. However, these methods are generally sensitive to incomplete point clouds that are scanned with flaws or large gaps. To address this issue, in this paper, we propose an end-to-end architecture that compensates for and identifies partial point cl… ▽ More Numerous point-cloud understanding techniques focus on whole entities and have succeeded in obtaining satisfactory results and limited sparsity tolerance. However, these methods are generally sensitive to incomplete point clouds that are scanned with flaws or large gaps. To address this issue, in this paper, we propose an end-to-end architecture that compensates for and identifies partial point clouds on the fly. First, we propose a cascaded solution that integrates both the upstream and downstream networks simultaneously, allowing the task-oriented downstream to identify the points generated by the completion-oriented upstream. These two streams complement each other, resulting in improved performance for both completion and downstream-dependent tasks. Second, to explicitly understand the predicted points' pattern, we introduce hierarchical self-distillation (HSD), which can be applied to arbitrary hierarchy-based point cloud methods. HSD ensures that the deepest classifier with a larger perceptual field and longer code length provides additional regularization to intermediate ones rather than simply aggregating the multi-scale features, and therefore maximizing the mutual information between a teacher and students. We show the advantage of the self-distillation process in the hyperspaces based on the information bottleneck principle. On the classification task, our proposed method performs competitively on the synthetic dataset and achieves superior results on the challenging real-world benchmark when compared to the state-of-the-art models. Additional experiments also demonstrate the superior performance and generality of our framework on the part segmentation task. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: Currently under review. Previously submitted to AAAI and got frustrated. Decisions: 1x weak reject, 2x weak accept, and 1 accept

arXiv:2312.15186 [pdf, other]

Efficient Asynchronous Federated Learning with Sparsification and Quantization

Authors: Juncheng Jia, Ji Liu, Chendi Zhou, Hao Tian, Mianxiong Dong, Dejing Dou

Abstract: While data is distributed in multiple edge devices, Federated Learning (FL) is attracting more and more attention to collaboratively train a machine learning model without transferring raw data. FL generally exploits a parameter server and a large number of edge devices during the whole process of the model training, while several devices are selected in each round. However, straggler devices may… ▽ More While data is distributed in multiple edge devices, Federated Learning (FL) is attracting more and more attention to collaboratively train a machine learning model without transferring raw data. FL generally exploits a parameter server and a large number of edge devices during the whole process of the model training, while several devices are selected in each round. However, straggler devices may slow down the training process or even make the system crash during training. Meanwhile, other idle edge devices remain unused. As the bandwidth between the devices and the server is relatively low, the communication of intermediate data becomes a bottleneck. In this paper, we propose Time-Efficient Asynchronous federated learning with Sparsification and Quantization, i.e., TEASQ-Fed. TEASQ-Fed can fully exploit edge devices to asynchronously participate in the training process by actively applying for tasks. We utilize control parameters to choose an appropriate number of parallel edge devices, which simultaneously execute the training tasks. In addition, we introduce a caching mechanism and weighted averaging with respect to model staleness to further improve the accuracy. Furthermore, we propose a sparsification and quantitation approach to compress the intermediate data to accelerate the training. The experimental results reveal that TEASQ-Fed improves the accuracy (up to 16.67% higher) while accelerating the convergence of model training (up to twice faster). △ Less

Submitted 6 January, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

Comments: To appear in Concurrency and Computation: Practice and Experience (CCPE), 21 pages

arXiv:2312.13424 [pdf, ps, other]

Multi-Model Wireless Federated Learning with Downlink Beamforming

Authors: Chong Zhang, Min Dong, Ben Liang, Ali Afana, Yahia Ahmed

Abstract: This paper studies the design of wireless federated learning (FL) for simultaneously training multiple machine learning models. We consider round robin device-model assignment and downlink beamforming for concurrent multiple model updates. After formulating the joint downlink-uplink transmission process, we derive the per-model global update expression over communication rounds, capturing the effe… ▽ More This paper studies the design of wireless federated learning (FL) for simultaneously training multiple machine learning models. We consider round robin device-model assignment and downlink beamforming for concurrent multiple model updates. After formulating the joint downlink-uplink transmission process, we derive the per-model global update expression over communication rounds, capturing the effect of beamforming and noisy reception. To maximize the multi-model training convergence rate, we derive an upper bound on the optimality gap of the global model update and use it to formulate a multi-group multicast beamforming problem. We show that this problem can be converted to minimizing the sum of inverse received signal-to-interference-plus-noise ratios, which can be solved efficiently by projected gradient descent. Simulation shows that our proposed multi-model FL solution outperforms other alternatives, including conventional single-model sequential training and multi-model zero-forcing beamforming. △ Less

Submitted 14 January, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

Comments: 6 pages, 4 figures. Accepted by IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

arXiv:2312.12358 [pdf, other]

Localization and Discrete Beamforming with a Large Reconfigurable Intelligent Surface

Authors: Baojia Luo, Yili Deng, Miaomiao Dong, Zhongyi Huang, Xiang Chen, Wei Han, Bo Bai

Abstract: In millimeter-wave (mmWave) cellular systems, reconfigurable intelligent surfaces (RISs) are foreseeably deployed with a large number of reflecting elements to achieve high beamforming gains. The large-sized RIS will make radio links fall in the near-field localization regime with spatial non-stationarity issues. Moreover, the discrete phase restriction on the RIS reflection coefficient incurs exp… ▽ More In millimeter-wave (mmWave) cellular systems, reconfigurable intelligent surfaces (RISs) are foreseeably deployed with a large number of reflecting elements to achieve high beamforming gains. The large-sized RIS will make radio links fall in the near-field localization regime with spatial non-stationarity issues. Moreover, the discrete phase restriction on the RIS reflection coefficient incurs exponential complexity for discrete beamforming. It remains an open problem to find the optimal RIS reflection coefficient design in polynomial time. To address these issues, we propose a scalable partitioned-far-field protocol that considers both the near-filed non-stationarity and discrete beamforming. The protocol approximates near-field signal propagation using a partitioned-far-field representation to inherit the sparsity from the sophisticated far-field and facilitate the near-field localization scheme. To improve the theoretical localization performance, we propose a fast passive beamforming (FPB) algorithm that optimally solves the discrete RIS beamforming problem, reducing the search complexity from exponential order to linear order. Furthermore, by exploiting the partitioned structure of RIS, we introduce a two-stage coarse-to-fine localization algorithm that leverages both the time delay and angle information. Numerical results demonstrate that centimeter-level localization precision is achieved under medium and high signal-to-noise ratios (SNR), revealing that RISs can provide support for low-cost and high-precision localization in future cellular systems. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 13 pages

arXiv:2312.06968 [pdf, other]

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model

Authors: Chaoya Jiang, Haiyang Xu, Mengfan Dong, Jiaxing Chen, Wei Ye, Ming Yan, Qinghao Ye, Ji Zhang, Fei Huang, Shikun Zhang

Abstract: Multi-modal large language models (MLLMs) have been shown to efficiently integrate natural language with visual information to handle multi-modal tasks. However, MLLMs still face a fundamental limitation of hallucinations, where they tend to generate erroneous or fabricated information. In this paper, we address hallucinations in MLLMs from a novel perspective of representation learning. We first… ▽ More Multi-modal large language models (MLLMs) have been shown to efficiently integrate natural language with visual information to handle multi-modal tasks. However, MLLMs still face a fundamental limitation of hallucinations, where they tend to generate erroneous or fabricated information. In this paper, we address hallucinations in MLLMs from a novel perspective of representation learning. We first analyzed the representation distribution of textual and visual tokens in MLLM, revealing two important findings: 1) there is a significant gap between textual and visual representations, indicating unsatisfactory cross-modal representation alignment; 2) representations of texts that contain and do not contain hallucinations are entangled, making it challenging to distinguish them. These two observations inspire us with a simple yet effective method to mitigate hallucinations. Specifically, we introduce contrastive learning into MLLMs and use text with hallucination as hard negative examples, naturally bringing representations of non-hallucinative text and visual samples closer while pushing way representations of non-hallucinating and hallucinative text. We evaluate our method quantitatively and qualitatively, showing its effectiveness in reducing hallucination occurrences and improving performance across multiple benchmarks. On the MMhal-Bench benchmark, our method obtains a 34.66% /29.5% improvement over the baseline MiniGPT-4/LLaVA. Our code is available on https://github.com/X-PLUG/mPLUG-HalOwl/tree/main/hacl. △ Less

Submitted 23 February, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.05219 [pdf, other]

Enhancing Facial Classification and Recognition using 3D Facial Models and Deep Learning

Authors: Houting Li, Mengxuan Dong, Lok Ming Lui

Abstract: Accurate analysis and classification of facial attributes are essential in various applications, from human-computer interaction to security systems. In this work, a novel approach to enhance facial classification and recognition tasks through the integration of 3D facial models with deep learning methods was proposed. We extract the most useful information for various tasks using the 3D Facial Mo… ▽ More Accurate analysis and classification of facial attributes are essential in various applications, from human-computer interaction to security systems. In this work, a novel approach to enhance facial classification and recognition tasks through the integration of 3D facial models with deep learning methods was proposed. We extract the most useful information for various tasks using the 3D Facial Model, leading to improved classification accuracy. Combining 3D facial insights with ResNet architecture, our approach achieves notable results: 100% individual classification, 95.4% gender classification, and 83.5% expression classification accuracy. This method holds promise for advancing facial analysis and recognition research. △ Less

Submitted 8 December, 2023; originally announced December 2023.

Comments: arXiv admin note: text overlap with arXiv:1903.08527 by other authors

arXiv:2311.18251 [pdf, other]

Can Large Language Models Be Good Companions? An LLM-Based Eyewear System with Conversational Common Ground

Authors: Zhenyu Xu, Hailin Xu, Zhouyang Lu, Yingying Zhao, Rui Zhu, Yujiang Wang, Mingzhi Dong, Yuhu Chang, Qin Lv, Robert P. Dick, Fan Yang, Tun Lu, Ning Gu, Li Shang

Abstract: Developing chatbots as personal companions has long been a goal of artificial intelligence researchers. Recent advances in Large Language Models (LLMs) have delivered a practical solution for endowing chatbots with anthropomorphic language capabilities. However, it takes more than LLMs to enable chatbots that can act as companions. Humans use their understanding of individual personalities to driv… ▽ More Developing chatbots as personal companions has long been a goal of artificial intelligence researchers. Recent advances in Large Language Models (LLMs) have delivered a practical solution for endowing chatbots with anthropomorphic language capabilities. However, it takes more than LLMs to enable chatbots that can act as companions. Humans use their understanding of individual personalities to drive conversations. Chatbots also require this capability to enable human-like companionship. They should act based on personalized, real-time, and time-evolving knowledge of their owner. We define such essential knowledge as the \textit{common ground} between chatbots and their owners, and we propose to build a common-ground-aware dialogue system from an LLM-based module, named \textit{OS-1}, to enable chatbot companionship. Hosted by eyewear, OS-1 can sense the visual and audio signals the user receives and extract real-time contextual semantics. Those semantics are categorized and recorded to formulate historical contexts from which the user's profile is distilled and evolves over time, i.e., OS-1 gradually learns about its user. OS-1 combines knowledge from real-time semantics, historical contexts, and user-specific profiles to produce a common-ground-aware prompt input into the LLM module. The LLM's output is converted to audio, spoken to the wearer when appropriate.We conduct laboratory and in-field studies to assess OS-1's ability to build common ground between the chatbot and its user. The technical feasibility and capabilities of the system are also evaluated. OS-1, with its common-ground awareness, can significantly improve user satisfaction and potentially lead to downstream tasks such as personal emotional support and assistance. △ Less

Submitted 29 November, 2023; originally announced November 2023.

Comments: 36 pages, 25 figures, Under review at ACM IMWUT

arXiv:2310.18619 [pdf, other]

Dense Retrieval as Indirect Supervision for Large-space Decision Making

Authors: Nan Xu, Fei Wang, Mingtao Dong, Muhao Chen

Abstract: Many discriminative natural language understanding (NLU) tasks have large label spaces. Learning such a process of large-space decision making is particularly challenging due to the lack of training instances per label and the difficulty of selection among many fine-grained labels. Inspired by dense retrieval methods for passage finding in open-domain QA, we propose a reformulation of large-space… ▽ More Many discriminative natural language understanding (NLU) tasks have large label spaces. Learning such a process of large-space decision making is particularly challenging due to the lack of training instances per label and the difficulty of selection among many fine-grained labels. Inspired by dense retrieval methods for passage finding in open-domain QA, we propose a reformulation of large-space discriminative NLU tasks as a learning-to-retrieve task, leading to a novel solution named Dense Decision Retrieval (DDR ). Instead of predicting fine-grained decisions as logits, DDR adopts a dual-encoder architecture that learns to predict by retrieving from a decision thesaurus. This approach not only leverages rich indirect supervision signals from easy-to-consume learning resources for dense retrieval, it also leads to enhanced prediction generalizability with a semantically meaningful representation of the large decision space. When evaluated on tasks with decision spaces ranging from hundreds to hundred-thousand scales, DDR outperforms strong baselines greatly by 27.54% in P@1 on two extreme multi-label classification tasks, 1.17% in F1 score ultra-fine entity typing, and 1.26% in accuracy on three few-shot intent classification tasks on average. Code and resources are available at https://github.com/luka-group/DDR △ Less

Submitted 28 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 (Findings)

arXiv:2310.14784 [pdf, other]

An Efficient Imbalance-Aware Federated Learning Approach for Wearable Healthcare with Autoregressive Ratio Observation

Authors: Wenhao Yan, He Li, Kaoru Ota, Mianxiong Dong

Abstract: Widely available healthcare services are now getting popular because of advancements in wearable sensing techniques and mobile edge computing. People's health information is collected by edge devices such as smartphones and wearable bands for further analysis on servers, then send back suggestions and alerts for abnormal conditions. The recent emergence of federated learning allows users to train… ▽ More Widely available healthcare services are now getting popular because of advancements in wearable sensing techniques and mobile edge computing. People's health information is collected by edge devices such as smartphones and wearable bands for further analysis on servers, then send back suggestions and alerts for abnormal conditions. The recent emergence of federated learning allows users to train private data on local devices while updating models collaboratively. However, the heterogeneous distribution of the health condition data may lead to significant risks to model performance due to class imbalance. Meanwhile, as FL training is powered by sharing gradients only with the server, training data is almost inaccessible. The conventional solutions to class imbalance do not work for federated learning. In this work, we propose a new federated learning framework FedImT, dedicated to addressing the challenges of class imbalance in federated learning scenarios. FedImT contains an online scheme that can estimate the data composition during each round of aggregation, then introduces a self-attenuating iterative equivalent to track variations of multiple estimations and promptly tweak the balance of the loss computing for minority classes. Experiments demonstrate the effectiveness of FedImT in solving the imbalance problem without extra energy consumption and avoiding privacy risks. △ Less

Submitted 30 October, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

Comments: submitted to IEEE OJCS in Oct. 2023, under review

arXiv:2310.07143 [pdf, other]

Imitation Learning from Purified Demonstration

Authors: Yunke Wang, Minjing Dong, Bo Du, Chang Xu

Abstract: Imitation learning has emerged as a promising approach for addressing sequential decision-making problems, with the assumption that expert demonstrations are optimal. However, in real-world scenarios, expert demonstrations are often imperfect, leading to challenges in effectively applying imitation learning. While existing research has focused on optimizing with imperfect demonstrations, the train… ▽ More Imitation learning has emerged as a promising approach for addressing sequential decision-making problems, with the assumption that expert demonstrations are optimal. However, in real-world scenarios, expert demonstrations are often imperfect, leading to challenges in effectively applying imitation learning. While existing research has focused on optimizing with imperfect demonstrations, the training typically requires a certain proportion of optimal demonstrations to guarantee performance. To tackle these problems, we propose to purify the potential perturbations in imperfect demonstrations and subsequently conduct imitation learning from purified demonstrations. Motivated by the success of diffusion models, we introduce a two-step purification via the diffusion process. In the first step, we apply a forward diffusion process to effectively smooth out the potential perturbations in imperfect demonstrations by introducing additional noise. Subsequently, a reverse generative process is utilized to recover the optimal expert demonstrations from the diffused ones. We provide theoretical evidence supporting our approach, demonstrating that total variance distance between the purified and optimal demonstration distributions can be upper-bounded. The evaluation results on MuJoCo demonstrate the effectiveness of our method from different aspects. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2310.03142 [pdf, ps, other]

Design and Optimization of Heterogeneous Coded Distributed Computing with Nonuniform File Popularity

Authors: Yong Deng, Min Dong

Abstract: This paper studies MapReduce-based heterogeneous coded distributed computing (CDC) where, besides different computing capabilities at workers, input files to be accessed by computing jobs have nonuniform popularity. We propose a file placement strategy that can handle an arbitrary number of input files. Furthermore, we design a nested coded shuffling strategy that can efficiently manage the nonuni… ▽ More This paper studies MapReduce-based heterogeneous coded distributed computing (CDC) where, besides different computing capabilities at workers, input files to be accessed by computing jobs have nonuniform popularity. We propose a file placement strategy that can handle an arbitrary number of input files. Furthermore, we design a nested coded shuffling strategy that can efficiently manage the nonuniformity of file popularity to maximize the coded multicasting opportunity. We then formulate the joint optimization of the proposed file placement and nested shuffling design variables to optimize the proposed CDC scheme. To reduce the high computational complexity in solving the resulting mixed-integer linear programming (MILP) problem, we propose a simple two-file-group-based file placement approach to obtain an approximate solution. Numerical results show that the optimized CDC scheme outperforms other alternatives. Also, the proposed two-file-group-based approach achieves nearly the same performance as the conventional branch-and-cut method in solving the MILP problem but with substantially lower computational complexity that is scalable over the number of files and workers. For computing jobs with aggregate target functions that commonly appear in machine learning applications, we propose a heterogeneous compressed CDC (C-CDC) scheme to further improve the shuffling efficiency. The C-CDC scheme uses a local data aggregation technique to compress the data to be shuffled for the shuffling load reduction. We again optimize the proposed C-CDC scheme and explore the two-file-group-based low-complexity approach for an approximate solution. Numerical results show the proposed C-CDC scheme provides a considerable shuffling load reduction over the CDC scheme, and also, the two-file-group-based file placement approach maintains good performance. △ Less

Submitted 4 October, 2023; originally announced October 2023.

Comments: 15 pages, 7 figures, 3 tables

arXiv:2309.16207 [pdf, other]

Parameter-Saving Adversarial Training: Reinforcing Multi-Perturbation Robustness via Hypernetworks

Authors: Huihui Gong, Minjing Dong, Siqi Ma, Seyit Camtepe, Surya Nepal, Chang Xu

Abstract: Adversarial training serves as one of the most popular and effective methods to defend against adversarial perturbations. However, most defense mechanisms only consider a single type of perturbation while various attack methods might be adopted to perform stronger adversarial attacks against the deployed model in real-world scenarios, e.g., $\ell_2$ or $\ell_\infty$. Defending against various atta… ▽ More Adversarial training serves as one of the most popular and effective methods to defend against adversarial perturbations. However, most defense mechanisms only consider a single type of perturbation while various attack methods might be adopted to perform stronger adversarial attacks against the deployed model in real-world scenarios, e.g., $\ell_2$ or $\ell_\infty$. Defending against various attacks can be a challenging problem since multi-perturbation adversarial training and its variants only achieve suboptimal robustness trade-offs, due to the theoretical limit to multi-perturbation robustness for a single model. Besides, it is impractical to deploy large models in some storage-efficient scenarios. To settle down these drawbacks, in this paper we propose a novel multi-perturbation adversarial training framework, parameter-saving adversarial training (PSAT), to reinforce multi-perturbation robustness with an advantageous side effect of saving parameters, which leverages hypernetworks to train specialized models against a single perturbation and aggregate these specialized models to defend against multiple perturbations. Eventually, we extensively evaluate and compare our proposed method with state-of-the-art single/multi-perturbation robust methods against various latest attack methods on different datasets, showing the robustness superiority and parameter efficiency of our proposed method, e.g., for the CIFAR-10 dataset with ResNet-50 as the backbone, PSAT saves approximately 80\% of parameters with achieving the state-of-the-art robustness trade-off accuracy. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: 9 pages, 2 figures

arXiv:2309.11133 [pdf, other]

Shape Anchor Guided Holistic Indoor Scene Understanding

Authors: Mingyue Dong, Linxi Huan, Hanjiang Xiong, Shuhan Shen, Xianwei Zheng

Abstract: This paper proposes a shape anchor guided learning strategy (AncLearn) for robust holistic indoor scene understanding. We observe that the search space constructed by current methods for proposal feature grouping and instance point sampling often introduces massive noise to instance detection and mesh reconstruction. Accordingly, we develop AncLearn to generate anchors that dynamically fit instanc… ▽ More This paper proposes a shape anchor guided learning strategy (AncLearn) for robust holistic indoor scene understanding. We observe that the search space constructed by current methods for proposal feature grouping and instance point sampling often introduces massive noise to instance detection and mesh reconstruction. Accordingly, we develop AncLearn to generate anchors that dynamically fit instance surfaces to (i) unmix noise and target-related features for offering reliable proposals at the detection stage, and (ii) reduce outliers in object point sampling for directly providing well-structured geometry priors without segmentation during reconstruction. We embed AncLearn into a reconstruction-from-detection learning system (AncRec) to generate high-quality semantic scene models in a purely instance-oriented manner. Experiments conducted on the challenging ScanNetv2 dataset demonstrate that our shape anchor-based method consistently achieves state-of-the-art performance in terms of 3D object detection, layout estimation, and shape reconstruction. The code will be available at https://github.com/Geo-Tell/AncRec. △ Less

Submitted 20 September, 2023; originally announced September 2023.

arXiv:2309.09480 [pdf, other]

Stealthy Physical Masked Face Recognition Attack via Adversarial Style Optimization

Authors: Huihui Gong, Minjing Dong, Siqi Ma, Seyit Camtepe, Surya Nepal, Chang Xu

Abstract: Deep neural networks (DNNs) have achieved state-of-the-art performance on face recognition (FR) tasks in the last decade. In real scenarios, the deployment of DNNs requires taking various face accessories into consideration, like glasses, hats, and masks. In the COVID-19 pandemic era, wearing face masks is one of the most effective ways to defend against the novel coronavirus. However, DNNs are kn… ▽ More Deep neural networks (DNNs) have achieved state-of-the-art performance on face recognition (FR) tasks in the last decade. In real scenarios, the deployment of DNNs requires taking various face accessories into consideration, like glasses, hats, and masks. In the COVID-19 pandemic era, wearing face masks is one of the most effective ways to defend against the novel coronavirus. However, DNNs are known to be vulnerable to adversarial examples with a small but elaborated perturbation. Thus, a facial mask with adversarial perturbations may pose a great threat to the widely used deep learning-based FR models. In this paper, we consider a challenging adversarial setting: targeted attack against FR models. We propose a new stealthy physical masked FR attack via adversarial style optimization. Specifically, we train an adversarial style mask generator that hides adversarial perturbations inside style masks. Moreover, to ameliorate the phenomenon of sub-optimization with one fixed style, we propose to discover the optimal style given a target through style optimization in a continuous relaxation manner. We simultaneously optimize the generator and the style selection for generating strong and stealthy adversarial style masks. We evaluated the effectiveness and transferability of our proposed method via extensive white-box and black-box digital experiments. Furthermore, we also conducted physical attack experiments against local FR models and online platforms. △ Less

Submitted 18 September, 2023; originally announced September 2023.

Comments: 11 pages, 7 figures

arXiv:2309.07581 [pdf, ps, other]

A Survey of Graph Pre-processing Methods: From Algorithmic to Hardware Perspectives

Authors: Zhengyang Lv, Mingyu Yan, Xin Liu, Mengyao Dong, Xiaochun Ye, Dongrui Fan, Ninghui Sun

Abstract: Graph-related applications have experienced significant growth in academia and industry, driven by the powerful representation capabilities of graph. However, efficiently executing these applications faces various challenges, such as load imbalance, random memory access, etc. To address these challenges, researchers have proposed various acceleration systems, including software frameworks and hard… ▽ More Graph-related applications have experienced significant growth in academia and industry, driven by the powerful representation capabilities of graph. However, efficiently executing these applications faces various challenges, such as load imbalance, random memory access, etc. To address these challenges, researchers have proposed various acceleration systems, including software frameworks and hardware accelerators, all of which incorporate graph pre-processing (GPP). GPP serves as a preparatory step before the formal execution of applications, involving techniques such as sampling, reorder, etc. However, GPP execution often remains overlooked, as the primary focus is directed towards enhancing graph applications themselves. This oversight is concerning, especially considering the explosive growth of real-world graph data, where GPP becomes essential and even dominates system running overhead. Furthermore, GPP methods exhibit significant variations across devices and applications due to high customization. Unfortunately, no comprehensive work systematically summarizes GPP. To address this gap and foster a better understanding of GPP, we present a comprehensive survey dedicated to this area. We propose a double-level taxonomy of GPP, considering both algorithmic and hardware perspectives. Through listing relavent works, we illustrate our taxonomy and conduct a thorough analysis and summary of diverse GPP techniques. Lastly, we discuss challenges in GPP and potential future directions. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2308.11838 [pdf, other]

A Benchmark Study on Calibration

Authors: Linwei Tao, Younan Zhu, Haolan Guo, Minjing Dong, Chang Xu

Abstract: Deep neural networks are increasingly utilized in various machine learning tasks. However, as these models grow in complexity, they often face calibration issues, despite enhanced prediction accuracy. Many studies have endeavored to improve calibration performance through the use of specific loss functions, data preprocessing and training frameworks. Yet, investigations into calibration properties… ▽ More Deep neural networks are increasingly utilized in various machine learning tasks. However, as these models grow in complexity, they often face calibration issues, despite enhanced prediction accuracy. Many studies have endeavored to improve calibration performance through the use of specific loss functions, data preprocessing and training frameworks. Yet, investigations into calibration properties have been somewhat overlooked. Our study leverages the Neural Architecture Search (NAS) search space, offering an exhaustive model architecture space for thorough calibration properties exploration. We specifically create a model calibration dataset. This dataset evaluates 90 bin-based and 12 additional calibration measurements across 117,702 unique neural networks within the widely employed NATS-Bench search space. Our analysis aims to answer several longstanding questions in the field, using our proposed dataset: (i) Can model calibration be generalized across different datasets? (ii) Can robustness be used as a calibration measurement? (iii) How reliable are calibration metrics? (iv) Does a post-hoc calibration method affect all models uniformly? (v) How does calibration interact with accuracy? (vi) What is the impact of bin size on calibration measurement? (vii) Which architectural designs are beneficial for calibration? Additionally, our study bridges an existing gap by exploring calibration within NAS. By providing this dataset, we enable further research into NAS calibration. As far as we are aware, our research represents the first large-scale investigation into calibration properties and the premier study of calibration issues within NAS. The project page can be found at https://www.taolinwei.com/calibration-study △ Less

Submitted 22 March, 2024; v1 submitted 22 August, 2023; originally announced August 2023.

Comments: ICLR 2024 poster

arXiv:2307.13209 [pdf]

Gait Cycle-Inspired Learning Strategy for Continuous Prediction of Knee Joint Trajectory from sEMG

Authors: Xueming Fu, Hao Zheng, Luyan Liu, Wenjuan Zhong, Haowen Liu, Wenxuan Xiong, Yuyang Zhang, Yifeng Chen, Dong Wei, Mingjie Dong, Yefeng Zheng, Mingming Zhang

Abstract: Predicting lower limb motion intent is vital for controlling exoskeleton robots and prosthetic limbs. Surface electromyography (sEMG) attracts increasing attention in recent years as it enables ahead-of-time prediction of motion intentions before actual movement. However, the estimation performance of human joint trajectory remains a challenging problem due to the inter- and intra-subject variatio… ▽ More Predicting lower limb motion intent is vital for controlling exoskeleton robots and prosthetic limbs. Surface electromyography (sEMG) attracts increasing attention in recent years as it enables ahead-of-time prediction of motion intentions before actual movement. However, the estimation performance of human joint trajectory remains a challenging problem due to the inter- and intra-subject variations. The former is related to physiological differences (such as height and weight) and preferred walking patterns of individuals, while the latter is mainly caused by irregular and gait-irrelevant muscle activity. This paper proposes a model integrating two gait cycle-inspired learning strategies to mitigate the challenge for predicting human knee joint trajectory. The first strategy is to decouple knee joint angles into motion patterns and amplitudes former exhibit low variability while latter show high variability among individuals. By learning through separate network entities, the model manages to capture both the common and personalized gait features. In the second, muscle principal activation masks are extracted from gait cycles in a prolonged walk. These masks are used to filter out components unrelated to walking from raw sEMG and provide auxiliary guidance to capture more gait-related features. Experimental results indicate that our model could predict knee angles with the average root mean square error (RMSE) of 3.03(0.49) degrees and 50ms ahead of time. To our knowledge this is the best performance in relevant literatures that has been reported, with reduced RMSE by at least 9.5%. △ Less

Submitted 24 July, 2023; originally announced July 2023.

arXiv:2307.10653 [pdf, other]

Refining the Optimization Target for Automatic Univariate Time Series Anomaly Detection in Monitoring Services

Authors: Manqing Dong, Zhanxiang Zhao, Yitong Geng, Wentao Li, Wei Wang, Huai Jiang

Abstract: Time series anomaly detection is crucial for industrial monitoring services that handle a large volume of data, aiming to ensure reliability and optimize system performance. Existing methods often require extensive labeled resources and manual parameter selection, highlighting the need for automation. This paper proposes a comprehensive framework for automatic parameter optimization in time series… ▽ More Time series anomaly detection is crucial for industrial monitoring services that handle a large volume of data, aiming to ensure reliability and optimize system performance. Existing methods often require extensive labeled resources and manual parameter selection, highlighting the need for automation. This paper proposes a comprehensive framework for automatic parameter optimization in time series anomaly detection models. The framework introduces three optimization targets: prediction score, shape score, and sensitivity score, which can be easily adapted to different model backbones without prior knowledge or manual labeling efforts. The proposed framework has been successfully applied online for over six months, serving more than 50,000 time series every minute. It simplifies the user's experience by requiring only an expected sensitive value, offering a user-friendly interface, and achieving desired detection results. Extensive evaluations conducted on public datasets and comparison with other methods further confirm the effectiveness of the proposed framework. △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: Accepted by 2023 IJCAI Workshop

arXiv:2307.07919 [pdf, other]

Neural Architecture Retrieval

Authors: Xiaohuan Pei, Yanxi Li, Minjing Dong, Chang Xu

Abstract: With the increasing number of new neural architecture designs and substantial existing neural architectures, it becomes difficult for the researchers to situate their contributions compared with existing neural architectures or establish the connections between their designs and other relevant ones. To discover similar neural architectures in an efficient and automatic manner, we define a new prob… ▽ More With the increasing number of new neural architecture designs and substantial existing neural architectures, it becomes difficult for the researchers to situate their contributions compared with existing neural architectures or establish the connections between their designs and other relevant ones. To discover similar neural architectures in an efficient and automatic manner, we define a new problem Neural Architecture Retrieval which retrieves a set of existing neural architectures which have similar designs to the query neural architecture. Existing graph pre-training strategies cannot address the computational graph in neural architectures due to the graph size and motifs. To fulfill this potential, we propose to divide the graph into motifs which are used to rebuild the macro graph to tackle these issues, and introduce multi-level contrastive learning to achieve accurate graph representation learning. Extensive evaluations on both human-designed and synthesized neural architectures demonstrate the superiority of our algorithm. Such a dataset which contains 12k real-world network architectures, as well as their embedding, is built for neural architecture retrieval. △ Less

Submitted 17 March, 2024; v1 submitted 15 July, 2023; originally announced July 2023.

Comments: ICLR 2024

arXiv:2307.00315 [pdf, ps, other]

Joint Downlink-Uplink Beamforming for Wireless Multi-Antenna Federated Learning

Authors: Chong Zhang, Min Dong, Ben Liang, Ali Afana, Yahia Ahmed

Abstract: We study joint downlink-uplink beamforming design for wireless federated learning (FL) with a multi-antenna base station. Considering analog transmission over noisy channels and uplink over-the-air aggregation, we derive the global model update expression over communication rounds. We then obtain an upper bound on the expected global loss function, capturing the downlink and uplink beamforming and… ▽ More We study joint downlink-uplink beamforming design for wireless federated learning (FL) with a multi-antenna base station. Considering analog transmission over noisy channels and uplink over-the-air aggregation, we derive the global model update expression over communication rounds. We then obtain an upper bound on the expected global loss function, capturing the downlink and uplink beamforming and receiver noise effect. We propose a low-complexity joint beamforming algorithm to minimize this upper bound, which employs alternating optimization to breakdown the problem into three subproblems, each solved via closed-form gradient updates. Simulation under practical wireless system setup shows that our proposed joint beamforming design solution substantially outperforms the conventional separate-link design approach and nearly attains the performance of ideal FL with error-free communication links. △ Less

Submitted 1 July, 2023; originally announced July 2023.

Comments: 8 pages, 3 figures. Accepted by International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt), 2023

arXiv:2306.03730 [pdf, other]

Modality-Agnostic Learning for Medical Image Segmentation Using Multi-modality Self-distillation

Authors: Qisheng He, Nicholas Summerfield, Ming Dong, Carri Glide-Hurst

Abstract: Medical image segmentation of tumors and organs at risk is a time-consuming yet critical process in the clinic that utilizes multi-modality imaging (e.g, different acquisitions, data types, and sequences) to increase segmentation precision. In this paper, we propose a novel framework, Modality-Agnostic learning through Multi-modality Self-dist-illation (MAG-MS), to investigate the impact of input… ▽ More Medical image segmentation of tumors and organs at risk is a time-consuming yet critical process in the clinic that utilizes multi-modality imaging (e.g, different acquisitions, data types, and sequences) to increase segmentation precision. In this paper, we propose a novel framework, Modality-Agnostic learning through Multi-modality Self-dist-illation (MAG-MS), to investigate the impact of input modalities on medical image segmentation. MAG-MS distills knowledge from the fusion of multiple modalities and applies it to enhance representation learning for individual modalities. Thus, it provides a versatile and efficient approach to handle limited modalities during testing. Our extensive experiments on benchmark datasets demonstrate the high efficiency of MAG-MS and its superior segmentation performance than current state-of-the-art methods. Furthermore, using MAG-MS, we provide valuable insight and guidance on selecting input modalities for medical image segmentation tasks. △ Less

Submitted 6 June, 2023; originally announced June 2023.

arXiv:2306.03271 [pdf, other]

Dual self-distillation of U-shaped networks for 3D medical image segmentation

Authors: Soumyanil Banerjee, Ming Dong, Carri Glide-Hurst

Abstract: U-shaped networks and its variants have demonstrated exceptional results for medical image segmentation. In this paper, we propose a novel dual self-distillation (DSD) framework for U-shaped networks for 3D medical image segmentation. DSD distills knowledge from the ground-truth segmentation labels to the decoder layers and also between the encoder and decoder layers of a single U-shaped network.… ▽ More U-shaped networks and its variants have demonstrated exceptional results for medical image segmentation. In this paper, we propose a novel dual self-distillation (DSD) framework for U-shaped networks for 3D medical image segmentation. DSD distills knowledge from the ground-truth segmentation labels to the decoder layers and also between the encoder and decoder layers of a single U-shaped network. DSD is a generalized training strategy that could be attached to the backbone architecture of any U-shaped network to further improve its segmentation performance. We attached DSD on two state-of-the-art U-shaped backbones, and extensive experiments on two public 3D medical image segmentation datasets (cardiac substructure and brain tumor) demonstrated significant improvement over those backbones. On average, after attaching DSD to the U-shaped backbones, we observed an improvement of 4.25% and 3.15% in Dice similarity score for cardiac substructure and brain tumor segmentation respectively. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: 12 pages, 5 figures, 3 tables

arXiv:2305.16265 [pdf, other]

UNITE: A Unified Benchmark for Text-to-SQL Evaluation

Authors: Wuwei Lan, Zhiguo Wang, Anuj Chauhan, Henghui Zhu, Alexander Li, Jiang Guo, Sheng Zhang, Chung-Wei Hang, Joseph Lilien, Yiqun Hu, Lin Pan, Mingwen Dong, Jun Wang, Jiarong Jiang, Stephen Ash, Vittorio Castelli, Patrick Ng, Bing Xiang

Abstract: A practical text-to-SQL system should generalize well on a wide variety of natural language questions, unseen database schemas, and novel SQL query structures. To comprehensively evaluate text-to-SQL systems, we introduce a UNIfied benchmark for Text-to-SQL Evaluation (UNITE). It is composed of publicly available text-to-SQL datasets, containing natural language questions from more than 12 domains… ▽ More A practical text-to-SQL system should generalize well on a wide variety of natural language questions, unseen database schemas, and novel SQL query structures. To comprehensively evaluate text-to-SQL systems, we introduce a UNIfied benchmark for Text-to-SQL Evaluation (UNITE). It is composed of publicly available text-to-SQL datasets, containing natural language questions from more than 12 domains, SQL queries from more than 3.9K patterns, and 29K databases. Compared to the widely used Spider benchmark, we introduce $\sim$120K additional examples and a threefold increase in SQL patterns, such as comparative and boolean questions. We conduct a systematic study of six state-of-the-art (SOTA) text-to-SQL parsers on our new benchmark and show that: 1) Codex performs surprisingly well on out-of-domain datasets; 2) specially designed decoding methods (e.g. constrained beam search) can improve performance for both in-domain and out-of-domain settings; 3) explicitly modeling the relationship between questions and schemas further improves the Seq2Seq models. More importantly, our benchmark presents key challenges towards compositional generalization and robustness issues -- which these SOTA models cannot address well. Our code and data processing script are available at https://github.com/awslabs/unified-text2sql-benchmark △ Less

Submitted 14 July, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: 5 pages

arXiv:2305.14490 [pdf, other]

doi 10.1109/THMS.2023.3264247

Wital: A COTS WiFi Devices Based Vital Signs Monitoring System Using NLOS Sensing Model

Authors: Xiang Zhang, Yu Gu, Huan Yan, Yantong Wang, Mianxiong Dong, Kaoru Ota, Fuji Ren, Yusheng Ji

Abstract: Vital sign (breathing and heartbeat) monitoring is essential for patient care and sleep disease prevention. Most current solutions are based on wearable sensors or cameras; however, the former could affect sleep quality, while the latter often present privacy concerns. To address these shortcomings, we propose Wital, a contactless vital sign monitoring system based on low-cost and widespread comme… ▽ More Vital sign (breathing and heartbeat) monitoring is essential for patient care and sleep disease prevention. Most current solutions are based on wearable sensors or cameras; however, the former could affect sleep quality, while the latter often present privacy concerns. To address these shortcomings, we propose Wital, a contactless vital sign monitoring system based on low-cost and widespread commercial off-the-shelf (COTS) Wi-Fi devices. There are two challenges that need to be overcome. First, the torso deformations caused by breathing/heartbeats are weak. How can such deformations be effectively captured? Second, movements such as turning over affect the accuracy of vital sign monitoring. How can such detrimental effects be avoided? For the former, we propose a non-line-of-sight (NLOS) sensing model for modeling the relationship between the energy ratio of line-of-sight (LOS) to NLOS signals and the vital sign monitoring capability using Ricean K theory and use this model to guide the system construction to better capture the deformations caused by breathing/heartbeats. For the latter, we propose a motion segmentation method based on motion regularity detection that accurately distinguishes respiration from other motions, and we remove periods that include movements such as turning over to eliminate detrimental effects. We have implemented and validated Wital on low-cost COTS devices. The experimental results demonstrate the effectiveness of Wital in monitoring vital signs. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: Accepted by IEEE THMS

Journal ref: IEEE Transactions on Human-Machine Systems,2023

arXiv:2305.13665 [pdf, other]

Dual Focal Loss for Calibration

Authors: Linwei Tao, Minjing Dong, Chang Xu

Abstract: The use of deep neural networks in real-world applications require well-calibrated networks with confidence scores that accurately reflect the actual probability. However, it has been found that these networks often provide over-confident predictions, which leads to poor calibration. Recent efforts have sought to address this issue by focal loss to reduce over-confidence, but this approach can als… ▽ More The use of deep neural networks in real-world applications require well-calibrated networks with confidence scores that accurately reflect the actual probability. However, it has been found that these networks often provide over-confident predictions, which leads to poor calibration. Recent efforts have sought to address this issue by focal loss to reduce over-confidence, but this approach can also lead to under-confident predictions. While different variants of focal loss have been explored, it is difficult to find a balance between over-confidence and under-confidence. In our work, we propose a new loss function by focusing on dual logits. Our method not only considers the ground truth logit, but also take into account the highest logit ranked after the ground truth logit. By maximizing the gap between these two logits, our proposed dual focal loss can achieve a better balance between over-confidence and under-confidence. We provide theoretical evidence to support our approach and demonstrate its effectiveness through evaluations on multiple models and datasets, where it achieves state-of-the-art performance. Code is available at https://github.com/Linwei94/DualFocalLoss △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: ICML 2023 Accept

arXiv:2305.13591 [pdf, other]

A Single Multi-Task Deep Neural Network with a Multi-Scale Feature Aggregation Mechanism for Manipulation Relationship Reasoning in Robotic Grasping

Authors: Mingshuai Dong, Yuxuan Bai, Shimin Wei, Xiuli Yu

Abstract: Grasping specific objects in complex and irregularly stacked scenes is still challenging for robotics. Because the robot is not only required to identify the object's grasping posture but also needs to reason the manipulation relationship between the objects. In this paper, we propose a manipulation relationship reasoning network with a multi-scale feature aggregation (MSFA) mechanism for robot gr… ▽ More Grasping specific objects in complex and irregularly stacked scenes is still challenging for robotics. Because the robot is not only required to identify the object's grasping posture but also needs to reason the manipulation relationship between the objects. In this paper, we propose a manipulation relationship reasoning network with a multi-scale feature aggregation (MSFA) mechanism for robot grasping tasks. MSFA aggregates high-level semantic information and low-level spatial information in a cross-scale connection way to improve the generalization ability of the model. Furthermore, to improve the accuracy, we propose to use intersection features with rich location priors for manipulation relationship reasoning. Experiments are validated in VMRD datasets and real environments, respectively. The experimental results demonstrate that our proposed method can accurately predict the manipulation relationship between objects in the scene of multi-object stacking. Compared with previous methods, it significantly improves reasoning speed and accuracy. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2305.09381 [pdf, other]

AMD: Autoregressive Motion Diffusion

Authors: Bo Han, Hao Peng, Minjing Dong, Yi Ren, Yixuan Shen, Chang Xu

Abstract: Human motion generation aims to produce plausible human motion sequences according to various conditional inputs, such as text or audio. Despite the feasibility of existing methods in generating motion based on short prompts and simple motion patterns, they encounter difficulties when dealing with long prompts or complex motions. The challenges are two-fold: 1) the scarcity of human motion-capture… ▽ More Human motion generation aims to produce plausible human motion sequences according to various conditional inputs, such as text or audio. Despite the feasibility of existing methods in generating motion based on short prompts and simple motion patterns, they encounter difficulties when dealing with long prompts or complex motions. The challenges are two-fold: 1) the scarcity of human motion-captured data for long prompts and complex motions. 2) the high diversity of human motions in the temporal domain and the substantial divergence of distributions from conditional modalities, leading to a many-to-many mapping problem when generating motion with complex and long texts. In this work, we address these gaps by 1) elaborating the first dataset pairing long textual descriptions and 3D complex motions (HumanLong3D), and 2) proposing an autoregressive motion diffusion model (AMD). Specifically, AMD integrates the text prompt at the current timestep with the text prompt and action sequences at the previous timestep as conditional information to predict the current action sequences in an iterative manner. Furthermore, we present its generalization for X-to-Motion with "No Modality Left Behind", enabling the generation of high-definition and high-fidelity human motions based on user-defined modality input. △ Less

Submitted 26 December, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

Comments: accepted by AAAI2024

arXiv:2305.08348 [pdf, other]

Coreference-aware Double-channel Attention Network for Multi-party Dialogue Reading Comprehension

Authors: Yanling Li, Bowei Zou, Yifan Fan, Mengxing Dong, Yu Hong

Abstract: We tackle Multi-party Dialogue Reading Comprehension (abbr., MDRC). MDRC stands for an extractive reading comprehension task grounded on a batch of dialogues among multiple interlocutors. It is challenging due to the requirement of understanding cross-utterance contexts and relationships in a multi-turn multi-party conversation. Previous studies have made great efforts on the utterance profiling o… ▽ More We tackle Multi-party Dialogue Reading Comprehension (abbr., MDRC). MDRC stands for an extractive reading comprehension task grounded on a batch of dialogues among multiple interlocutors. It is challenging due to the requirement of understanding cross-utterance contexts and relationships in a multi-turn multi-party conversation. Previous studies have made great efforts on the utterance profiling of a single interlocutor and graph-based interaction modeling. The corresponding solutions contribute to the answer-oriented reasoning on a series of well-organized and thread-aware conversational contexts. However, the current MDRC models still suffer from two bottlenecks. On the one hand, a pronoun like "it" most probably produces multi-skip reasoning throughout the utterances of different interlocutors. On the other hand, an MDRC encoder is potentially puzzled by fuzzy features, i.e., the mixture of inner linguistic features in utterances and external interactive features among utterances. To overcome the bottlenecks, we propose a coreference-aware attention modeling method to strengthen the reasoning ability. In addition, we construct a two-channel encoding network. It separately encodes utterance profiles and interactive relationships, so as to relieve the confusion among heterogeneous features. We experiment on the benchmark corpora Molweni and FriendsQA. Experimental results demonstrate that our approach yields substantial improvements on both corpora, compared to the fine-tuned BERT and ELECTRA baselines. The maximum performance gain is about 2.5\% F1-score. Besides, our MDRC models outperform the state-of-the-art in most cases. △ Less

Submitted 22 May, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

Comments: IJCNN2023

arXiv:2305.03711 [pdf, other]

Medical records condensation: a roadmap towards healthcare data democratisation

Authors: Yujiang Wang, Anshul Thakur, Mingzhi Dong, Pingchuan Ma, Stavros Petridis, Li Shang, Tingting Zhu, David A. Clifton

Abstract: The prevalence of artificial intelligence (AI) has envisioned an era of healthcare democratisation that promises every stakeholder a new and better way of life. However, the advancement of clinical AI research is significantly hurdled by the dearth of data democratisation in healthcare. To truly democratise data for AI studies, challenges are two-fold: 1. the sensitive information in clinical data… ▽ More The prevalence of artificial intelligence (AI) has envisioned an era of healthcare democratisation that promises every stakeholder a new and better way of life. However, the advancement of clinical AI research is significantly hurdled by the dearth of data democratisation in healthcare. To truly democratise data for AI studies, challenges are two-fold: 1. the sensitive information in clinical data should be anonymised appropriately, and 2. AI-oriented clinical knowledge should flow freely across organisations. This paper considers a recent deep-learning advent, dataset condensation (DC), as a stone that kills two birds in democratising healthcare data. The condensed data after DC, which can be viewed as statistical metadata, abstracts original clinical records and irreversibly conceals sensitive information at individual levels; nevertheless, it still preserves adequate knowledge for learning deep neural networks (DNNs). More favourably, the compressed volumes and the accelerated model learnings of condensed data portray a more efficient clinical knowledge sharing and flowing system, as necessitated by data democratisation. We underline DC's prospects for democratising clinical data, specifically electrical healthcare records (EHRs), for AI research through experimental results and analysis across three healthcare datasets of varying data types. △ Less

Submitted 8 January, 2024; v1 submitted 5 May, 2023; originally announced May 2023.

arXiv:2305.01670 [pdf]

Fears about AI-mediated communication are grounded in different expectations for one's own versus others' use

Authors: Zoe A. Purcell, Mengchen Dong, Anne-Marie Nussberger, Nils Köbis, Maurice Jakesch

Abstract: The rapid development of AI-mediated communication technologies (AICTs), which are digital tools that use AI to augment interpersonal messages, has raised concerns about the future of interpersonal trust and prompted discussions about disclosure and uptake. This paper contributes to this discussion by assessing perceptions about the acceptability and use of open and secret AICTs for oneself and ot… ▽ More The rapid development of AI-mediated communication technologies (AICTs), which are digital tools that use AI to augment interpersonal messages, has raised concerns about the future of interpersonal trust and prompted discussions about disclosure and uptake. This paper contributes to this discussion by assessing perceptions about the acceptability and use of open and secret AICTs for oneself and others. In two studies with representative samples (UK: N=477, US: N=765), we found that secret AICT use is deemed less acceptable than open AICT use, people tend to overestimate others' AICT use, and people expect others to use AICTs irresponsibly. Thus, we raise concerns about the potential for misperceptions and different expectations for others to drive self-fulfilling pessimistic outlooks about AI-mediated communication. △ Less

Submitted 2 May, 2023; originally announced May 2023.

arXiv:2304.04673 [pdf]

Regional Deep Atrophy: a Self-Supervised Learning Method to Automatically Identify Regions Associated With Alzheimer's Disease Progression From Longitudinal MRI

Authors: Mengjin Dong, Long Xie, Sandhitsu R. Das, Jiancong Wang, Laura E. M. Wisse, Robin deFlores, David A. Wolk, Paul A. Yushkevich

Abstract: Longitudinal assessment of brain atrophy, particularly in the hippocampus, is a well-studied biomarker for neurodegenerative diseases, such as Alzheimer's disease (AD). In clinical trials, estimation of brain progressive rates can be applied to track therapeutic efficacy of disease modifying treatments. However, most state-of-the-art measurements calculate changes directly by segmentation and/or d… ▽ More Longitudinal assessment of brain atrophy, particularly in the hippocampus, is a well-studied biomarker for neurodegenerative diseases, such as Alzheimer's disease (AD). In clinical trials, estimation of brain progressive rates can be applied to track therapeutic efficacy of disease modifying treatments. However, most state-of-the-art measurements calculate changes directly by segmentation and/or deformable registration of MRI images, and may misreport head motion or MRI artifacts as neurodegeneration, impacting their accuracy. In our previous study, we developed a deep learning method DeepAtrophy that uses a convolutional neural network to quantify differences between longitudinal MRI scan pairs that are associated with time. DeepAtrophy has high accuracy in inferring temporal information from longitudinal MRI scans, such as temporal order or relative inter-scan interval. DeepAtrophy also provides an overall atrophy score that was shown to perform well as a potential biomarker of disease progression and treatment efficacy. However, DeepAtrophy is not interpretable, and it is unclear what changes in the MRI contribute to progression measurements. In this paper, we propose Regional Deep Atrophy (RDA), which combines the temporal inference approach from DeepAtrophy with a deformable registration neural network and attention mechanism that highlights regions in the MRI image where longitudinal changes are contributing to temporal inference. RDA has similar prediction accuracy as DeepAtrophy, but its additional interpretability makes it more acceptable for use in clinical settings, and may lead to more sensitive biomarkers for disease monitoring in clinical trials of early AD. △ Less

Submitted 10 April, 2023; originally announced April 2023.

Comments: Submitted to NeuroImage for review

arXiv:2304.04264 [pdf]

doi 10.3390/s23146609

RGB-T Tracking Based on Mixed Attention

Authors: Yang Luo, Xiqing Guo, Mingtao Dong, Jin Yu

Abstract: RGB-T tracking involves the use of images from both visible and thermal modalities. The primary objective is to adaptively leverage the relatively dominant modality in varying conditions to achieve more robust tracking compared to single-modality tracking. An RGB-T tracker based on mixed attention mechanism to achieve complementary fusion of modalities (referred to as MACFT) is proposed in this pa… ▽ More RGB-T tracking involves the use of images from both visible and thermal modalities. The primary objective is to adaptively leverage the relatively dominant modality in varying conditions to achieve more robust tracking compared to single-modality tracking. An RGB-T tracker based on mixed attention mechanism to achieve complementary fusion of modalities (referred to as MACFT) is proposed in this paper. In the feature extraction stage, we utilize different transformer backbone branches to extract specific and shared information from different modalities. By performing mixed attention operations in the backbone to enable information interaction and self-enhancement between the template and search images, it constructs a robust feature representation that better understands the high-level semantic features of the target. Then, in the feature fusion stage, a modality-adaptive fusion is achieved through a mixed attention-based modality fusion network, which suppresses the low-quality modality noise while enhancing the information of the dominant modality. Evaluation on multiple RGB-T public datasets demonstrates that our proposed tracker outperforms other RGB-T trackers on general evaluation metrics while also being able to adapt to longterm tracking scenarios. △ Less

Submitted 17 April, 2023; v1 submitted 9 April, 2023; originally announced April 2023.

Comments: 14 pages, 10 figures

Journal ref: Sensors 23, no. 14: 6609 (2023)

arXiv:2303.03182 [pdf, ps, other]

doi 10.1109/TNET.2023.3284347

Decentralized Caching under Nonuniform File Popularity and Size: Memory-Rate Tradeoff Characterization

Authors: Yong Deng, Min Dong

Abstract: This paper aims to characterize the memory-rate tradeoff for decentralized caching under nonuniform file popularity and size. We consider a recently proposed decentralized modified coded caching scheme (D-MCCS) and formulate the cache placement optimization problem to minimize the average rate for the D-MCCS. To solve this challenging non-convex optimization problem, we first propose a successive… ▽ More This paper aims to characterize the memory-rate tradeoff for decentralized caching under nonuniform file popularity and size. We consider a recently proposed decentralized modified coded caching scheme (D-MCCS) and formulate the cache placement optimization problem to minimize the average rate for the D-MCCS. To solve this challenging non-convex optimization problem, we first propose a successive Geometric Programming (GP) approximation algorithm, which guarantees convergence to a stationary point but has high computational complexity. Next, we develop a low-complexity file-group-based approach, where we propose a popularity-first and size-aware (PF-SA) cache placement strategy to partition files into two groups, taking into account the nonuniformity in file popularity and size. Both algorithms do not require the knowledge of active users beforehand for cache placement. Numerical results show that they perform very closely to each other. We further develop a lower bound for decentralized caching under nonuniform file popularity and size as a non-convex optimization problem and solved it using a similar successive GP approximation algorithm. We show that the D-MCCS with the optimized cache placement attains this lower bound when no more than two active users request files at a time. The same is true for files with uniform size but nonuniform popularity and the optimal cache placement being symmetric among files. In these cases, the optimized DMCCS characterizes the exact memory-rate tradeoff for decentralized caching. For general cases, our numerical results show that the average rate achieved by the optimized D-MCCS is very close to the lower bound. △ Less

Submitted 26 June, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

Comments: 16 pages, 7 figures, 7 tables. Accepted to IEEE/ACM Transactions on Networking

arXiv:2302.14336 [pdf, other]

doi 10.1109/OJCOMS.2024.3372893

Beamforming and Device Selection Design in Federated Learning with Over-the-air Aggregation

Authors: Faeze Moradi Kalarde, Min Dong, Ben Liang, Yahia A. Eldemerdash Ahmed, Ho Ting Cheng

Abstract: Federated learning (FL) with over-the-air computation can efficiently utilize the communication bandwidth but is susceptible to analog aggregation error. Excluding those devices with weak channel conditions can reduce the aggregation error, but it also limits the amount of local training data for FL, which can reduce the training convergence rate. In this work, we jointly design uplink receiver be… ▽ More Federated learning (FL) with over-the-air computation can efficiently utilize the communication bandwidth but is susceptible to analog aggregation error. Excluding those devices with weak channel conditions can reduce the aggregation error, but it also limits the amount of local training data for FL, which can reduce the training convergence rate. In this work, we jointly design uplink receiver beamforming and device selection for over-the-air FL over time-varying wireless channels to maximize the training convergence rate. We reformulate this stochastic optimization problem into a mixed-integer program using an upper bound on the global training loss over communication rounds. We then propose a Greedy Spatial Device Selection (GSDS) approach, which uses a sequential procedure to select devices based on a measure capturing both the channel strength and the channel correlation to the selected devices. We show that given the selected devices, the receiver beamforming optimization problem is equivalent to downlink single-group multicast beamforming. To reduce the computational complexity, we also propose an Alternating-optimization-based Device Selection and Beamforming (ADSBF) approach, which solves the receiver beamforming and device selection subproblems alternatingly. In particular, despite the device selection being an integer problem, we are able to develop an efficient algorithm to find its optimal solution. Simulation results with real-world image classification demonstrate that our proposed methods achieve faster convergence with significantly lower computational complexity than existing alternatives. Furthermore, although ADSBF shows marginally inferior performance to GSDS, it offers the advantage of lower computational complexity when the number of devices is large. △ Less

Submitted 6 March, 2024; v1 submitted 28 February, 2023; originally announced February 2023.

Comments: 12 pages, 8 figures

arXiv:2302.08934 [pdf, other]

Active RIS Aided ISAC Systems: Beamforming Design and Performance Analysis

Authors: Zhiyuan Yu, Hong Ren, Cunhua Pan, Gui Zhou, Boshi Wang, Mianxiong Dong, Jiangzhou Wang

Abstract: This paper considers an active reconfigurable intelligent surface (RIS)-aided integrated sensing and communication (ISAC) system. We aim to maximize radar signal-to-interference-plus-noise-ratio (SINR) by jointly optimizing the beamforming matrix at the dual-function radar-communication (DFRC) base station (BS) and the reflecting coefficients at the active RIS subject to the quality of service (Qo… ▽ More This paper considers an active reconfigurable intelligent surface (RIS)-aided integrated sensing and communication (ISAC) system. We aim to maximize radar signal-to-interference-plus-noise-ratio (SINR) by jointly optimizing the beamforming matrix at the dual-function radar-communication (DFRC) base station (BS) and the reflecting coefficients at the active RIS subject to the quality of service (QoS) constraints of communication users (UE) and the transmit power constraints of active RIS and DFRC BS. To tackle the optimization problem, the majorization-minimization (MM) algorithm is applied to address the nonconvex radar SINR objective function, and the resulting quartic problem is solved by developing an semidefinite relaxation (SDR)-based approach. Moreover, we derive the scaling order of the radar SINR with a large number of reflecting elements. Next, the transmit power allocation problem and the deployment strategy of the active RIS are studied with a moderate number of reflecting elements. Finally, we validate the potential of the active RIS in ISAC systems compared to passive RIS. Additionally, we deliberate on several open problems that remain for future research. △ Less

Submitted 3 February, 2024; v1 submitted 17 February, 2023; originally announced February 2023.

Comments: 17 pages,11 figures, accepted by IEEE TCOM.The manuscript has been revised to correct several typographical errors

arXiv:2302.06245 [pdf, other]

Calibrating a Deep Neural Network with Its Predecessors

Authors: Linwei Tao, Minjing Dong, Daochang Liu, Changming Sun, Chang Xu

Abstract: Confidence calibration - the process to calibrate the output probability distribution of neural networks - is essential for safety-critical applications of such networks. Recent works verify the link between mis-calibration and overfitting. However, early stopping, as a well-known technique to mitigate overfitting, fails to calibrate networks. In this work, we study the limitions of early stopping… ▽ More Confidence calibration - the process to calibrate the output probability distribution of neural networks - is essential for safety-critical applications of such networks. Recent works verify the link between mis-calibration and overfitting. However, early stopping, as a well-known technique to mitigate overfitting, fails to calibrate networks. In this work, we study the limitions of early stopping and comprehensively analyze the overfitting problem of a network considering each individual block. We then propose a novel regularization method, predecessor combination search (PCS), to improve calibration by searching a combination of best-fitting block predecessors, where block predecessors are the corresponding network blocks with weight parameters from earlier training stages. PCS achieves the state-of-the-art calibration performance on multiple datasets and architectures. In addition, PCS improves model robustness under dataset distribution shift. △ Less

Submitted 23 May, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: IJCAI 2023 Accept

Showing 1–50 of 173 results for author: Dong, M