subscribe to arXiv mailings

DiffCom: Channel Received Signal is a Natural Condition to Guide Diffusion Posterior Sampling

Authors: Sixian Wang, Jincheng Dai, Kailin Tan, Xiaoqi Qin, Kai Niu, Ping Zhang

Abstract: End-to-end visual communication systems typically optimize a trade-off between channel bandwidth costs and signal-level distortion metrics. However, under challenging physical conditions, this traditional discriminative communication paradigm often results in unrealistic reconstructions with perceptible blurring and aliasing artifacts, despite the inclusion of perceptual or adversarial losses for… ▽ More End-to-end visual communication systems typically optimize a trade-off between channel bandwidth costs and signal-level distortion metrics. However, under challenging physical conditions, this traditional discriminative communication paradigm often results in unrealistic reconstructions with perceptible blurring and aliasing artifacts, despite the inclusion of perceptual or adversarial losses for optimizing. This issue primarily stems from the receiver's limited knowledge about the underlying data manifold and the use of deterministic decoding mechanisms. To address these limitations, this paper introduces DiffCom, a novel end-to-end generative communication paradigm that utilizes off-the-shelf generative priors and probabilistic diffusion models for decoding, thereby improving perceptual quality without heavily relying on bandwidth costs and received signal quality. Unlike traditional systems that rely on deterministic decoders optimized solely for distortion metrics, our DiffCom leverages raw channel-received signal as a fine-grained condition to guide stochastic posterior sampling. Our approach ensures that reconstructions remain on the manifold of real data with a novel confirming constraint, enhancing the robustness and reliability of the generated outcomes. Furthermore, DiffCom incorporates a blind posterior sampling technique to address scenarios with unknown forward transmission characteristics. Extensive experimental validations demonstrate that DiffCom not only produces realistic reconstructions with details faithful to the original data but also achieves superior robustness against diverse wireless transmission degradations. Collectively, these advancements establish DiffCom as a new benchmark in designing generative communication systems that offer enhanced robustness and generalization superiorities. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.06446 [pdf, other]

Deep Generative Modeling Reshapes Compression and Transmission: From Efficiency to Resiliency

Authors: Jincheng Dai, Xiaoqi Qin, Sixian Wang, Lexi Xu, Kai Niu, Ping Zhang

Abstract: Information theory and machine learning are inextricably linked and have even been referred to as "two sides of the same coin". One particularly elegant connection is the essential equivalence between probabilistic generative modeling and data compression or transmission. In this article, we reveal the dual-functionality of deep generative models that reshapes both data compression for efficiency… ▽ More Information theory and machine learning are inextricably linked and have even been referred to as "two sides of the same coin". One particularly elegant connection is the essential equivalence between probabilistic generative modeling and data compression or transmission. In this article, we reveal the dual-functionality of deep generative models that reshapes both data compression for efficiency and transmission error concealment for resiliency. We present how the contextual predictive capabilities of powerful generative models can be well positioned to be strong compressors and estimators. In this sense, we advocate for viewing the deep generative modeling problem through the lens of end-to-end communications, and evaluate the compression and error restoration capabilities of foundation generative models. We show that the kernel of many large generative models is powerful predictor that can capture complex relationships among semantic latent variables, and the communication viewpoints provide novel insights into semantic feature tokenization, contextual learning, and usage of deep generative models. In summary, our article highlights the essential connections of generative AI to source and channel coding techniques, and motivates researchers to make further explorations in this emerging topic. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: Publication in IEEE Wireless Communications

arXiv:2406.06045 [pdf, other]

Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training

Authors: Ke Niu, Haiyang Yu, Xuelin Qian, Teng Fu, Bin Li, Xiangyang Xue

Abstract: Existing person re-identification (Re-ID) methods principally deploy the ImageNet-1K dataset for model initialization, which inevitably results in sub-optimal situations due to the large domain gap. One of the key challenges is that building large-scale person Re-ID datasets is time-consuming. Some previous efforts address this problem by collecting person images from the internet e.g., LUPerson,… ▽ More Existing person re-identification (Re-ID) methods principally deploy the ImageNet-1K dataset for model initialization, which inevitably results in sub-optimal situations due to the large domain gap. One of the key challenges is that building large-scale person Re-ID datasets is time-consuming. Some previous efforts address this problem by collecting person images from the internet e.g., LUPerson, but it struggles to learn from unlabeled, uncontrollable, and noisy data. In this paper, we present a novel paradigm Diffusion-ReID to efficiently augment and generate diverse images based on known identities without requiring any cost of data collection and annotation. Technically, this paradigm unfolds in two stages: generation and filtering. During the generation stage, we propose Language Prompts Enhancement (LPE) to ensure the ID consistency between the input image sequence and the generated images. In the diffusion process, we propose a Diversity Injection (DI) module to increase attribute diversity. In order to make the generated data have higher quality, we apply a Re-ID confidence threshold filter to further remove the low-quality images. Benefiting from our proposed paradigm, we first create a new large-scale person Re-ID dataset Diff-Person, which consists of over 777K images from 5,183 identities. Next, we build a stronger person Re-ID backbone pre-trained on our Diff-Person. Extensive experiments are conducted on four person Re-ID benchmarks in six widely used settings. Compared with other pre-training and self-supervised competitors, our approach shows significant superiority. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.02962 [pdf, other]

Docs2KG: Unified Knowledge Graph Construction from Heterogeneous Documents Assisted by Large Language Models

Authors: Qiang Sun, Yuanyi Luo, Wenxiao Zhang, Sirui Li, Jichunyang Li, Kai Niu, Xiangrui Kong, Wei Liu

Abstract: Even for a conservative estimate, 80% of enterprise data reside in unstructured files, stored in data lakes that accommodate heterogeneous formats. Classical search engines can no longer meet information seeking needs, especially when the task is to browse and explore for insight formulation. In other words, there are no obvious search keywords to use. Knowledge graphs, due to their natural visual… ▽ More Even for a conservative estimate, 80% of enterprise data reside in unstructured files, stored in data lakes that accommodate heterogeneous formats. Classical search engines can no longer meet information seeking needs, especially when the task is to browse and explore for insight formulation. In other words, there are no obvious search keywords to use. Knowledge graphs, due to their natural visual appeals that reduce the human cognitive load, become the winning candidate for heterogeneous data integration and knowledge representation. In this paper, we introduce Docs2KG, a novel framework designed to extract multimodal information from diverse and heterogeneous unstructured documents, including emails, web pages, PDF files, and Excel files. Dynamically generates a unified knowledge graph that represents the extracted key information, Docs2KG enables efficient querying and exploration of document data lakes. Unlike existing approaches that focus on domain-specific data sources or pre-designed schemas, Docs2KG offers a flexible and extensible solution that can adapt to various document structures and content types. The proposed framework unifies data processing supporting a multitude of downstream tasks with improved domain interpretability. Docs2KG is publicly accessible at https://docs2kg.ai4wa.com, and a demonstration video is available at https://docs2kg.ai4wa.com/Video. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.19542 [pdf, other]

Anatomical Region Recognition and Real-time Bone Tracking Methods by Dynamically Decoding A-Mode Ultrasound Signals

Authors: Bangyu Lan, Stefano Stramigioli, Kenan Niu

Abstract: Accurate bone tracking is crucial for kinematic analysis in orthopedic surgery and prosthetic robotics. Traditional methods (e.g., skin markers) are subject to soft tissue artifacts, and the bone pins used in surgery introduce the risk of additional trauma and infection. For electromyography (EMG), its inability to directly measure joint angles requires complex algorithms for kinematic estimation.… ▽ More Accurate bone tracking is crucial for kinematic analysis in orthopedic surgery and prosthetic robotics. Traditional methods (e.g., skin markers) are subject to soft tissue artifacts, and the bone pins used in surgery introduce the risk of additional trauma and infection. For electromyography (EMG), its inability to directly measure joint angles requires complex algorithms for kinematic estimation. To address these issues, A-mode ultrasound-based tracking has been proposed as a non-invasive and safe alternative. However, this approach suffers from limited accuracy in peak detection when processing received ultrasound signals. To build a precise and real-time bone tracking approach, this paper introduces a deep learning-based method for anatomical region recognition and bone tracking using A-mode ultrasound signals, specifically focused on the knee joint. The algorithm is capable of simultaneously performing bone tracking and identifying the anatomical region where the A-mode ultrasound transducer is placed. It contains the fully connection between all encoding and decoding layers of the cascaded U-Nets to focus only on the signal region that is most likely to have the bone peak, thus pinpointing the exact location of the peak and classifying the anatomical region of the signal. The experiment showed a 97% accuracy in the classification of the anatomical regions and a precision of around 0.5$\pm$1mm under dynamic tracking conditions for various anatomical areas surrounding the knee joint. In general, this approach shows great potential beyond the traditional method, in terms of the accuracy achieved and the recognition of the anatomical region where the ultrasound has been attached as an additional functionality. △ Less

Submitted 31 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

arXiv:2403.05879 [pdf, other]

Deep Learning based acoustic measurement approach for robotic applications on orthopedics

Authors: Bangyu Lan, Momen Abayazid, Nico Verdonschot, Stefano Stramigioli, Kenan Niu

Abstract: In Total Knee Replacement Arthroplasty (TKA), surgical robotics can provide image-guided navigation to fit implants with high precision. Its tracking approach highly relies on inserting bone pins into the bones tracked by the optical tracking system. This is normally done by invasive, radiative manners (implantable markers and CT scans), which introduce unnecessary trauma and prolong the preparati… ▽ More In Total Knee Replacement Arthroplasty (TKA), surgical robotics can provide image-guided navigation to fit implants with high precision. Its tracking approach highly relies on inserting bone pins into the bones tracked by the optical tracking system. This is normally done by invasive, radiative manners (implantable markers and CT scans), which introduce unnecessary trauma and prolong the preparation time for patients. To tackle this issue, ultrasound-based bone tracking could offer an alternative. In this study, we proposed a novel deep learning structure to improve the accuracy of bone tracking by an A-mode ultrasound (US). We first obtained a set of ultrasound dataset from the cadaver experiment, where the ground truth locations of bones were calculated using bone pins. These data were used to train the proposed CasAtt-UNet to predict bone location automatically and robustly. The ground truth bone locations and those locations of US were recorded simultaneously. Therefore, we could label bone peaks in the raw US signals. As a result, our method achieved sub millimeter precision across all eight bone areas with the only exception of one channel in the ankle. This method enables the robust measurement of lower extremity bone positions from 1D raw ultrasound signals. It shows great potential to apply A-mode ultrasound in orthopedic surgery from safe, convenient, and efficient perspectives. △ Less

Submitted 9 March, 2024; originally announced March 2024.

arXiv:2401.14634 [pdf, other]

Semantic Huffman Coding using Synonymous Mapping

Authors: Jin Xu, Kai Niu, Zijian Liang, Ping Zhang

Abstract: Semantic communication stands out as a highly promising avenue for future developments in communications. Theoretically, source compression coding based on semantics can achieve lower rates than Shannon entropy. This paper introduces a semantic Huffman coding built upon semantic information theory. By incorporating synonymous mapping and synonymous sets, semantic Huffman coding can achieve shorter… ▽ More Semantic communication stands out as a highly promising avenue for future developments in communications. Theoretically, source compression coding based on semantics can achieve lower rates than Shannon entropy. This paper introduces a semantic Huffman coding built upon semantic information theory. By incorporating synonymous mapping and synonymous sets, semantic Huffman coding can achieve shorter average code lengths. Furthermore, we demonstrate that semantic Huffman coding theoretically have the capability to approximate semantic entropy. Experimental results indicate that, under the condition of semantic lossless, semantic Huffman coding exhibits clear advantages in compression efficiency over classical Huffman coding. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 6 pages, 3 figures, this paper is submitted to the 2024 IEEE International Symposium on Information Theory (ISIT 2024)

arXiv:2401.14633 [pdf, other]

Semantic Arithmetic Coding using Synonymous Mappings

Authors: Zijian Liang, Kai Niu, Jin Xu, Ping Zhang

Abstract: Recent semantic communication methods explore effective ways to expand the communication paradigm and improve the system performance of the communication systems. Nonetheless, the common problem of these methods is that the essence of semantics is not explicitly pointed out and directly utilized. A new epistemology suggests that synonymy, which is revealed as the fundamental feature of semantics,… ▽ More Recent semantic communication methods explore effective ways to expand the communication paradigm and improve the system performance of the communication systems. Nonetheless, the common problem of these methods is that the essence of semantics is not explicitly pointed out and directly utilized. A new epistemology suggests that synonymy, which is revealed as the fundamental feature of semantics, guides the establishment of the semantic information theory from a novel viewpoint. Building on this theoretical basis, this paper proposes a semantic arithmetic coding (SAC) method for semantic lossless compression using intuitive semantic synonymy. By constructing reasonable synonymous mappings and performing arithmetic coding procedures over synonymous sets, SAC can achieve higher compression efficiency for meaning-contained source sequences at the semantic level and thereby approximate the semantic entropy limits. Experimental results on edge texture map compression show an evident improvement in coding efficiency using SAC without semantic losses, compared to traditional arithmetic coding, which demonstrates its effectiveness. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 6 pages, 4 figures. This paper is submitted to the 2024 IEEE International Symposium on Information Theory (ISIT 2024)

arXiv:2401.14160 [pdf, other]

A Mathematical Theory of Semantic Communication: Overview

Authors: Kai Niu, Ping Zhang

Abstract: Semantic communication initiates a new direction for future communication. In this paper, we aim to establish a systematic framework of semantic information theory (SIT). First, we propose a semantic communication model and define the synonymous mapping to indicate the critical relationship between semantic information and syntactic information. Based on this core concept, we introduce the measure… ▽ More Semantic communication initiates a new direction for future communication. In this paper, we aim to establish a systematic framework of semantic information theory (SIT). First, we propose a semantic communication model and define the synonymous mapping to indicate the critical relationship between semantic information and syntactic information. Based on this core concept, we introduce the measures of semantic information, such as semantic entropy $H_s(\tilde{U})$, up/down semantic mutual information $I^s(\tilde{X};\tilde{Y})$ $(I_s(\tilde{X};\tilde{Y}))$, semantic capacity $C_s=\max_{p(x)}I^s(\tilde{X};\tilde{Y})$, and semantic rate-distortion function $R_s(D)=\min_{p(\hat{x}|x):\mathbb{E}d_s(\tilde{x},\hat{\tilde{x}})\leq D}I_s(\tilde{X};\hat{\tilde{X}})$. Furthermore, we prove three coding theorems of SIT, that is, the semantic source coding theorem, semantic channel coding theorem, and semantic rate-distortion coding theorem. We find that the limits of information theory are extended by using synonymous mapping, that is, $H_s(\tilde{U})\leq H(U)$, $C_s\geq C$ and $R_s(D)\leq R(D)$. All these works composite the basis of semantic information theory. In summary, the theoretic framework proposed in this paper is a natural extension of classic information theory and may reveal great performance potential for future communication. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 6 pages, 2 figures. This paper is submitted to the 2024 IEEE International Symposium on Information Theory (ISIT 2024). arXiv admin note: substantial text overlap with arXiv:2401.13387

arXiv:2401.13387 [pdf, other]

A Mathematical Theory of Semantic Communication

Authors: Kai Niu, Ping Zhang

Abstract: The year 1948 witnessed the historic moment of the birth of classic information theory (CIT). Guided by CIT, modern communication techniques have approached the theoretic limitations, such as, entropy function $H(U)$, channel capacity $C=\max_{p(x)}I(X;Y)$ and rate-distortion function $R(D)=\min_{p(\hat{x}|x):\mathbb{E}d(x,\hat{x})\leq D} I(X;\hat{X})$. Semantic communication paves a new direction… ▽ More The year 1948 witnessed the historic moment of the birth of classic information theory (CIT). Guided by CIT, modern communication techniques have approached the theoretic limitations, such as, entropy function $H(U)$, channel capacity $C=\max_{p(x)}I(X;Y)$ and rate-distortion function $R(D)=\min_{p(\hat{x}|x):\mathbb{E}d(x,\hat{x})\leq D} I(X;\hat{X})$. Semantic communication paves a new direction for future communication techniques whereas the guided theory is missed. In this paper, we try to establish a systematic framework of semantic information theory (SIT). We investigate the behavior of semantic communication and find that synonym is the basic feature so we define the synonymous mapping between semantic information and syntactic information. Stemming from this core concept, synonymous mapping $f$, we introduce the measures of semantic information, such as semantic entropy $H_s(\tilde{U})$, up/down semantic mutual information $I^s(\tilde{X};\tilde{Y})$ $(I_s(\tilde{X};\tilde{Y}))$, semantic capacity $C_s=\max_{f_{xy}}\max_{p(x)}I^s(\tilde{X};\tilde{Y})$, and semantic rate-distortion function $R_s(D)=\min_{\{f_x,f_{\hat{x}}\}}\min_{p(\hat{x}|x):\mathbb{E}d_s(\tilde{x},\hat{\tilde{x}})\leq D}I_s(\tilde{X};\hat{\tilde{X}})$. Furthermore, we prove three coding theorems of SIT by using random coding and (jointly) typical decoding/encoding, that is, the semantic source coding theorem, semantic channel coding theorem, and semantic rate-distortion coding theorem. We find that the limits of SIT are extended by using synonymous mapping, that is, $H_s(\tilde{U})\leq H(U)$, $C_s\geq C$ and $R_s(D)\leq R(D)$. All these works composite the basis of semantic information theory. In addition, we discuss the semantic information measures in the continuous case. For the band-limited Gaussian channel, we obtain a new channel capacity formula, $C_s=B\log\left[S^4\left(1+\frac{P}{N_0B}\right)\right]$. △ Less

Submitted 26 March, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: (version 2.0 updated) 96 pages, 18 figures. This paper is submitted to IEEE Transactions on Information Theory (TIT)

arXiv:2312.08862 [pdf, other]

Semantics-Division Duplexing: A Novel Full-Duplex Paradigm

Authors: Kai Niu, Zijian Liang, Chao Dong, Jincheng Dai, Zhongwei Si, Ping Zhang

Abstract: In-band full-duplex (IBFD) is a theoretically effective solution to increase the overall throughput for the future wireless communications system by enabling transmission and reception over the same time-frequency resources. However, reliable source reconstruction remains a great challenge in the practical IBFD systems due to the non-ideal elimination of the self-interference and the inherent limi… ▽ More In-band full-duplex (IBFD) is a theoretically effective solution to increase the overall throughput for the future wireless communications system by enabling transmission and reception over the same time-frequency resources. However, reliable source reconstruction remains a great challenge in the practical IBFD systems due to the non-ideal elimination of the self-interference and the inherent limitations of the separate source and channel coding methods. On the other hand, artificial intelligence-enabled semantic communication can provide a viable direction for the optimization of the IBFD system. This article introduces a novel IBFD paradigm with the guidance of semantic communication called semantics-division duplexing (SDD). It utilizes semantic domain processing to further suppress self-interference, distinguish the expected semantic information, and recover the desired sources. Further integration of the digital and semantic domain processing can be implemented so as to achieve intelligent and concise communications. We present the advantages of the SDD paradigm with theoretical explanations and provide some visualized results to verify its effectiveness. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: 9 pages, 5 figures, submitted to IEEE Wireless Communications Magazine

arXiv:2312.02456 [pdf, other]

Watermarking for Neural Radiation Fields by Invertible Neural Network

Authors: Wenquan Sun, Jia Liu, Weina Dong, Lifeng Chen, Ke Niu

Abstract: To protect the copyright of the 3D scene represented by the neural radiation field, the embedding and extraction of the neural radiation field watermark are considered as a pair of inverse problems of image transformations. A scheme for protecting the copyright of the neural radiation field is proposed using invertible neural network watermarking, which utilizes watermarking techniques for 2D imag… ▽ More To protect the copyright of the 3D scene represented by the neural radiation field, the embedding and extraction of the neural radiation field watermark are considered as a pair of inverse problems of image transformations. A scheme for protecting the copyright of the neural radiation field is proposed using invertible neural network watermarking, which utilizes watermarking techniques for 2D images to achieve the protection of the 3D scene. The scheme embeds the watermark in the training image of the neural radiation field through the forward process in the invertible network and extracts the watermark from the image rendered by the neural radiation field using the inverse process to realize the copyright protection of both the neural radiation field and the 3D scene. Since the rendering process of the neural radiation field can cause the loss of watermark information, the scheme incorporates an image quality enhancement module, which utilizes a neural network to recover the rendered image and then extracts the watermark. The scheme embeds a watermark in each training image to train the neural radiation field and enables the extraction of watermark information from multiple viewpoints. Simulation experimental results demonstrate the effectiveness of the method. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2310.07121 [pdf, other]

Motion Vector-Domain Video Steganalysis Exploiting Skipped Macroblocks

Authors: Jun Li, Minqing Zhang, Ke Niu, Yingnan Zhang, Xiaoyuan Yang

Abstract: Video steganography has the potential to be used to convey illegal information, and video steganalysis is a vital tool to detect the presence of this illicit act. Currently, all the motion vector (MV)-based video steganalysis algorithms extract feature sets directly on the MVs, but ignoring the steganograhic operation may perturb the statistics distribution of other video encoding elements, such a… ▽ More Video steganography has the potential to be used to convey illegal information, and video steganalysis is a vital tool to detect the presence of this illicit act. Currently, all the motion vector (MV)-based video steganalysis algorithms extract feature sets directly on the MVs, but ignoring the steganograhic operation may perturb the statistics distribution of other video encoding elements, such as the skipped macroblocks (no direct MVs). This paper proposes a novel 11-dimensional feature set to detect MV-based video steganography based on the above observation. The proposed feature is extracted based on the skipped macroblocks by recompression calibration. Specifically, the feature consists of two components. The first is the probability distribution of motion vector prediction (MVP) difference, and the second is the probability distribution of partition state transfer. Extensive experiments on different conditions demonstrate that the proposed feature set achieves good detection accuracy, especially in lower embedding capacity. In addition, the loss of detection performance caused by recompression calibration using mismatched quantization parameters (QP) is within the acceptable range, so the proposed method can be used in practical scenarios. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2309.11836 [pdf, other]

Pre-configured Error Pattern Ordered Statistics Decoding for CRC-Polar Codes

Authors: Xuanyu Li, Kai Niu, Yuxin Han, Jincheng Dai, Zhiyuan Tan, Zhiheng Guo

Abstract: In this paper, we propose a pre-configured error pattern ordered statistics decoding (PEPOSD) algorithm and discuss its application to short cyclic redundancy check (CRC)-polar codes. Unlike the traditional OSD that changes the most reliable independent symbols, we regard the decoding process as testing the error patterns, like guessing random additive noise decoding (GRAND). Also, the pre-configu… ▽ More In this paper, we propose a pre-configured error pattern ordered statistics decoding (PEPOSD) algorithm and discuss its application to short cyclic redundancy check (CRC)-polar codes. Unlike the traditional OSD that changes the most reliable independent symbols, we regard the decoding process as testing the error patterns, like guessing random additive noise decoding (GRAND). Also, the pre-configurator referred from ordered reliability bits (ORB) GRAND can better control the range and testing order of EPs. Offline-online structure can accelerate the decoding process. Additionally, we also introduce two orders to optimize the search order for testing EPs. Compared with CRC-aided OSD and list decoding, PEPOSD can achieve a better trade-off between accuracy and complexity. △ Less

Submitted 23 September, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.04682 [pdf, other]

DeNoising-MOT: Towards Multiple Object Tracking with Severe Occlusions

Authors: Teng Fu, Xiaocong Wang, Haiyang Yu, Ke Niu, Bin Li, Xiangyang Xue

Abstract: Multiple object tracking (MOT) tends to become more challenging when severe occlusions occur. In this paper, we analyze the limitations of traditional Convolutional Neural Network-based methods and Transformer-based methods in handling occlusions and propose DNMOT, an end-to-end trainable DeNoising Transformer for MOT. To address the challenge of occlusions, we explicitly simulate the scenarios wh… ▽ More Multiple object tracking (MOT) tends to become more challenging when severe occlusions occur. In this paper, we analyze the limitations of traditional Convolutional Neural Network-based methods and Transformer-based methods in handling occlusions and propose DNMOT, an end-to-end trainable DeNoising Transformer for MOT. To address the challenge of occlusions, we explicitly simulate the scenarios when occlusions occur. Specifically, we augment the trajectory with noises during training and make our model learn the denoising process in an encoder-decoder architecture, so that our model can exhibit strong robustness and perform well under crowded scenes. Additionally, we propose a Cascaded Mask strategy to better coordinate the interaction between different types of queries in the decoder to prevent the mutual suppression between neighboring trajectories under crowded scenes. Notably, the proposed method requires no additional modules like matching strategy and motion state estimation in inference. We conduct extensive experiments on the MOT17, MOT20, and DanceTrack datasets, and the experimental results show that our method outperforms previous state-of-the-art methods by a clear margin. △ Less

Submitted 9 September, 2023; originally announced September 2023.

Comments: ACM Multimedia 2023

arXiv:2309.00885 [pdf, other]

doi 10.1016/j.media.2023.102945

A Generic Fundus Image Enhancement Network Boosted by Frequency Self-supervised Representation Learning

Authors: Heng Li, Haofeng Liu, Huazhu Fu, Yanwu Xu, Hui Shu, Ke Niu, Yan Hu, Jiang Liu

Abstract: Fundus photography is prone to suffer from image quality degradation that impacts clinical examination performed by ophthalmologists or intelligent systems. Though enhancement algorithms have been developed to promote fundus observation on degraded images, high data demands and limited applicability hinder their clinical deployment. To circumvent this bottleneck, a generic fundus image enhancement… ▽ More Fundus photography is prone to suffer from image quality degradation that impacts clinical examination performed by ophthalmologists or intelligent systems. Though enhancement algorithms have been developed to promote fundus observation on degraded images, high data demands and limited applicability hinder their clinical deployment. To circumvent this bottleneck, a generic fundus image enhancement network (GFE-Net) is developed in this study to robustly correct unknown fundus images without supervised or extra data. Levering image frequency information, self-supervised representation learning is conducted to learn robust structure-aware representations from degraded images. Then with a seamless architecture that couples representation learning and image enhancement, GFE-Net can accurately correct fundus images and meanwhile preserve retinal structures. Comprehensive experiments are implemented to demonstrate the effectiveness and advantages of GFE-Net. Compared with state-of-the-art algorithms, GFE-Net achieves superior performance in data dependency, enhancement performance, deployment efficiency, and scale generalizability. Follow-up fundus image analysis is also facilitated by GFE-Net, whose modules are respectively verified to be effective for image enhancement. △ Less

Submitted 2 September, 2023; originally announced September 2023.

Comments: Accepted by Medical Image Analysis in Auguest, 2023

Journal ref: Medical Image Analysis, 2023, 90:102945

arXiv:2308.06464 [pdf, other]

A One-dimensional HEVC video steganalysis method using the Optimality of Predicted Motion Vectors

Authors: Jun Li, Minqing Zhang, Ke Niu, Yingnan Zhang, Xiaoyuan Yang

Abstract: Among steganalysis techniques, detection against motion vector (MV) domain-based video steganography in High Efficiency Video Coding (HEVC) standard remains a hot and challenging issue. For the purpose of improving the detection performance, this paper proposes a steganalysis feature based on the optimality of predicted MVs with a dimension of one. Firstly, we point out that the motion vector pred… ▽ More Among steganalysis techniques, detection against motion vector (MV) domain-based video steganography in High Efficiency Video Coding (HEVC) standard remains a hot and challenging issue. For the purpose of improving the detection performance, this paper proposes a steganalysis feature based on the optimality of predicted MVs with a dimension of one. Firstly, we point out that the motion vector prediction (MVP) of the prediction unit (PU) encoded using the Advanced Motion Vector Prediction (AMVP) technique satisfies the local optimality in the cover video. Secondly, we analyze that in HEVC video, message embedding either using MVP index or motion vector differences (MVD) may destroy the above optimality of MVP. And then, we define the optimal rate of MVP in HEVC video as a steganalysis feature. Finally, we conduct steganalysis detection experiments on two general datasets for three popular steganography methods and compare the performance with four state-of-the-art steganalysis methods. The experimental results show that the proposed optimal rate of MVP for all cover videos is 100\%, while the optimal rate of MVP for all stego videos is less than 100\%. Therefore, the proposed steganography scheme can accurately distinguish between cover videos and stego videos, and it is efficiently applied to practical scenarios with no model training and low computational complexity. △ Less

Submitted 12 August, 2023; originally announced August 2023.

Comments: Submitted to TCSVT

arXiv:2303.14640 [pdf, other]

NeurJSCC Enabled Semantic Communications: Paradigms, Applications, and Potentials

Authors: Sixian Wang, Jincheng Dai, Xiaoqi Qin, Kai Niu, Ping Zhang

Abstract: Recent advances in deep learning have led to increased interest in solving high-efficiency end-to-end transmission problems using methods that employ the nonlinear property of neural networks. These techniques, we call neural joint source-channel coding (NeurJSCC), extract latent semantic features of the source signal across space and time, and design corresponding variable-length NeurJSCC approac… ▽ More Recent advances in deep learning have led to increased interest in solving high-efficiency end-to-end transmission problems using methods that employ the nonlinear property of neural networks. These techniques, we call neural joint source-channel coding (NeurJSCC), extract latent semantic features of the source signal across space and time, and design corresponding variable-length NeurJSCC approaches to transmit latent features over wireless communication channels. Rapid progress has led to numerous research papers, but a consolidation of the discovered knowledge has not yet emerged. In this article, we gather diverse ideas to categorize the expansive aspects on NeurJSCC as two paradigms, i.e., explicit and implicit NeurJSCC. We first focus on those two paradigms of NeurJSCC by identifying their common and different components in building end-to-end communication systems. We then focus on typical applications of NeurJSCC to various communication tasks. Our article highlights the improved quality, flexibility, and capability brought by NeurJSCC, and we also point out future directions. △ Less

Submitted 23 June, 2023; v1 submitted 26 March, 2023; originally announced March 2023.

arXiv:2303.14637 [pdf, other]

Improved Nonlinear Transform Source-Channel Coding to Catalyze Semantic Communications

Authors: Sixian Wang, Jincheng Dai, Xiaoqi Qin, Zhongwei Si, Kai Niu, Ping Zhang

Abstract: Recent deep learning methods have led to increased interest in solving high-efficiency end-to-end transmission problems. These methods, we call nonlinear transform source-channel coding (NTSCC), extract the semantic latent features of source signal, and learn entropy model to guide the joint source-channel coding with variable rate to transmit latent features over wireless channels. In this paper,… ▽ More Recent deep learning methods have led to increased interest in solving high-efficiency end-to-end transmission problems. These methods, we call nonlinear transform source-channel coding (NTSCC), extract the semantic latent features of source signal, and learn entropy model to guide the joint source-channel coding with variable rate to transmit latent features over wireless channels. In this paper, we propose a comprehensive framework for improving NTSCC, thereby higher system coding gain, better model versatility, and more flexible adaptation strategy aligned with semantic guidance are all achieved. This new sophisticated NTSCC model is now ready to support large-size data interaction in emerging XR, which catalyzes the application of semantic communications. Specifically, we propose three useful improvement approaches. First, we introduce a contextual entropy model to better capture the spatial correlations among the semantic latent features, thereby more accurate rate allocation and contextual joint source-channel coding are developed accordingly to enable higher coding gain. On that basis, we further propose response network architectures to formulate versatile NTSCC, i.e., once-trained model supports various rates and channel states that benefits the practical deployment. Following this, we propose an online latent feature editing method to enable more flexible coding rate control aligned with some specific semantic guidance. By comprehensively applying the above three improvement methods for NTSCC, a deployment-friendly semantic coded transmission system stands out finally. Our improved NTSCC system has been experimentally verified to achieve considerable bandwidth saving versus the state-of-the-art engineered VTM + 5G LDPC coded transmission system with lower processing latency. △ Less

Submitted 18 August, 2023; v1 submitted 26 March, 2023; originally announced March 2023.

arXiv:2303.14614 [pdf, ps, other]

doi 10.23919/JCC.2023.02.015

A Golden Decade of Polar Codes: From Basic Principle to 5G Applications

Authors: Kai Niu, Ping Zhang, Jincheng Dai, Zhongwei Si, Chao Dong

Abstract: After the pursuit of seventy years, the invention of polar codes indicates that we have found the first capacity-achieving coding with low complexity construction and decoding, which is the great breakthrough of the coding theory in the past two decades. In this survey, we retrospect the history of polar codes and summarize the advancement in the past ten years. First, the primary principle of cha… ▽ More After the pursuit of seventy years, the invention of polar codes indicates that we have found the first capacity-achieving coding with low complexity construction and decoding, which is the great breakthrough of the coding theory in the past two decades. In this survey, we retrospect the history of polar codes and summarize the advancement in the past ten years. First, the primary principle of channel polarization is investigated such that the basic construction, coding method, and classic successive cancellation (SC) decoding are reviewed. Second, in order to improve the performance of the finite code length, we introduce the guiding principle and conclude five design criteria for the construction, design, and implementation of the polar code in the practical communication system based on the exemplar schemes in the literature. Especially, we explain the design principle behind the concatenated coding and rate matching of polar codes in a 5G wireless system. Furthermore, the improved SC decoding algorithms, such as SC list (SCL) decoding and SC stack (SCS) decoding, etc., are investigated and compared. Finally, the research prospects of polar codes for the future 6G communication system are explored, including the optimization of short polar codes, coding construction in fading channels, polar coded modulation and HARQ, and the polar coded transmission, namely polar processing. Predictably, as a new coding methodology, polar codes will shine a light on communication theory and unveil a revolution in transmission technology. △ Less

Submitted 25 March, 2023; originally announced March 2023.

Comments: 29 pages, 21 figures, Published in China Communications

Journal ref: China Communications, vol.20, no. 2, pp. 94-121, 2023

arXiv:2212.05294 [pdf, ps, other]

Variational Speech Waveform Compression to Catalyze Semantic Communications

Authors: Shengshi Yao, Zixuan Xiao, Sixian Wang, Jincheng Dai, Kai Niu, Ping Zhang

Abstract: We propose a novel neural waveform compression method to catalyze emerging speech semantic communications. By introducing nonlinear transform and variational modeling, we effectively capture the dependencies within speech frames and estimate the probabilistic distribution of the speech feature more accurately, giving rise to better compression performance. In particular, the speech signals are ana… ▽ More We propose a novel neural waveform compression method to catalyze emerging speech semantic communications. By introducing nonlinear transform and variational modeling, we effectively capture the dependencies within speech frames and estimate the probabilistic distribution of the speech feature more accurately, giving rise to better compression performance. In particular, the speech signals are analyzed and synthesized by a pair of nonlinear transforms, yielding latent features. An entropy model with hyperprior is built to capture the probabilistic distribution of latent features, followed with quantization and entropy coding. The proposed waveform codec can be optimized flexibly towards arbitrary rate, and the other appealing feature is that it can be easily optimized for any differentiable loss function, including perceptual loss used in semantic communications. To further improve the fidelity, we incorporate residual coding to mitigate the degradation arising from quantization distortion at the latent space. Results indicate that achieving the same performance, the proposed method saves up to 27% coding rate than widely used adaptive multi-rate wideband (AMR-WB) codec as well as emerging neural waveform coding methods. △ Less

Submitted 13 December, 2022; v1 submitted 10 December, 2022; originally announced December 2022.

arXiv:2211.14541 [pdf, other]

RL-Based Guidance in Outpatient Hysteroscopy Training: A Feasibility Study

Authors: Vladimir Poliakov, Kenan Niu, Emmanuel Vander Poorten, Dzmitry Tsetserukou

Abstract: This work presents an RL-based agent for outpatient hysteroscopy training. Hysteroscopy is a gynecological procedure for examination of the uterine cavity. Recent advancements enabled performing this type of intervention in the outpatient setup without anaesthesia. While being beneficial to the patient, this approach introduces new challenges for clinicians, who should take additional measures to… ▽ More This work presents an RL-based agent for outpatient hysteroscopy training. Hysteroscopy is a gynecological procedure for examination of the uterine cavity. Recent advancements enabled performing this type of intervention in the outpatient setup without anaesthesia. While being beneficial to the patient, this approach introduces new challenges for clinicians, who should take additional measures to maintain the level of patient comfort and prevent tissue damage. Our prior work has presented a platform for hysteroscopic training with the focus on the passage of the cervical canal. With this work, we aim to extend the functionality of the platform by designing a subsystem that autonomously performs the task of the passage of the cervical canal. This feature can later be used as a virtual instructor to provide educational cues for trainees and assess their performance. The developed algorithm is based on the soft actor critic approach to smooth the learning curve of the agent and ensure uniform exploration of the workspace. The designed algorithm was tested against the performance of five clinicians. Overall, the algorithm demonstrated high efficiency and reliability, succeeding in 98% of trials and outperforming the expert group in three out of four measured metrics. △ Less

Submitted 26 November, 2022; originally announced November 2022.

arXiv:2211.04339 [pdf, other]

Toward Adaptive Semantic Communications: Efficient Data Transmission via Online Learned Nonlinear Transform Source-Channel Coding

Authors: Jincheng Dai, Sixian Wang, Ke Yang, Kailin Tan, Xiaoqi Qin, Zhongwei Si, Kai Niu, Ping Zhang

Abstract: The emerging field semantic communication is driving the research of end-to-end data transmission. By utilizing the powerful representation ability of deep learning models, learned data transmission schemes have exhibited superior performance than the established source and channel coding methods. While, so far, research efforts mainly concentrated on architecture and model improvements toward a s… ▽ More The emerging field semantic communication is driving the research of end-to-end data transmission. By utilizing the powerful representation ability of deep learning models, learned data transmission schemes have exhibited superior performance than the established source and channel coding methods. While, so far, research efforts mainly concentrated on architecture and model improvements toward a static target domain. Despite their successes, such learned models are still suboptimal due to the limitations in model capacity and imperfect optimization and generalization, particularly when the testing data distribution or channel response is different from that adopted for model training, as is likely to be the case in real-world. To tackle this, we propose a novel online learned joint source and channel coding approach that leverages the deep learning model's overfitting property. Specifically, we update the off-the-shelf pre-trained models after deployment in a lightweight online fashion to adapt to the distribution shifts in source data and environment domain. We take the overfitting concept to the extreme, proposing a series of implementation-friendly methods to adapt the codec model or representations to an individual data or channel state instance, which can further lead to substantial gains in terms of the bandwidth ratio-distortion performance. The proposed methods enable the communication-efficient adaptation for all parameters in the network without sacrificing decoding speed. Our experiments, including user study, on continually changing target source data and wireless channel environments, demonstrate the effectiveness and efficiency of our approach, on which we outperform existing state-of-the-art engineered transmission scheme (VVC combined with 5G LDPC coded transmission). △ Less

Submitted 24 May, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

Comments: Accepted by IEEE JSAC

arXiv:2211.02283 [pdf, ps, other]

Wireless Deep Speech Semantic Transmission

Authors: Zixuan Xiao, Shengshi Yao, Jincheng Dai, Sixian Wang, Kai Niu, Ping Zhang

Abstract: In this paper, we propose a new class of high-efficiency semantic coded transmission methods for end-to-end speech transmission over wireless channels. We name the whole system as deep speech semantic transmission (DSST). Specifically, we introduce a nonlinear transform to map the speech source to semantic latent space and feed semantic features into source-channel encoder to generate the channel-… ▽ More In this paper, we propose a new class of high-efficiency semantic coded transmission methods for end-to-end speech transmission over wireless channels. We name the whole system as deep speech semantic transmission (DSST). Specifically, we introduce a nonlinear transform to map the speech source to semantic latent space and feed semantic features into source-channel encoder to generate the channel-input sequence. Guided by the variational modeling idea, we build an entropy model on the latent space to estimate the importance diversity among semantic feature embeddings. Accordingly, these semantic features of different importance can be allocated with different coding rates reasonably, which maximizes the system coding gain. Furthermore, we introduce a channel signal-to-noise ratio (SNR) adaptation mechanism such that a single model can be applied over various channel states. The end-to-end optimization of our model leads to a flexible rate-distortion (RD) trade-off, supporting versatile wireless speech semantic transmission. Experimental results verify that our DSST system clearly outperforms current engineered speech transmission systems on both objective and subjective metrics. Compared with existing neural speech semantic transmission methods, our model saves up to 75% of channel bandwidth costs when achieving the same quality. An intuitive comparison of audio demos can be found at https://ximoo123.github.io/DSST. △ Less

Submitted 4 November, 2022; originally announced November 2022.

arXiv:2211.00937 [pdf, other]

WITT: A Wireless Image Transmission Transformer for Semantic Communications

Authors: Ke Yang, Sixian Wang, Jincheng Dai, Kailin Tan, Kai Niu, Ping Zhang

Abstract: In this paper, we aim to redesign the vision Transformer (ViT) as a new backbone to realize semantic image transmission, termed wireless image transmission transformer (WITT). Previous works build upon convolutional neural networks (CNNs), which are inefficient in capturing global dependencies, resulting in degraded end-to-end transmission performance especially for high-resolution images. To tack… ▽ More In this paper, we aim to redesign the vision Transformer (ViT) as a new backbone to realize semantic image transmission, termed wireless image transmission transformer (WITT). Previous works build upon convolutional neural networks (CNNs), which are inefficient in capturing global dependencies, resulting in degraded end-to-end transmission performance especially for high-resolution images. To tackle this, the proposed WITT employs Swin Transformers as a more capable backbone to extract long-range information. Different from ViTs in image classification tasks, WITT is highly optimized for image transmission while considering the effect of the wireless channel. Specifically, we propose a spatial modulation module to scale the latent representations according to channel state information, which enhances the ability of a single model to deal with various channel conditions. As a result, extensive experiments verify that our WITT attains better performance for different image resolutions, distortion metrics, and channel conditions. The code is available at https://github.com/KeYang8/WITT. △ Less

Submitted 2 November, 2022; originally announced November 2022.

arXiv:2210.16741 [pdf, ps, other]

Versatile Semantic Coded Transmission over MIMO Fading Channels

Authors: Shengshi Yao, Sixian Wang, Jincheng Dai, Kai Niu, Ping Zhang

Abstract: Semantic communications have shown great potential to boost the end-to-end transmission performance. To further improve the system efficiency, in this paper, we propose a class of novel semantic coded transmission (SCT) schemes over multiple-input multiple-output (MIMO) fading channels. In particular, we propose a high-efficiency SCT system supporting concurrent transmission of multiple streams, w… ▽ More Semantic communications have shown great potential to boost the end-to-end transmission performance. To further improve the system efficiency, in this paper, we propose a class of novel semantic coded transmission (SCT) schemes over multiple-input multiple-output (MIMO) fading channels. In particular, we propose a high-efficiency SCT system supporting concurrent transmission of multiple streams, which can maximize the multiplexing gain of end-to-end semantic communication system. By jointly considering the entropy distribution on the source semantic features and the wireless MIMO channel states, we design a spatial multiplexing mechanism to realize adaptive coding rate allocation and stream mapping. As a result, source content and channel environment will be seamlessly coupled, which maximizes the coding gain of SCT system. Moreover, our SCT system is versatile: a single model can support various transmission rates. The whole model is optimized under the constraint of transmission rate-distortion (RD) tradeoff. Experimental results verify that our scheme substantially increases the throughput of semantic communication system. It also outperforms traditional MIMO communication systems under realistic fading channels. △ Less

Submitted 30 October, 2022; originally announced October 2022.

arXiv:2209.08294 [pdf, ps, other]

A Survey on the Network Models applied in the Industrial Network Optimization

Authors: Chao Dong, Xiaoxiong Xiong, Qiulin Xue, Zhengzhen Zhang, Kai Niu, Ping Zhang

Abstract: Network architecture design is very important for the optimization of industrial networks. The type of network architecture can be divided into small-scale network and large-scale network according to its scale. Graph theory is an efficient mathematical tool for network topology modeling. For small-scale networks, its structure often has regular topology. For large-scale networks, the existing res… ▽ More Network architecture design is very important for the optimization of industrial networks. The type of network architecture can be divided into small-scale network and large-scale network according to its scale. Graph theory is an efficient mathematical tool for network topology modeling. For small-scale networks, its structure often has regular topology. For large-scale networks, the existing research mainly focuses on the random characteristics of network nodes and edges. Recently, popular models include random networks, small-world networks and scale-free networks. Starting from the scale of network, this survey summarizes and analyzes the network modeling methods based on graph theory and the practical application in industrial scenarios. Furthermore, this survey proposes a novel network performance metric - system entropy. From the perspective of mathematical properties, the analysis of its non-negativity, monotonicity and concave-convexity is given. The advantage of system entropy is that it can cover the existing regular network, random network, small-world network and scale-free network, and has strong generality. The simulation results show that this metric can realize the comparison of various industrial networks under different models. △ Less

Submitted 17 September, 2022; originally announced September 2022.

Comments: 26 pages, 11 figures, Journal

arXiv:2209.01744 [pdf]

Investigation on Principles for Cost Assignment in Motion Vector-based Video Steganography

Authors: Jun Li, Minqing Zhang, Ke Niu, Xiaoyuan Yang

Abstract: Cost assignment in the motion vector domain remains a research focus in video steganography. Recent studies in image steganography have summarized many principles for cost assignment and achieved good results. But the basic principles for cost assignment in motion vector-based video steganography have not been fully discussed yet. Firstly, this paper proposes three principles for cost assignment i… ▽ More Cost assignment in the motion vector domain remains a research focus in video steganography. Recent studies in image steganography have summarized many principles for cost assignment and achieved good results. But the basic principles for cost assignment in motion vector-based video steganography have not been fully discussed yet. Firstly, this paper proposes three principles for cost assignment in the motion vector domain, including the principle of local optimality, non-consistency in the block group, and complexity priority. Secondly, three corresponding novel practical distortion functions were designed according to the three principles. Finally, a joint distortion function is constructed based on all three principles to increase overall performance. The experimental results show that not only the three independent distortion functions can effectively resist the corresponding steganalysis attacks, but the final joint distortion can resist the three steganalysis features simultaneously. In addition, it can obtain good visual quality and coding efficiency, which can be applied to practical scenarios. △ Less

Submitted 4 September, 2022; originally announced September 2022.

Comments: 16 pages, 8 figures,

arXiv:2208.02481 [pdf, ps, other]

Communication Beyond Transmitting Bits: Semantics-Guided Source and Channel Coding

Authors: Jincheng Dai, Ping Zhang, Kai Niu, Sixian Wang, Zhongwei Si, Xiaoqi Qin

Abstract: Classical communication paradigms focus on accurately transmitting bits over a noisy channel, and Shannon theory provides a fundamental theoretical limit on the rate of reliable communications. In this approach, bits are treated equally, and the communication system is oblivious to what meaning these bits convey or how they would be used. Future communications towards intelligence and conciseness… ▽ More Classical communication paradigms focus on accurately transmitting bits over a noisy channel, and Shannon theory provides a fundamental theoretical limit on the rate of reliable communications. In this approach, bits are treated equally, and the communication system is oblivious to what meaning these bits convey or how they would be used. Future communications towards intelligence and conciseness will predictably play a dominant role, and the proliferation of connected intelligent agents requires a radical rethinking of coded transmission paradigm to support the new communication morphology on the horizon. The recent concept of "semantic communications" offers a promising research direction. Injecting semantic guidance into the coded transmission design to achieve semantics-aware communications shows great potential for further breakthrough in effectiveness and reliability. This article sheds light on semantics-guided source and channel coding as a transmission paradigm of semantic communications, which exploits both data semantics diversity and wireless channel diversity together to boost the whole system performance. We present the general system architecture and key techniques, and indicate some open issues on this topic. △ Less

Submitted 4 August, 2022; originally announced August 2022.

Comments: IEEE Wireless Communications, text overlap with arXiv:2112.03093

arXiv:2205.13129 [pdf, other]

Wireless Deep Video Semantic Transmission

Authors: Sixian Wang, Jincheng Dai, Zijian Liang, Kai Niu, Zhongwei Si, Chao Dong, Xiaoqi Qin, Ping Zhang

Abstract: In this paper, we design a new class of high-efficiency deep joint source-channel coding methods to achieve end-to-end video transmission over wireless channels. The proposed methods exploit nonlinear transform and conditional coding architecture to adaptively extract semantic features across video frames, and transmit semantic feature domain representations over wireless channels via deep joint s… ▽ More In this paper, we design a new class of high-efficiency deep joint source-channel coding methods to achieve end-to-end video transmission over wireless channels. The proposed methods exploit nonlinear transform and conditional coding architecture to adaptively extract semantic features across video frames, and transmit semantic feature domain representations over wireless channels via deep joint source-channel coding. Our framework is collected under the name deep video semantic transmission (DVST). In particular, benefiting from the strong temporal prior provided by the feature domain context, the learned nonlinear transform function becomes temporally adaptive, resulting in a richer and more accurate entropy model guiding the transmission of current frame. Accordingly, a novel rate adaptive transmission mechanism is developed to customize deep joint source-channel coding for video sources. It learns to allocate the limited channel bandwidth within and among video frames to maximize the overall transmission performance. The whole DVST design is formulated as an optimization problem whose goal is to minimize the end-to-end transmission rate-distortion performance under perceptual quality metrics or machine vision task performance metrics. Across standard video source test sequences and various communication scenarios, experiments show that our DVST can generally surpass traditional wireless video coded transmission schemes. The proposed DVST framework can well support future semantic communications due to its video content-aware and machine vision task integration abilities. △ Less

Submitted 2 November, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

Comments: published in IEEE JSAC

arXiv:2205.13120 [pdf, ps, other]

Perceptual Learned Source-Channel Coding for High-Fidelity Image Semantic Transmission

Authors: Jun Wang, Sixian Wang, Jincheng Dai, Zhongwei Si, Dekun Zhou, Kai Niu

Abstract: As one novel approach to realize end-to-end wireless image semantic transmission, deep learning-based joint source-channel coding (deep JSCC) method is emerging in both deep learning and communication communities. However, current deep JSCC image transmission systems are typically optimized for traditional distortion metrics such as peak signal-to-noise ratio (PSNR) or multi-scale structural simil… ▽ More As one novel approach to realize end-to-end wireless image semantic transmission, deep learning-based joint source-channel coding (deep JSCC) method is emerging in both deep learning and communication communities. However, current deep JSCC image transmission systems are typically optimized for traditional distortion metrics such as peak signal-to-noise ratio (PSNR) or multi-scale structural similarity (MS-SSIM). But for low transmission rates, due to the imperfect wireless channel, these distortion metrics lose significance as they favor pixel-wise preservation. To account for human visual perception in semantic communications, it is of great importance to develop new deep JSCC systems optimized beyond traditional PSNR and MS-SSIM metrics. In this paper, we introduce adversarial losses to optimize deep JSCC, which tends to preserve global semantic information and local texture. Our new deep JSCC architecture combines encoder, wireless channel, decoder/generator, and discriminator, which are jointly learned under both perceptual and adversarial losses. Our method yields human visually much more pleasing results than state-of-the-art engineered image coded transmission systems and traditional deep JSCC systems. A user study confirms that achieving the perceptually similar end-to-end image transmission quality, the proposed method can save about 50\% wireless channel bandwidth cost. △ Less

Submitted 25 May, 2022; originally announced May 2022.

arXiv:2205.03534 [pdf, other]

Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information

Authors: Zhipeng Zhang, Xinglin Hou, Kai Niu, Zhongzhen Huang, Tiezheng Ge, Yuning Jiang, Qi Wu, Peng Wang

Abstract: Recently, online shopping has gradually become a common way of shopping for people all over the world. Wonderful merchandise advertisements often attract more people to buy. These advertisements properly integrate multimodal multi-structured information of commodities, such as visual spatial information and fine-grained structure information. However, traditional multimodal text generation focuses… ▽ More Recently, online shopping has gradually become a common way of shopping for people all over the world. Wonderful merchandise advertisements often attract more people to buy. These advertisements properly integrate multimodal multi-structured information of commodities, such as visual spatial information and fine-grained structure information. However, traditional multimodal text generation focuses on the conventional description of what existed and happened, which does not match the requirement of advertisement copywriting in the real world. Because advertisement copywriting has a vivid language style and higher requirements of faithfulness. Unfortunately, there is a lack of reusable evaluation frameworks and a scarcity of datasets. Therefore, we present a dataset, E-MMAD (e-commercial multimodal multi-structured advertisement copywriting), which requires, and supports much more detailed information in text generation. Noticeably, it is one of the largest video captioning datasets in this field. Accordingly, we propose a baseline method and faithfulness evaluation metric on the strength of structured information reasoning to solve the demand in reality on this dataset. It surpasses the previous methods by a large margin on all metrics. The dataset and method are coming soon on \url{https://e-mmad.github.io/e-mmad.net/index.html}. △ Less

Submitted 6 May, 2022; originally announced May 2022.

arXiv:2204.07435 [pdf, other]

Performance and Construction of Polar Codes: The Perspective of Bit Error Probability

Authors: Bolin Wu, Kai Niu, Jincheng Dai

Abstract: Most existing works of polar codes focus on the analysis of block error probability. However, in many scenarios, bit error probability is also important for evaluating the performance of channel codes. In this paper, we establish a new framework to analyze the bit error probability of polar codes. Specifically, by revisiting the error event of bit-channel, we first introduce the conditional bit er… ▽ More Most existing works of polar codes focus on the analysis of block error probability. However, in many scenarios, bit error probability is also important for evaluating the performance of channel codes. In this paper, we establish a new framework to analyze the bit error probability of polar codes. Specifically, by revisiting the error event of bit-channel, we first introduce the conditional bit error probability as a metric to evaluate the reliability of bit-channel for both systematic and non-systematic polar codes. Guided by the concept of polar subcode, we then derive an upper bound on the conditional bit error probability of each bit-channel, and accordingly, an upper bound on the bit error probability of polar codes. Based on these, two types of construction metrics aiming at minimizing the bit error probability of polar codes are proposed, which are of linear computational complexity and explicit forms. Simulation results show that the polar codes constructed by the proposed methods can outperform those constructed by the conventional methods. △ Less

Submitted 15 April, 2022; originally announced April 2022.

arXiv:2204.03535 [pdf, ps, other]

Practical Issues and Challenges in CSI-based Integrated Sensing and Communication

Authors: Daqing Zhang, Dan Wu, Kai Niu, Xuanzhi Wang, Fusang Zhang, Jian Yao, Dajie Jiang, Fei Qin

Abstract: Next-generation mobile communication network (i.e., 6G) has been envisioned to go beyond classical communication functionality and provide integrated sensing and communication (ISAC) capability to enable more emerging applications, such as smart cities, connected vehicles, AIoT and health care/elder care. Among all the ISAC proposals, the most practical and promising approach is to empower existin… ▽ More Next-generation mobile communication network (i.e., 6G) has been envisioned to go beyond classical communication functionality and provide integrated sensing and communication (ISAC) capability to enable more emerging applications, such as smart cities, connected vehicles, AIoT and health care/elder care. Among all the ISAC proposals, the most practical and promising approach is to empower existing wireless network (e.g., WiFi, 4G/5G) with the augmented ability to sense the surrounding human and environment, and evolve wireless communication networks into intelligent communication and sensing network (e.g., 6G). In this paper, based on our experience on CSI-based wireless sensing with WiFi/4G/5G signals, we intend to identify ten major practical and theoretical problems that hinder real deployment of ISAC applications, and provide possible solutions to those critical challenges. Hopefully, this work will inspire further research to evolve existing WiFi/4G/5G networks into next-generation intelligent wireless network (i.e., 6G). △ Less

Submitted 17 March, 2022; originally announced April 2022.

Comments: ICC 2022 workshop on integrated sensing and communication (ISAC)

arXiv:2204.03125 [pdf, other]

Deep transfer learning for system identification using long short-term memory neural networks

Authors: Kaicheng Niu, Mi Zhou, Chaouki T. Abdallah, Mohammad Hayajneh

Abstract: Recurrent neural networks (RNNs) have many advantages over more traditional system identification techniques. They may be applied to linear and nonlinear systems, and they require fewer modeling assumptions. However, these neural network models may also need larger amounts of data to learn and generalize. Furthermore, neural networks training is a time-consuming process. Hence, building upon long-… ▽ More Recurrent neural networks (RNNs) have many advantages over more traditional system identification techniques. They may be applied to linear and nonlinear systems, and they require fewer modeling assumptions. However, these neural network models may also need larger amounts of data to learn and generalize. Furthermore, neural networks training is a time-consuming process. Hence, building upon long-short term memory neural networks (LSTM), this paper proposes using two types of deep transfer learning, namely parameter fine-tuning and freezing, to reduce the data and computation requirements for system identification. We apply these techniques to identify two dynamical systems, namely a second-order linear system and a Wiener-Hammerstein nonlinear system. Results show that compared with direct learning, our method accelerates learning by 10% to 50%, which also saves data and computing resources. △ Less

Submitted 6 April, 2022; originally announced April 2022.

arXiv:2203.06692 [pdf, other]

Towards Semantic Communications: A Paradigm Shift

Authors: Kai Niu, Jincheng Dai, Shengshi Yao, Sixian Wang, Zhongwei Si, Xiaoqi Qin, Ping Zhang

Abstract: The last seventy years have witnessed the transition of communication from Shannon's theoretical concept to current high-efficient practical systems. Classical communication systems address the capability-deficiency issue mainly by module-stacking and technique-densification with ever-increasing complexity. In such a traditional viewpoint, classical source coding only uses explicit probabilistic m… ▽ More The last seventy years have witnessed the transition of communication from Shannon's theoretical concept to current high-efficient practical systems. Classical communication systems address the capability-deficiency issue mainly by module-stacking and technique-densification with ever-increasing complexity. In such a traditional viewpoint, classical source coding only uses explicit probabilistic models to compress data, regardless of the meaning of transmitted source messages. Also, channel coded transmission does not identify the source content. In this sense, state-of-the-art communication systems work merely at the technical level as summarized by Weaver. Unlike the traditional system design philosophy, this article proposes a new route to boost the system capabilities towards intelligence-endogenous and primitive-concise communications. The communication paradigm upgrades to the semantic level, which is radically different since all the key techniques imply the use of meanings of transmitted data, thus deeply changing the design of the communication system. This paradigm shifting unveils a promising direction due to its ability to offer an identical quality of service with much lower data transmission requirement. Different from other similar works, this article constitutes a brief tutorial on the framework of semantic communications, its gain analyzed from the information theory perspective, a method to calculate the semantic compression bound, and an exemplary use case of semantic communications. △ Less

Submitted 30 March, 2022; v1 submitted 13 March, 2022; originally announced March 2022.

arXiv:2202.14018 [pdf, other]

Description Logic EL++ Embeddings with Intersectional Closure

Authors: Xi Peng, Zhenwei Tang, Maxat Kulmanov, Kexin Niu, Robert Hoehndorf

Abstract: Many ontologies, in particular in the biomedical domain, are based on the Description Logic EL++. Several efforts have been made to interpret and exploit EL++ ontologies by distributed representation learning. Specifically, concepts within EL++ theories have been represented as n-balls within an n-dimensional embedding space. However, the intersectional closure is not satisfied when using n-balls… ▽ More Many ontologies, in particular in the biomedical domain, are based on the Description Logic EL++. Several efforts have been made to interpret and exploit EL++ ontologies by distributed representation learning. Specifically, concepts within EL++ theories have been represented as n-balls within an n-dimensional embedding space. However, the intersectional closure is not satisfied when using n-balls to represent concepts because the intersection of two n-balls is not an n-ball. This leads to challenges when measuring the distance between concepts and inferring equivalence between concepts. To this end, we developed EL Box Embedding (ELBE) to learn Description Logic EL++ embeddings using axis-parallel boxes. We generate specially designed box-based geometric constraints from EL++ axioms for model training. Since the intersection of boxes remains as a box, the intersectional closure is satisfied. We report extensive experimental results on three datasets and present a case study to demonstrate the effectiveness of the proposed method. △ Less

Submitted 28 February, 2022; originally announced February 2022.

arXiv:2201.10340 [pdf, other]

Distributed Image Transmission using Deep Joint Source-Channel Coding

Authors: Sixian Wang, Ke Yang, Jincheng Dai, Kai Niu

Abstract: We study the problem of deep joint source-channel coding (D-JSCC) for correlated image sources, where each source is transmitted through a noisy independent channel to the common receiver. In particular, we consider a pair of images captured by two cameras with probably overlapping fields of view transmitted over wireless channels and reconstructed in the center node. The challenging problem invol… ▽ More We study the problem of deep joint source-channel coding (D-JSCC) for correlated image sources, where each source is transmitted through a noisy independent channel to the common receiver. In particular, we consider a pair of images captured by two cameras with probably overlapping fields of view transmitted over wireless channels and reconstructed in the center node. The challenging problem involves designing a practical code to utilize both source and channel correlations to improve transmission efficiency without additional transmission overhead. To tackle this, we need to consider the common information across two stereo images as well as the differences between two transmission channels. In this case, we propose a deep neural networks solution that includes lightweight edge encoders and a powerful center decoder. Besides, in the decoder, we propose a novel channel state information aware cross attention module to highlight the overlapping fields and leverage the relevance between two noisy feature maps.Our results show the impressive improvement of reconstruction quality in both links by exploiting the noisy representations of the other link. Moreover, the proposed scheme shows competitive results compared to the separated schemes with capacity-achieving channel codes. △ Less

Submitted 25 January, 2022; originally announced January 2022.

Comments: ICASSP 2022

arXiv:2201.03801 [pdf, other]

Winning solutions and post-challenge analyses of the ChaLearn AutoDL challenge 2019

Authors: Zhengying Liu, Adrien Pavao, Zhen Xu, Sergio Escalera, Fabio Ferreira, Isabelle Guyon, Sirui Hong, Frank Hutter, Rongrong Ji, Julio C. S. Jacques Junior, Ge Li, Marius Lindauer, Zhipeng Luo, Meysam Madadi, Thomas Nierhoff, Kangning Niu, Chunguang Pan, Danny Stoll, Sebastien Treguer, Jin Wang, Peng Wang, Chenglin Wu, Youcheng Xiong, Arbe r Zela, Yang Zhang

Abstract: This paper reports the results and post-challenge analyses of ChaLearn's AutoDL challenge series, which helped sorting out a profusion of AutoML solutions for Deep Learning (DL) that had been introduced in a variety of settings, but lacked fair comparisons. All input data modalities (time series, images, videos, text, tabular) were formatted as tensors and all tasks were multi-label classification… ▽ More This paper reports the results and post-challenge analyses of ChaLearn's AutoDL challenge series, which helped sorting out a profusion of AutoML solutions for Deep Learning (DL) that had been introduced in a variety of settings, but lacked fair comparisons. All input data modalities (time series, images, videos, text, tabular) were formatted as tensors and all tasks were multi-label classification problems. Code submissions were executed on hidden tasks, with limited time and computational resources, pushing solutions that get results quickly. In this setting, DL methods dominated, though popular Neural Architecture Search (NAS) was impractical. Solutions relied on fine-tuned pre-trained networks, with architectures matching data modality. Post-challenge tests did not reveal improvements beyond the imposed time limit. While no component is particularly original or novel, a high level modular organization emerged featuring a "meta-learner", "data ingestor", "model selector", "model/learner", and "evaluator". This modularity enabled ablation studies, which revealed the importance of (off-platform) meta-learning, ensembling, and efficient data management. Experiments on heterogeneous module combinations further confirm the (local) optimality of the winning solutions. Our challenge legacy includes an ever-lasting benchmark (http://autodl.chalearn.org), the open-sourced code of the winners, and a free "AutoDL self-service". △ Less

Submitted 11 January, 2022; originally announced January 2022.

Comments: The first three authors contributed equally; This is only a draft version

Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) 2021

arXiv:2201.02924 [pdf, ps, other]

Joint Successive Cancellation List Decoding for the Double Polar codes

Authors: Yanfei Dong, Kai Niu, Jincheng Dai, Sen Wang, Yifei Yuan

Abstract: As a new joint source-channel coding scheme, the double polar (D-Polar) codes have been proposed recently. In this letter, a novel joint source-channel decoder, namely the joint successive cancellation list (J-SCL) decoder, is proposed to improve the decoding performance of the D-Polar codes. We merge the trellis of the source polar code and that of the channel polar code to construct a compound t… ▽ More As a new joint source-channel coding scheme, the double polar (D-Polar) codes have been proposed recently. In this letter, a novel joint source-channel decoder, namely the joint successive cancellation list (J-SCL) decoder, is proposed to improve the decoding performance of the D-Polar codes. We merge the trellis of the source polar code and that of the channel polar code to construct a compound trellis. In this compound trellis, the joint source-channel nodes represent both of the information bits and the high-entropy bits. Based on the compound trellis, the J-SCL decoder is designed to recover the source messages by combining the source SCL decoding and channel SCL decoding. The J-SCL decoder doubles the number of the decoding paths at each decoding level and then reserves the L paths with the smallest joint path-metric (JPM). For the JSC node, the JPM is updated considering both the channel decision log-likelihood ratios (LLRs) and the source decision LLRs. Simulation results show that the J-SCL decoder outperforms the turbo-like BP (TL-BP) decoder with lower complexity. △ Less

Submitted 8 January, 2022; originally announced January 2022.

arXiv:2112.10961 [pdf, other]

Nonlinear Transform Source-Channel Coding for Semantic Communications

Authors: Jincheng Dai, Sixian Wang, Kailin Tan, Zhongwei Si, Xiaoqi Qin, Kai Niu, Ping Zhang

Abstract: In this paper, we propose a class of high-efficiency deep joint source-channel coding methods that can closely adapt to the source distribution under the nonlinear transform, it can be collected under the name nonlinear transform source-channel coding (NTSCC). In the considered model, the transmitter first learns a nonlinear analysis transform to map the source data into latent space, then transmi… ▽ More In this paper, we propose a class of high-efficiency deep joint source-channel coding methods that can closely adapt to the source distribution under the nonlinear transform, it can be collected under the name nonlinear transform source-channel coding (NTSCC). In the considered model, the transmitter first learns a nonlinear analysis transform to map the source data into latent space, then transmits the latent representation to the receiver via deep joint source-channel coding. Our model incorporates the nonlinear transform as a strong prior to effectively extract the source semantic features and provide side information for source-channel coding. Unlike existing conventional deep joint source-channel coding methods, the proposed NTSCC essentially learns both the source latent representation and an entropy model as the prior on the latent representation. Accordingly, novel adaptive rate transmission and hyperprior-aided codec refinement mechanisms are developed to upgrade deep joint source-channel coding. The whole system design is formulated as an optimization problem whose goal is to minimize the end-to-end transmission rate-distortion performance under established perceptual quality metrics. Across test image sources with various resolutions, we find that the proposed NTSCC transmission method generally outperforms both the analog transmission using the standard deep joint source-channel coding and the classical separation-based digital transmission. Notably, the proposed NTSCC method can potentially support future semantic communications due to its content-aware ability and perceptual optimization goal. △ Less

Submitted 2 November, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

Comments: published in IEEE JSAC

arXiv:2112.03093 [pdf, ps, other]

Communication Beyond Transmitting Bits: Semantics-Guided Source and Channel Coding

Authors: Jincheng Dai, Ping Zhang, Kai Niu, Sixian Wang, Zhongwei Si, Xiaoqi Qin

Abstract: Classical communication paradigms focus on accurately transmitting bits over a noisy channel, and Shannon theory provides a fundamental theoretical limit on the rate of reliable communications. In this approach, bits are treated equally, and the communication system is oblivious to what meaning these bits convey or how they would be used. Future communications towards intelligence and conciseness… ▽ More Classical communication paradigms focus on accurately transmitting bits over a noisy channel, and Shannon theory provides a fundamental theoretical limit on the rate of reliable communications. In this approach, bits are treated equally, and the communication system is oblivious to what meaning these bits convey or how they would be used. Future communications towards intelligence and conciseness will predictably play a dominant role, and the proliferation of connected intelligent agents requires a radical rethinking of coded transmission paradigm to support the new communication morphology on the horizon. The recent concept of "semantic communications" offers a promising research direction. Injecting semantic guidance into the coded transmission design to achieve semantics-aware communications shows great potential for further breakthrough in effectiveness and reliability. This article sheds light on semantics-guided source and channel coding as a transmission paradigm of semantic communications, which exploits both data semantics diversity and wireless channel diversity together to boost the whole system performance. We present the general system architecture and key techniques, and indicate some open issues on this topic. △ Less

Submitted 1 June, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

arXiv:2110.12224 [pdf, other]

Generalized Polarization Transform: A Novel Coded Transmission Paradigm

Authors: Bolin Wu, Jincheng Dai, Kai Niu, Zhongwei Si, Ping Zhang, Sen Wang, Yifei Yuan, Chih-Lin I

Abstract: For the upcoming 6G wireless networks, a new wave of applications and services will demand ultra-high data rates and reliability. To this end, future wireless systems are expected to pave the way for entirely new fundamental air interface technologies to attain a breakthrough in spectrum efficiency (SE). This article discusses a new paradigm, named generalized polarization transform (GPT), to achi… ▽ More For the upcoming 6G wireless networks, a new wave of applications and services will demand ultra-high data rates and reliability. To this end, future wireless systems are expected to pave the way for entirely new fundamental air interface technologies to attain a breakthrough in spectrum efficiency (SE). This article discusses a new paradigm, named generalized polarization transform (GPT), to achieve an integrated design of coding, modulation, multi-antenna, multiple access, etc., in a real sense. The GPT enabled air interface develops far-reaching insights that the joint optimization of critical air interface ingredients can achieve remarkable gains on SE compared with the state-of-the-art module-stacking design. △ Less

Submitted 27 April, 2022; v1 submitted 23 October, 2021; originally announced October 2021.

arXiv:2110.08268 [pdf, other]

Explainable Student Performance Prediction With Personalized Attention for Explaining Why A Student Fails

Authors: Kun Niu, Xipeng Cao, Yicong Yu

Abstract: As student failure rates continue to increase in higher education, predicting student performance in the following semester has become a significant demand. Personalized student performance prediction helps educators gain a comprehensive view of student status and effectively intervene in advance. However, existing works scarcely consider the explainability of student performance prediction, which… ▽ More As student failure rates continue to increase in higher education, predicting student performance in the following semester has become a significant demand. Personalized student performance prediction helps educators gain a comprehensive view of student status and effectively intervene in advance. However, existing works scarcely consider the explainability of student performance prediction, which educators are most concerned about. In this paper, we propose a novel Explainable Student performance prediction method with Personalized Attention (ESPA) by utilizing relationships in student profiles and prior knowledge of related courses. The designed Bidirectional Long Short-Term Memory (BiLSTM) architecture extracts the semantic information in the paths with specific patterns. As for leveraging similar paths' internal relations, a local and global-level attention mechanism is proposed to distinguish the influence of different students or courses for making predictions. Hence, valid reasoning on paths can be applied to predict the performance of students. The ESPA consistently outperforms the other state-of-the-art models for student performance prediction, and the results are intuitively explainable. This work can help educators better understand the different impacts of behavior on students' studies. △ Less

Submitted 15 October, 2021; originally announced October 2021.

Comments: AAAI 2021 Workshop on AI Education/TIPCE 2021

arXiv:2109.12965 [pdf, other]

Text-based Person Search in Full Images via Semantic-Driven Proposal Generation

Authors: Shizhou Zhang, De Cheng, Wenlong Luo, Yinghui Xing, Duo Long, Hao Li, Kai Niu, Guoqiang Liang, Yanning Zhang

Abstract: Finding target persons in full scene images with a query of text description has important practical applications in intelligent video surveillance.However, different from the real-world scenarios where the bounding boxes are not available, existing text-based person retrieval methods mainly focus on the cross modal matching between the query text descriptions and the gallery of cropped pedestrian… ▽ More Finding target persons in full scene images with a query of text description has important practical applications in intelligent video surveillance.However, different from the real-world scenarios where the bounding boxes are not available, existing text-based person retrieval methods mainly focus on the cross modal matching between the query text descriptions and the gallery of cropped pedestrian images. To close the gap, we study the problem of text-based person search in full images by proposing a new end-to-end learning framework which jointly optimize the pedestrian detection, identification and visual-semantic feature embedding tasks. To take full advantage of the query text, the semantic features are leveraged to instruct the Region Proposal Network to pay more attention to the text-described proposals. Besides, a cross-scale visual-semantic embedding mechanism is utilized to improve the performance. To validate the proposed method, we collect and annotate two large-scale benchmark datasets based on the widely adopted image-based person search datasets CUHK-SYSU and PRW. Comprehensive experiments are conducted on the two datasets and compared with the baseline methods, our method achieves the state-of-the-art performance. △ Less

Submitted 25 February, 2024; v1 submitted 27 September, 2021; originally announced September 2021.

arXiv:2108.03508 [pdf, other]

The Effect of Training Parameters and Mechanisms on Decentralized Federated Learning based on MNIST Dataset

Authors: Zhuofan Zhang, Mi Zhou, Kaicheng Niu, Chaouki Abdallah

Abstract: Federated Learning is an algorithm suited for training models on decentralized data, but the requirement of a central "server" node is a bottleneck. In this document, we first introduce the notion of Decentralized Federated Learning (DFL). We then perform various experiments on different setups, such as changing model aggregation frequency, switching from independent and identically distributed (I… ▽ More Federated Learning is an algorithm suited for training models on decentralized data, but the requirement of a central "server" node is a bottleneck. In this document, we first introduce the notion of Decentralized Federated Learning (DFL). We then perform various experiments on different setups, such as changing model aggregation frequency, switching from independent and identically distributed (IID) dataset partitioning to non-IID partitioning with partial global sharing, using different optimization methods across clients, and breaking models into segments with partial sharing. All experiments are run on the MNIST handwritten digits dataset. We observe that those altered training procedures are generally robust, albeit non-optimal. We also observe failures in training when the variance between model weights is too large. The open-source experiment code is accessible through GitHub\footnote{Code was uploaded at \url{https://github.com/zhzhang2018/DecentralizedFL}}. △ Less

Submitted 7 August, 2021; originally announced August 2021.

arXiv:2108.03495 [pdf, other]

Game Theory and Machine Learning in UAVs-Assisted Wireless Communication Networks: A Survey

Authors: M. Zhou, Y. Guan, M. Hayajneh, K. Niu, C. Abdallah

Abstract: In recent years, Unmanned Aerial Vehicles (UAVs) have been used in fields such as architecture, business delivery, military and civilian theaters, and many others. With increased applications comes the increased demand for advanced algorithms for resource allocation and energy management. As is well known, game theory and machine learning are two powerful tools already widely used in the wireless… ▽ More In recent years, Unmanned Aerial Vehicles (UAVs) have been used in fields such as architecture, business delivery, military and civilian theaters, and many others. With increased applications comes the increased demand for advanced algorithms for resource allocation and energy management. As is well known, game theory and machine learning are two powerful tools already widely used in the wireless communication field and there are numerous surveys of game theory and machine learning usage in wireless communication. Existing surveys however focus either on game theory or machine learning and due to this fact, the current article surveys both game-theoretic and machine learning algorithms for use by UAVs in Wireless Communication Networks (U-WCNs). We also discuss how to combine game theory and machine learning for solving problems in U-WCNs and identify several future research directions. △ Less

Submitted 7 August, 2021; originally announced August 2021.

arXiv:2104.05178 [pdf, other]

Polar-Precoding: A Unitary Finite-Feedback Transmit Precoder for Polar-Coded MIMO Systems

Authors: Jinnan Piao, Kai Niu, Jincheng Dai, Lajos Hanzo

Abstract: We propose a unitary precoding scheme, namely polar-precoding, to improve the performance of polar-coded MIMO systems. In contrast to the traditional design of MIMO precoding criteria, the proposed polar-precoding scheme relies on the \emph{polarization criterion}. In particular, the precoding matrix design comprises two steps. After selecting a basic matrix for maximizing the capacity in the firs… ▽ More We propose a unitary precoding scheme, namely polar-precoding, to improve the performance of polar-coded MIMO systems. In contrast to the traditional design of MIMO precoding criteria, the proposed polar-precoding scheme relies on the \emph{polarization criterion}. In particular, the precoding matrix design comprises two steps. After selecting a basic matrix for maximizing the capacity in the first step, we design a unitary matrix for maximizing the polarization effect among the data streams without degrading the capacity. Our simulation results show that the proposed polar-precoding scheme outperforms the state-of-the-art DFT precoding scheme. △ Less

Submitted 13 September, 2021; v1 submitted 11 April, 2021; originally announced April 2021.

Comments: Polar-coded MIMO system, polarization criterion, precoding, unitary matrix

arXiv:2102.03828 [pdf, other]

Learning to Decode Protograph LDPC Codes

Authors: Jincheng Dai, Kailin Tan, Zhongwei Si, Kai Niu, Mingzhe Chen, H. Vincent Poor, Shuguang Cui

Abstract: The recent development of deep learning methods provides a new approach to optimize the belief propagation (BP) decoding of linear codes. However, the limitation of existing works is that the scale of neural networks increases rapidly with the codelength, thus they can only support short to moderate codelengths. From the point view of practicality, we propose a high-performance neural min-sum (MS)… ▽ More The recent development of deep learning methods provides a new approach to optimize the belief propagation (BP) decoding of linear codes. However, the limitation of existing works is that the scale of neural networks increases rapidly with the codelength, thus they can only support short to moderate codelengths. From the point view of practicality, we propose a high-performance neural min-sum (MS) decoding method that makes full use of the lifting structure of protograph low-density parity-check (LDPC) codes. By this means, the size of the parameter array of each layer in the neural decoder only equals the number of edge-types for arbitrary codelengths. In particular, for protograph LDPC codes, the proposed neural MS decoder is constructed in a special way such that identical parameters are shared by a bundle of edges derived from the same edge-type. To reduce the complexity and overcome the vanishing gradient problem in training the proposed neural MS decoder, an iteration-by-iteration (i.e., layer-by-layer in neural networks) greedy training method is proposed. With this, the proposed neural MS decoder tends to be optimized with faster convergence, which is aligned with the early termination mechanism widely used in practice. To further enhance the generalization ability of the proposed neural MS decoder, a codelength/rate compatible training method is proposed, which randomly selects samples from a set of codes lifted from the same base code. As a theoretical performance evaluation tool, a trajectory-based extrinsic information transfer (T-EXIT) chart is developed for various decoders. Both T-EXIT and simulation results show that the optimized MS decoding can provide faster convergence and up to 1dB gain compared with the plain MS decoding and its variants with only slightly increased complexity. In addition, it can even outperform the sum-product algorithm for some short codes. △ Less

Submitted 10 February, 2021; v1 submitted 7 February, 2021; originally announced February 2021.

Comments: To appear in the IEEE JSAC Series on Machine Learning in Communications and Networks

arXiv:2011.10308 [pdf, other]

Progressive Rate-Filling: A Framework for Agile Construction of Multilevel Polar-Coded Modulation

Authors: Jincheng Dai, Jinnan Piao, Kai Niu

Abstract: In this letter, we propose a progressive rate-filling method as a framework to study agile construction of multilevel polar-coded modulation. We show that the bit indices within each component polar code can follow a fixed, precomputed ranking sequence, e.g., the Polar sequence in the 5G standard, while their allocated rates (i.e., the number of information bits of each component polar code) can b… ▽ More In this letter, we propose a progressive rate-filling method as a framework to study agile construction of multilevel polar-coded modulation. We show that the bit indices within each component polar code can follow a fixed, precomputed ranking sequence, e.g., the Polar sequence in the 5G standard, while their allocated rates (i.e., the number of information bits of each component polar code) can be fast computed by exploiting the target sum-rate approximation and proper rate-filling methods. In particular, we develop two rate-filling strategies based on the capacity and the rate considering the finite block-length effect. The proposed construction methods can be performed independently of the actual channel condition with ${O\left(m\right)}$ ($m$ denotes the modulation order) complexity and robust to diverse modulation and coding schemes in the 5G standard, which is a desired feature for practical systems. △ Less

Submitted 20 November, 2020; originally announced November 2020.

Showing 1–50 of 74 results for author: Niu, K