-
TCM-FTP: Fine-Tuning Large Language Models for Herbal Prescription Prediction
Authors:
Xingzhi Zhou,
Xin Dong,
Chunhao Li,
Yuning Bai,
Yulong Xu,
Ka Chun Cheung,
Simon See,
Xinpeng Song,
Runshun Zhang,
Xuezhong Zhou,
Nevin L. Zhang
Abstract:
Traditional Chinese medicine (TCM) relies on specific combinations of herbs in prescriptions to treat symptoms and signs, a practice that spans thousands of years. Predicting TCM prescriptions presents a fascinating technical challenge with practical implications. However, this task faces limitations due to the scarcity of high-quality clinical datasets and the intricate relationship between sympt…
▽ More
Traditional Chinese medicine (TCM) relies on specific combinations of herbs in prescriptions to treat symptoms and signs, a practice that spans thousands of years. Predicting TCM prescriptions presents a fascinating technical challenge with practical implications. However, this task faces limitations due to the scarcity of high-quality clinical datasets and the intricate relationship between symptoms and herbs. To address these issues, we introduce DigestDS, a new dataset containing practical medical records from experienced experts in digestive system diseases. We also propose a method, TCM-FTP (TCM Fine-Tuning Pre-trained), to leverage pre-trained large language models (LLMs) through supervised fine-tuning on DigestDS. Additionally, we enhance computational efficiency using a low-rank adaptation technique. TCM-FTP also incorporates data augmentation by permuting herbs within prescriptions, capitalizing on their order-agnostic properties. Impressively, TCM-FTP achieves an F1-score of 0.8031, surpassing previous methods significantly. Furthermore, it demonstrates remarkable accuracy in dosage prediction, achieving a normalized mean square error of 0.0604. In contrast, LLMs without fine-tuning perform poorly. Although LLMs have shown capabilities on a wide range of tasks, this work illustrates the importance of fine-tuning for TCM prescription prediction, and we have proposed an effective way to do that.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Protecting NeRFs' Copyright via Plug-And-Play Watermarking Base Model
Authors:
Qi Song,
Ziyuan Luo,
Ka Chun Cheung,
Simon See,
Renjie Wan
Abstract:
Neural Radiance Fields (NeRFs) have become a key method for 3D scene representation. With the rising prominence and influence of NeRF, safeguarding its intellectual property has become increasingly important. In this paper, we propose \textbf{NeRFProtector}, which adopts a plug-and-play strategy to protect NeRF's copyright during its creation. NeRFProtector utilizes a pre-trained watermarking base…
▽ More
Neural Radiance Fields (NeRFs) have become a key method for 3D scene representation. With the rising prominence and influence of NeRF, safeguarding its intellectual property has become increasingly important. In this paper, we propose \textbf{NeRFProtector}, which adopts a plug-and-play strategy to protect NeRF's copyright during its creation. NeRFProtector utilizes a pre-trained watermarking base model, enabling NeRF creators to embed binary messages directly while creating their NeRF. Our plug-and-play property ensures NeRF creators can flexibly choose NeRF variants without excessive modifications. Leveraging our newly designed progressive distillation, we demonstrate performance on par with several leading-edge neural rendering methods. Our project is available at: \url{https://qsong2001.github.io/NeRFProtector}.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Unlocking Continual Learning Abilities in Language Models
Authors:
Wenyu Du,
Shuang Cheng,
Tongxu Luo,
Zihan Qiu,
Zeyu Huang,
Ka Chun Cheung,
Reynold Cheng,
Jie Fu
Abstract:
Language models (LMs) exhibit impressive performance and generalization capabilities. However, LMs struggle with the persistent challenge of catastrophic forgetting, which undermines their long-term sustainability in continual learning (CL). Existing approaches usually address the issue by incorporating old task data or task-wise inductive bias into LMs. However, old data and accurate task informa…
▽ More
Language models (LMs) exhibit impressive performance and generalization capabilities. However, LMs struggle with the persistent challenge of catastrophic forgetting, which undermines their long-term sustainability in continual learning (CL). Existing approaches usually address the issue by incorporating old task data or task-wise inductive bias into LMs. However, old data and accurate task information are often unavailable or costly to collect, hindering the availability of current CL approaches for LMs. To address this limitation, we introduce $\textbf{MIGU}$ ($\textbf{M}$agn$\textbf{I}$tude-based $\textbf{G}$radient $\textbf{U}$pdating for continual learning), a rehearsal-free and task-label-free method that only updates the model parameters with large magnitudes of output in LMs' linear layers. MIGU is based on our observation that the L1-normalized magnitude distribution of the output in LMs' linear layers is different when the LM models deal with different task data. By imposing this simple constraint on the gradient update process, we can leverage the inherent behaviors of LMs, thereby unlocking their innate CL abilities. Our experiments demonstrate that MIGU is universally applicable to all three LM architectures (T5, RoBERTa, and Llama2), delivering state-of-the-art or on-par performance across continual finetuning and continual pre-training settings on four CL benchmarks. For example, MIGU brings a 15.2% average accuracy improvement over conventional parameter-efficient finetuning baselines in a 15-task CL benchmark. MIGU can also seamlessly integrate with all three existing CL types to further enhance performance. Code is available at \href{https://github.com/wenyudu/MIGU}{this https URL}.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Reconfiguration Algorithms for Cubic Modular Robots with Realistic Movement Constraints
Authors:
MIT--NASA Space Robots Team,
Josh Brunner,
Kenneth C. Cheung,
Erik D. Demaine,
Jenny Diomidova,
Christine Gregg,
Della H. Hendrickson,
Irina Kostitsyna
Abstract:
We introduce and analyze a model for self-reconfigurable robots made up of unit-cube modules. Compared to past models, our model aims to newly capture two important practical aspects of real-world robots. First, modules often do not occupy an exact unit cube, but rather have features like bumps extending outside the allotted space so that modules can interlock. Thus, for example, our model forbids…
▽ More
We introduce and analyze a model for self-reconfigurable robots made up of unit-cube modules. Compared to past models, our model aims to newly capture two important practical aspects of real-world robots. First, modules often do not occupy an exact unit cube, but rather have features like bumps extending outside the allotted space so that modules can interlock. Thus, for example, our model forbids modules from squeezing in between two other modules that are one unit distance apart. Second, our model captures the practical scenario of many passive modules assembled by a single robot, instead of requiring all modules to be able to move on their own.
We prove two universality results. First, with a supply of auxiliary modules, we show that any connected polycube structure can be constructed by a carefully aligned plane sweep. Second, without additional modules, we show how to construct any structure for which a natural notion of external feature size is at least a constant; this property largely consolidates forbidden-pattern properties used in previous works on reconfigurable modular robots.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
RegionGPT: Towards Region Understanding Vision Language Model
Authors:
Qiushan Guo,
Shalini De Mello,
Hongxu Yin,
Wonmin Byeon,
Ka Chun Cheung,
Yizhou Yu,
Ping Luo,
Sifei Liu
Abstract:
Vision language models (VLMs) have experienced rapid advancements through the integration of large language models (LLMs) with image-text pairs, yet they struggle with detailed regional visual understanding due to limited spatial awareness of the vision encoder, and the use of coarse-grained training data that lacks detailed, region-specific captions. To address this, we introduce RegionGPT (short…
▽ More
Vision language models (VLMs) have experienced rapid advancements through the integration of large language models (LLMs) with image-text pairs, yet they struggle with detailed regional visual understanding due to limited spatial awareness of the vision encoder, and the use of coarse-grained training data that lacks detailed, region-specific captions. To address this, we introduce RegionGPT (short as RGPT), a novel framework designed for complex region-level captioning and understanding. RGPT enhances the spatial awareness of regional representation with simple yet effective modifications to existing visual encoders in VLMs. We further improve performance on tasks requiring a specific output scope by integrating task-guided instruction prompts during both training and inference phases, while maintaining the model's versatility for general-purpose tasks. Additionally, we develop an automated region caption data generation pipeline, enriching the training set with detailed region-level captions. We demonstrate that a universal RGPT model can be effectively applied and significantly enhancing performance across a range of region-level tasks, including but not limited to complex region descriptions, reasoning, object classification, and referring expressions comprehension.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
Authors:
Xiaoyu Shi,
Zhaoyang Huang,
Fu-Yun Wang,
Weikang Bian,
Dasong Li,
Yi Zhang,
Manyuan Zhang,
Ka Chun Cheung,
Simon See,
Hongwei Qin,
Jifeng Dai,
Hongsheng Li
Abstract:
We introduce Motion-I2V, a novel framework for consistent and controllable image-to-video generation (I2V). In contrast to previous methods that directly learn the complicated image-to-video mapping, Motion-I2V factorizes I2V into two stages with explicit motion modeling. For the first stage, we propose a diffusion-based motion field predictor, which focuses on deducing the trajectories of the ref…
▽ More
We introduce Motion-I2V, a novel framework for consistent and controllable image-to-video generation (I2V). In contrast to previous methods that directly learn the complicated image-to-video mapping, Motion-I2V factorizes I2V into two stages with explicit motion modeling. For the first stage, we propose a diffusion-based motion field predictor, which focuses on deducing the trajectories of the reference image's pixels. For the second stage, we propose motion-augmented temporal attention to enhance the limited 1-D temporal attention in video latent diffusion models. This module can effectively propagate reference image's feature to synthesized frames with the guidance of predicted trajectories from the first stage. Compared with existing methods, Motion-I2V can generate more consistent videos even at the presence of large motion and viewpoint variation. By training a sparse trajectory ControlNet for the first stage, Motion-I2V can support users to precisely control motion trajectories and motion regions with sparse trajectory and region annotations. This offers more controllability of the I2V process than solely relying on textual instructions. Additionally, Motion-I2V's second stage naturally supports zero-shot video-to-video translation. Both qualitative and quantitative comparisons demonstrate the advantages of Motion-I2V over prior approaches in consistent and controllable image-to-video generation. Please see our project page at https://xiaoyushi97.github.io/Motion-I2V/.
△ Less
Submitted 31 January, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
Resilient Practical Test-Time Adaptation: Soft Batch Normalization Alignment and Entropy-driven Memory Bank
Authors:
Xingzhi Zhou,
Zhiliang Tian,
Ka Chun Cheung,
Simon See,
Nevin L. Zhang
Abstract:
Test-time domain adaptation effectively adjusts the source domain model to accommodate unseen domain shifts in a target domain during inference. However, the model performance can be significantly impaired by continuous distribution changes in the target domain and non-independent and identically distributed (non-i.i.d.) test samples often encountered in practical scenarios. While existing memory…
▽ More
Test-time domain adaptation effectively adjusts the source domain model to accommodate unseen domain shifts in a target domain during inference. However, the model performance can be significantly impaired by continuous distribution changes in the target domain and non-independent and identically distributed (non-i.i.d.) test samples often encountered in practical scenarios. While existing memory bank methodologies use memory to store samples and mitigate non-i.i.d. effects, they do not inherently prevent potential model degradation. To address this issue, we propose a resilient practical test-time adaptation (ResiTTA) method focused on parameter resilience and data quality. Specifically, we develop a resilient batch normalization with estimation on normalization statistics and soft alignments to mitigate overfitting and model degradation. We use an entropy-driven memory bank that accounts for timeliness, the persistence of over-confident samples, and sample uncertainty for high-quality data in adaptation. Our framework periodically adapts the source domain model using a teacher-student model through a self-training loss on the memory samples, incorporating soft alignment losses on batch normalization. We empirically validate ResiTTA across various benchmark datasets, demonstrating state-of-the-art performance.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Low-count Time Series Anomaly Detection
Authors:
Philipp Renz,
Kurt Cutajar,
Niall Twomey,
Gavin K. C. Cheung,
Hanting Xie
Abstract:
Low-count time series describe sparse or intermittent events, which are prevalent in large-scale online platforms that capture and monitor diverse data types. Several distinct challenges surface when modelling low-count time series, particularly low signal-to-noise ratios (when anomaly signatures are provably undetectable), and non-uniform performance (when average metrics are not representative o…
▽ More
Low-count time series describe sparse or intermittent events, which are prevalent in large-scale online platforms that capture and monitor diverse data types. Several distinct challenges surface when modelling low-count time series, particularly low signal-to-noise ratios (when anomaly signatures are provably undetectable), and non-uniform performance (when average metrics are not representative of local behaviour). The time series anomaly detection community currently lacks explicit tooling and processes to model and reliably detect anomalies in these settings. We address this gap by introducing a novel generative procedure for creating benchmark datasets comprising of low-count time series with anomalous segments. Via a mixture of theoretical and empirical analysis, our work explains how widely-used algorithms struggle with the distribution overlap between normal and anomalous segments. In order to mitigate this shortcoming, we then leverage our findings to demonstrate how anomaly score smoothing consistently improves performance. The practical utility of our analysis and recommendation is validated on a real-world dataset containing sales data for retail stores.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
CopyRNeRF: Protecting the CopyRight of Neural Radiance Fields
Authors:
Ziyuan Luo,
Qing Guo,
Ka Chun Cheung,
Simon See,
Renjie Wan
Abstract:
Neural Radiance Fields (NeRF) have the potential to be a major representation of media. Since training a NeRF has never been an easy task, the protection of its model copyright should be a priority. In this paper, by analyzing the pros and cons of possible copyright protection solutions, we propose to protect the copyright of NeRF models by replacing the original color representation in NeRF with…
▽ More
Neural Radiance Fields (NeRF) have the potential to be a major representation of media. Since training a NeRF has never been an easy task, the protection of its model copyright should be a priority. In this paper, by analyzing the pros and cons of possible copyright protection solutions, we propose to protect the copyright of NeRF models by replacing the original color representation in NeRF with a watermarked color representation. Then, a distortion-resistant rendering scheme is designed to guarantee robust message extraction in 2D renderings of NeRF. Our proposed method can directly protect the copyright of NeRF models while maintaining high rendering quality and bit accuracy when compared among optional solutions.
△ Less
Submitted 29 July, 2023; v1 submitted 21 July, 2023;
originally announced July 2023.
-
TrajectoryFormer: 3D Object Tracking Transformer with Predictive Trajectory Hypotheses
Authors:
Xuesong Chen,
Shaoshuai Shi,
Chao Zhang,
Benjin Zhu,
Qiang Wang,
Ka Chun Cheung,
Simon See,
Hongsheng Li
Abstract:
3D multi-object tracking (MOT) is vital for many applications including autonomous driving vehicles and service robots. With the commonly used tracking-by-detection paradigm, 3D MOT has made important progress in recent years. However, these methods only use the detection boxes of the current frame to obtain trajectory-box association results, which makes it impossible for the tracker to recover o…
▽ More
3D multi-object tracking (MOT) is vital for many applications including autonomous driving vehicles and service robots. With the commonly used tracking-by-detection paradigm, 3D MOT has made important progress in recent years. However, these methods only use the detection boxes of the current frame to obtain trajectory-box association results, which makes it impossible for the tracker to recover objects missed by the detector. In this paper, we present TrajectoryFormer, a novel point-cloud-based 3D MOT framework. To recover the missed object by detector, we generates multiple trajectory hypotheses with hybrid candidate boxes, including temporally predicted boxes and current-frame detection boxes, for trajectory-box association. The predicted boxes can propagate object's history trajectory information to the current frame and thus the network can tolerate short-term miss detection of the tracked objects. We combine long-term object motion feature and short-term object appearance feature to create per-hypothesis feature embedding, which reduces the computational overhead for spatial-temporal encoding. Additionally, we introduce a Global-Local Interaction Module to conduct information interaction among all hypotheses and models their spatial relations, leading to accurate estimation of hypotheses. Our TrajectoryFormer achieves state-of-the-art performance on the Waymo 3D MOT benchmarks. Code is available at https://github.com/poodarchu/EFG .
△ Less
Submitted 18 August, 2023; v1 submitted 9 June, 2023;
originally announced June 2023.
-
VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation
Authors:
Xiaoyu Shi,
Zhaoyang Huang,
Weikang Bian,
Dasong Li,
Manyuan Zhang,
Ka Chun Cheung,
Simon See,
Hongwei Qin,
Jifeng Dai,
Hongsheng Li
Abstract:
We introduce VideoFlow, a novel optical flow estimation framework for videos. In contrast to previous methods that learn to estimate optical flow from two frames, VideoFlow concurrently estimates bi-directional optical flows for multiple frames that are available in videos by sufficiently exploiting temporal cues. We first propose a TRi-frame Optical Flow (TROF) module that estimates bi-directiona…
▽ More
We introduce VideoFlow, a novel optical flow estimation framework for videos. In contrast to previous methods that learn to estimate optical flow from two frames, VideoFlow concurrently estimates bi-directional optical flows for multiple frames that are available in videos by sufficiently exploiting temporal cues. We first propose a TRi-frame Optical Flow (TROF) module that estimates bi-directional optical flows for the center frame in a three-frame manner. The information of the frame triplet is iteratively fused onto the center frame. To extend TROF for handling more frames, we further propose a MOtion Propagation (MOP) module that bridges multiple TROFs and propagates motion features between adjacent TROFs. With the iterative flow estimation refinement, the information fused in individual TROFs can be propagated into the whole sequence via MOP. By effectively exploiting video information, VideoFlow presents extraordinary performance, ranking 1st on all public benchmarks. On the Sintel benchmark, VideoFlow achieves 1.649 and 0.991 average end-point-error (AEPE) on the final and clean passes, a 15.1% and 7.6% error reduction from the best-published results (1.943 and 1.073 from FlowFormer++). On the KITTI-2015 benchmark, VideoFlow achieves an F1-all error of 3.65%, a 19.2% error reduction from the best-published result (4.52% from FlowFormer++). Code is released at \url{https://github.com/XiaoyuShi97/VideoFlow}.
△ Less
Submitted 20 August, 2023; v1 submitted 14 March, 2023;
originally announced March 2023.
-
FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation
Authors:
Xiaoyu Shi,
Zhaoyang Huang,
Dasong Li,
Manyuan Zhang,
Ka Chun Cheung,
Simon See,
Hongwei Qin,
Jifeng Dai,
Hongsheng Li
Abstract:
FlowFormer introduces a transformer architecture into optical flow estimation and achieves state-of-the-art performance. The core component of FlowFormer is the transformer-based cost-volume encoder. Inspired by the recent success of masked autoencoding (MAE) pretraining in unleashing transformers' capacity of encoding visual representation, we propose Masked Cost Volume Autoencoding (MCVA) to enh…
▽ More
FlowFormer introduces a transformer architecture into optical flow estimation and achieves state-of-the-art performance. The core component of FlowFormer is the transformer-based cost-volume encoder. Inspired by the recent success of masked autoencoding (MAE) pretraining in unleashing transformers' capacity of encoding visual representation, we propose Masked Cost Volume Autoencoding (MCVA) to enhance FlowFormer by pretraining the cost-volume encoder with a novel MAE scheme. Firstly, we introduce a block-sharing masking strategy to prevent masked information leakage, as the cost maps of neighboring source pixels are highly correlated. Secondly, we propose a novel pre-text reconstruction task, which encourages the cost-volume encoder to aggregate long-range information and ensures pretraining-finetuning consistency. We also show how to modify the FlowFormer architecture to accommodate masks during pretraining. Pretrained with MCVA, FlowFormer++ ranks 1st among published methods on both Sintel and KITTI-2015 benchmarks. Specifically, FlowFormer++ achieves 1.07 and 1.94 average end-point error (AEPE) on the clean and final pass of Sintel benchmark, leading to 7.76\% and 7.18\% error reductions from FlowFormer. FlowFormer++ obtains 4.52 F1-all on the KITTI-2015 test set, improving FlowFormer by 0.16.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
NAS-LID: Efficient Neural Architecture Search with Local Intrinsic Dimension
Authors:
Xin He,
Jiangchao Yao,
Yuxin Wang,
Zhenheng Tang,
Ka Chu Cheung,
Simon See,
Bo Han,
Xiaowen Chu
Abstract:
One-shot neural architecture search (NAS) substantially improves the search efficiency by training one supernet to estimate the performance of every possible child architecture (i.e., subnet). However, the inconsistency of characteristics among subnets incurs serious interference in the optimization, resulting in poor performance ranking correlation of subnets. Subsequent explorations decompose su…
▽ More
One-shot neural architecture search (NAS) substantially improves the search efficiency by training one supernet to estimate the performance of every possible child architecture (i.e., subnet). However, the inconsistency of characteristics among subnets incurs serious interference in the optimization, resulting in poor performance ranking correlation of subnets. Subsequent explorations decompose supernet weights via a particular criterion, e.g., gradient matching, to reduce the interference; yet they suffer from huge computational cost and low space separability. In this work, we propose a lightweight and effective local intrinsic dimension (LID)-based method NAS-LID. NAS-LID evaluates the geometrical properties of architectures by calculating the low-cost LID features layer-by-layer, and the similarity characterized by LID enjoys better separability compared with gradients, which thus effectively reduces the interference among subnets. Extensive experiments on NASBench-201 indicate that NAS-LID achieves superior performance with better efficiency. Specifically, compared to the gradient-driven method, NAS-LID can save up to 86% of GPU memory overhead when searching on NASBench-201. We also demonstrate the effectiveness of NAS-LID on ProxylessNAS and OFA spaces. Source code: https://github.com/marsggbo/NAS-LID.
△ Less
Submitted 24 November, 2022; v1 submitted 23 November, 2022;
originally announced November 2022.
-
SVD-PINNs: Transfer Learning of Physics-Informed Neural Networks via Singular Value Decomposition
Authors:
Yihang Gao,
Ka Chun Cheung,
Michael K. Ng
Abstract:
Physics-informed neural networks (PINNs) have attracted significant attention for solving partial differential equations (PDEs) in recent years because they alleviate the curse of dimensionality that appears in traditional methods. However, the most disadvantage of PINNs is that one neural network corresponds to one PDE. In practice, we usually need to solve a class of PDEs, not just one. With the…
▽ More
Physics-informed neural networks (PINNs) have attracted significant attention for solving partial differential equations (PDEs) in recent years because they alleviate the curse of dimensionality that appears in traditional methods. However, the most disadvantage of PINNs is that one neural network corresponds to one PDE. In practice, we usually need to solve a class of PDEs, not just one. With the explosive growth of deep learning, many useful techniques in general deep learning tasks are also suitable for PINNs. Transfer learning methods may reduce the cost for PINNs in solving a class of PDEs. In this paper, we proposed a transfer learning method of PINNs via keeping singular vectors and optimizing singular values (namely SVD-PINNs). Numerical experiments on high dimensional PDEs (10-d linear parabolic equations and 10-d Allen-Cahn equations) show that SVD-PINNs work for solving a class of PDEs with different but close right-hand-side functions.
△ Less
Submitted 14 March, 2024; v1 submitted 16 November, 2022;
originally announced November 2022.
-
Adaptive Label Smoothing with Self-Knowledge in Natural Language Generation
Authors:
Dongkyu Lee,
Ka Chun Cheung,
Nevin L. Zhang
Abstract:
Overconfidence has been shown to impair generalization and calibration of a neural network. Previous studies remedy this issue by adding a regularization term to a loss function, preventing a model from making a peaked distribution. Label smoothing smoothes target labels with a pre-defined prior label distribution; as a result, a model is learned to maximize the likelihood of predicting the soft l…
▽ More
Overconfidence has been shown to impair generalization and calibration of a neural network. Previous studies remedy this issue by adding a regularization term to a loss function, preventing a model from making a peaked distribution. Label smoothing smoothes target labels with a pre-defined prior label distribution; as a result, a model is learned to maximize the likelihood of predicting the soft label. Nonetheless, the amount of smoothing is the same in all samples and remains fixed in training. In other words, label smoothing does not reflect the change in probability distribution mapped by a model over the course of training. To address this issue, we propose a regularization scheme that brings dynamic nature into the smoothing parameter by taking model probability distribution into account, thereby varying the parameter per instance. A model in training self-regulates the extent of smoothing on the fly during forward propagation. Furthermore, inspired by recent work in bridging label smoothing and knowledge distillation, our work utilizes self-knowledge as a prior label distribution in softening target labels, and presents theoretical support for the regularization effect by knowledge distillation and the dynamic smoothing parameter. Our regularizer is validated comprehensively, and the result illustrates marked improvements in model generalization and calibration, enhancing robustness and trustworthiness of a model.
△ Less
Submitted 22 October, 2022;
originally announced October 2022.
-
Hard Gate Knowledge Distillation -- Leverage Calibration for Robust and Reliable Language Model
Authors:
Dongkyu Lee,
Zhiliang Tian,
Yingxiu Zhao,
Ka Chun Cheung,
Nevin L. Zhang
Abstract:
In knowledge distillation, a student model is trained with supervisions from both knowledge from a teacher and observations drawn from a training data distribution. Knowledge of a teacher is considered a subject that holds inter-class relations which send a meaningful supervision to a student; hence, much effort has been put to find such knowledge to be distilled. In this paper, we explore a quest…
▽ More
In knowledge distillation, a student model is trained with supervisions from both knowledge from a teacher and observations drawn from a training data distribution. Knowledge of a teacher is considered a subject that holds inter-class relations which send a meaningful supervision to a student; hence, much effort has been put to find such knowledge to be distilled. In this paper, we explore a question that has been given little attention: "when to distill such knowledge." The question is answered in our work with the concept of model calibration; we view a teacher model not only as a source of knowledge but also as a gauge to detect miscalibration of a student. This simple and yet novel view leads to a hard gate knowledge distillation scheme that switches between learning from a teacher model and training data. We verify the gating mechanism in the context of natural language generation at both the token-level and the sentence-level. Empirical comparisons with strong baselines show that hard gate knowledge distillation not only improves model generalization, but also significantly lowers model calibration error.
△ Less
Submitted 22 October, 2022;
originally announced October 2022.
-
NeuralMarker: A Framework for Learning General Marker Correspondence
Authors:
Zhaoyang Huang,
Xiaokun Pan,
Weihong Pan,
Weikang Bian,
Yan Xu,
Ka Chun Cheung,
Guofeng Zhang,
Hongsheng Li
Abstract:
We tackle the problem of estimating correspondences from a general marker, such as a movie poster, to an image that captures such a marker. Conventionally, this problem is addressed by fitting a homography model based on sparse feature matching. However, they are only able to handle plane-like markers and the sparse features do not sufficiently utilize appearance information. In this paper, we pro…
▽ More
We tackle the problem of estimating correspondences from a general marker, such as a movie poster, to an image that captures such a marker. Conventionally, this problem is addressed by fitting a homography model based on sparse feature matching. However, they are only able to handle plane-like markers and the sparse features do not sufficiently utilize appearance information. In this paper, we propose a novel framework NeuralMarker, training a neural network estimating dense marker correspondences under various challenging conditions, such as marker deformation, harsh lighting, etc. Besides, we also propose a novel marker correspondence evaluation method circumstancing annotations on real marker-image pairs and create a new benchmark. We show that NeuralMarker significantly outperforms previous methods and enables new interesting applications, including Augmented Reality (AR) and video editing.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Learning Degradation Representations for Image Deblurring
Authors:
Dasong Li,
Yi Zhang,
Ka Chun Cheung,
Xiaogang Wang,
Hongwei Qin,
Hongsheng Li
Abstract:
In various learning-based image restoration tasks, such as image denoising and image super-resolution, the degradation representations were widely used to model the degradation process and handle complicated degradation patterns. However, they are less explored in learning-based image deblurring as blur kernel estimation cannot perform well in real-world challenging cases. We argue that it is part…
▽ More
In various learning-based image restoration tasks, such as image denoising and image super-resolution, the degradation representations were widely used to model the degradation process and handle complicated degradation patterns. However, they are less explored in learning-based image deblurring as blur kernel estimation cannot perform well in real-world challenging cases. We argue that it is particularly necessary for image deblurring to model degradation representations since blurry patterns typically show much larger variations than noisy patterns or high-frequency textures.In this paper, we propose a framework to learn spatially adaptive degradation representations of blurry images. A novel joint image reblurring and deblurring learning process is presented to improve the expressiveness of degradation representations. To make learned degradation representations effective in reblurring and deblurring, we propose a Multi-Scale Degradation Injection Network (MSDI-Net) to integrate them into the neural networks. With the integration, MSDI-Net can handle various and complicated blurry patterns adaptively. Experiments on the GoPro and RealBlur datasets demonstrate that our proposed deblurring framework with the learned degradation representations outperforms state-of-the-art methods with appealing improvements. The code is released at https://github.com/dasongli1/Learning_degradation.
△ Less
Submitted 10 August, 2022;
originally announced August 2022.
-
A Simple Baseline for Video Restoration with Grouped Spatial-temporal Shift
Authors:
Dasong Li,
Xiaoyu Shi,
Yi Zhang,
Ka Chun Cheung,
Simon See,
Xiaogang Wang,
Hongwei Qin,
Hongsheng Li
Abstract:
Video restoration, which aims to restore clear frames from degraded videos, has numerous important applications. The key to video restoration depends on utilizing inter-frame information. However, existing deep learning methods often rely on complicated network architectures, such as optical flow estimation, deformable convolution, and cross-frame self-attention layers, resulting in high computati…
▽ More
Video restoration, which aims to restore clear frames from degraded videos, has numerous important applications. The key to video restoration depends on utilizing inter-frame information. However, existing deep learning methods often rely on complicated network architectures, such as optical flow estimation, deformable convolution, and cross-frame self-attention layers, resulting in high computational costs. In this study, we propose a simple yet effective framework for video restoration. Our approach is based on grouped spatial-temporal shift, which is a lightweight and straightforward technique that can implicitly capture inter-frame correspondences for multi-frame aggregation. By introducing grouped spatial shift, we attain expansive effective receptive fields. Combined with basic 2D convolution, this simple framework can effectively aggregate inter-frame information. Extensive experiments demonstrate that our framework outperforms the previous state-of-the-art method, while using less than a quarter of its computational cost, on both video deblurring and video denoising tasks. These results indicate the potential for our approach to significantly reduce computational overhead while maintaining high-quality results. Code is avaliable at https://github.com/dasongli1/Shift-Net.
△ Less
Submitted 22 May, 2023; v1 submitted 21 June, 2022;
originally announced June 2022.
-
MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection
Authors:
Xuesong Chen,
Shaoshuai Shi,
Benjin Zhu,
Ka Chun Cheung,
Hang Xu,
Hongsheng Li
Abstract:
Accurate and reliable 3D detection is vital for many applications including autonomous driving vehicles and service robots. In this paper, we present a flexible and high-performance 3D detection framework, named MPPNet, for 3D temporal object detection with point cloud sequences. We propose a novel three-hierarchy framework with proxy points for multi-frame feature encoding and interactions to ach…
▽ More
Accurate and reliable 3D detection is vital for many applications including autonomous driving vehicles and service robots. In this paper, we present a flexible and high-performance 3D detection framework, named MPPNet, for 3D temporal object detection with point cloud sequences. We propose a novel three-hierarchy framework with proxy points for multi-frame feature encoding and interactions to achieve better detection. The three hierarchies conduct per-frame feature encoding, short-clip feature fusion, and whole-sequence feature aggregation, respectively. To enable processing long-sequence point clouds with reasonable computational resources, intra-group feature mixing and inter-group feature attention are proposed to form the second and third feature encoding hierarchies, which are recurrently applied for aggregating multi-frame trajectory features. The proxy points not only act as consistent object representations for each frame, but also serve as the courier to facilitate feature interaction between frames. The experiments on large Waymo Open dataset show that our approach outperforms state-of-the-art methods with large margins when applied to both short (e.g., 4-frame) and long (e.g., 16-frame) point cloud sequences. Code is available at https://github.com/open-mmlab/OpenPCDet.
△ Less
Submitted 2 September, 2022; v1 submitted 12 May, 2022;
originally announced May 2022.
-
FlowFormer: A Transformer Architecture for Optical Flow
Authors:
Zhaoyang Huang,
Xiaoyu Shi,
Chao Zhang,
Qiang Wang,
Ka Chun Cheung,
Hongwei Qin,
Jifeng Dai,
Hongsheng Li
Abstract:
We introduce optical Flow transFormer, dubbed as FlowFormer, a transformer-based neural network architecture for learning optical flow. FlowFormer tokenizes the 4D cost volume built from an image pair, encodes the cost tokens into a cost memory with alternate-group transformer (AGT) layers in a novel latent space, and decodes the cost memory via a recurrent transformer decoder with dynamic positio…
▽ More
We introduce optical Flow transFormer, dubbed as FlowFormer, a transformer-based neural network architecture for learning optical flow. FlowFormer tokenizes the 4D cost volume built from an image pair, encodes the cost tokens into a cost memory with alternate-group transformer (AGT) layers in a novel latent space, and decodes the cost memory via a recurrent transformer decoder with dynamic positional cost queries. On the Sintel benchmark, FlowFormer achieves 1.159 and 2.088 average end-point-error (AEPE) on the clean and final pass, a 16.5% and 15.5% error reduction from the best published result (1.388 and 2.47). Besides, FlowFormer also achieves strong generalization performance. Without being trained on Sintel, FlowFormer achieves 1.01 AEPE on the clean pass of Sintel training set, outperforming the best published result (1.29) by 21.7%.
△ Less
Submitted 21 September, 2022; v1 submitted 30 March, 2022;
originally announced March 2022.
-
Massively parallel pixel-by-pixel nanophotonic optimization using a Green's function formalism
Authors:
Jiahui Wang,
Alfred K. C. Cheung,
Aleksandra Spyra,
Ian A. D. Williamson,
Jian Guan,
Martin F. Schubert
Abstract:
We introduce an efficient parallelization scheme to implement pixel-by-pixel nanophotonic optimization using a Green's function based formalism. The crucial insight in our proposal is the reframing of the optimization algorithm as a large-scale data processing pipeline, which allows for the efficient distribution of computational tasks across thousands of workers. We demonstrate the utility of our…
▽ More
We introduce an efficient parallelization scheme to implement pixel-by-pixel nanophotonic optimization using a Green's function based formalism. The crucial insight in our proposal is the reframing of the optimization algorithm as a large-scale data processing pipeline, which allows for the efficient distribution of computational tasks across thousands of workers. We demonstrate the utility of our implementation by exercising it to optimize a high numerical aperture focusing metalens at problem sizes that would otherwise be far out of reach for the Green's function based method. Finally, we highlight the connection to powerful ideas from reinforcement learning as a natural corollary of reinterpreting the nanophotonic inverse design problem as a graph traversal enabled by the pixel-by-pixel optimization paradigm.
△ Less
Submitted 10 February, 2022;
originally announced February 2022.
-
Inverse design of photonic devices with strict foundry fabrication constraints
Authors:
Martin F. Schubert,
Alfred K. C. Cheung,
Ian A. D. Williamson,
Aleksandra Spyra,
David H. Alexander
Abstract:
We introduce a new method for inverse design of nanophotonic devices which guarantees that resulting designs satisfy strict length scale constraints - including minimum width and spacing constraints required by commercial semiconductor foundries. The method adopts several concepts from machine learning to transform the problem of topology optimization with strict length scale constraints to an unc…
▽ More
We introduce a new method for inverse design of nanophotonic devices which guarantees that resulting designs satisfy strict length scale constraints - including minimum width and spacing constraints required by commercial semiconductor foundries. The method adopts several concepts from machine learning to transform the problem of topology optimization with strict length scale constraints to an unconstrained stochastic gradient optimization problem. Specifically, we introduce a conditional generator for feasible designs and adopt a straight-through estimator for backpropagation of gradients to a latent design. We demonstrate the performance and reliability of our method by designing several common integrated photonic components.
△ Less
Submitted 13 June, 2022; v1 submitted 30 January, 2022;
originally announced January 2022.
-
Self-distillation with Batch Knowledge Ensembling Improves ImageNet Classification
Authors:
Yixiao Ge,
Xiao Zhang,
Ching Lam Choi,
Ka Chun Cheung,
Peipei Zhao,
Feng Zhu,
Xiaogang Wang,
Rui Zhao,
Hongsheng Li
Abstract:
The recent studies of knowledge distillation have discovered that ensembling the "dark knowledge" from multiple teachers or students contributes to creating better soft targets for training, but at the cost of significantly more computations and/or parameters. In this work, we present BAtch Knowledge Ensembling (BAKE) to produce refined soft targets for anchor images by propagating and ensembling…
▽ More
The recent studies of knowledge distillation have discovered that ensembling the "dark knowledge" from multiple teachers or students contributes to creating better soft targets for training, but at the cost of significantly more computations and/or parameters. In this work, we present BAtch Knowledge Ensembling (BAKE) to produce refined soft targets for anchor images by propagating and ensembling the knowledge of the other samples in the same mini-batch. Specifically, for each sample of interest, the propagation of knowledge is weighted in accordance with the inter-sample affinities, which are estimated on-the-fly with the current network. The propagated knowledge can then be ensembled to form a better soft target for distillation. In this way, our BAKE framework achieves online knowledge ensembling across multiple samples with only a single network. It requires minimal computational and memory overhead compared to existing knowledge ensembling methods. Extensive experiments demonstrate that the lightweight yet effective BAKE consistently boosts the classification performance of various architectures on multiple datasets, e.g., a significant +0.7% gain of Swin-T on ImageNet with only +1.5% computational overhead and zero additional parameters. BAKE does not only improve the vanilla baselines, but also surpasses the single-network state-of-the-arts on all the benchmarks.
△ Less
Submitted 20 November, 2021; v1 submitted 27 April, 2021;
originally announced April 2021.
-
LIFE: Lighting Invariant Flow Estimation
Authors:
Zhaoyang Huang,
Xiaokun Pan,
Runsen Xu,
Yan Xu,
Ka chun Cheung,
Guofeng Zhang,
Hongsheng Li
Abstract:
We tackle the problem of estimating flow between two images with large lighting variations. Recent learning-based flow estimation frameworks have shown remarkable performance on image pairs with small displacement and constant illuminations, but cannot work well on cases with large viewpoint change and lighting variations because of the lack of pixel-wise flow annotations for such cases. We observ…
▽ More
We tackle the problem of estimating flow between two images with large lighting variations. Recent learning-based flow estimation frameworks have shown remarkable performance on image pairs with small displacement and constant illuminations, but cannot work well on cases with large viewpoint change and lighting variations because of the lack of pixel-wise flow annotations for such cases. We observe that via the Structure-from-Motion (SfM) techniques, one can easily estimate relative camera poses between image pairs with large viewpoint change and lighting variations. We propose a novel weakly supervised framework LIFE to train a neural network for estimating accurate lighting-invariant flows between image pairs. Sparse correspondences are conventionally established via feature matching with descriptors encoding local image contents. However, local image contents are inevitably ambiguous and error-prone during the cross-image feature matching process, which hinders downstream tasks. We propose to guide feature matching with the flows predicted by LIFE, which addresses the ambiguous matching by utilizing abundant context information in the image pairs. We show that LIFE outperforms previous flow learning frameworks by large margins in challenging scenarios, consistently improves feature matching, and benefits downstream tasks.
△ Less
Submitted 19 April, 2021; v1 submitted 7 April, 2021;
originally announced April 2021.
-
Understanding Top-k Sparsification in Distributed Deep Learning
Authors:
Shaohuai Shi,
Xiaowen Chu,
Ka Chun Cheung,
Simon See
Abstract:
Distributed stochastic gradient descent (SGD) algorithms are widely deployed in training large-scale deep learning models, while the communication overhead among workers becomes the new system bottleneck. Recently proposed gradient sparsification techniques, especially Top-$k$ sparsification with error compensation (TopK-SGD), can significantly reduce the communication traffic without an obvious i…
▽ More
Distributed stochastic gradient descent (SGD) algorithms are widely deployed in training large-scale deep learning models, while the communication overhead among workers becomes the new system bottleneck. Recently proposed gradient sparsification techniques, especially Top-$k$ sparsification with error compensation (TopK-SGD), can significantly reduce the communication traffic without an obvious impact on the model accuracy. Some theoretical studies have been carried out to analyze the convergence property of TopK-SGD. However, existing studies do not dive into the details of Top-$k$ operator in gradient sparsification and use relaxed bounds (e.g., exact bound of Random-$k$) for analysis; hence the derived results cannot well describe the real convergence performance of TopK-SGD. To this end, we first study the gradient distributions of TopK-SGD during the training process through extensive experiments. We then theoretically derive a tighter bound for the Top-$k$ operator. Finally, we exploit the property of gradient distribution to propose an approximate top-$k$ selection algorithm, which is computing-efficient for GPUs, to improve the scaling efficiency of TopK-SGD by significantly reducing the computing overhead. Codes are available at: \url{https://github.com/hclhkbu/GaussianK-SGD}.
△ Less
Submitted 20 November, 2019;
originally announced November 2019.
-
Folding Polyominoes with Holes into a Cube
Authors:
Oswin Aichholzer,
Hugo A. Akitaya,
Kenneth C. Cheung,
Erik D. Demaine,
Martin L. Demaine,
Sándor P. Fekete,
Linda Kleist,
Irina Kostitsyna,
Maarten Löffler,
Zuzana Masárová,
Klara Mundilova,
Christiane Schmidt
Abstract:
When can a polyomino piece of paper be folded into a unit cube? Prior work studied tree-like polyominoes, but polyominoes with holes remain an intriguing open problem. We present sufficient conditions for a polyomino with one or several holes to fold into a cube, and conditions under which cube folding is impossible. In particular, we show that all but five special \emph{simple} holes guarantee fo…
▽ More
When can a polyomino piece of paper be folded into a unit cube? Prior work studied tree-like polyominoes, but polyominoes with holes remain an intriguing open problem. We present sufficient conditions for a polyomino with one or several holes to fold into a cube, and conditions under which cube folding is impossible. In particular, we show that all but five special \emph{simple} holes guarantee foldability.
△ Less
Submitted 2 July, 2020; v1 submitted 22 October, 2019;
originally announced October 2019.
-
Simulating with AcCoRD: Actor-Based Communication via Reaction-Diffusion
Authors:
Adam Noel,
Karen C. Cheung,
Robert Schober,
Dimitrios Makrakis,
Abdelhakim Hafid
Abstract:
This paper introduces AcCoRD (Actor-based Communication via Reaction-Diffusion) version 1.0. AcCoRD is a sandbox reaction-diffusion solver designed for the study of molecular communication systems. It uses a hybrid of microscopic and mesoscopic simulation models that enables scalability via user control of local accuracy. AcCoRD is developed in C as an open source command line tool and includes ut…
▽ More
This paper introduces AcCoRD (Actor-based Communication via Reaction-Diffusion) version 1.0. AcCoRD is a sandbox reaction-diffusion solver designed for the study of molecular communication systems. It uses a hybrid of microscopic and mesoscopic simulation models that enables scalability via user control of local accuracy. AcCoRD is developed in C as an open source command line tool and includes utilities to process simulation output in MATLAB. The latest code and links to user documentation can be found at https://github.com/adamjgnoel/AcCoRD/. This paper provides an overview of AcCoRD's design, including the motivation for developing a specialized reaction-diffusion solver. The corresponding algorithms are presented in detail, including the computational complexity of the microscopic and mesoscopic models. Other novel derivations include the transition rates between adjacent mesoscopic subvolumes of different sizes. Simulation results demonstrate the use of AcCoRD as both an accurate reaction-diffusion solver and one that is catered to the analysis of molecular communication systems. A link is included to videos that demonstrate many of the simulated scenarios. Additional insights from the simulation results include the selection of suitable hybrid model parameters, the impact of reactive surfaces that are in the proximity of a hybrid interface, and the size of a bounded environment that is necessary to assume that it is unbounded. The development of AcCoRD is ongoing, so its future direction is also discussed in order to highlight improvements that will expand its potential areas of application. New features that are being planned at the time of writing include a fluid flow model and more complex actor behavior.
△ Less
Submitted 2 February, 2017; v1 submitted 1 December, 2016;
originally announced December 2016.
-
Modeling and Simulation of Molecular Communication Systems with a Reversible Adsorption Receiver
Authors:
Yansha Deng,
Adam Noel,
Maged Elkashlan,
Arumugam Nallanathan,
Karen C. Cheung
Abstract:
In this paper, we present an analytical model for the diffusive molecular communication (MC) system with a reversible adsorption receiver in a fluid environment. The widely used concentration shift keying (CSK) is considered for modulation. The time-varying spatial distribution of the information molecules under the reversible adsorption and desorption reaction at the surface of a receiver is anal…
▽ More
In this paper, we present an analytical model for the diffusive molecular communication (MC) system with a reversible adsorption receiver in a fluid environment. The widely used concentration shift keying (CSK) is considered for modulation. The time-varying spatial distribution of the information molecules under the reversible adsorption and desorption reaction at the surface of a receiver is analytically characterized. Based on the spatial distribution, we derive the net number of newly-adsorbed information molecules expected in any time duration. We further derive the number of newly-adsorbed molecules expected at the steady state to demonstrate the equilibrium concentration. Given the number of newly-adsorbed information molecules, the bit error probability of the proposed MC system is analytically approximated. Importantly, we present a simulation framework for the proposed model that accounts for the diffusion and reversible reaction. Simulation results show the accuracy of our derived expressions, and demonstrate the positive effect of the adsorption rate and the negative effect of the desorption rate on the error probability of reversible adsorption receiver with last transmit bit-1. Moreover, our analytical results simplify to the special cases of a full adsorption receiver and a partial adsorption receiver, both of which do not include desorption.
△ Less
Submitted 22 June, 2016; v1 submitted 4 January, 2016;
originally announced January 2016.
-
Molecular Communication with a Reversible Adsorption Receiver
Authors:
Yansha Deng,
Adam Noel,
Maged Elkashlan,
Arumugam Nallanathan,
Karen C. Cheung
Abstract:
In this paper, we present an analytical model for a diffusive molecular communication (MC) system with a reversible adsorption receiver in a fluid environment. The time-varying spatial distribution of the information molecules under the reversible adsorption and desorption reaction at the surface of a bio-receiver is analytically characterized. Based on the spatial distribution, we derive the numb…
▽ More
In this paper, we present an analytical model for a diffusive molecular communication (MC) system with a reversible adsorption receiver in a fluid environment. The time-varying spatial distribution of the information molecules under the reversible adsorption and desorption reaction at the surface of a bio-receiver is analytically characterized. Based on the spatial distribution, we derive the number of newly-adsorbed information molecules expected in any time duration. Importantly, we present a simulation framework for the proposed model that accounts for the diffusion and reversible reaction. Simulation results show the accuracy of our derived expressions, and demonstrate the positive effect of the adsorption rate and the negative effect of the desorption rate on the net number of newly-adsorbed information molecules expected. Moreover, our analytical results simplify to the special case of an absorbing receiver.
△ Less
Submitted 7 April, 2016; v1 submitted 30 November, 2015;
originally announced November 2015.
-
On the Statistics of Reaction-Diffusion Simulations for Molecular Communication
Authors:
Adam Noel,
Karen C. Cheung,
Robert Schober
Abstract:
A molecule traveling in a realistic propagation environment can experience stochastic interactions with other molecules and the environment boundary. The statistical behavior of some isolated phenomena, such as dilute unbounded molecular diffusion, are well understood. However, the coupling of multiple interactions can impede closed-form analysis, such that simulations are required to determine th…
▽ More
A molecule traveling in a realistic propagation environment can experience stochastic interactions with other molecules and the environment boundary. The statistical behavior of some isolated phenomena, such as dilute unbounded molecular diffusion, are well understood. However, the coupling of multiple interactions can impede closed-form analysis, such that simulations are required to determine the statistics. This paper compares the statistics of molecular reaction-diffusion simulation models from the perspective of molecular communication systems. Microscopic methods track the location and state of every molecule, whereas mesoscopic methods partition the environment into virtual containers that hold molecules. The properties of each model are described and compared with a hybrid of both models. Simulation results also assess the accuracy of Poisson and Gaussian approximations of the underlying Binomial statistics.
△ Less
Submitted 19 May, 2015;
originally announced May 2015.
-
Multi-Scale Stochastic Simulation for Diffusive Molecular Communication
Authors:
Adam Noel,
Karen C. Cheung,
Robert Schober
Abstract:
Recently, hybrid models have emerged that combine microscopic and mesoscopic regimes in a single stochastic reaction-diffusion simulation. Microscopic simulations track every individual molecule and are generally more accurate. Mesoscopic simulations partition the environment into subvolumes, track when molecules move between adjacent subvolumes, and are generally more computationally efficient. I…
▽ More
Recently, hybrid models have emerged that combine microscopic and mesoscopic regimes in a single stochastic reaction-diffusion simulation. Microscopic simulations track every individual molecule and are generally more accurate. Mesoscopic simulations partition the environment into subvolumes, track when molecules move between adjacent subvolumes, and are generally more computationally efficient. In this paper, we present the foundation of a multi-scale stochastic simulator from the perspective of molecular communication, for both mesoscopic and hybrid models, where we emphasize simulation accuracy at the receiver and efficiency in regions that are far from the communication link. Our multi-scale models use subvolumes of different sizes, between which we derive the diffusion event transition rate. Simulation results compare the accuracy and efficiency of traditional approaches with that of a regular hybrid method and with those of our proposed multi-scale methods.
△ Less
Submitted 19 May, 2015; v1 submitted 15 October, 2014;
originally announced December 2014.
-
Joint Channel Parameter Estimation via Diffusive Molecular Communication
Authors:
Adam Noel,
Karen C. Cheung,
Robert Schober
Abstract:
The design and analysis of diffusive molecular communication systems generally requires knowledge of the environment's physical and chemical properties. Furthermore, prospective applications might rely on the timely detection of changes in the local system parameters. This paper studies the local estimation of channel parameters for diffusive molecular communication when a transmitter releases mol…
▽ More
The design and analysis of diffusive molecular communication systems generally requires knowledge of the environment's physical and chemical properties. Furthermore, prospective applications might rely on the timely detection of changes in the local system parameters. This paper studies the local estimation of channel parameters for diffusive molecular communication when a transmitter releases molecules that are observed by a receiver. The Fisher information matrix of the joint parameter estimation problem is derived so that the Cramer-Rao lower bound on the variance of locally unbiased estimation can be found. The joint estimation problem can be reduced to the estimation of any subset of the channel parameters. Maximum likelihood estimation leads to closed-form solutions for some single-parameter estimation problems and can otherwise be determined numerically. Peak-based estimators are proposed for low-complexity estimation of a single unknown parameter.
△ Less
Submitted 23 June, 2015; v1 submitted 15 October, 2014;
originally announced October 2014.
-
Bounds on Distance Estimation via Diffusive Molecular Communication
Authors:
Adam Noel,
Karen C. Cheung,
Robert Schober
Abstract:
This paper studies distance estimation for diffusive molecular communication. The Cramer-Rao lower bound on the variance of the distance estimation error is derived. The lower bound is derived for a physically unbounded environment with molecule degradation and steady uniform flow. The maximum likelihood distance estimator is derived and its accuracy is shown via simulation to perform very close t…
▽ More
This paper studies distance estimation for diffusive molecular communication. The Cramer-Rao lower bound on the variance of the distance estimation error is derived. The lower bound is derived for a physically unbounded environment with molecule degradation and steady uniform flow. The maximum likelihood distance estimator is derived and its accuracy is shown via simulation to perform very close to the Cramer-Rao lower bound. An existing protocol is shown to be equivalent to the maximum likelihood distance estimator if only one observation is made. Simulation results also show the accuracy of existing protocols with respect to the Cramer-Rao lower bound.
△ Less
Submitted 15 October, 2014; v1 submitted 11 April, 2014;
originally announced April 2014.
-
A Unifying Model for External Noise Sources and ISI in Diffusive Molecular Communication
Authors:
Adam Noel,
Karen C. Cheung,
Robert Schober
Abstract:
This paper considers the impact of external noise sources, including interfering transmitters, on a diffusive molecular communication system, where the impact is measured as the number of noise molecules expected to be observed at a passive receiver. A unifying model for noise, multiuser interference, and intersymbol interference is presented, where, under certain circumstances, interference can b…
▽ More
This paper considers the impact of external noise sources, including interfering transmitters, on a diffusive molecular communication system, where the impact is measured as the number of noise molecules expected to be observed at a passive receiver. A unifying model for noise, multiuser interference, and intersymbol interference is presented, where, under certain circumstances, interference can be approximated as a noise source that is emitting continuously. The model includes the presence of advection and molecule degradation. The time-varying and asymptotic impact is derived for a series of special cases, some of which facilitate closed-form solutions. Simulation results show the accuracy of the expressions derived for the impact of a continuously-emitting noise source, and show how approximating intersymbol interference as a noise source can simplify the calculation of the expected bit error probability of a weighted sum detector.
△ Less
Submitted 7 July, 2014; v1 submitted 22 October, 2013;
originally announced October 2013.
-
Diffusive Molecular Communication with Disruptive Flows
Authors:
Adam Noel,
Karen C. Cheung,
Robert Schober
Abstract:
In this paper, we study the performance of detectors in a diffusive molecular communication environment where steady uniform flow is present. We derive the expected number of information molecules to be observed in a passive spherical receiver, and determine the impact of flow on the assumption that the concentration of molecules throughout the receiver is uniform. Simulation results show the impa…
▽ More
In this paper, we study the performance of detectors in a diffusive molecular communication environment where steady uniform flow is present. We derive the expected number of information molecules to be observed in a passive spherical receiver, and determine the impact of flow on the assumption that the concentration of molecules throughout the receiver is uniform. Simulation results show the impact of advection on detector performance as a function of the flow's magnitude and direction. We highlight that there are disruptive flows, i.e., flows that are not in the direction of information transmission, that lead to an improvement in detector performance as long as the disruptive flow does not dominate diffusion and sufficient samples are taken.
△ Less
Submitted 11 February, 2014; v1 submitted 20 September, 2013;
originally announced September 2013.
-
Optimal Receiver Design for Diffusive Molecular Communication With Flow and Additive Noise
Authors:
Adam Noel,
Karen C. Cheung,
Robert Schober
Abstract:
In this paper, we perform receiver design for a diffusive molecular communication environment. Our model includes flow in any direction, sources of information molecules in addition to the transmitter, and enzymes in the propagation environment to mitigate intersymbol interference. We characterize the mutual information between receiver observations to show how often independent observations can b…
▽ More
In this paper, we perform receiver design for a diffusive molecular communication environment. Our model includes flow in any direction, sources of information molecules in addition to the transmitter, and enzymes in the propagation environment to mitigate intersymbol interference. We characterize the mutual information between receiver observations to show how often independent observations can be made. We derive the maximum likelihood sequence detector to provide a lower bound on the bit error probability. We propose the family of weighted sum detectors for more practical implementation and derive their expected bit error probability. Under certain conditions, the performance of the optimal weighted sum detector is shown to be equivalent to a matched filter. Receiver simulation results show the tradeoff in detector complexity versus achievable bit error probability, and that a slow flow in any direction can improve the performance of a weighted sum detector.
△ Less
Submitted 7 July, 2014; v1 submitted 1 August, 2013;
originally announced August 2013.
-
Improving Receiver Performance of Diffusive Molecular Communication with Enzymes
Authors:
Adam Noel,
Karen C. Cheung,
Robert Schober
Abstract:
This paper studies the mitigation of intersymbol interference in a diffusive molecular communication system using enzymes that freely diffuse in the propagation environment. The enzymes form reaction intermediates with information molecules and then degrade them so that they cannot interfere with future transmissions. A lower bound expression on the expected number of molecules measured at the rec…
▽ More
This paper studies the mitigation of intersymbol interference in a diffusive molecular communication system using enzymes that freely diffuse in the propagation environment. The enzymes form reaction intermediates with information molecules and then degrade them so that they cannot interfere with future transmissions. A lower bound expression on the expected number of molecules measured at the receiver is derived. A simple binary receiver detection scheme is proposed where the number of observed molecules is sampled at the time when the maximum number of molecules is expected. Insight is also provided into the selection of an appropriate bit interval. The expected bit error probability is derived as a function of the current and all previously transmitted bits. Simulation results show the accuracy of the bit error probability expression and the improvement in communication performance by having active enzymes present.
△ Less
Submitted 16 December, 2013; v1 submitted 8 May, 2013;
originally announced May 2013.
-
Using Dimensional Analysis to Assess Scalability and Accuracy in Molecular Communication
Authors:
Adam Noel,
Karen C. Cheung,
Robert Schober
Abstract:
In this paper, we apply dimensional analysis to study a diffusive molecular communication system that uses diffusing enzymes in the propagation environment to mitigate intersymbol interference. The enzymes bind to information molecules and then degrade them so that they cannot interfere with the detection of future transmissions at the receiver. We determine when it is accurate to assume that the…
▽ More
In this paper, we apply dimensional analysis to study a diffusive molecular communication system that uses diffusing enzymes in the propagation environment to mitigate intersymbol interference. The enzymes bind to information molecules and then degrade them so that they cannot interfere with the detection of future transmissions at the receiver. We determine when it is accurate to assume that the concentration of information molecules throughout the receiver is constant and equal to that expected at the center of the receiver. We show that a lower bound on the expected number of molecules observed at the receiver can be arbitrarily scaled over the environmental parameters, and generalize how the accuracy of the lower bound is qualitatively impacted by those parameters.
△ Less
Submitted 8 May, 2013;
originally announced May 2013.
-
Improving Diffusion-Based Molecular Communication with Unanchored Enzymes
Authors:
Adam Noel,
Karen C. Cheung,
Robert Schober
Abstract:
In this paper, we propose adding enzymes to the propagation environment of a diffusive molecular communication system as a strategy for mitigating intersymbol interference. The enzymes form reaction intermediates with information molecules and then degrade them so that they have a smaller chance of interfering with future transmissions. We present the reaction-diffusion dynamics of this proposed s…
▽ More
In this paper, we propose adding enzymes to the propagation environment of a diffusive molecular communication system as a strategy for mitigating intersymbol interference. The enzymes form reaction intermediates with information molecules and then degrade them so that they have a smaller chance of interfering with future transmissions. We present the reaction-diffusion dynamics of this proposed system and derive a lower bound expression for the expected number of molecules observed at the receiver. We justify a particle-based simulation framework, and present simulation results that show both the accuracy of our expression and the potential for enzymes to improve communication performance.
△ Less
Submitted 8 May, 2013;
originally announced May 2013.