subscribe to arXiv mailings

SeqBalance: Congestion-Aware Load Balancing with no Reordering for RoCE

Authors: Huimin Luo, Jiao Zhang, Mingxuan Yu, Yongchen Pan, Tian Pan, Tao Huang

Abstract: Remote Direct Memory Access (RDMA) is widely used in data center networks because of its high performance. However, due to the characteristics of RDMA's retransmission strategy and the traffic mode of AI training, current load balancing schemes for data center networks are unsuitable for RDMA. In this paper, we propose SeqBalance, a load balancing framework designed for RDMA. SeqBalance implements… ▽ More Remote Direct Memory Access (RDMA) is widely used in data center networks because of its high performance. However, due to the characteristics of RDMA's retransmission strategy and the traffic mode of AI training, current load balancing schemes for data center networks are unsuitable for RDMA. In this paper, we propose SeqBalance, a load balancing framework designed for RDMA. SeqBalance implements fine-grained load balancing for RDMA through a reasonable design and does not cause reordering problems. SeqBalance's designs are all based on existing commercial RNICs and commercial programmable switches, so they are compatible with existing data center networks. We have implemented SeqBalance in Mellanox CX-6 RNICs and Tofino switches. The results of hardware testbed experiments and large-scale simulations show that compared with existing load balancing schemes, SeqBalance improves 18.7% and 33.2% on average FCT and 99th percentile FCT. △ Less

Submitted 13 July, 2024; originally announced July 2024.

arXiv:2406.14742 [pdf, other]

Latent Variable Sequence Identification for Cognitive Models with Neural Bayes Estimation

Authors: Ti-Fen Pan, Jing-Jing Li, Bill Thompson, Anne Collins

Abstract: Extracting time-varying latent variables from computational cognitive models is a key step in model-based neural analysis, which aims to understand the neural correlates of cognitive processes. However, existing methods only allow researchers to infer latent variables that explain subjects' behavior in a relatively small class of cognitive models. For example, a broad class of relevant cognitive m… ▽ More Extracting time-varying latent variables from computational cognitive models is a key step in model-based neural analysis, which aims to understand the neural correlates of cognitive processes. However, existing methods only allow researchers to infer latent variables that explain subjects' behavior in a relatively small class of cognitive models. For example, a broad class of relevant cognitive models with analytically intractable likelihood is currently out of reach from standard techniques, based on Maximum a Posteriori parameter estimation. Here, we present an approach that extends neural Bayes estimation to learn a direct mapping between experimental data and the targeted latent variable space using recurrent neural networks and simulated datasets. We show that our approach achieves competitive performance in inferring latent variable sequences in both tractable and intractable models. Furthermore, the approach is generalizable across different computational models and is adaptable for both continuous and discrete latent spaces. We then demonstrate its applicability in real world datasets. Our work underscores that combining recurrent neural networks and simulation-based inference to identify latent variable sequences can enable researchers to access a wider class of cognitive models for model-based neural analyses, and thus test a broader set of theories. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.10538 [pdf, other]

Large Reasoning Models for 3D Floorplanning in EDA: Learning from Imperfections

Authors: Fin Amin, Nirjhor Rouf, Tse-Han Pan, Md Kamal Ibn Shafi, Paul D. Franzon

Abstract: In this paper, we introduce Dreamweaver, which belongs to a new class of auto-regressive decision-making models known as large reasoning models (LRMs). Dreamweaver is designed to improve 3D floorplanning in electronic design automation (EDA) via an architecture that melds advancements in sequence-to-sequence reinforcement learning algorithms. A significant advantage of our approach is its ability… ▽ More In this paper, we introduce Dreamweaver, which belongs to a new class of auto-regressive decision-making models known as large reasoning models (LRMs). Dreamweaver is designed to improve 3D floorplanning in electronic design automation (EDA) via an architecture that melds advancements in sequence-to-sequence reinforcement learning algorithms. A significant advantage of our approach is its ability to effectively reason over large discrete action spaces, which is essential for handling the numerous potential positions for various functional blocks in floorplanning. Additionally, Dreamweaver demonstrates strong performance even when trained on entirely random trajectories, showcasing its capacity to leverage sub-optimal or non-expert trajectories to enhance its results. This innovative approach contributes to streamlining the integrated circuit (IC) design flow and reducing the high computational costs typically associated with floorplanning. We evaluate its performance against a current state-of-the-art method, highlighting notable improvements. △ Less

Submitted 21 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Comments: Under review

arXiv:2406.07497 [pdf]

A pilot protocol and cohort for the investigation of non-pathological variability in speech

Authors: Nicholas Cummins, Lauren L. White, Zahia Rahman, Catriona Lucas, Tian Pan, Ewan Carr, Faith Matcham, Johnny Downs, Richard J. Dobson, Judith Dineley

Abstract: Background Speech-based biomarkers have potential as a means for regular, objective assessment of symptom severity, remotely and in-clinic in combination with advanced analytical models. However, the complex nature of speech and the often subtle changes associated with health mean that findings are highly dependent on methodological and cohort choices. These are often not reported adequately in st… ▽ More Background Speech-based biomarkers have potential as a means for regular, objective assessment of symptom severity, remotely and in-clinic in combination with advanced analytical models. However, the complex nature of speech and the often subtle changes associated with health mean that findings are highly dependent on methodological and cohort choices. These are often not reported adequately in studies investigating speech-based health assessment Objective To develop and apply an exemplar protocol to generate a pilot dataset of healthy speech with detailed metadata for the assessment of factors in the speech recording-analysis pipeline, including device choice, speech elicitation task and non-pathological variability. Methods We developed our collection protocol and choice of exemplar speech features based on a thematic literature review. Our protocol includes the elicitation of three different speech types. With a focus towards remote applications, we also choose to collect speech with three different microphone types. We developed a pipeline to extract a set of 14 exemplar speech features. Results We collected speech from 28 individuals three times in one day, repeated at the same times 8-11 weeks later, and from 25 healthy individuals three times in one week. Participant characteristics collected included sex, age, native language status and voice use habits of the participant. A preliminary set of 14 speech features covering timing, prosody, voice quality, articulation and spectral moment characteristics were extracted that provide a resource of normative values. Conclusions There are multiple methodological factors involved in the collection, processing and analysis of speech recordings. Consistent reporting and greater harmonisation of study protocols are urgently required to aid the translation of speech processing into clinical research and practice. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 29 pages. Pre peer review

ACM Class: J.3

arXiv:2406.03641 [pdf, other]

Task and Motion Planning for Execution in the Real

Authors: Tianyang Pan, Rahul Shome, Lydia E. Kavraki

Abstract: Task and motion planning represents a powerful set of hybrid planning methods that combine reasoning over discrete task domains and continuous motion generation. Traditional reasoning necessitates task domain models and enough information to ground actions to motion planning queries. Gaps in this knowledge often arise from sources like occlusion or imprecise modeling. This work generates task and… ▽ More Task and motion planning represents a powerful set of hybrid planning methods that combine reasoning over discrete task domains and continuous motion generation. Traditional reasoning necessitates task domain models and enough information to ground actions to motion planning queries. Gaps in this knowledge often arise from sources like occlusion or imprecise modeling. This work generates task and motion plans that include actions cannot be fully grounded at planning time. During execution, such an action is handled by a provided human-designed or learned closed-loop behavior. Execution combines offline planned motions and online behaviors till reaching the task goal. Failures of behaviors are fed back as constraints to find new plans. Forty real-robot trials and motivating demonstrations are performed to evaluate the proposed framework and compare against state-of-the-art. Results show faster execution time, less number of actions, and more success in problems where diverse gaps arise. The experiment data is shared for researchers to simulate these settings. The work shows promise in expanding the applicable class of realistic partially grounded problems that robots can address. △ Less

Submitted 13 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

Comments: 15 pages, 14 figures, 2 tables, accepted by IEEE Transactions on Robotics

ACM Class: I.2.9; I.2.8

arXiv:2405.08573 [pdf, other]

ViSTooth: A Visualization Framework for Tooth Segmentation on Panoramic Radiograph

Authors: Shenji Zhu, Miaoxin Hu, Tianya Pan, Yue Hong, Bin Li, Zhiguang Zhou, Ting Xu

Abstract: Tooth segmentation is a key step for computer aided diagnosis of dental diseases. Numerous machine learning models have been employed for tooth segmentation on dental panoramic radiograph. However, it is a difficult task to achieve accurate tooth segmentation due to complex tooth shapes, diverse tooth categories and incomplete sample set for machine learning. In this paper, we propose ViSTooth, a… ▽ More Tooth segmentation is a key step for computer aided diagnosis of dental diseases. Numerous machine learning models have been employed for tooth segmentation on dental panoramic radiograph. However, it is a difficult task to achieve accurate tooth segmentation due to complex tooth shapes, diverse tooth categories and incomplete sample set for machine learning. In this paper, we propose ViSTooth, a visualization framework for tooth segmentation on dental panoramic radiograph. First, we employ Mask R-CNN to conduct preliminary tooth segmentation, and a set of domain metrics are proposed to estimate the accuracy of the segmented teeth, including tooth shape, tooth position and tooth angle. Then, we represent the teeth with high-dimensional vectors and visualize their distribution in a low-dimensional space, in which experts can easily observe those teeth with specific metrics. Further, we expand the sample set with the expert-specified teeth and train the tooth segmentation model iteratively. Finally, we conduct case study and expert study to demonstrate the effectiveness and usability of our ViSTooth, in aiding experts to implement accurate tooth segmentation guided by expert knowledge. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.05714 [pdf, other]

Estimating Noisy Class Posterior with Part-level Labels for Noisy Label Learning

Authors: Rui Zhao, Bin Shi, Jianfei Ruan, Tianze Pan, Bo Dong

Abstract: In noisy label learning, estimating noisy class posteriors plays a fundamental role for developing consistent classifiers, as it forms the basis for estimating clean class posteriors and the transition matrix. Existing methods typically learn noisy class posteriors by training a classification model with noisy labels. However, when labels are incorrect, these models may be misled to overemphasize… ▽ More In noisy label learning, estimating noisy class posteriors plays a fundamental role for developing consistent classifiers, as it forms the basis for estimating clean class posteriors and the transition matrix. Existing methods typically learn noisy class posteriors by training a classification model with noisy labels. However, when labels are incorrect, these models may be misled to overemphasize the feature parts that do not reflect the instance characteristics, resulting in significant errors in estimating noisy class posteriors. To address this issue, this paper proposes to augment the supervised information with part-level labels, encouraging the model to focus on and integrate richer information from various parts. Specifically, our method first partitions features into distinct parts by cropping instances, yielding part-level labels associated with these various parts. Subsequently, we introduce a novel single-to-multiple transition matrix to model the relationship between the noisy and part-level labels, which incorporates part-level labels into a classifier-consistent framework. Utilizing this framework with part-level labels, we can learn the noisy class posteriors more precisely by guiding the model to integrate information from various parts, ultimately improving the classification performance. Our method is theoretically sound, while experiments show that it is empirically effective in synthetic and real-world noisy benchmarks. △ Less

Submitted 2 July, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

Comments: CVPR 2024

arXiv:2404.16874 [pdf, other]

Worldwide wildfire spreading and its severity described by the SIR model

Authors: Tong Pan, Hongjun Wang, Jiyuan Chen, Xuan Song

Abstract: Global wildfire spreading dynamics and severity are analyzed using the susceptible-infected-recovered (SIR) compartment model. We use the novel FireTracks (FT) Scientific Dataset covering the wildfire time series of 2002-2023. Global wildfire spreading dynamics and severity are analyzed using the susceptible-infected-recovered (SIR) compartment model. We use the novel FireTracks (FT) Scientific Dataset covering the wildfire time series of 2002-2023. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.15514 [pdf, ps, other]

Numerical study of transitions in lid-driven flows in semicircular cavities

Authors: Tsorng-Whay Pan, Ang Li, Shang-Huan Chiu

Abstract: In this article, three-dimensional (3D) lid-driven flows in semicircular cavities are studied. The numerical solution of the Navier-Stokes equations modeling incompressible viscous fluid flow in cavities is obtained via a methodology combining a first-order accurate operator-splitting scheme, a fictitious domain formulation, and finite element space approximations. The critical Reynolds numbers (R… ▽ More In this article, three-dimensional (3D) lid-driven flows in semicircular cavities are studied. The numerical solution of the Navier-Stokes equations modeling incompressible viscous fluid flow in cavities is obtained via a methodology combining a first-order accurate operator-splitting scheme, a fictitious domain formulation, and finite element space approximations. The critical Reynolds numbers (Re_{cr}) for having oscillatory flow (a Hopf bifurcation) are obtained. The associated oscillating motion in a semicircular cavity with length equal to width has been studied in detail. Based on the averaged velocity field in one period of oscillating motion, the flow difference (called oscillation mode) between the velocity field and averaged one at several time instances in such period shows almost the same flow pattern for the Reynolds numbers close to Re_{cr}. This oscillation mode in a semicircular cavity shows a close similarity to the one obtained in a shallow cavity, but with some difference in a shallow cavity which is triggered by the presence of two vertical side walls and downstream wall. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.12577 [pdf]

In-tube micro-pyramidal silicon nanopore for inertial-kinetic sensing of single molecules

Authors: Jianxin Yang, Tianle Pan, Zhenming Xie, Wu Yuan, Ho-Pui Ho

Abstract: Electrokinetic force has been the major choice for driving the translocation of molecules through a nanopore. However, the use of this approach is limited by an uncontrollable translocation speed, resulting in non-uniform conductance signals with low conformational sensitivity, which hinders the accurate discrimination of the molecules. Here, we show the first use of inertial-kinetic translocation… ▽ More Electrokinetic force has been the major choice for driving the translocation of molecules through a nanopore. However, the use of this approach is limited by an uncontrollable translocation speed, resulting in non-uniform conductance signals with low conformational sensitivity, which hinders the accurate discrimination of the molecules. Here, we show the first use of inertial-kinetic translocation induced by spinning an in-tube micro-pyramidal silicon nanopore fabricated using photovoltaic electrochemical etch-stop technique for biomolecular sensing. By adjusting the kinetic properties of a funnel-shaped centrifugal force field while maintaining a counter-balanced state of electrophoretic and electroosmotic effect in the nanopore, we achieved regulated translocation of proteins and obtained stable signals of long and adjustable dwell times and high conformational sensitivity. Moreover, we demonstrated instantaneous sensing and discrimination of molecular conformations and longitudinal monitoring of molecular reactions and conformation changes by wirelessly measuring characteristic features in current blockade readouts using the in-tube nanopore device. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2403.13260 [pdf, other]

A Bayesian Approach for Selecting Relevant External Data (BASE): Application to a study of Long-Term Outcomes in a Hemophilia Gene Therapy Trial

Authors: Tianyu Pan, Xiang Zhang, Weining Shen, Ting Ye

Abstract: Gene therapies aim to address the root causes of diseases, particularly those stemming from rare genetic defects that can be life-threatening or severely debilitating. While there has been notable progress in the development of gene therapies in recent years, understanding their long-term effectiveness remains challenging due to a lack of data on long-term outcomes, especially during the early sta… ▽ More Gene therapies aim to address the root causes of diseases, particularly those stemming from rare genetic defects that can be life-threatening or severely debilitating. While there has been notable progress in the development of gene therapies in recent years, understanding their long-term effectiveness remains challenging due to a lack of data on long-term outcomes, especially during the early stages of their introduction to the market. To address the critical question of estimating long-term efficacy without waiting for the completion of lengthy clinical trials, we propose a novel Bayesian framework. This framework selects pertinent data from external sources, often early-phase clinical trials with more comprehensive longitudinal efficacy data that could lead to an improved inference of the long-term efficacy outcome. We apply this methodology to predict the long-term factor IX (FIX) levels of HEMGENIX (etranacogene dezaparvovec), the first FDA-approved gene therapy to treat adults with severe Hemophilia B, in a phase 3 study. Our application showcases the capability of the framework to estimate the 5-year FIX levels following HEMGENIX therapy, demonstrating sustained FIX levels induced by HEMGENIX infusion. Additionally, we provide theoretical insights into the methodology by establishing its posterior convergence properties. △ Less

Submitted 9 April, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.01493 [pdf, other]

ConvTimeNet: A Deep Hierarchical Fully Convolutional Model for Multivariate Time Series Analysis

Authors: Mingyue Cheng, Jiqian Yang, Tingyue Pan, Qi Liu, Zhi Li

Abstract: This paper introduces ConvTimeNet, a novel deep hierarchical fully convolutional network designed to serve as a general-purpose model for time series analysis. The key design of this network is twofold, designed to overcome the limitations of traditional convolutional networks. Firstly, we propose an adaptive segmentation of time series into sub-series level patches, treating these as fundamental… ▽ More This paper introduces ConvTimeNet, a novel deep hierarchical fully convolutional network designed to serve as a general-purpose model for time series analysis. The key design of this network is twofold, designed to overcome the limitations of traditional convolutional networks. Firstly, we propose an adaptive segmentation of time series into sub-series level patches, treating these as fundamental modeling units. This setting avoids the sparsity semantics associated with raw point-level time steps. Secondly, we design a fully convolutional block by skillfully integrating deepwise and pointwise convolution operations, following the advanced building block style employed in Transformer encoders. This backbone network allows for the effective capture of both global sequence and cross-variable dependence, as it not only incorporates the advancements of Transformer architecture but also inherits the inherent properties of convolution. Furthermore, multi-scale representations of given time series instances can be learned by controlling the kernel size flexibly. Extensive experiments are conducted on both time series forecasting and classification tasks. The results consistently outperformed strong baselines in most situations in terms of effectiveness.The code is publicly available. △ Less

Submitted 3 March, 2024; originally announced March 2024.

arXiv:2312.09128 [pdf, other]

Tokenize Anything via Prompting

Authors: Ting Pan, Lulu Tang, Xinlong Wang, Shiguang Shan

Abstract: We present a unified, promptable model capable of simultaneously segmenting, recognizing, and captioning anything. Unlike SAM, we aim to build a versatile region representation in the wild via visual prompting. To achieve this, we train a generalizable model with massive segmentation masks, e.g., SA-1B masks, and semantic priors from a pre-trained CLIP model with 5 billion parameters. Specifically… ▽ More We present a unified, promptable model capable of simultaneously segmenting, recognizing, and captioning anything. Unlike SAM, we aim to build a versatile region representation in the wild via visual prompting. To achieve this, we train a generalizable model with massive segmentation masks, e.g., SA-1B masks, and semantic priors from a pre-trained CLIP model with 5 billion parameters. Specifically, we construct a promptable image decoder by adding a semantic token to each mask token. The semantic token is responsible for learning the semantic priors in a predefined concept space. Through joint optimization of segmentation on mask tokens and concept prediction on semantic tokens, our model exhibits strong regional recognition and localization capabilities. For example, an additional 38M-parameter causal text decoder trained from scratch sets a new record with a CIDEr score of 150.7 on the Visual Genome region captioning task. We believe this model can be a versatile region-level image tokenizer, capable of encoding general-purpose region context for a broad range of perception tasks. Code and models are available at https://github.com/baaivision/tokenize-anything. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: code, model, and demo: https://github.com/baaivision/tokenize-anything

arXiv:2312.05927 [pdf, other]

The survival of scientific stylization

Authors: Yuanyuan Shu, Tianxing Pan

Abstract: This study elaborates a text-based metric to quantify the unique position of stylized scientific research, characterized by its innovative integration of diverse knowledge components and potential to pivot established scientific paradigms. Our analysis reveals a concerning decline in stylized research, highlighted by its comparative undervaluation in terms of citation counts and protracted peer-re… ▽ More This study elaborates a text-based metric to quantify the unique position of stylized scientific research, characterized by its innovative integration of diverse knowledge components and potential to pivot established scientific paradigms. Our analysis reveals a concerning decline in stylized research, highlighted by its comparative undervaluation in terms of citation counts and protracted peer-review duration. Despite facing these challenges, the disruptive potential of stylized research remains robust, consistently introducing groundbreaking questions and theories. This paper posits that substantive reforms are necessary to incentivize and recognize the value of stylized research, including optimizations to the peer-review process and the criteria for evaluating scientific impact. Embracing these changes may be imperative to halt the downturn in stylized research and ensure enduring scholarly exploration in endless frontiers. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: 55 pages (23 main text, 32 SI)

arXiv:2311.10924 [pdf, other]

Faster Streaming and Scalable Algorithms for Finding Directed Dense Subgraphs in Large Graphs

Authors: Slobodan Mitrović, Theodore Pan

Abstract: Finding dense subgraphs is a fundamental algorithmic tool in data mining, community detection, and clustering. In this problem, one aims to find an induced subgraph whose edge-to-vertex ratio is maximized. We study the directed case of this question in the context of semi-streaming and massively parallel algorithms. In particular, we show that it is possible to find a $(2+ε)$ approximation on ra… ▽ More Finding dense subgraphs is a fundamental algorithmic tool in data mining, community detection, and clustering. In this problem, one aims to find an induced subgraph whose edge-to-vertex ratio is maximized. We study the directed case of this question in the context of semi-streaming and massively parallel algorithms. In particular, we show that it is possible to find a $(2+ε)$ approximation on randomized streams even in a single pass by using $O(n \cdot {\rm poly} \log n)$ memory on $n$-vertex graphs. Our result improves over prior works, which were designed for arbitrary-ordered streams: the algorithm by Bahmani et al. (VLDB 2012) which uses $O(\log n)$ passes, and the work by Esfandiari et al. (2015) which makes one pass but uses $O(n^{3/2})$ memory. Moreover, our techniques extend to the Massively Parallel Computation model yielding $O(1)$ rounds in the super-linear and $O(\sqrt{\log n})$ rounds in the nearly-linear memory regime. This constitutes a quadratic improvement over state-of-the-art bounds by Bahmani et al. (VLDB 2012 and WAW 2014), which require $O(\log n)$ rounds even in the super-linear memory regime. Finally, we empirically evaluate our single-pass semi-streaming algorithm on $6$ benchmarks and show that, even on non-randomly ordered streams, the quality of its output is essentially the same as that of Bahmani et al. (VLDB 2012) while it is $2$ times faster on large graphs. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2311.10036 [pdf]

Dynamic CBCT Imaging using Prior Model-Free Spatiotemporal Implicit Neural Representation (PMF-STINR)

Authors: Hua-Chieh Shao, Mengke Tielige, Tinsu Pan, You Zhang

Abstract: Dynamic cone-beam computed tomography (CBCT) can capture high-spatial-resolution, time-varying images for motion monitoring, patient setup, and adaptive planning of radiotherapy. However, dynamic CBCT reconstruction is an extremely ill-posed spatiotemporal inverse problem, as each CBCT volume in the dynamic sequence is only captured by one or a few X-ray projections. We developed a machine learnin… ▽ More Dynamic cone-beam computed tomography (CBCT) can capture high-spatial-resolution, time-varying images for motion monitoring, patient setup, and adaptive planning of radiotherapy. However, dynamic CBCT reconstruction is an extremely ill-posed spatiotemporal inverse problem, as each CBCT volume in the dynamic sequence is only captured by one or a few X-ray projections. We developed a machine learning-based technique, prior-model-free spatiotemporal implicit neural representation (PMF-STINR), to reconstruct dynamic CBCTs from sequentially acquired X-ray projections. PMF-STINR employs a joint image reconstruction and registration approach to address the under-sampling challenge. Specifically, PMF-STINR uses spatial implicit neural representation to reconstruct a reference CBCT volume, and it applies temporal INR to represent the intra-scan dynamic motion with respect to the reference CBCT to yield dynamic CBCTs. PMF-STINR couples the temporal INR with a learning-based B-spline motion model to capture time-varying deformable motion during the reconstruction. Compared with previous methods, the spatial INR, the temporal INR, and the B-spline model of PMF-STINR are all learned on the fly during reconstruction in a one-shot fashion, without using any patient-specific prior knowledge or motion sorting/binning. PMF-STINR was evaluated via digital phantom simulations, physical phantom measurements, and a multi-institutional patient dataset featuring various imaging protocols (half-fan/full-fan, full sampling/sparse sampling, different energy and mAs settings, etc.). The results showed that the one-shot learning-based PMF-STINR can accurately and robustly reconstruct dynamic CBCTs and capture highly irregular motion with high temporal (~0.1s) resolution and sub-millimeter accuracy. It can be a promising tool for motion management by offering richer motion information than traditional 4D-CBCTs. △ Less

Submitted 4 December, 2023; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2310.14592 [pdf, other]

Pre-Training LiDAR-Based 3D Object Detectors Through Colorization

Authors: Tai-Yu Pan, Chenyang Ma, Tianle Chen, Cheng Perng Phoo, Katie Z Luo, Yurong You, Mark Campbell, Kilian Q. Weinberger, Bharath Hariharan, Wei-Lun Chao

Abstract: Accurate 3D object detection and understanding for self-driving cars heavily relies on LiDAR point clouds, necessitating large amounts of labeled data to train. In this work, we introduce an innovative pre-training approach, Grounded Point Colorization (GPC), to bridge the gap between data and labels by teaching the model to colorize LiDAR point clouds, equipping it with valuable semantic cues. To… ▽ More Accurate 3D object detection and understanding for self-driving cars heavily relies on LiDAR point clouds, necessitating large amounts of labeled data to train. In this work, we introduce an innovative pre-training approach, Grounded Point Colorization (GPC), to bridge the gap between data and labels by teaching the model to colorize LiDAR point clouds, equipping it with valuable semantic cues. To tackle challenges arising from color variations and selection bias, we incorporate color as "context" by providing ground-truth colors as hints during colorization. Experimental results on the KITTI and Waymo datasets demonstrate GPC's remarkable effectiveness. Even with limited labeled data, GPC significantly improves fine-tuning performance; notably, on just 20% of the KITTI dataset, GPC outperforms training from scratch with the entire dataset. In sum, we introduce a fresh perspective on pre-training for 3D object detection, aligning the objective with the model's intended role and ultimately advancing the accuracy and efficiency of 3D object detection for autonomous vehicles. △ Less

Submitted 25 February, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

Comments: Accepted to ICLR 2024

arXiv:2310.04153 [pdf, other]

Fair coins tend to land on the same side they started: Evidence from 350,757 flips

Authors: František Bartoš, Alexandra Sarafoglou, Henrik R. Godmann, Amir Sahrani, David Klein Leunk, Pierre Y. Gui, David Voss, Kaleem Ullah, Malte J. Zoubek, Franziska Nippold, Frederik Aust, Felipe F. Vieira, Chris-Gabriel Islam, Anton J. Zoubek, Sara Shabani, Jonas Petter, Ingeborg B. Roos, Adam Finnemann, Aaron B. Lob, Madlen F. Hoffstadt, Jason Nak, Jill de Ron, Koen Derks, Karoline Huth, Sjoerd Terpstra , et al. (25 additional authors not shown)

Abstract: Many people have flipped coins but few have stopped to ponder the statistical and physical intricacies of the process. In a preregistered study we collected $350{,}757$ coin flips to test the counterintuitive prediction from a physics model of human coin tossing developed by Diaconis, Holmes, and Montgomery (DHM; 2007). The model asserts that when people flip an ordinary coin, it tends to land on… ▽ More Many people have flipped coins but few have stopped to ponder the statistical and physical intricacies of the process. In a preregistered study we collected $350{,}757$ coin flips to test the counterintuitive prediction from a physics model of human coin tossing developed by Diaconis, Holmes, and Montgomery (DHM; 2007). The model asserts that when people flip an ordinary coin, it tends to land on the same side it started -- DHM estimated the probability of a same-side outcome to be about 51%. Our data lend strong support to this precise prediction: the coins landed on the same side more often than not, $\text{Pr}(\text{same side}) = 0.508$, 95% credible interval (CI) [$0.506$, $0.509$], $\text{BF}_{\text{same-side bias}} = 2359$. Furthermore, the data revealed considerable between-people variation in the degree of this same-side bias. Our data also confirmed the generic prediction that when people flip an ordinary coin -- with the initial side-up randomly determined -- it is equally likely to land heads or tails: $\text{Pr}(\text{heads}) = 0.500$, 95% CI [$0.498$, $0.502$], $\text{BF}_{\text{heads-tails bias}} = 0.182$. Furthermore, this lack of heads-tails bias does not appear to vary across coins. Additional exploratory analyses revealed that the within-people same-side bias decreased as more coins were flipped, an effect that is consistent with the possibility that practice makes people flip coins in a less wobbly fashion. Our data therefore provide strong evidence that when some (but not all) people flip a fair coin, it tends to land on the same side it started. Our data provide compelling statistical support for the DHM physics model of coin tossing. △ Less

Submitted 2 June, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

arXiv:2309.12842 [pdf, other]

SRFNet: Monocular Depth Estimation with Fine-grained Structure via Spatial Reliability-oriented Fusion of Frames and Events

Authors: Tianbo Pan, Zidong Cao, Lin Wang

Abstract: Monocular depth estimation is a crucial task to measure distance relative to a camera, which is important for applications, such as robot navigation and self-driving. Traditional frame-based methods suffer from performance drops due to the limited dynamic range and motion blur. Therefore, recent works leverage novel event cameras to complement or guide the frame modality via frame-event feature fu… ▽ More Monocular depth estimation is a crucial task to measure distance relative to a camera, which is important for applications, such as robot navigation and self-driving. Traditional frame-based methods suffer from performance drops due to the limited dynamic range and motion blur. Therefore, recent works leverage novel event cameras to complement or guide the frame modality via frame-event feature fusion. However, event streams exhibit spatial sparsity, leaving some areas unperceived, especially in regions with marginal light changes. Therefore, direct fusion methods, e.g., RAMNet, often ignore the contribution of the most confident regions of each modality. This leads to structural ambiguity in the modality fusion process, thus degrading the depth estimation performance. In this paper, we propose a novel Spatial Reliability-oriented Fusion Network (SRFNet), that can estimate depth with fine-grained structure at both daytime and nighttime. Our method consists of two key technical components. Firstly, we propose an attention-based interactive fusion (AIF) module that applies spatial priors of events and frames as the initial masks and learns the consensus regions to guide the inter-modal feature fusion. The fused feature are then fed back to enhance the frame and event feature learning. Meanwhile, it utilizes an output head to generate a fused mask, which is iteratively updated for learning consensual spatial priors. Secondly, we propose the Reliability-oriented Depth Refinement (RDR) module to estimate dense depth with the fine-grained structure based on the fused features and masks. We evaluate the effectiveness of our method on the synthetic and real-world datasets, which shows that, even without pretraining, our method outperforms the prior methods, e.g., RAMNet, especially in night scenes. Our project homepage: https://vlislab22.github.io/SRFNet. △ Less

Submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.11091 [pdf, other]

doi 10.1145/3474085.3475301

Learning Segment Similarity and Alignment in Large-Scale Content Based Video Retrieval

Authors: Chen Jiang, Kaiming Huang, Sifeng He, Xudong Yang, Wei Zhang, Xiaobo Zhang, Yuan Cheng, Lei Yang, Qing Wang, Furong Xu, Tan Pan, Wei Chu

Abstract: With the explosive growth of web videos in recent years, large-scale Content-Based Video Retrieval (CBVR) becomes increasingly essential in video filtering, recommendation, and copyright protection. Segment-level CBVR (S-CBVR) locates the start and end time of similar segments in finer granularity, which is beneficial for user browsing efficiency and infringement detection especially in long video… ▽ More With the explosive growth of web videos in recent years, large-scale Content-Based Video Retrieval (CBVR) becomes increasingly essential in video filtering, recommendation, and copyright protection. Segment-level CBVR (S-CBVR) locates the start and end time of similar segments in finer granularity, which is beneficial for user browsing efficiency and infringement detection especially in long video scenarios. The challenge of S-CBVR task is how to achieve high temporal alignment accuracy with efficient computation and low storage consumption. In this paper, we propose a Segment Similarity and Alignment Network (SSAN) in dealing with the challenge which is firstly trained end-to-end in S-CBVR. SSAN is based on two newly proposed modules in video retrieval: (1) An efficient Self-supervised Keyframe Extraction (SKE) module to reduce redundant frame features, (2) A robust Similarity Pattern Detection (SPD) module for temporal alignment. In comparison with uniform frame extraction, SKE not only saves feature storage and search time, but also introduces comparable accuracy and limited extra computation time. In terms of temporal alignment, SPD localizes similar segments with higher accuracy and efficiency than existing deep learning methods. Furthermore, we jointly train SSAN with SKE and SPD and achieve an end-to-end improvement. Meanwhile, the two key modules SKE and SPD can also be effectively inserted into other video retrieval pipelines and gain considerable performance improvements. Experimental results on public datasets show that SSAN can obtain higher alignment accuracy while saving storage and online query computational cost compared to existing methods. △ Less

Submitted 20 September, 2023; originally announced September 2023.

Comments: Accepted by ACM MM 2021

arXiv:2309.06667 [pdf]

doi 10.1038/s41467-023-41773-x

Visualizing moiré ferroelectricity via plasmons and nano-photocurrent in graphene/twisted-WSe2 structures

Authors: Shuai Zhang, Yang Liu, Zhiyuan Sun, Xinzhong Chen, Baichang Li, S. L. Moore, Song Liu, Zhiying Wang, S. E. Rossi, Ran Jing, Jordan Fonseca, Birui Yang, Yinming Shao, Chun-Ying Huang, Taketo Handa, Lin Xiong, Matthew Fu, Tsai-Chun Pan, Dorri Halbertal, Xinyi Xu, Wenjun Zheng, P. J. Schuck, A. N. Pasupathy, C. R. Dean, Xiaoyang Zhu , et al. (6 additional authors not shown)

Abstract: Ferroelectricity, a spontaneous and reversible electric polarization, is found in certain classes of van der Waals (vdW) material heterostructures. The discovery of ferroelectricity in twisted vdW layers provides new opportunities to engineer spatially dependent electric and optical properties associated with the configuration of moiré superlattice domains and the network of domain walls. Here, we… ▽ More Ferroelectricity, a spontaneous and reversible electric polarization, is found in certain classes of van der Waals (vdW) material heterostructures. The discovery of ferroelectricity in twisted vdW layers provides new opportunities to engineer spatially dependent electric and optical properties associated with the configuration of moiré superlattice domains and the network of domain walls. Here, we employ near-field infrared nano-imaging and nano-photocurrent measurements to study ferroelectricity in minimally twisted WSe2. The ferroelectric domains are visualized through the imaging of the plasmonic response in a graphene monolayer adjacent to the moiré WSe2 bilayers. Specifically, we find that the ferroelectric polarization in moiré domains is imprinted on the plasmonic response of the graphene. Complementary nano-photocurrent measurements demonstrate that the optoelectronic properties of graphene are also modulated by the proximal ferroelectric domains. Our approach represents an alternative strategy for studying moiré ferroelectricity at native length scales and opens promising prospects for (opto)electronic devices. △ Less

Submitted 12 September, 2023; originally announced September 2023.

Comments: 19 pages, 3 figures

Journal ref: Nature Communications 14, 6200 (2023)

arXiv:2308.14295 [pdf, other]

Traffic Light Control with Reinforcement Learning

Authors: Taoyu Pan

Abstract: Traffic light control is important for reducing congestion in urban mobility systems. This paper proposes a real-time traffic light control method using deep Q learning. Our approach incorporates a reward function considering queue lengths, delays, travel time, and throughput. The model dynamically decides phase changes based on current traffic conditions. The training of the deep Q network involv… ▽ More Traffic light control is important for reducing congestion in urban mobility systems. This paper proposes a real-time traffic light control method using deep Q learning. Our approach incorporates a reward function considering queue lengths, delays, travel time, and throughput. The model dynamically decides phase changes based on current traffic conditions. The training of the deep Q network involves an offline stage from pre-generated data with fixed schedules and an online stage using real-time traffic data. A deep Q network structure with a "phase gate" component is used to simplify the model's learning task under different phases. A "memory palace" mechanism is used to address sample imbalance during the training process. We validate our approach using both synthetic and real-world traffic flow data on a road intersecting in Hangzhou, China. Results demonstrate significant performance improvements of the proposed method in reducing vehicle waiting time (57.1% to 100%), queue lengths (40.9% to 100%), and total travel time (16.8% to 68.0%) compared to traditional fixed signal plans. △ Less

Submitted 28 August, 2023; originally announced August 2023.

arXiv:2308.05493 [pdf, other]

Look at the Neighbor: Distortion-aware Unsupervised Domain Adaptation for Panoramic Semantic Segmentation

Authors: Xu Zheng, Tianbo Pan, Yunhao Luo, Lin Wang

Abstract: Endeavors have been recently made to transfer knowledge from the labeled pinhole image domain to the unlabeled panoramic image domain via Unsupervised Domain Adaptation (UDA). The aim is to tackle the domain gaps caused by the style disparities and distortion problem from the non-uniformly distributed pixels of equirectangular projection (ERP). Previous works typically focus on transferring knowle… ▽ More Endeavors have been recently made to transfer knowledge from the labeled pinhole image domain to the unlabeled panoramic image domain via Unsupervised Domain Adaptation (UDA). The aim is to tackle the domain gaps caused by the style disparities and distortion problem from the non-uniformly distributed pixels of equirectangular projection (ERP). Previous works typically focus on transferring knowledge based on geometric priors with specially designed multi-branch network architectures. As a result, considerable computational costs are induced, and meanwhile, their generalization abilities are profoundly hindered by the variation of distortion among pixels. In this paper, we find that the pixels' neighborhood regions of the ERP indeed introduce less distortion. Intuitively, we propose a novel UDA framework that can effectively address the distortion problems for panoramic semantic segmentation. In comparison, our method is simpler, easier to implement, and more computationally efficient. Specifically, we propose distortion-aware attention (DA) capturing the neighboring pixel distribution without using any geometric constraints. Moreover, we propose a class-wise feature aggregation (CFA) module to iteratively update the feature representations with a memory bank. As such, the feature similarity between two domains can be consistently optimized. Extensive experiments show that our method achieves new state-of-the-art performance while remarkably reducing 80% parameters. △ Less

Submitted 10 August, 2023; originally announced August 2023.

Comments: Accepted to ICCV 2023

arXiv:2307.08271 [pdf, ps, other]

doi 10.1088/1674-1137/ace821

$B_{(s)} \rightarrow D^{**}_{(s)}$ form factors in HQEFT and model independent analysis of relevant semileptonic decays with NP effects

Authors: Ya-Bing Zuo, Hong-Yao Jin, Jing-Ying Tian, Jia Yi, Han-Yu Gong, Ting-Ting Pan

Abstract: The form factors of $B_{(s)}$ decays into P-wave excited charmed mesons (including $D^*_0(2300)$, $D_1(2430)$, $D_1(2420)$, $D^*_2(2460)$ and their strange counterparts, denoted generically as $D^{**}_{(s)}$) are systematically calculated via the QCD sum rules in the framework of heavy quark effective field theory (HQEFT). We consider contributions up to the next leading order of heavy quark expan… ▽ More The form factors of $B_{(s)}$ decays into P-wave excited charmed mesons (including $D^*_0(2300)$, $D_1(2430)$, $D_1(2420)$, $D^*_2(2460)$ and their strange counterparts, denoted generically as $D^{**}_{(s)}$) are systematically calculated via the QCD sum rules in the framework of heavy quark effective field theory (HQEFT). We consider contributions up to the next leading order of heavy quark expansion and give all the relevant form factors, including the scalar and tensor ones only relevant for possible new physics effects. The expressions for the form factors in terms of several universal wave functions are derived via heavy quark expansion. These universal functions can be evaluated through QCD sum rules. Then, the numerical results of the form factors are presented. With the form factors given here, a model independent analysis of relevant semileptonic decays $B_{(s)} \rightarrow D^{**}_{(s)} l \barν_l$ is performed, including the contributions from possible new physics effects. Our predictions for the differential decay widths, branching fractions and ratios of branching fractions $R(D^{**}_{(s)})$ may be tested in more precise experiments in the future. △ Less

Submitted 29 August, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

Comments: 38 pages, 8 figures, 12 tables

Journal ref: Chinese Physics C47,(2023)103104

arXiv:2307.05904 [pdf]

Twofold Symmetry Observed in Bi$_{2}$Te$_{3}$/FeTe Interfacial Superconductor

Authors: Xinru Han, Hailang Qin, Tianluo Pan, Bin Guo, Kaige Shi, Zijin Huang, Jie Jiang, Hangyu Yin, Hongtao He, Fei Ye, Wei-Qiang Chen, Jia-Wei Mei, Gan Wang

Abstract: Superconducting pairing symmetry are crucial in understanding the microscopic superconducting mechanism of a superconductor. Here we report the observation of a twofold superconducting gap symmetry in an interfacial superconductor Bi$_{2}$Te$_{3}$/FeTe, by employing quasiparticle interference (QPI) technique in scanning tunneling microscopy and macroscopic magnetoresistance measurements. The QPI p… ▽ More Superconducting pairing symmetry are crucial in understanding the microscopic superconducting mechanism of a superconductor. Here we report the observation of a twofold superconducting gap symmetry in an interfacial superconductor Bi$_{2}$Te$_{3}$/FeTe, by employing quasiparticle interference (QPI) technique in scanning tunneling microscopy and macroscopic magnetoresistance measurements. The QPI patterns corresponding to energies inside and outside the gap reveal a clear anisotropic superconducting gap. Furthermore, both the in-plane angle-dependent magnetoresistance and in-plane upper critical field exhibit a clear twofold symmetry. This twofold symmetry align with the Te-Te direction in FeTe, which weakens the possible generation by bi-collinear antiferromagnetism order. Our finding provides key information in further understanding of the topological properties in Bi$_{2}$Te$_{3}$/FeTe superconducting system and propels further theoretical interests in the paring mechanism in the system. △ Less

Submitted 25 August, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

arXiv:2306.16201 [pdf, other]

Low-Confidence Samples Mining for Semi-supervised Object Detection

Authors: Guandu Liu, Fangyuan Zhang, Tianxiang Pan, Bin Wang

Abstract: Reliable pseudo-labels from unlabeled data play a key role in semi-supervised object detection (SSOD). However, the state-of-the-art SSOD methods all rely on pseudo-labels with high confidence, which ignore valuable pseudo-labels with lower confidence. Additionally, the insufficient excavation for unlabeled data results in an excessively low recall rate thus hurting the network training. In this p… ▽ More Reliable pseudo-labels from unlabeled data play a key role in semi-supervised object detection (SSOD). However, the state-of-the-art SSOD methods all rely on pseudo-labels with high confidence, which ignore valuable pseudo-labels with lower confidence. Additionally, the insufficient excavation for unlabeled data results in an excessively low recall rate thus hurting the network training. In this paper, we propose a novel Low-confidence Samples Mining (LSM) method to utilize low-confidence pseudo-labels efficiently. Specifically, we develop an additional pseudo information mining (PIM) branch on account of low-resolution feature maps to extract reliable large-area instances, the IoUs of which are higher than small-area ones. Owing to the complementary predictions between PIM and the main branch, we further design self-distillation (SD) to compensate for both in a mutually-learning manner. Meanwhile, the extensibility of the above approaches enables our LSM to apply to Faster-RCNN and Deformable-DETR respectively. On the MS-COCO benchmark, our method achieves 3.54% mAP improvement over state-of-the-art methods under 5% labeling ratios. △ Less

Submitted 28 June, 2023; originally announced June 2023.

arXiv:2306.09246 [pdf, other]

Dynamic Modulation of Electromagnetically Induced Transparency Metamaterials through Mode Coupling and Stretchable Design

Authors: Sihong Chen, Taisong Pan, Zhengcheng Mou, Bing-Zhong Wang, Yuan Lin

Abstract: The active control of electromagnetically induced transparency (EIT) metamaterials (MM) has the potential to revolutionize communication networks without relying on quantum technology. However, current reconfigurable systems offer limited flexibility and have high fabrication costs and difficulties. In this study, we examine a classical EIT metamaterial and discover a novel modulation mechanism th… ▽ More The active control of electromagnetically induced transparency (EIT) metamaterials (MM) has the potential to revolutionize communication networks without relying on quantum technology. However, current reconfigurable systems offer limited flexibility and have high fabrication costs and difficulties. In this study, we examine a classical EIT metamaterial and discover a novel modulation mechanism that leverages mode coupling to dynamically adjust the bandwidth and group delay of the EIT MM. This mechanism is verified through analyses of the electric field and surface charge density distributions. Additionally, a robust coupled Lorentz oscillator model is used to explain the coupling mechanism, with results that are in good agreement with simulations and experiments. To capitalize on this mechanism, we propose a block-definition approach where the MM is divided into stretchable sections, allowing for dynamic modulation of the bandwidth and group delay by stretching the EIT MM. Furthermore, the fabrication process is highly compatible with traditional flexible printed circuit board techniques. Our block-definition EIT MM offers unprecedented tunability and flexibility, requiring no complex components or specialized materials, making it a promising candidate for tunable slow-wave devices and other reconfigurable microwave applications. △ Less

Submitted 22 May, 2023; originally announced June 2023.

arXiv:2306.08062 [pdf, other]

Block definition design for stretchable metamaterials: enabling configurable sensitivity to deformation

Authors: Sihong Chen, Taisong Pan, Zhengcheng Mou, Mingde Du, Tianxiang Wang, Bing-Zhong Wang, and Yuan Lin

Abstract: The sensitivity to deformation plays a key role in determining the applicability of stretchable metamaterials (MMs) to be used for conformal integration or mechanical reconfiguration. Typically, different unit designs are required to achieve the desired sensitivity, but this article proposes a block definition design for stretchable MMs that enables regulation of the MMs' response to deformation b… ▽ More The sensitivity to deformation plays a key role in determining the applicability of stretchable metamaterials (MMs) to be used for conformal integration or mechanical reconfiguration. Typically, different unit designs are required to achieve the desired sensitivity, but this article proposes a block definition design for stretchable MMs that enables regulation of the MMs' response to deformation by defining various block arrangements with the same precursor structure. The article demonstrates a stretchable MM that employs the block definition design to show the mechanical reconfigurability of resonant frequency. Different block definitions result in modulation ranges of resonant frequency ranging from 39\% to 85\% when applying a 20\% tensile strain. Additionally, the proposed design is also used to realize another MM with contradictory sensitivity to the deformation and electromagnetically induced transparency (EIT) MMs with configurable transmission bandwidth to the deformation, indicating its potential for broader applications. △ Less

Submitted 22 May, 2023; originally announced June 2023.

arXiv:2305.16804 [pdf, other]

Towards Open-World Segmentation of Parts

Authors: Tai-Yu Pan, Qing Liu, Wei-Lun Chao, Brian Price

Abstract: Segmenting object parts such as cup handles and animal bodies is important in many real-world applications but requires more annotation effort. The largest dataset nowadays contains merely two hundred object categories, implying the difficulty to scale up part segmentation to an unconstrained setting. To address this, we propose to explore a seemingly simplified but empirically useful and scalable… ▽ More Segmenting object parts such as cup handles and animal bodies is important in many real-world applications but requires more annotation effort. The largest dataset nowadays contains merely two hundred object categories, implying the difficulty to scale up part segmentation to an unconstrained setting. To address this, we propose to explore a seemingly simplified but empirically useful and scalable task, class-agnostic part segmentation. In this problem, we disregard the part class labels in training and instead treat all of them as a single part class. We argue and demonstrate that models trained without part classes can better localize parts and segment them on objects unseen in training. We then present two further improvements. First, we propose to make the model object-aware, leveraging the fact that parts are "compositions", whose extents are bounded by the corresponding objects and whose appearances are by nature not independent but bundled. Second, we introduce a novel approach to improve part segmentation on unseen objects, inspired by an interesting finding -- for unseen objects, the pixel-wise features extracted by the model often reveal high-quality part segments. To this end, we propose a novel self-supervised procedure that iterates between pixel clustering and supervised contrastive learning that pulls pixels closer or pushes them away. Via extensive experiments on PartImageNet and Pascal-Part, we show notable and consistent gains by our approach, essentially a critical step towards open-world part segmentation. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: Accepted to CVPR 2023

arXiv:2305.02610 [pdf, other]

Boundary-aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval

Authors: Tan Pan, Furong Xu, Xudong Yang, Sifeng He, Chen Jiang, Qingpei Guo, Feng Qian Xiaobo Zhang, Yuan Cheng, Lei Yang, Wei Chu

Abstract: Image retrieval plays an important role in the Internet world. Usually, the core parts of mainstream visual retrieval systems include an online service of the embedding model and a large-scale vector database. For traditional model upgrades, the old model will not be replaced by the new one until the embeddings of all the images in the database are re-computed by the new model, which takes days or… ▽ More Image retrieval plays an important role in the Internet world. Usually, the core parts of mainstream visual retrieval systems include an online service of the embedding model and a large-scale vector database. For traditional model upgrades, the old model will not be replaced by the new one until the embeddings of all the images in the database are re-computed by the new model, which takes days or weeks for a large amount of data. Recently, backward-compatible training (BCT) enables the new model to be immediately deployed online by making the new embeddings directly comparable to the old ones. For BCT, improving the compatibility of two models with less negative impact on retrieval performance is the key challenge. In this paper, we introduce AdvBCT, an Adversarial Backward-Compatible Training method with an elastic boundary constraint that takes both compatibility and discrimination into consideration. We first employ adversarial learning to minimize the distribution disparity between embeddings of the new model and the old model. Meanwhile, we add an elastic boundary constraint during training to improve compatibility and discrimination efficiently. Extensive experiments on GLDv2, Revisited Oxford (ROxford), and Revisited Paris (RParis) demonstrate that our method outperforms other BCT methods on both compatibility and discrimination. The implementation of AdvBCT will be publicly available at https://github.com/Ashespt/AdvBCT. △ Less

Submitted 4 May, 2023; originally announced May 2023.

Comments: accepted by CVPR 2023

arXiv:2302.08890 [pdf, other]

Deep Learning for Event-based Vision: A Comprehensive Survey and Benchmarks

Authors: Xu Zheng, Yexin Liu, Yunfan Lu, Tongyan Hua, Tianbo Pan, Weiming Zhang, Dacheng Tao, Lin Wang

Abstract: Event cameras are bio-inspired sensors that capture the per-pixel intensity changes asynchronously and produce event streams encoding the time, pixel position, and polarity (sign) of the intensity changes. Event cameras possess a myriad of advantages over canonical frame-based cameras, such as high temporal resolution, high dynamic range, low latency, etc. Being capable of capturing information in… ▽ More Event cameras are bio-inspired sensors that capture the per-pixel intensity changes asynchronously and produce event streams encoding the time, pixel position, and polarity (sign) of the intensity changes. Event cameras possess a myriad of advantages over canonical frame-based cameras, such as high temporal resolution, high dynamic range, low latency, etc. Being capable of capturing information in challenging visual conditions, event cameras have the potential to overcome the limitations of frame-based cameras in the computer vision and robotics community. In very recent years, deep learning (DL) has been brought to this emerging field and inspired active research endeavors in mining its potential. However, there is still a lack of taxonomies in DL techniques for event-based vision. We first scrutinize the typical event representations with quality enhancement methods as they play a pivotal role as inputs to the DL models. We then provide a comprehensive survey of existing DL-based methods by structurally grouping them into two major categories: 1) image/video reconstruction and restoration; 2) event-based scene understanding and 3D vision. We conduct benchmark experiments for the existing methods in some representative research directions, i.e., image reconstruction, deblurring, and object recognition, to identify some critical insights and problems. Finally, we have discussions regarding the challenges and provide new perspectives for inspiring more research studies. △ Less

Submitted 11 April, 2024; v1 submitted 17 February, 2023; originally announced February 2023.

arXiv:2301.09897 [pdf, ps, other]

Stochastic heat equations on moving domains

Authors: Tianyi Pan, Wei Wang, Jianliang Zhai, Tusheng Zhang

Abstract: In this paper, we establish the well-posedness of stochastic heat equations on moving domains, which amounts to a study of infinite dimensional interacting systems. The main difficulty is to deal with the problems caused by the time-varying state spaces and the interaction of the particle systems. The interaction still occurs even in the case of additive noise. This is in contrast to stochastic he… ▽ More In this paper, we establish the well-posedness of stochastic heat equations on moving domains, which amounts to a study of infinite dimensional interacting systems. The main difficulty is to deal with the problems caused by the time-varying state spaces and the interaction of the particle systems. The interaction still occurs even in the case of additive noise. This is in contrast to stochastic heat equations in a fixed domain. △ Less

Submitted 24 January, 2023; originally announced January 2023.

arXiv:2212.10282 [pdf, ps, other]

Large deviations of fully local monotone stochastic partial differential equations driven by gradient-dependent noise

Authors: Tianyi Pan, Shijie Shang, Jianliang Zhai, Tusheng Zhang

Abstract: Consider stochastic partial differential equations (SPDEs) with fully local monotone coefficients in a Gelfand triple $V\subseteq H\subseteq V^*$ $$ \left\{ \begin{align} &dX_t=A(t,X_t)dt+B(t,X_t)dW_t,\ t\in (0,T]\\\\& X_0=x\in H, \end{align} \right. $$ where $$A: [0,T] \times V\rightarrow V^*,\ \ B:[0,T]\times V\rightarrow\ L_2(U,H)$$ are measurable maps, $L_2(U,H)$ is the space o… ▽ More Consider stochastic partial differential equations (SPDEs) with fully local monotone coefficients in a Gelfand triple $V\subseteq H\subseteq V^*$ $$ \left\{ \begin{align} &dX_t=A(t,X_t)dt+B(t,X_t)dW_t,\ t\in (0,T]\\\\& X_0=x\in H, \end{align} \right. $$ where $$A: [0,T] \times V\rightarrow V^*,\ \ B:[0,T]\times V\rightarrow\ L_2(U,H)$$ are measurable maps, $L_2(U,H)$ is the space of Hilbert-Schmidt operators from $U$ to $H$ and $W$ is a $U$-cylindrical Wiener process.\par In this paper, we establish a small noise large deviation principle(LDP) for the solutions {$u^\varepsilon$}$_{\varepsilon>0}$ of the above SPDEs. The main contribution of this paper is the much more generality of our framework than that of the existing results. In particular, the diffusion coefficient $B(t,\cdot)$ may depend on the gradient of the solutions, which is of great interest in the field of SPDEs, but there are few existing results on the topic of LDP. The broader scope of the fully local monotone setting leads us to use different strategies and techniques. A combination of the pseudomonotone technique and compactness arguement plays a crucial role in the whole paper. Our framework is very general to include many interesting models that could not be covered by existing work, including stochastic quasilinear SPDEs, stochastic convection diffusion equation, stochastic 2D Liquid crystal equation, stochastic $p$-Laplace equation with gradient-dependent noise, stochastic 2D Navier-Stokes equation with gradient-dependent noise etc. △ Less

Submitted 10 January, 2024; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: 36 pages

MSC Class: 60H15 (Primary) 60F10; 35R60 (Secondary)

arXiv:2212.06988 [pdf, other]

Efficient Exploration in Resource-Restricted Reinforcement Learning

Authors: Zhihai Wang, Taoxing Pan, Qi Zhou, Jie Wang

Abstract: In many real-world applications of reinforcement learning (RL), performing actions requires consuming certain types of resources that are non-replenishable in each episode. Typical applications include robotic control with limited energy and video games with consumable items. In tasks with non-replenishable resources, we observe that popular RL methods such as soft actor critic suffer from poor sa… ▽ More In many real-world applications of reinforcement learning (RL), performing actions requires consuming certain types of resources that are non-replenishable in each episode. Typical applications include robotic control with limited energy and video games with consumable items. In tasks with non-replenishable resources, we observe that popular RL methods such as soft actor critic suffer from poor sample efficiency. The major reason is that, they tend to exhaust resources fast and thus the subsequent exploration is severely restricted due to the absence of resources. To address this challenge, we first formalize the aforementioned problem as a resource-restricted reinforcement learning, and then propose a novel resource-aware exploration bonus (RAEB) to make reasonable usage of resources. An appealing feature of RAEB is that, it can significantly reduce unnecessary resource-consuming trials while effectively encouraging the agent to explore unvisited states. Experiments demonstrate that the proposed RAEB significantly outperforms state-of-the-art exploration strategies in resource-restricted reinforcement learning environments, improving the sample efficiency by up to an order of magnitude. △ Less

Submitted 13 December, 2022; originally announced December 2022.

Comments: Accepted to AAAI 2023

arXiv:2211.15182 [pdf, other]

Easy Begun is Half Done: Spatial-Temporal Graph Modeling with ST-Curriculum Dropout

Authors: Hongjun Wang, Jiyuan Chen, Tong Pan, Zipei Fan, Boyuan Zhang, Renhe Jiang, Lingyu Zhang, Yi Xie, Zhongyi Wang, Xuan Song

Abstract: Spatial-temporal (ST) graph modeling, such as traffic speed forecasting and taxi demand prediction, is an important task in deep learning area. However, for the nodes in graph, their ST patterns can vary greatly in difficulties for modeling, owning to the heterogeneous nature of ST data. We argue that unveiling the nodes to the model in a meaningful order, from easy to complex, can provide perform… ▽ More Spatial-temporal (ST) graph modeling, such as traffic speed forecasting and taxi demand prediction, is an important task in deep learning area. However, for the nodes in graph, their ST patterns can vary greatly in difficulties for modeling, owning to the heterogeneous nature of ST data. We argue that unveiling the nodes to the model in a meaningful order, from easy to complex, can provide performance improvements over traditional training procedure. The idea has its root in Curriculum Learning which suggests in the early stage of training models can be sensitive to noise and difficult samples. In this paper, we propose ST-Curriculum Dropout, a novel and easy-to-implement strategy for spatial-temporal graph modeling. Specifically, we evaluate the learning difficulty of each node in high-level feature space and drop those difficult ones out to ensure the model only needs to handle fundamental ST relations at the beginning, before gradually moving to hard ones. Our strategy can be applied to any canonical deep learning architecture without extra trainable parameters, and extensive experiments on a wide range of datasets are conducted to illustrate that, by controlling the difficulty level of ST relations as the training progresses, the model is able to capture better representation of the data and thus yields better generalization. △ Less

Submitted 28 November, 2022; originally announced November 2022.

arXiv:2211.11888 [pdf, ps, other]

Precision education: A Bayesian nonparametric approach for handling item and examinee heterogeneity in assessment data

Authors: Tianyu Pan, Weining Shen, Clintin P. Davis-Stober, Guanyu Hu

Abstract: We propose a novel nonparametric Bayesian IRT model in this paper by introducing the clustering effect at question level and further assume heterogeneity at examinee level under each question cluster, characterized by the mixture of Binomial distributions. The main contribution of this work is threefold: (1) We demonstrate that the model is identifiable. (2) The clustering effect can be captured a… ▽ More We propose a novel nonparametric Bayesian IRT model in this paper by introducing the clustering effect at question level and further assume heterogeneity at examinee level under each question cluster, characterized by the mixture of Binomial distributions. The main contribution of this work is threefold: (1) We demonstrate that the model is identifiable. (2) The clustering effect can be captured asymptotically and the parameters of interest that measure the proficiency of examinees in solving certain questions can be estimated at a root n rate (up to a log term). (3) We present a tractable sampling algorithm to obtain valid posterior samples from our proposed model. We evaluate our model via a series of simulations as well as apply it to an English assessment data. This data analysis example nicely illustrates how our model can be used by test makers to distinguish different types of students and aid in the design of future tests. △ Less

Submitted 21 November, 2022; originally announced November 2022.

arXiv:2209.06167 [pdf, other]

PET image denoising based on denoising diffusion probabilistic models

Authors: Kuang Gong, Keith A. Johnson, Georges El Fakhri, Quanzheng Li, Tinsu Pan

Abstract: Due to various physical degradation factors and limited counts received, PET image quality needs further improvements. The denoising diffusion probabilistic models (DDPM) are distribution learning-based models, which try to transform a normal distribution into a specific data distribution based on iterative refinements. In this work, we proposed and evaluated different DDPM-based methods for PET i… ▽ More Due to various physical degradation factors and limited counts received, PET image quality needs further improvements. The denoising diffusion probabilistic models (DDPM) are distribution learning-based models, which try to transform a normal distribution into a specific data distribution based on iterative refinements. In this work, we proposed and evaluated different DDPM-based methods for PET image denoising. Under the DDPM framework, one way to perform PET image denoising is to provide the PET image and/or the prior image as the network input. Another way is to supply the prior image as the input with the PET image included in the refinement steps, which can fit for scenarios of different noise levels. 120 18F-FDG datasets and 140 18F-MK-6240 datasets were utilized to evaluate the proposed DDPM-based methods. Quantification show that the DDPM-based frameworks with PET information included can generate better results than the nonlocal mean and Unet-based denoising methods. Adding additional MR prior in the model can help achieve better performance and further reduce the uncertainty during image denoising. Solely relying on MR prior while ignoring the PET information can result in large bias. Regional and surface quantification shows that employing MR prior as the network input while embedding PET image as a data-consistency constraint during inference can achieve the best performance. In summary, DDPM-based PET image denoising is a flexible framework, which can efficiently utilize prior information and achieve better performance than the nonlocal mean and Unet-based denoising methods. △ Less

Submitted 14 September, 2022; v1 submitted 13 September, 2022; originally announced September 2022.

Comments: 8 figures

arXiv:2209.05726 [pdf, other]

Data efficient reinforcement learning and adaptive optimal perimeter control of network traffic dynamics

Authors: C. Chen, Y. P. Huang, W. H. K. Lam, T. L. Pan, S. C. Hsu, A. Sumalee, R. X. Zhong

Abstract: Existing data-driven and feedback traffic control strategies do not consider the heterogeneity of real-time data measurements. Besides, traditional reinforcement learning (RL) methods for traffic control usually converge slowly for lacking data efficiency. Moreover, conventional optimal perimeter control schemes require exact knowledge of the system dynamics and thus would be fragile to endogenous… ▽ More Existing data-driven and feedback traffic control strategies do not consider the heterogeneity of real-time data measurements. Besides, traditional reinforcement learning (RL) methods for traffic control usually converge slowly for lacking data efficiency. Moreover, conventional optimal perimeter control schemes require exact knowledge of the system dynamics and thus would be fragile to endogenous uncertainties. To handle these challenges, this work proposes an integral reinforcement learning (IRL) based approach to learning the macroscopic traffic dynamics for adaptive optimal perimeter control. This work makes the following primary contributions to the transportation literature: (a) A continuous-time control is developed with discrete gain updates to adapt to the discrete-time sensor data. (b) To reduce the sampling complexity and use the available data more efficiently, the experience replay (ER) technique is introduced to the IRL algorithm. (c) The proposed method relaxes the requirement on model calibration in a "model-free" manner that enables robustness against modeling uncertainty and enhances the real-time performance via a data-driven RL algorithm. (d) The convergence of the IRL-based algorithms and the stability of the controlled traffic dynamics are proven via the Lyapunov theory. The optimal control law is parameterized and then approximated by neural networks (NN), which moderates the computational complexity. Both state and input constraints are considered while no model linearization is required. Numerical examples and simulation experiments are presented to verify the effectiveness and efficiency of the proposed method. △ Less

Submitted 13 September, 2022; originally announced September 2022.

arXiv:2209.03300 [pdf, ps, other]

Spach Transformer: Spatial and Channel-wise Transformer Based on Local and Global Self-attentions for PET Image Denoising

Authors: Se-In Jang, Tinsu Pan, Ye Li, Pedram Heidari, Junyu Chen, Quanzheng Li, Kuang Gong

Abstract: Position emission tomography (PET) is widely used in clinics and research due to its quantitative merits and high sensitivity, but suffers from low signal-to-noise ratio (SNR). Recently convolutional neural networks (CNNs) have been widely used to improve PET image quality. Though successful and efficient in local feature extraction, CNN cannot capture long-range dependencies well due to its limit… ▽ More Position emission tomography (PET) is widely used in clinics and research due to its quantitative merits and high sensitivity, but suffers from low signal-to-noise ratio (SNR). Recently convolutional neural networks (CNNs) have been widely used to improve PET image quality. Though successful and efficient in local feature extraction, CNN cannot capture long-range dependencies well due to its limited receptive field. Global multi-head self-attention (MSA) is a popular approach to capture long-range information. However, the calculation of global MSA for 3D images has high computational costs. In this work, we proposed an efficient spatial and channel-wise encoder-decoder transformer, Spach Transformer, that can leverage spatial and channel information based on local and global MSAs. Experiments based on datasets of different PET tracers, i.e., $^{18}$F-FDG, $^{18}$F-ACBC, $^{18}$F-DCFPyL, and $^{68}$Ga-DOTATATE, were conducted to evaluate the proposed framework. Quantitative results show that the proposed Spach Transformer framework outperforms state-of-the-art deep learning architectures. Our codes are available at https://github.com/sijang/SpachTransformer △ Less

Submitted 10 December, 2023; v1 submitted 7 September, 2022; originally announced September 2022.

Comments: 15 pages

arXiv:2208.03526 [pdf, other]

Multiplex-detection Based Multiple Instance Learning Network for Whole Slide Image Classification

Authors: Zhikang Wang, Yue Bi, Tong Pan, Xiaoyu Wang, Chris Bain, Richard Bassed, Seiya Imoto, Jianhua Yao, Jiangning Song

Abstract: Multiple instance learning (MIL) is a powerful approach to classify whole slide images (WSIs) for diagnostic pathology. A fundamental challenge of MIL on WSI classification is to discover the \textit{critical instances} that trigger the bag label. However, previous methods are primarily designed under the independent and identical distribution hypothesis (\textit{i.i.d}), ignoring either the corre… ▽ More Multiple instance learning (MIL) is a powerful approach to classify whole slide images (WSIs) for diagnostic pathology. A fundamental challenge of MIL on WSI classification is to discover the \textit{critical instances} that trigger the bag label. However, previous methods are primarily designed under the independent and identical distribution hypothesis (\textit{i.i.d}), ignoring either the correlations between instances or heterogeneity of tumours. In this paper, we propose a novel multiplex-detection-based multiple instance learning (MDMIL) to tackle the issues above. Specifically, MDMIL is constructed by the internal query generation module (IQGM) and the multiplex detection module (MDM) and assisted by the memory-based contrastive loss during training. Firstly, IQGM gives the probability of instances and generates the internal query (IQ) for the subsequent MDM by aggregating highly reliable features after the distribution analysis. Secondly, the multiplex-detection cross-attention (MDCA) and multi-head self-attention (MHSA) in MDM cooperate to generate the final representations for the WSI. In this process, the IQ and trainable variational query (VQ) successfully build up the connections between instances and significantly improve the model's robustness toward heterogeneous tumours. At last, to further enforce constraints in the feature space and stabilize the training process, we adopt a memory-based contrastive loss, which is practicable for WSI classification even with a single sample as input in each iteration. We conduct experiments on three computational pathology datasets, e.g., CAMELYON16, TCGA-NSCLC, and TCGA-RCC datasets. The superior accuracy and AUC demonstrate the superiority of our proposed MDMIL over other state-of-the-art methods. △ Less

Submitted 31 August, 2022; v1 submitted 6 August, 2022; originally announced August 2022.

arXiv:2207.02385 [pdf, ps, other]

Large deviations of stochastic heat equations with logarithmic nonlinearity

Authors: Tianyi Pan, Shijie Shang, Tusheng Zhang

Abstract: In this paper, we establish a large deviation principle for the solutions to the stochastic heat equations with logarithmic nonlinearity driven by Brownian motion, which is neither locally Lipschitz nor locally monotone. Nonlinear versions of Gronwall's inequalities and Log-Sobolev inequalities play an important role. In this paper, we establish a large deviation principle for the solutions to the stochastic heat equations with logarithmic nonlinearity driven by Brownian motion, which is neither locally Lipschitz nor locally monotone. Nonlinear versions of Gronwall's inequalities and Log-Sobolev inequalities play an important role. △ Less

Submitted 5 July, 2022; originally announced July 2022.

Comments: 27 pages

MSC Class: 60H15 (Primary) 60F10; 35R60 (Secondary)

arXiv:2203.02654 [pdf, other]

A Large-scale Comprehensive Dataset and Copy-overlap Aware Evaluation Protocol for Segment-level Video Copy Detection

Authors: Sifeng He, Xudong Yang, Chen Jiang, Gang Liang, Wei Zhang, Tan Pan, Qing Wang, Furong Xu, Chunguang Li, Jingxiong Liu, Hui Xu, Kaiming Huang, Yuan Cheng, Feng Qian, Xiaobo Zhang, Lei Yang

Abstract: In this paper, we introduce VCSL (Video Copy Segment Localization), a new comprehensive segment-level annotated video copy dataset. Compared with existing copy detection datasets restricted by either video-level annotation or small-scale, VCSL not only has two orders of magnitude more segment-level labelled data, with 160k realistic video copy pairs containing more than 280k localized copied segme… ▽ More In this paper, we introduce VCSL (Video Copy Segment Localization), a new comprehensive segment-level annotated video copy dataset. Compared with existing copy detection datasets restricted by either video-level annotation or small-scale, VCSL not only has two orders of magnitude more segment-level labelled data, with 160k realistic video copy pairs containing more than 280k localized copied segment pairs, but also covers a variety of video categories and a wide range of video duration. All the copied segments inside each collected video pair are manually extracted and accompanied by precisely annotated starting and ending timestamps. Alongside the dataset, we also propose a novel evaluation protocol that better measures the prediction accuracy of copy overlapping segments between a video pair and shows improved adaptability in different scenarios. By benchmarking several baseline and state-of-the-art segment-level video copy detection methods with the proposed dataset and evaluation metric, we provide a comprehensive analysis that uncovers the strengths and weaknesses of current approaches, hoping to open up promising directions for future works. The VCSL dataset, metric and benchmark codes are all publicly available at https://github.com/alipay/VCSL. △ Less

Submitted 16 June, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

Comments: Accepted by CVPR 2022. Codes are all publicly available at https://github.com/alipay/VCSL

arXiv:2202.11124 [pdf, other]

Learning with Free Object Segments for Long-Tailed Instance Segmentation

Authors: Cheng Zhang, Tai-Yu Pan, Tianle Chen, Jike Zhong, Wenjin Fu, Wei-Lun Chao

Abstract: One fundamental challenge in building an instance segmentation model for a large number of classes in complex scenes is the lack of training examples, especially for rare objects. In this paper, we explore the possibility to increase the training examples without laborious data collection and annotation. We find that an abundance of instance segments can potentially be obtained freely from object-… ▽ More One fundamental challenge in building an instance segmentation model for a large number of classes in complex scenes is the lack of training examples, especially for rare objects. In this paper, we explore the possibility to increase the training examples without laborious data collection and annotation. We find that an abundance of instance segments can potentially be obtained freely from object-centric images, according to two insights: (i) an object-centric image usually contains one salient object in a simple background; (ii) objects from the same class often share similar appearances or similar contrasts to the background. Motivated by these insights, we propose a simple and scalable framework FreeSeg for extracting and leveraging these "free" object foreground segments to facilitate model training in long-tailed instance segmentation. Concretely, we investigate the similarity among object-centric images of the same class to propose candidate segments of foreground instances, followed by a novel ranking of segment quality. The resulting high-quality object segments can then be used to augment the existing long-tailed datasets, e.g., by copying and pasting the segments onto the original training images. Extensive experiments show that FreeSeg yields substantial improvements on top of strong baselines and achieves state-of-the-art accuracy for segmenting rare object categories. △ Less

Submitted 4 October, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

Comments: Accepted to ECCV 2022

arXiv:2202.07028 [pdf, other]

One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones

Authors: Chan Hee Song, Jihyung Kil, Tai-Yu Pan, Brian M. Sadler, Wei-Lun Chao, Yu Su

Abstract: We study the problem of developing autonomous agents that can follow human instructions to infer and perform a sequence of actions to complete the underlying task. Significant progress has been made in recent years, especially for tasks with short horizons. However, when it comes to long-horizon tasks with extended sequences of actions, an agent can easily ignore some instructions or get stuck in… ▽ More We study the problem of developing autonomous agents that can follow human instructions to infer and perform a sequence of actions to complete the underlying task. Significant progress has been made in recent years, especially for tasks with short horizons. However, when it comes to long-horizon tasks with extended sequences of actions, an agent can easily ignore some instructions or get stuck in the middle of the long instructions and eventually fail the task. To address this challenge, we propose a model-agnostic milestone-based task tracker (M-TRACK) to guide the agent and monitor its progress. Specifically, we propose a milestone builder that tags the instructions with navigation and interaction milestones which the agent needs to complete step by step, and a milestone checker that systemically checks the agent's progress in its current milestone and determines when to proceed to the next. On the challenging ALFRED dataset, our M-TRACK leads to a notable 33% and 52% relative improvement in unseen success rate over two competitive base models. △ Less

Submitted 10 June, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

Comments: 10 pages, 5 figures. Accepted to CVPR 2022

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 15482-15491

arXiv:2201.06972 [pdf, other]

Representation Learning on Heterostructures via Heterogeneous Anonymous Walks

Authors: Xuan Guo, Pengfei Jiao, Ting Pan, Wang Zhang, Mengyu Jia, Danyang Shi, Wenjun Wang

Abstract: Capturing structural similarity has been a hot topic in the field of network embedding recently due to its great help in understanding the node functions and behaviors. However, existing works have paid very much attention to learning structures on homogeneous networks while the related study on heterogeneous networks is still a void. In this paper, we try to take the first step for representation… ▽ More Capturing structural similarity has been a hot topic in the field of network embedding recently due to its great help in understanding the node functions and behaviors. However, existing works have paid very much attention to learning structures on homogeneous networks while the related study on heterogeneous networks is still a void. In this paper, we try to take the first step for representation learning on heterostructures, which is very challenging due to their highly diverse combinations of node types and underlying structures. To effectively distinguish diverse heterostructures, we firstly propose a theoretically guaranteed technique called heterogeneous anonymous walk (HAW) and its variant coarse HAW (CHAW). Then, we devise the heterogeneous anonymous walk embedding (HAWE) and its variant coarse HAWE in a data-driven manner to circumvent using an extremely large number of possible walks and train embeddings by predicting occurring walks in the neighborhood of each node. Finally, we design and apply extensive and illustrative experiments on synthetic and real-world networks to build a benchmark on heterostructure learning and evaluate the effectiveness of our methods. The results demonstrate our methods achieve outstanding performance compared with both homogeneous and heterogeneous classic methods, and can be applied on large-scale networks. △ Less

Submitted 18 January, 2022; originally announced January 2022.

Comments: 13 pages, 6 figures, 5 tables

MSC Class: 68T30 ACM Class: J.4.3

arXiv:2111.10733 [pdf, other]

On the DLM/FD methods for simulating neutrally buoyant swimmer motion in non-Newtonian shear thinning fluids

Authors: Ang Li, Tsorng-Whay Pan, Roland Glowinski

Abstract: In this article we discuss the generalization of a Lagrange multiplier based fictitious domain (DLM/FD) method to simulating the motion of neutrally buoyant particles of non-symmetric shape in non-Newtonian shear thinning fluids. Numerical solutions of steady Poiseuille flow of non-Newtonian shear thinning fluids are compared with the exact solutions in a two-dimensional channel. Concerning a self… ▽ More In this article we discuss the generalization of a Lagrange multiplier based fictitious domain (DLM/FD) method to simulating the motion of neutrally buoyant particles of non-symmetric shape in non-Newtonian shear thinning fluids. Numerical solutions of steady Poiseuille flow of non-Newtonian shear thinning fluids are compared with the exact solutions in a two-dimensional channel. Concerning a self-propelled swimmer formed by two disks, the effect of shear thinning makes the swimmer moving faster and decreases the critical Reynolds number (for the moving direction changing to the opposite one) when decreasing the value of the power index in the Carreau-Bird model. △ Less

Submitted 20 November, 2021; originally announced November 2021.

Comments: 24 pages, 15 figures, and 2 tables

arXiv:2109.05156 [pdf]

doi 10.1016/j.cemconres.2022.106785

Effect of shaping plate apparatus on mechanical properties of 3D printed cement-based materials: Experimental and numerical studies

Authors: Tinghong Pan, Huaijin Teng, Hengcheng Liao, Yaqing Jiang, Chunxiang Qian, Yu Wang

Abstract: Precisely controlling the shape of the printed-layers, eliminating the curved sides and internal stress concentration, and increasing the mechanical properties are essential to guarantee the quality of 3D printed cement-based structures. This work aims at achieving the above-mentioned targets through a specially designed shaping plate apparatus. The pressure (stress) distribution in the printed st… ▽ More Precisely controlling the shape of the printed-layers, eliminating the curved sides and internal stress concentration, and increasing the mechanical properties are essential to guarantee the quality of 3D printed cement-based structures. This work aims at achieving the above-mentioned targets through a specially designed shaping plate apparatus. The pressure (stress) distribution in the printed structure with a shaping plate apparatus (SP-3DPC), and the cross-sectional shape, microstructure and mechanical properties of SP-3DPC were systematically investigated. Results indicate that using the shaping plate apparatus may slightly reduce the printing speed, but it can effectively constrain the free expansion of extrudate, control its cross-sectional geometry, and improve the surface finish quality and mechanical properties of the printed structure. This study provides a theoretical basis and technical guidance for the design and application of the shaping plate apparatus. △ Less

Submitted 23 March, 2022; v1 submitted 10 September, 2021; originally announced September 2021.

Comments: 41 pages, 23 figures

Journal ref: Cement and Concrete Research (2022)

arXiv:2109.05154 [pdf]

doi 10.1016/j.conbuildmat.2022.127151

Interlayer bonding investigation of 3D printing cementitious materials with fluidity retaining polycarboxylate superplasticizer and high dispersion polycarboxylate superplasticizer

Authors: Tinghong Pan, Yaqing Jiang

Abstract: Proposed special requirements exist for the rheological properties and time varying characteristics of 3D printing cementitious materials (3DPC). In this study, high dispersion polycarboxylate superplasticizer (HD PC) and fluidity retaining polycarboxylate superplasticizer (FR PC) were used to control the rheological behaviors of 3DPC. The correlation of the time interval, the time varying charact… ▽ More Proposed special requirements exist for the rheological properties and time varying characteristics of 3D printing cementitious materials (3DPC). In this study, high dispersion polycarboxylate superplasticizer (HD PC) and fluidity retaining polycarboxylate superplasticizer (FR PC) were used to control the rheological behaviors of 3DPC. The correlation of the time interval, the time varying characteristics of the rheological properties and the interlayer bonding strength were investigated. The results indicated that FR PC improved the fluidity retention ability and thixotropy of the fresh pastes. The thixotropic hysteresis loop area and reflocculation rate (Rthix) of FR PC were 101.9% and 80.4% higher than those of HD PC, respectively. Furthermore, the FR PC polymer has a positive effect on the interlayer bonding and may reduce the negative effect caused by extending the time interval. During the time interval of 20 s to 30 min, the interlayer bonding strength with FR PC decreases by 14.1%, while that with HD PC decreases by 50.0%. △ Less

Submitted 23 March, 2022; v1 submitted 10 September, 2021; originally announced September 2021.

Comments: 41 pages, 19 figures

Journal ref: Construction and Building Materials (2022)

arXiv:2108.04517 [pdf, other]

doi 10.1016/mri.2022.01.012.

Iterative Self-consistent Parallel Magnetic Resonance Imaging Reconstruction based on Nonlocal Low-Rank Regularization

Authors: Ting Pan, Jizhong Duan, Junfeng Wang, Yu Liu

Abstract: Iterative self-consistent parallel imaging reconstruction (SPIRiT) is an effective self-calibrated reconstruction model for parallel magnetic resonance imaging (PMRI). The joint L1 norm of wavelet coefficients and joint total variation (TV) regularization terms are incorporated into the SPIRiT model to improve the reconstruction performance. The simultaneous two-directional low-rankness (STDLR) in… ▽ More Iterative self-consistent parallel imaging reconstruction (SPIRiT) is an effective self-calibrated reconstruction model for parallel magnetic resonance imaging (PMRI). The joint L1 norm of wavelet coefficients and joint total variation (TV) regularization terms are incorporated into the SPIRiT model to improve the reconstruction performance. The simultaneous two-directional low-rankness (STDLR) in k-space data is incorporated into SPIRiT to realize improved reconstruction. Recent methods have exploited the nonlocal self-similarity (NSS) of images by imposing nonlocal low-rankness of similar patches to achieve a superior performance. To fully utilize both the NSS in Magnetic resonance (MR) images and calibration consistency in the k-space domain, we propose a nonlocal low-rank (NLR)-SPIRiT model by incorporating NLR regularization into the SPIRiT model. We apply the weighted nuclear norm (WNN) as a surrogate of the rank and employ the Nash equilibrium (NE) formulation and alternating direction method of multipliers (ADMM) to efficiently solve the NLR-SPIRiT model. The experimental results demonstrate the superior performance of NLR-SPIRiT over the state-of-the-art methods via three objective metrics and visual comparison. △ Less

Submitted 17 April, 2022; v1 submitted 10 August, 2021; originally announced August 2021.

Journal ref: Magnetic Resonance Imaging, vol. 88, pp. 62-75, 2022

arXiv:2107.08379 [pdf, other]

A Survey on Role-Oriented Network Embedding

Authors: Pengfei Jiao, Xuan Guo, Ting Pan, Wang Zhang, Yulong Pei

Abstract: Recently, Network Embedding (NE) has become one of the most attractive research topics in machine learning and data mining. NE approaches have achieved promising performance in various of graph mining tasks including link prediction and node clustering and classification. A wide variety of NE methods focus on the proximity of networks. They learn community-oriented embedding for each node, where t… ▽ More Recently, Network Embedding (NE) has become one of the most attractive research topics in machine learning and data mining. NE approaches have achieved promising performance in various of graph mining tasks including link prediction and node clustering and classification. A wide variety of NE methods focus on the proximity of networks. They learn community-oriented embedding for each node, where the corresponding representations are similar if two nodes are closer to each other in the network. Meanwhile, there is another type of structural similarity, i.e., role-based similarity, which is usually complementary and completely different from the proximity. In order to preserve the role-based structural similarity, the problem of role-oriented NE is raised. However, compared to community-oriented NE problem, there are only a few role-oriented embedding approaches proposed recently. Although less explored, considering the importance of roles in analyzing networks and many applications that role-oriented NE can shed light on, it is necessary and timely to provide a comprehensive overview of existing role-oriented NE methods. In this review, we first clarify the differences between community-oriented and role-oriented network embedding. Afterwards, we propose a general framework for understanding role-oriented NE and a two-level categorization to better classify existing methods. Then, we select some representative methods according to the proposed categorization and briefly introduce them by discussing their motivation, development and differences. Moreover, we conduct comprehensive experiments to empirically evaluate these methods on a variety of role-related tasks including node classification and clustering (role discovery), top-k similarity search and visualization using some widely used synthetic and real-world datasets... △ Less

Submitted 18 July, 2021; originally announced July 2021.

Comments: 20 pages,9 figures, 5 tables

ACM Class: J.4.3

Showing 1–50 of 115 results for author: Pan, T