-
EndoFinder: Online Image Retrieval for Explainable Colorectal Polyp Diagnosis
Authors:
Ruijie Yang,
Yan Zhu,
Peiyao Fu,
Yizhe Zhang,
Zhihua Wang,
Quanlin Li,
Pinghong Zhou,
Xian Yang,
Shuo Wang
Abstract:
Determining the necessity of resecting malignant polyps during colonoscopy screen is crucial for patient outcomes, yet challenging due to the time-consuming and costly nature of histopathology examination. While deep learning-based classification models have shown promise in achieving optical biopsy with endoscopic images, they often suffer from a lack of explainability. To overcome this limitatio…
▽ More
Determining the necessity of resecting malignant polyps during colonoscopy screen is crucial for patient outcomes, yet challenging due to the time-consuming and costly nature of histopathology examination. While deep learning-based classification models have shown promise in achieving optical biopsy with endoscopic images, they often suffer from a lack of explainability. To overcome this limitation, we introduce EndoFinder, a content-based image retrieval framework to find the 'digital twin' polyp in the reference database given a newly detected polyp. The clinical semantics of the new polyp can be inferred referring to the matched ones. EndoFinder pioneers a polyp-aware image encoder that is pre-trained on a large polyp dataset in a self-supervised way, merging masked image modeling with contrastive learning. This results in a generic embedding space ready for different downstream clinical tasks based on image retrieval. We validate the framework on polyp re-identification and optical biopsy tasks, with extensive experiments demonstrating that EndoFinder not only achieves explainable diagnostics but also matches the performance of supervised classification models. EndoFinder's reliance on image retrieval has the potential to support diverse downstream decision-making tasks during real-time colonoscopy procedures.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Think out Loud: Emotion Deducing Explanation in Dialogues
Authors:
Jiangnan Li,
Zheng Lin,
Lanrui Wang,
Qingyi Si,
Yanan Cao,
Mo Yu,
Peng Fu,
Weiping Wang,
Jie Zhou
Abstract:
Humans convey emotions through daily dialogues, making emotion understanding a crucial step of affective intelligence. To understand emotions in dialogues, machines are asked to recognize the emotion for an utterance (Emotion Recognition in Dialogues, ERD); based on the emotion, then find causal utterances for the emotion (Emotion Cause Extraction in Dialogues, ECED). The setting of the two tasks…
▽ More
Humans convey emotions through daily dialogues, making emotion understanding a crucial step of affective intelligence. To understand emotions in dialogues, machines are asked to recognize the emotion for an utterance (Emotion Recognition in Dialogues, ERD); based on the emotion, then find causal utterances for the emotion (Emotion Cause Extraction in Dialogues, ECED). The setting of the two tasks requires first ERD and then ECED, ignoring the mutual complement between emotion and cause. To fix this, some new tasks are proposed to extract them simultaneously. Although the current research on these tasks has excellent achievements, simply identifying emotion-related factors by classification modeling lacks realizing the specific thinking process of causes stimulating the emotion in an explainable way. This thinking process especially reflected in the reasoning ability of Large Language Models (LLMs) is under-explored. To this end, we propose a new task "Emotion Deducing Explanation in Dialogues" (EDEN). EDEN recognizes emotion and causes in an explicitly thinking way. That is, models need to generate an explanation text, which first summarizes the causes; analyzes the inner activities of the speakers triggered by the causes using common sense; then guesses the emotion accordingly. To support the study of EDEN, based on the existing resources in ECED, we construct two EDEN datasets by human effort. We further evaluate different models on EDEN and find that LLMs are more competent than conventional PLMs. Besides, EDEN can help LLMs achieve better recognition of emotions and causes, which explores a new research direction of explainable emotion understanding in dialogues.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning
Authors:
Naibin Gu,
Peng Fu,
Xiyu Liu,
Bowen Shen,
Zheng Lin,
Weiping Wang
Abstract:
Parameter-efficient fine-tuning (PEFT) has emerged as the predominant technique for fine-tuning in the era of large language models. However, existing PEFT methods still have inadequate training efficiency. Firstly, the utilization of large-scale foundation models during the training process is excessively redundant for certain fine-tuning tasks. Secondly, as the model size increases, the growth i…
▽ More
Parameter-efficient fine-tuning (PEFT) has emerged as the predominant technique for fine-tuning in the era of large language models. However, existing PEFT methods still have inadequate training efficiency. Firstly, the utilization of large-scale foundation models during the training process is excessively redundant for certain fine-tuning tasks. Secondly, as the model size increases, the growth in trainable parameters of empirically added PEFT modules becomes non-negligible and redundant, leading to inefficiency. To achieve task-specific efficient fine-tuning, we propose the Light-PEFT framework, which includes two methods: Masked Early Pruning of the Foundation Model and Multi-Granularity Early Pruning of PEFT. The Light-PEFT framework allows for the simultaneous estimation of redundant parameters in both the foundation model and PEFT modules during the early stage of training. These parameters can then be pruned for more efficient fine-tuning. We validate our approach on GLUE, SuperGLUE, QA tasks, and various models. With Light-PEFT, parameters of the foundation model can be pruned by up to over 40%, while still controlling trainable parameters to be only 25% of the original PEFT method. Compared to utilizing the PEFT method directly, Light-PEFT achieves training and inference speedup, reduces memory usage, and maintains comparable performance and the plug-and-play feature of PEFT.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Towards Generalizable Multi-Object Tracking
Authors:
Zheng Qin,
Le Wang,
Sanping Zhou,
Panpan Fu,
Gang Hua,
Wei Tang
Abstract:
Multi-Object Tracking MOT encompasses various tracking scenarios, each characterized by unique traits. Effective trackers should demonstrate a high degree of generalizability across diverse scenarios. However, existing trackers struggle to accommodate all aspects or necessitate hypothesis and experimentation to customize the association information motion and or appearance for a given scenario, le…
▽ More
Multi-Object Tracking MOT encompasses various tracking scenarios, each characterized by unique traits. Effective trackers should demonstrate a high degree of generalizability across diverse scenarios. However, existing trackers struggle to accommodate all aspects or necessitate hypothesis and experimentation to customize the association information motion and or appearance for a given scenario, leading to narrowly tailored solutions with limited generalizability. In this paper, we investigate the factors that influence trackers generalization to different scenarios and concretize them into a set of tracking scenario attributes to guide the design of more generalizable trackers. Furthermore, we propose a point-wise to instance-wise relation framework for MOT, i.e., GeneralTrack, which can generalize across diverse scenarios while eliminating the need to balance motion and appearance. Thanks to its superior generalizability, our proposed GeneralTrack achieves state-of-the-art performance on multiple benchmarks and demonstrates the potential for domain generalization. https://github.com/qinzheng2000/GeneralTrack.git
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
BeACONS: A Blockchain-enabled Authentication and Communications Network for Scalable IoV
Authors:
Qi Shi,
Jingyi Sun,
Hanwei Fu,
Peizhe Fu,
Jiayuan Ma,
Hao Xu,
Erwu Liu
Abstract:
This paper introduces a novel blockchain-enabled authentication and communications network for scalable Internet of Vehicles, which aims to bolster security and confidentiality, diminish communications latency, and reduce dependence on centralised infrastructures like Certificate Authorities and Public Key Infrastructures by leveraging Blockchain-enabled Domain Name Services and Blockchain-enabled…
▽ More
This paper introduces a novel blockchain-enabled authentication and communications network for scalable Internet of Vehicles, which aims to bolster security and confidentiality, diminish communications latency, and reduce dependence on centralised infrastructures like Certificate Authorities and Public Key Infrastructures by leveraging Blockchain-enabled Domain Name Services and Blockchain-enabled Mutual Authentication. The proposed network is structured into a primary layer, consisting of Road Side Units and edge servers as servers of Blockchain-enabled Domain Name Services for managing inter-vehicle communications identities, and a sub-layer within each vehicle for intra-vehicle communications via the Blockchain-enabled Mutual Authentication Protocol. This design facilitates secure connections across vehicles by coordinating between the layers, significantly improving communications security and efficiency. This study also evaluates Road Side Unit availability against the random distribution of Road Side Units along the route of different vehicles. The proposed model presents a novel pathway towards a decentralised, secure, and efficient Internet of Vehicles ecosystem, contributing to the advancement of autonomous and trustworthy vehicular networks.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
Authors:
Chen Duan,
Pei Fu,
Shan Guo,
Qianyi Jiang,
Xiaoming Wei
Abstract:
In recent years, text-image joint pre-training techniques have shown promising results in various tasks. However, in Optical Character Recognition (OCR) tasks, aligning text instances with their corresponding text regions in images poses a challenge, as it requires effective alignment between text and OCR-Text (referring to the text in images as OCR-Text to distinguish from the text in natural lan…
▽ More
In recent years, text-image joint pre-training techniques have shown promising results in various tasks. However, in Optical Character Recognition (OCR) tasks, aligning text instances with their corresponding text regions in images poses a challenge, as it requires effective alignment between text and OCR-Text (referring to the text in images as OCR-Text to distinguish from the text in natural language) rather than a holistic understanding of the overall image content. In this paper, we propose a new pre-training method called OCR-Text Destylization Modeling (ODM) that transfers diverse styles of text found in images to a uniform style based on the text prompt. With ODM, we achieve better alignment between text and OCR-Text and enable pre-trained models to adapt to the complex and diverse styles of scene text detection and spotting tasks. Additionally, we have designed a new labeling generation method specifically for ODM and combined it with our proposed Text-Controller module to address the challenge of annotation costs in OCR tasks, allowing a larger amount of unlabeled data to participate in pre-training. Extensive experiments on multiple public datasets demonstrate that our method significantly improves performance and outperforms current pre-training methods in scene text detection and spotting tasks. Code is available at https://github.com/PriNing/ODM.
△ Less
Submitted 17 April, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
Are Large Language Models Table-based Fact-Checkers?
Authors:
Hangwen Zhang,
Qingyi Si,
Peng Fu,
Zheng Lin,
Weiping Wang
Abstract:
Table-based Fact Verification (TFV) aims to extract the entailment relation between statements and structured tables. Existing TFV methods based on small-scaled models suffer from insufficient labeled data and weak zero-shot ability. Recently, the appearance of Large Language Models (LLMs) has gained lots of attraction in research fields. They have shown powerful zero-shot and in-context learning…
▽ More
Table-based Fact Verification (TFV) aims to extract the entailment relation between statements and structured tables. Existing TFV methods based on small-scaled models suffer from insufficient labeled data and weak zero-shot ability. Recently, the appearance of Large Language Models (LLMs) has gained lots of attraction in research fields. They have shown powerful zero-shot and in-context learning abilities on several NLP tasks, but their potential on TFV is still unknown. In this work, we implement a preliminary study about whether LLMs are table-based fact-checkers. In detail, we design diverse prompts to explore how the in-context learning can help LLMs in TFV, i.e., zero-shot and few-shot TFV capability. Besides, we carefully design and construct TFV instructions to study the performance gain brought by the instruction tuning of LLMs. Experimental results demonstrate that LLMs can achieve acceptable results on zero-shot and few-shot TFV with prompt engineering, while instruction-tuning can stimulate the TFV capability significantly. We also make some valuable findings about the format of zero-shot prompts and the number of in-context examples. Finally, we analyze some possible directions to promote the accuracy of TFV via LLMs, which is beneficial to further research of table reasoning.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Object Attribute Matters in Visual Question Answering
Authors:
Peize Li,
Qingyi Si,
Peng Fu,
Zheng Lin,
Yan Wang
Abstract:
Visual question answering is a multimodal task that requires the joint comprehension of visual and textual information. However, integrating visual and textual semantics solely through attention layers is insufficient to comprehensively understand and align information from both modalities. Intuitively, object attributes can naturally serve as a bridge to unify them, which has been overlooked in p…
▽ More
Visual question answering is a multimodal task that requires the joint comprehension of visual and textual information. However, integrating visual and textual semantics solely through attention layers is insufficient to comprehensively understand and align information from both modalities. Intuitively, object attributes can naturally serve as a bridge to unify them, which has been overlooked in previous research. In this paper, we propose a novel VQA approach from the perspective of utilizing object attribute, aiming to achieve better object-level visual-language alignment and multimodal scene understanding. Specifically, we design an attribute fusion module and a contrastive knowledge distillation module. The attribute fusion module constructs a multimodal graph neural network to fuse attributes and visual features through message passing. The enhanced object-level visual features contribute to solving fine-grained problem like counting-question. The better object-level visual-language alignment aids in understanding multimodal scenes, thereby improving the model's robustness. Furthermore, to augment scene understanding and the out-of-distribution performance, the contrastive knowledge distillation module introduces a series of implicit knowledge. We distill knowledge into attributes through contrastive loss, which further strengthens the representation learning of attribute features and facilitates visual-linguistic alignment. Intensive experiments on six datasets, COCO-QA, VQAv2, VQA-CPv2, VQA-CPv1, VQAvs and TDIUC, show the superiority of the proposed method.
△ Less
Submitted 20 December, 2023;
originally announced January 2024.
-
A Spatio-Temporal Graph Convolutional Network for Gesture Recognition from High-Density Electromyography
Authors:
Wenjuan Zhong,
Yuyang Zhang,
Peiwen Fu,
Wenxuan Xiong,
Mingming Zhang
Abstract:
Accurate hand gesture prediction is crucial for effective upper-limb prosthetic limbs control. As the high flexibility and multiple degrees of freedom exhibited by human hands, there has been a growing interest in integrating deep networks with high-density surface electromyography (HD-sEMG) grids to enhance gesture recognition capabilities. However, many existing methods fall short in fully explo…
▽ More
Accurate hand gesture prediction is crucial for effective upper-limb prosthetic limbs control. As the high flexibility and multiple degrees of freedom exhibited by human hands, there has been a growing interest in integrating deep networks with high-density surface electromyography (HD-sEMG) grids to enhance gesture recognition capabilities. However, many existing methods fall short in fully exploit the specific spatial topology and temporal dependencies present in HD-sEMG data. Additionally, these studies are often limited number of gestures and lack generality. Hence, this study introduces a novel gesture recognition method, named STGCN-GR, which leverages spatio-temporal graph convolution networks for HD-sEMG-based human-machine interfaces. Firstly, we construct muscle networks based on functional connectivity between channels, creating a graph representation of HD-sEMG recordings. Subsequently, a temporal convolution module is applied to capture the temporal dependences in the HD-sEMG series and a spatial graph convolution module is employed to effectively learn the intrinsic spatial topology information among distinct HD-sEMG channels. We evaluate our proposed model on a public HD-sEMG dataset comprising a substantial number of gestures (i.e., 65). Our results demonstrate the remarkable capability of the STGCN-GR method, achieving an impressive accuracy of 91.07% in predicting gestures, which surpasses state-of-the-art deep learning methods applied to the same dataset.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Predicting Continuous Locomotion Modes via Multidimensional Feature Learning from sEMG
Authors:
Peiwen Fu,
Wenjuan Zhong,
Yuyang Zhang,
Wenxuan Xiong,
Yuzhou Lin,
Yanlong Tai,
Lin Meng,
Mingming Zhang
Abstract:
Walking-assistive devices require adaptive control methods to ensure smooth transitions between various modes of locomotion. For this purpose, detecting human locomotion modes (e.g., level walking or stair ascent) in advance is crucial for improving the intelligence and transparency of such robotic systems. This study proposes Deep-STF, a unified end-to-end deep learning model designed for integra…
▽ More
Walking-assistive devices require adaptive control methods to ensure smooth transitions between various modes of locomotion. For this purpose, detecting human locomotion modes (e.g., level walking or stair ascent) in advance is crucial for improving the intelligence and transparency of such robotic systems. This study proposes Deep-STF, a unified end-to-end deep learning model designed for integrated feature extraction in spatial, temporal, and frequency dimensions from surface electromyography (sEMG) signals. Our model enables accurate and robust continuous prediction of nine locomotion modes and 15 transitions at varying prediction time intervals, ranging from 100 to 500 ms. In addition, we introduced the concept of 'stable prediction time' as a distinct metric to quantify prediction efficiency. This term refers to the duration during which consistent and accurate predictions of mode transitions are made, measured from the time of the fifth correct prediction to the occurrence of the critical event leading to the task transition. This distinction between stable prediction time and prediction time is vital as it underscores our focus on the precision and reliability of mode transition predictions. Experimental results showcased Deep-STP's cutting-edge prediction performance across diverse locomotion modes and transitions, relying solely on sEMG data. When forecasting 100 ms ahead, Deep-STF surpassed CNN and other machine learning techniques, achieving an outstanding average prediction accuracy of 96.48%. Even with an extended 500 ms prediction horizon, accuracy only marginally decreased to 93.00%. The averaged stable prediction times for detecting next upcoming transitions spanned from 28.15 to 372.21 ms across the 100-500 ms time advances.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Revisiting the Knowledge Injection Frameworks
Authors:
Peng Fu,
Yiming Zhang,
Haobo Wang,
Weikang Qiu,
Junbo Zhao
Abstract:
In recent years, large language models (LLMs), such as GPTs, have attained great impact worldwide. However, how to adapt these LLMs to better suit the vertical domain-specific tasks by utilizing external knowledge remains not completely solved. Indeed, there have emerged a few works on this line where most of them rely on an alignment heuristic that is built to inject the corresponding knowledge t…
▽ More
In recent years, large language models (LLMs), such as GPTs, have attained great impact worldwide. However, how to adapt these LLMs to better suit the vertical domain-specific tasks by utilizing external knowledge remains not completely solved. Indeed, there have emerged a few works on this line where most of them rely on an alignment heuristic that is built to inject the corresponding knowledge tuple into the associated text sample.
However, despite the promise, we identify a pivotal problem in this work ubiquitously. Simply put, we find that injecting unaligned (i.e., random) knowledge tuple into the LLMs achieves comparable (and sometimes better) results than the aligned knowledge being injected. We therefore take a thorough investigation of this frustrating finding on a variety of related prior work and further provide a chain of potential interpretations for the phenomenon. Based on all that, we offer a simple remediated technique. Briefly, the core of this technique is rooted in an ideological emphasis on the pruning and purification of the external knowledge base to be injected into LLMs. At last, we show that by integrating this technique into most (if not all) knowledge injection frameworks and recent LLMs, it manages to overcome the aforementioned sanity problem and further pushes the boundary of the performance of the domain-adaptive LLMs.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Knowledge Extraction and Distillation from Large-Scale Image-Text Colonoscopy Records Leveraging Large Language and Vision Models
Authors:
Shuo Wang,
Yan Zhu,
Xiaoyuan Luo,
Zhiwei Yang,
Yizhe Zhang,
Peiyao Fu,
Manning Wang,
Zhijian Song,
Quanlin Li,
Pinghong Zhou,
Yike Guo
Abstract:
The development of artificial intelligence systems for colonoscopy analysis often necessitates expert-annotated image datasets. However, limitations in dataset size and diversity impede model performance and generalisation. Image-text colonoscopy records from routine clinical practice, comprising millions of images and text reports, serve as a valuable data source, though annotating them is labour…
▽ More
The development of artificial intelligence systems for colonoscopy analysis often necessitates expert-annotated image datasets. However, limitations in dataset size and diversity impede model performance and generalisation. Image-text colonoscopy records from routine clinical practice, comprising millions of images and text reports, serve as a valuable data source, though annotating them is labour-intensive. Here we leverage recent advancements in large language and vision models and propose EndoKED, a data mining paradigm for deep knowledge extraction and distillation. EndoKED automates the transformation of raw colonoscopy records into image datasets with pixel-level annotation. We validate EndoKED using multi-centre datasets of raw colonoscopy records (~1 million images), demonstrating its superior performance in training polyp detection and segmentation models. Furthermore, the EndoKED pre-trained vision backbone enables data-efficient and generalisable learning for optical biopsy, achieving expert-level performance in both retrospective and prospective validation.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
Enhancing cardiovascular risk prediction through AI-enabled calcium-omics
Authors:
Ammar Hoori,
Sadeer Al-Kindi,
Tao Hu,
Yingnan Song,
Hao Wu,
Juhwan Lee,
Nour Tashtish,
Pingfu Fu,
Robert Gilkeson,
Sanjay Rajagopalan,
David L. Wilson
Abstract:
Background. Coronary artery calcium (CAC) is a powerful predictor of major adverse cardiovascular events (MACE). Traditional Agatston score simply sums the calcium, albeit in a non-linear way, leaving room for improved calcification assessments that will more fully capture the extent of disease.
Objective. To determine if AI methods using detailed calcification features (i.e., calcium-omics) can…
▽ More
Background. Coronary artery calcium (CAC) is a powerful predictor of major adverse cardiovascular events (MACE). Traditional Agatston score simply sums the calcium, albeit in a non-linear way, leaving room for improved calcification assessments that will more fully capture the extent of disease.
Objective. To determine if AI methods using detailed calcification features (i.e., calcium-omics) can improve MACE prediction.
Methods. We investigated additional features of calcification including assessment of mass, volume, density, spatial distribution, territory, etc. We used a Cox model with elastic-net regularization on 2457 CT calcium score (CTCS) enriched for MACE events obtained from a large no-cost CLARIFY program (ClinicalTri-als.gov Identifier: NCT04075162). We employed sampling techniques to enhance model training. We also investigated Cox models with selected features to identify explainable high-risk characteristics.
Results. Our proposed calcium-omics model with modified synthetic down sampling and up sampling gave C-index (80.5%/71.6%) and two-year AUC (82.4%/74.8%) for (80:20, training/testing), respectively (sampling was applied to the training set only). Results compared favorably to Agatston which gave C-index (71.3%/70.3%) and AUC (71.8%/68.8%), respectively. Among calcium-omics features, numbers of calcifications, LAD mass, and diffusivity (a measure of spatial distribution) were important determinants of increased risk, with dense calcification (>1000HU) associated with lower risk. The calcium-omics model reclassified 63% of MACE patients to the high risk group in a held-out test. The categorical net-reclassification index was NRI=0.153.
Conclusions. AI analysis of coronary calcification can lead to improved results as compared to Agatston scoring. Our findings suggest the utility of calcium-omics in improved prediction of risk.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
Towards an induction principle for nested data types
Authors:
Peng Fu,
Peter Selinger
Abstract:
A well-known problem in the theory of dependent types is how to handle so-called nested data types. These data types are difficult to program and to reason about in total dependently typed languages such as Agda and Coq. In particular, it is not easy to derive a canonical induction principle for such types. Working towards a solution to this problem, we introduce dependently typed folds for nested…
▽ More
A well-known problem in the theory of dependent types is how to handle so-called nested data types. These data types are difficult to program and to reason about in total dependently typed languages such as Agda and Coq. In particular, it is not easy to derive a canonical induction principle for such types. Working towards a solution to this problem, we introduce dependently typed folds for nested data types. Using the nested data type Bush as a guiding example, we show how to derive its dependently typed fold and induction principle. We also discuss the relationship between dependently typed folds and the more traditional higher-order folds.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
Marine Microalgae Detection in Microscopy Images: A New Dataset
Authors:
Shizheng Zhou,
Juntao Jiang,
Xiaohan Hong,
Yajun Fang,
Yan Hong,
Pengcheng Fu
Abstract:
Marine microalgae are widespread in the ocean and play a crucial role in the ecosystem. Automatic identification and location of marine microalgae in microscopy images would help establish marine ecological environment monitoring and water quality evaluation system. A new dataset for marine microalgae detection is proposed in this paper. Six classes of microalgae commonlyfound in the ocean (Bacill…
▽ More
Marine microalgae are widespread in the ocean and play a crucial role in the ecosystem. Automatic identification and location of marine microalgae in microscopy images would help establish marine ecological environment monitoring and water quality evaluation system. A new dataset for marine microalgae detection is proposed in this paper. Six classes of microalgae commonlyfound in the ocean (Bacillariophyta, Chlorella pyrenoidosa, Platymonas, Dunaliella salina, Chrysophyta, Symbiodiniaceae) are microscopically imaged in real-time. Images of Symbiodiniaceae in three physiological states known as normal, bleaching, and translating are also included. We annotated these images with bounding boxes using Labelme software and split them into the training and testing sets. The total number of images in the dataset is 937 and all the objects in these images were annotated. The total number of annotated objects is 4201. The training set contains 537 images and the testing set contains 430 images. Baselines of different object detection algorithms are trained, validated and tested on this dataset. This data set can be got accessed via tianchi.aliyun.com/competition/entrance/532036/information.
△ Less
Submitted 14 November, 2022;
originally announced November 2022.
-
Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering
Authors:
Qingyi Si,
Yuanxin Liu,
Zheng Lin,
Peng Fu,
Weiping Wang
Abstract:
Despite the excellent performance of vision-language pre-trained models (VLPs) on conventional VQA task, they still suffer from two problems: First, VLPs tend to rely on language biases in datasets and fail to generalize to out-of-distribution (OOD) data. Second, they are inefficient in terms of memory footprint and computation. Although promising progress has been made in both problems, most exis…
▽ More
Despite the excellent performance of vision-language pre-trained models (VLPs) on conventional VQA task, they still suffer from two problems: First, VLPs tend to rely on language biases in datasets and fail to generalize to out-of-distribution (OOD) data. Second, they are inefficient in terms of memory footprint and computation. Although promising progress has been made in both problems, most existing works tackle them independently. To facilitate the application of VLP to VQA tasks, it is imperative to jointly study VLP compression and OOD robustness, which, however, has not yet been explored. This paper investigates whether a VLP can be compressed and debiased simultaneously by searching sparse and robust subnetworks. To this end, we systematically study the design of a training and compression pipeline to search the subnetworks, as well as the assignment of sparsity to different modality-specific modules. Our experiments involve 3 VLPs, 2 compression methods, 4 training methods, 2 datasets and a range of sparsity levels and random seeds. Our results show that there indeed exist sparse and robust subnetworks, which are competitive with the debiased full VLP and clearly outperform the debiasing SoTAs with fewer parameters on OOD datasets VQA-CP v2 and VQA-VS. The codes can be found at https://github.com/PhoebusSi/Compress-Robust-VQA.
△ Less
Submitted 11 October, 2023; v1 submitted 26 October, 2022;
originally announced October 2022.
-
Question-Interlocutor Scope Realized Graph Modeling over Key Utterances for Dialogue Reading Comprehension
Authors:
Jiangnan Li,
Mo Yu,
Fandong Meng,
Zheng Lin,
Peng Fu,
Weiping Wang,
Jie Zhou
Abstract:
In this work, we focus on dialogue reading comprehension (DRC), a task extracting answer spans for questions from dialogues. Dialogue context modeling in DRC is tricky due to complex speaker information and noisy dialogue context. To solve the two problems, previous research proposes two self-supervised tasks respectively: guessing who a randomly masked speaker is according to the dialogue and pre…
▽ More
In this work, we focus on dialogue reading comprehension (DRC), a task extracting answer spans for questions from dialogues. Dialogue context modeling in DRC is tricky due to complex speaker information and noisy dialogue context. To solve the two problems, previous research proposes two self-supervised tasks respectively: guessing who a randomly masked speaker is according to the dialogue and predicting which utterance in the dialogue contains the answer. Although these tasks are effective, there are still urging problems: (1) randomly masking speakers regardless of the question cannot map the speaker mentioned in the question to the corresponding speaker in the dialogue, and ignores the speaker-centric nature of utterances. This leads to wrong answer extraction from utterances in unrelated interlocutors' scopes; (2) the single utterance prediction, preferring utterances similar to the question, is limited in finding answer-contained utterances not similar to the question. To alleviate these problems, we first propose a new key utterances extracting method. It performs prediction on the unit formed by several contiguous utterances, which can realize more answer-contained utterances. Based on utterances in the extracted units, we then propose Question-Interlocutor Scope Realized Graph (QuISG) modeling. As a graph constructed on the text of utterances, QuISG additionally involves the question and question-mentioning speaker names as nodes. To realize interlocutor scopes, speakers in the dialogue are connected with the words in their corresponding utterances. Experiments on the benchmarks show that our method can achieve better and competitive results against previous works.
△ Less
Submitted 26 October, 2022;
originally announced October 2022.
-
A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models
Authors:
Yuanxin Liu,
Fandong Meng,
Zheng Lin,
Jiangnan Li,
Peng Fu,
Yanan Cao,
Weiping Wang,
Jie Zhou
Abstract:
Despite the remarkable success of pre-trained language models (PLMs), they still face two challenges: First, large-scale PLMs are inefficient in terms of memory footprint and computation. Second, on the downstream tasks, PLMs tend to rely on the dataset bias and struggle to generalize to out-of-distribution (OOD) data. In response to the efficiency problem, recent studies show that dense PLMs can…
▽ More
Despite the remarkable success of pre-trained language models (PLMs), they still face two challenges: First, large-scale PLMs are inefficient in terms of memory footprint and computation. Second, on the downstream tasks, PLMs tend to rely on the dataset bias and struggle to generalize to out-of-distribution (OOD) data. In response to the efficiency problem, recent studies show that dense PLMs can be replaced with sparse subnetworks without hurting the performance. Such subnetworks can be found in three scenarios: 1) the fine-tuned PLMs, 2) the raw PLMs and then fine-tuned in isolation, and even inside 3) PLMs without any parameter fine-tuning. However, these results are only obtained in the in-distribution (ID) setting. In this paper, we extend the study on PLMs subnetworks to the OOD setting, investigating whether sparsity and robustness to dataset bias can be achieved simultaneously. To this end, we conduct extensive experiments with the pre-trained BERT model on three natural language understanding (NLU) tasks. Our results demonstrate that \textbf{sparse and robust subnetworks (SRNets) can consistently be found in BERT}, across the aforementioned three scenarios, using different training and compression methods. Furthermore, we explore the upper bound of SRNets using the OOD information and show that \textbf{there exist sparse and almost unbiased BERT subnetworks}. Finally, we present 1) an analytical study that provides insights on how to promote the efficiency of SRNets searching process and 2) a solution to improve subnetworks' performance at high sparsity. The code is available at https://github.com/llyx97/sparse-and-robust-PLM.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut Learning in VQA
Authors:
Qingyi Si,
Fandong Meng,
Mingyu Zheng,
Zheng Lin,
Yuanxin Liu,
Peng Fu,
Yanan Cao,
Weiping Wang,
Jie Zhou
Abstract:
Visual Question Answering (VQA) models are prone to learn the shortcut solution formed by dataset biases rather than the intended solution. To evaluate the VQA models' reasoning ability beyond shortcut learning, the VQA-CP v2 dataset introduces a distribution shift between the training and test set given a question type. In this way, the model cannot use the training set shortcut (from question ty…
▽ More
Visual Question Answering (VQA) models are prone to learn the shortcut solution formed by dataset biases rather than the intended solution. To evaluate the VQA models' reasoning ability beyond shortcut learning, the VQA-CP v2 dataset introduces a distribution shift between the training and test set given a question type. In this way, the model cannot use the training set shortcut (from question type to answer) to perform well on the test set. However, VQA-CP v2 only considers one type of shortcut and thus still cannot guarantee that the model relies on the intended solution rather than a solution specific to this shortcut. To overcome this limitation, we propose a new dataset that considers varying types of shortcuts by constructing different distribution shifts in multiple OOD test sets. In addition, we overcome the three troubling practices in the use of VQA-CP v2, e.g., selecting models using OOD test sets, and further standardize OOD evaluation procedure. Our benchmark provides a more rigorous and comprehensive testbed for shortcut learning in VQA. We benchmark recent methods and find that methods specifically designed for particular shortcuts fail to simultaneously generalize to our varying OOD test sets. We also systematically study the varying shortcuts and provide several valuable findings, which may promote the exploration of shortcut learning in VQA.
△ Less
Submitted 10 October, 2022;
originally announced October 2022.
-
Towards Robust Visual Question Answering: Making the Most of Biased Samples via Contrastive Learning
Authors:
Qingyi Si,
Yuanxin Liu,
Fandong Meng,
Zheng Lin,
Peng Fu,
Yanan Cao,
Weiping Wang,
Jie Zhou
Abstract:
Models for Visual Question Answering (VQA) often rely on the spurious correlations, i.e., the language priors, that appear in the biased samples of training set, which make them brittle against the out-of-distribution (OOD) test data. Recent methods have achieved promising progress in overcoming this problem by reducing the impact of biased samples on model training. However, these models reveal a…
▽ More
Models for Visual Question Answering (VQA) often rely on the spurious correlations, i.e., the language priors, that appear in the biased samples of training set, which make them brittle against the out-of-distribution (OOD) test data. Recent methods have achieved promising progress in overcoming this problem by reducing the impact of biased samples on model training. However, these models reveal a trade-off that the improvements on OOD data severely sacrifice the performance on the in-distribution (ID) data (which is dominated by the biased samples). Therefore, we propose a novel contrastive learning approach, MMBS, for building robust VQA models by Making the Most of Biased Samples. Specifically, we construct positive samples for contrastive learning by eliminating the information related to spurious correlation from the original training samples and explore several strategies to use the constructed positive samples for training. Instead of undermining the importance of biased samples in model training, our approach precisely exploits the biased samples for unbiased information that contributes to reasoning. The proposed method is compatible with various VQA backbones. We validate our contributions by achieving competitive performance on the OOD dataset VQA-CP v2 while preserving robust performance on the ID dataset VQA v2.
△ Less
Submitted 10 October, 2022;
originally announced October 2022.
-
Novel Radiomic Measurements of Tumor- Associated Vasculature Morphology on Clinical Imaging as a Biomarker of Treatment Response in Multiple Cancers
Authors:
Nathaniel Braman,
Prateek Prasanna,
Kaustav Bera,
Mehdi Alilou,
Mohammadhadi Khorrami,
Patrick Leo,
Maryam Etesami,
Manasa Vulchi,
Paulette Turk,
Amit Gupta,
Prantesh Jain,
Pingfu Fu,
Nathan Pennell,
Vamsidhar Velcheti,
Jame Abraham,
Donna Plecha,
Anant Madabhushi
Abstract:
Purpose: Tumor-associated vasculature differs from healthy blood vessels by its chaotic architecture and twistedness, which promotes treatment resistance. Measurable differences in these attributes may help stratify patients by likely benefit of systemic therapy (e.g. chemotherapy). In this work, we present a new category of radiomic biomarkers called quantitative tumor-associated vasculature (Qua…
▽ More
Purpose: Tumor-associated vasculature differs from healthy blood vessels by its chaotic architecture and twistedness, which promotes treatment resistance. Measurable differences in these attributes may help stratify patients by likely benefit of systemic therapy (e.g. chemotherapy). In this work, we present a new category of radiomic biomarkers called quantitative tumor-associated vasculature (QuanTAV) features, and demonstrate their ability to predict response and survival across multiple cancers, imaging modalities, and treatment regimens.
Experimental Design: We segmented tumor vessels and computed mathematical measurements of twistedness and organization on routine pre-treatment radiology (CT or contrast-enhanced MRI) from 558 patients, who received one of four first-line chemotherapy-based therapeutic intervention strategies for breast (n=371) or non-small cell lung cancer (NSCLC, n=187).
Results: Across 4 chemotherapy-based treatment strategies, classifiers of QuanTAV measurements significantly (p<.05) predicted response in held out testing cohorts alone (AUC=0.63-0.71) and increased AUC by 0.06-0.12 when added to models of significant clinical variables alone. QuanTAV risk scores were prognostic of recurrence free survival in treatment cohorts chemotherapy for breast cancer (p=0.002, HR=1.25, 95% CI 1.08-1.44, C-index=.66) and chemoradiation for NSCLC (p=0.039, HR=1.28, 95% CI 1.01-1.62, C-index=0.66). Categorical QuanTAV risk groups were independently prognostic among all treatment groups, including NSCLC patients receiving chemotherapy (p=0.034, HR=2.29, 95% CI 1.07-4.94, C-index=0.62).
Conclusions: Across these domains, we observed an association of vascular morphology on radiology with treatment outcome. Our findings suggest the potential of tumor-associated vasculature shape and structure as a prognostic and predictive biomarker for multiple cancers and treatments.
△ Less
Submitted 5 October, 2022;
originally announced October 2022.
-
Transfer Learning and Vision Transformer based State-of-Health prediction of Lithium-Ion Batteries
Authors:
Pengyu Fu,
Liang Chu,
Zhuoran Hou,
Jincheng Hu,
Yanjun Huang,
Yuanjian Zhang
Abstract:
In recent years, significant progress has been made in transportation electrification. And lithium-ion batteries (LIB), as the main energy storage devices, have received widespread attention. Accurately predicting the state of health (SOH) can not only ease the anxiety of users about the battery life but also provide important information for the management of the battery. This paper presents a pr…
▽ More
In recent years, significant progress has been made in transportation electrification. And lithium-ion batteries (LIB), as the main energy storage devices, have received widespread attention. Accurately predicting the state of health (SOH) can not only ease the anxiety of users about the battery life but also provide important information for the management of the battery. This paper presents a prediction method for SOH based on Vision Transformer (ViT) model. First, discrete charging data of a predefined voltage range is used as an input data matrix. Then, the cycle features of the battery are captured by the ViT which can obtain the global features, and the SOH is obtained by combining the cycle features with the full connection (FC) layer. At the same time, transfer learning (TL) is introduced, and the prediction model based on source task battery training is further fine-tuned according to the early cycle data of the target task battery to provide an accurate prediction. Experiments show that our method can obtain better feature expression compared with existing deep learning methods so that better prediction effect and transfer effect can be achieved.
△ Less
Submitted 7 September, 2022;
originally announced September 2022.
-
A Transferable Intersection Reconstruction Network for Traffic Speed Prediction
Authors:
Pengyu Fu,
Liang Chu,
Zhuoran Hou,
Jincheng Hu,
Yanjun Huang,
Yuanjian Zhang
Abstract:
Traffic speed prediction is the key to many valuable applications, and it is also a challenging task because of its various influencing factors. Recent work attempts to obtain more information through various hybrid models, thereby improving the prediction accuracy. However, the spatial information acquisition schemes of these methods have two-level differentiation problems. Either the modeling is…
▽ More
Traffic speed prediction is the key to many valuable applications, and it is also a challenging task because of its various influencing factors. Recent work attempts to obtain more information through various hybrid models, thereby improving the prediction accuracy. However, the spatial information acquisition schemes of these methods have two-level differentiation problems. Either the modeling is simple but contains little spatial information, or the modeling is complete but lacks flexibility. In order to introduce more spatial information on the basis of ensuring flexibility, this paper proposes IRNet (Transferable Intersection Reconstruction Network). First, this paper reconstructs the intersection into a virtual intersection with the same structure, which simplifies the topology of the road network. Then, the spatial information is subdivided into intersection information and sequence information of traffic flow direction, and spatiotemporal features are obtained through various models. Third, a self-attention mechanism is used to fuse spatiotemporal features for prediction. In the comparison experiment with the baseline, not only the prediction effect, but also the transfer performance has obvious advantages.
△ Less
Submitted 22 July, 2022;
originally announced July 2022.
-
On the Lambek embedding and the category of product-preserving presheaves
Authors:
Peng Fu,
Kohei Kishida,
Neil J. Ross,
Peter Selinger
Abstract:
It is well-known that the category of presheaf functors is complete and cocomplete, and that the Yoneda embedding into the presheaf category preserves products. However, the Yoneda embedding does not preserve coproducts. It is perhaps less well-known that if we restrict the codomain of the Yoneda embedding to the full subcategory of limit-preserving functors, then this embedding preserves colimits…
▽ More
It is well-known that the category of presheaf functors is complete and cocomplete, and that the Yoneda embedding into the presheaf category preserves products. However, the Yoneda embedding does not preserve coproducts. It is perhaps less well-known that if we restrict the codomain of the Yoneda embedding to the full subcategory of limit-preserving functors, then this embedding preserves colimits, while still enjoying most of the other useful properties of the Yoneda embedding. We call this modified embedding the Lambek embedding. The category of limit-preserving functors is known to be a reflective subcategory of the category of all functors, i.e., there is a left adjoint for the inclusion functor. In the literature, the existence of this left adjoint is often proved non-constructively, e.g., by an application of Freyd's adjoint functor theorem. In this paper, we provide an alternative, more constructive proof of this fact. We first explain the Lambek embedding and why it preserves coproducts. Then we review some concepts from multi-sorted algebras and observe that there is a one-to-one correspondence between product-preserving presheaves and certain multi-sorted term algebras. We provide a construction that freely turns any presheaf functor into a product-preserving one, hence giving an explicit definition of the left adjoint functor of the inclusion. Finally, we sketch how to extend our method to prove that the subcategory of limit-preserving functors is also reflective.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
Neutral Utterances are Also Causes: Enhancing Conversational Causal Emotion Entailment with Social Commonsense Knowledge
Authors:
Jiangnan Li,
Fandong Meng,
Zheng Lin,
Rui Liu,
Peng Fu,
Yanan Cao,
Weiping Wang,
Jie Zhou
Abstract:
Conversational Causal Emotion Entailment aims to detect causal utterances for a non-neutral targeted utterance from a conversation. In this work, we build conversations as graphs to overcome implicit contextual modelling of the original entailment style. Following the previous work, we further introduce the emotion information into graphs. Emotion information can markedly promote the detection of…
▽ More
Conversational Causal Emotion Entailment aims to detect causal utterances for a non-neutral targeted utterance from a conversation. In this work, we build conversations as graphs to overcome implicit contextual modelling of the original entailment style. Following the previous work, we further introduce the emotion information into graphs. Emotion information can markedly promote the detection of causal utterances whose emotion is the same as the targeted utterance. However, it is still hard to detect causal utterances with different emotions, especially neutral ones. The reason is that models are limited in reasoning causal clues and passing them between utterances. To alleviate this problem, we introduce social commonsense knowledge (CSK) and propose a Knowledge Enhanced Conversation graph (KEC). KEC propagates the CSK between two utterances. As not all CSK is emotionally suitable for utterances, we therefore propose a sentiment-realized knowledge selecting strategy to filter CSK. To process KEC, we further construct the Knowledge Enhanced Directed Acyclic Graph networks. Experimental results show that our method outperforms baselines and infers more causes with different emotions from the targeted utterance.
△ Less
Submitted 7 May, 2022; v1 submitted 2 May, 2022;
originally announced May 2022.
-
Proto-Quipper with dynamic lifting
Authors:
Peng Fu,
Kohei Kishida,
Neil J. Ross,
Peter Selinger
Abstract:
Quipper is a functional programming language for quantum computing. Proto-Quipper is a family of languages aiming to provide a formal foundation for Quipper. In this paper, we extend Proto-Quipper-M with a construct called dynamic lifting, which is present in Quipper. By virtue of being a circuit description language, Proto-Quipper has two separate runtimes: circuit generation time and circuit exe…
▽ More
Quipper is a functional programming language for quantum computing. Proto-Quipper is a family of languages aiming to provide a formal foundation for Quipper. In this paper, we extend Proto-Quipper-M with a construct called dynamic lifting, which is present in Quipper. By virtue of being a circuit description language, Proto-Quipper has two separate runtimes: circuit generation time and circuit execution time. Values that are known at circuit generation time are called parameters, and values that are known at circuit execution time are called states. Dynamic lifting is an operation that enables a state, such as the result of a measurement, to be lifted to a parameter, where it can influence the generation of the next portion of the circuit. As a result, dynamic lifting enables Proto-Quipper programs to interleave classical and quantum computation. We describe the syntax of a language we call Proto-Quipper-Dyn. Its type system uses a system of modalities to keep track of the use of dynamic lifting. We also provide an operational semantics, as well as an abstract categorical semantics for dynamic lifting based on enriched category theory. We prove that both the type system and the operational semantics are sound with respect to our categorical semantics. Finally, we give some examples of Proto-Quipper-Dyn programs that make essential use of dynamic lifting.
△ Less
Submitted 8 November, 2022; v1 submitted 27 April, 2022;
originally announced April 2022.
-
A Biset-Enriched Categorical Model for Proto-Quipper with Dynamic Lifting
Authors:
Peng Fu,
Kohei Kishida,
Neil J. Ross,
Peter Selinger
Abstract:
Quipper and Proto-Quipper are a family of quantum programming languages that, by their nature as circuit description languages, involve two runtimes: one at which the program generates a circuit and one at which the circuit is executed, normally with probabilistic results due to measurements. Accordingly, the language distinguishes two kinds of data: parameters, which are known at circuit generati…
▽ More
Quipper and Proto-Quipper are a family of quantum programming languages that, by their nature as circuit description languages, involve two runtimes: one at which the program generates a circuit and one at which the circuit is executed, normally with probabilistic results due to measurements. Accordingly, the language distinguishes two kinds of data: parameters, which are known at circuit generation time, and states, which are known at circuit execution time. Sometimes, it is desirable for the results of measurements to control the generation of the next part of the circuit. Therefore, the language needs to turn states, such as measurement outcomes, into parameters, an operation we call dynamic lifting. The goal of this paper is to model this interaction between the runtimes by providing a general categorical structure enriched in what we call "bisets". We demonstrate that the biset-enriched structure achieves a proper semantics of the two runtimes and their interaction, by showing that it models a variant of Proto-Quipper with dynamic lifting. The present paper deals with the concrete categorical semantics of this language, whereas a companion paper deals with the syntax, type system, operational semantics, and abstract categorical semantics.
△ Less
Submitted 15 November, 2023; v1 submitted 27 April, 2022;
originally announced April 2022.
-
Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training
Authors:
Yuanxin Liu,
Fandong Meng,
Zheng Lin,
Peng Fu,
Yanan Cao,
Weiping Wang,
Jie Zhou
Abstract:
Recent studies on the lottery ticket hypothesis (LTH) show that pre-trained language models (PLMs) like BERT contain matching subnetworks that have similar transfer learning performance as the original PLM. These subnetworks are found using magnitude-based pruning. In this paper, we find that the BERT subnetworks have even more potential than these studies have shown. Firstly, we discover that the…
▽ More
Recent studies on the lottery ticket hypothesis (LTH) show that pre-trained language models (PLMs) like BERT contain matching subnetworks that have similar transfer learning performance as the original PLM. These subnetworks are found using magnitude-based pruning. In this paper, we find that the BERT subnetworks have even more potential than these studies have shown. Firstly, we discover that the success of magnitude pruning can be attributed to the preserved pre-training performance, which correlates with the downstream transferability. Inspired by this, we propose to directly optimize the subnetwork structure towards the pre-training objectives, which can better preserve the pre-training performance. Specifically, we train binary masks over model weights on the pre-training tasks, with the aim of preserving the universal transferability of the subnetwork, which is agnostic to any specific downstream tasks. We then fine-tune the subnetworks on the GLUE benchmark and the SQuAD dataset. The results show that, compared with magnitude pruning, mask training can effectively find BERT subnetworks with improved overall performance on downstream tasks. Moreover, our method is also more efficient in searching subnetworks and more advantageous when fine-tuning within a certain range of data scarcity. Our code is available at https://github.com/llyx97/TAMT.
△ Less
Submitted 29 May, 2022; v1 submitted 24 April, 2022;
originally announced April 2022.
-
6GAN: IPv6 Multi-Pattern Target Generation via Generative Adversarial Nets with Reinforcement Learning
Authors:
Tianyu Cui,
Gaopeng Gou,
Gang Xiong,
Chang Liu,
Peipei Fu,
Zhen Li
Abstract:
Global IPv6 scanning has always been a challenge for researchers because of the limited network speed and computational power. Target generation algorithms are recently proposed to overcome the problem for Internet assessments by predicting a candidate set to scan. However, IPv6 custom address configuration emerges diverse addressing patterns discouraging algorithmic inference. Widespread IPv6 ali…
▽ More
Global IPv6 scanning has always been a challenge for researchers because of the limited network speed and computational power. Target generation algorithms are recently proposed to overcome the problem for Internet assessments by predicting a candidate set to scan. However, IPv6 custom address configuration emerges diverse addressing patterns discouraging algorithmic inference. Widespread IPv6 alias could also mislead the algorithm to discover aliased regions rather than valid host targets. In this paper, we introduce 6GAN, a novel architecture built with Generative Adversarial Net (GAN) and reinforcement learning for multi-pattern target generation. 6GAN forces multiple generators to train with a multi-class discriminator and an alias detector to generate non-aliased active targets with different addressing pattern types. The rewards from the discriminator and the alias detector help supervise the address sequence decision-making process. After adversarial training, 6GAN's generators could keep a strong imitating ability for each pattern and 6GAN's discriminator obtains outstanding pattern discrimination ability with a 0.966 accuracy. Experiments indicate that our work outperformed the state-of-the-art target generation algorithms by reaching a higher-quality candidate set.
△ Less
Submitted 20 April, 2022;
originally announced April 2022.
-
Deep Learning-Accelerated 3D Carbon Storage Reservoir Pressure Forecasting Based on Data Assimilation Using Surface Displacement from InSAR
Authors:
Hewei Tang,
Pengcheng Fu,
Honggeun Jo,
Su Jiang,
Christopher S. Sherman,
François Hamon,
Nicholas A. Azzolina,
Joseph P. Morris
Abstract:
Fast forecasting of reservoir pressure distribution in geologic carbon storage (GCS) by assimilating monitoring data is a challenging problem. Due to high drilling cost, GCS projects usually have spatially sparse measurements from wells, leading to high uncertainties in reservoir pressure prediction. To address this challenge, we propose to use low-cost Interferometric Synthetic-Aperture Radar (In…
▽ More
Fast forecasting of reservoir pressure distribution in geologic carbon storage (GCS) by assimilating monitoring data is a challenging problem. Due to high drilling cost, GCS projects usually have spatially sparse measurements from wells, leading to high uncertainties in reservoir pressure prediction. To address this challenge, we propose to use low-cost Interferometric Synthetic-Aperture Radar (InSAR) data as monitoring data to infer reservoir pressure build up. We develop a deep learning-accelerated workflow to assimilate surface displacement maps interpreted from InSAR and to forecast dynamic reservoir pressure. Employing an Ensemble Smoother Multiple Data Assimilation (ES-MDA) framework, the workflow updates three-dimensional (3D) geologic properties and predicts reservoir pressure with quantified uncertainties. We use a synthetic commercial-scale GCS model with bimodally distributed permeability and porosity to demonstrate the efficacy of the workflow. A two-step CNN-PCA approach is employed to parameterize the bimodal fields. The computational efficiency of the workflow is boosted by two residual U-Net based surrogate models for surface displacement and reservoir pressure predictions, respectively. The workflow can complete data assimilation and reservoir pressure forecasting in half an hour on a personal computer.
△ Less
Submitted 26 January, 2022; v1 submitted 21 January, 2022;
originally announced January 2022.
-
Machine learning-based porosity estimation from spectral decomposed seismic data
Authors:
Honggeun Jo,
Yongchae Cho,
Michael J. Pyrcz,
Hewei Tang,
Pengcheng Fu
Abstract:
Estimating porosity models via seismic data is challenging due to the signal noise and insufficient resolution of seismic data. Although impedance inversion is often used by combining with well logs, several hurdles remain to retrieve sub-seismic scale porosity. As an alternative, we propose a machine learning-based workflow to convert seismic data to porosity models. A ResUNet++ based workflow is…
▽ More
Estimating porosity models via seismic data is challenging due to the signal noise and insufficient resolution of seismic data. Although impedance inversion is often used by combining with well logs, several hurdles remain to retrieve sub-seismic scale porosity. As an alternative, we propose a machine learning-based workflow to convert seismic data to porosity models. A ResUNet++ based workflow is designed to take three seismic data in different frequencies (i.e., decomposed seismic data) and estimate their corresponding porosity model. The workflow is successfully demonstrated in the 3D channelized reservoir to estimate the porosity model with more than 0.9 in R2 score for training and validating data. Moreover, the application is extended for a stress test by adding signal noise to the seismic data, and the workflow results show a robust estimation even with 5\% of noise. Another two ResUNet++ are trained to take either the lowest or highest resolution seismic data only to estimate the porosity model, but they show under- and over-fitting results, supporting the importance of using decomposed seismic data in porosity estimation.
△ Less
Submitted 22 November, 2021;
originally announced November 2021.
-
ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D Data
Authors:
Gilad Baruch,
Zhuoyuan Chen,
Afshin Dehghan,
Tal Dimry,
Yuri Feigin,
Peter Fu,
Thomas Gebauer,
Brandon Joffe,
Daniel Kurz,
Arik Schwartz,
Elad Shulman
Abstract:
Scene understanding is an active research area. Commercial depth sensors, such as Kinect, have enabled the release of several RGB-D datasets over the past few years which spawned novel methods in 3D scene understanding. More recently with the launch of the LiDAR sensor in Apple's iPads and iPhones, high quality RGB-D data is accessible to millions of people on a device they commonly use. This open…
▽ More
Scene understanding is an active research area. Commercial depth sensors, such as Kinect, have enabled the release of several RGB-D datasets over the past few years which spawned novel methods in 3D scene understanding. More recently with the launch of the LiDAR sensor in Apple's iPads and iPhones, high quality RGB-D data is accessible to millions of people on a device they commonly use. This opens a whole new era in scene understanding for the Computer Vision community as well as app developers. The fundamental research in scene understanding together with the advances in machine learning can now impact people's everyday experiences. However, transforming these scene understanding methods to real-world experiences requires additional innovation and development. In this paper we introduce ARKitScenes. It is not only the first RGB-D dataset that is captured with a now widely available depth sensor, but to our best knowledge, it also is the largest indoor scene understanding data released. In addition to the raw and processed data from the mobile device, ARKitScenes includes high resolution depth maps captured using a stationary laser scanner, as well as manually labeled 3D oriented bounding boxes for a large taxonomy of furniture. We further analyze the usefulness of the data for two downstream tasks: 3D object detection and color-guided depth upsampling. We demonstrate that our dataset can help push the boundaries of existing state-of-the-art methods and it introduces new challenges that better represent real-world scenarios.
△ Less
Submitted 12 January, 2022; v1 submitted 16 November, 2021;
originally announced November 2021.
-
Check It Again: Progressive Visual Question Answering via Visual Entailment
Authors:
Qingyi Si,
Zheng Lin,
Mingyu Zheng,
Peng Fu,
Weiping Wang
Abstract:
While sophisticated Visual Question Answering models have achieved remarkable success, they tend to answer questions only according to superficial correlations between question and answer. Several recent approaches have been developed to address this language priors problem. However, most of them predict the correct answer according to one best output without checking the authenticity of answers.…
▽ More
While sophisticated Visual Question Answering models have achieved remarkable success, they tend to answer questions only according to superficial correlations between question and answer. Several recent approaches have been developed to address this language priors problem. However, most of them predict the correct answer according to one best output without checking the authenticity of answers. Besides, they only explore the interaction between image and question, ignoring the semantics of candidate answers. In this paper, we propose a select-and-rerank (SAR) progressive framework based on Visual Entailment. Specifically, we first select the candidate answers relevant to the question or the image, then we rerank the candidate answers by a visual entailment task, which verifies whether the image semantically entails the synthetic statement of the question and each candidate answer. Experimental results show the effectiveness of our proposed framework, which establishes a new state-of-the-art accuracy on VQA-CP v2 with a 7.55% improvement.
△ Less
Submitted 8 June, 2021;
originally announced June 2021.
-
A Deep Learning-Accelerated Data Assimilation and Forecasting Workflow for Commercial-Scale Geologic Carbon Storage
Authors:
Hewei Tang,
Pengcheng Fu,
Christopher S. Sherman,
Jize Zhang,
Xin Ju,
François Hamon,
Nicholas A. Azzolina,
Matthew Burton-Kelly,
Joseph P. Morris
Abstract:
Fast assimilation of monitoring data to update forecasts of pressure buildup and carbon dioxide (CO2) plume migration under geologic uncertainties is a challenging problem in geologic carbon storage. The high computational cost of data assimilation with a high-dimensional parameter space impedes fast decision-making for commercial-scale reservoir management. We propose to leverage physical underst…
▽ More
Fast assimilation of monitoring data to update forecasts of pressure buildup and carbon dioxide (CO2) plume migration under geologic uncertainties is a challenging problem in geologic carbon storage. The high computational cost of data assimilation with a high-dimensional parameter space impedes fast decision-making for commercial-scale reservoir management. We propose to leverage physical understandings of porous medium flow behavior with deep learning techniques to develop a fast history matching-reservoir response forecasting workflow. Applying an Ensemble Smoother Multiple Data Assimilation framework, the workflow updates geologic properties and predicts reservoir performance with quantified uncertainty from pressure history and CO2 plumes interpreted through seismic inversion. As the most computationally expensive component in such a workflow is reservoir simulation, we developed surrogate models to predict dynamic pressure and CO2 plume extents under multi-well injection. The surrogate models employ deep convolutional neural networks, specifically, a wide residual network and a residual U-Net. The workflow is validated against a flat three-dimensional reservoir model representative of a clastic shelf depositional environment. Intelligent treatments are applied to bridge between quantities in a true-3D reservoir model and those in a single-layer reservoir model. The workflow can complete history matching and reservoir forecasting with uncertainty quantification in less than one hour on a mainstream personal workstation.
△ Less
Submitted 10 January, 2022; v1 submitted 9 May, 2021;
originally announced May 2021.
-
A Hierarchical Transformer with Speaker Modeling for Emotion Recognition in Conversation
Authors:
Jiangnan Li,
Zheng Lin,
Peng Fu,
Qingyi Si,
Weiping Wang
Abstract:
Emotion Recognition in Conversation (ERC) is a more challenging task than conventional text emotion recognition. It can be regarded as a personalized and interactive emotion recognition task, which is supposed to consider not only the semantic information of text but also the influences from speakers. The current method models speakers' interactions by building a relation between every two speaker…
▽ More
Emotion Recognition in Conversation (ERC) is a more challenging task than conventional text emotion recognition. It can be regarded as a personalized and interactive emotion recognition task, which is supposed to consider not only the semantic information of text but also the influences from speakers. The current method models speakers' interactions by building a relation between every two speakers. However, this fine-grained but complicated modeling is computationally expensive, hard to extend, and can only consider local context. To address this problem, we simplify the complicated modeling to a binary version: Intra-Speaker and Inter-Speaker dependencies, without identifying every unique speaker for the targeted speaker. To better achieve the simplified interaction modeling of speakers in Transformer, which shows excellent ability to settle long-distance dependency, we design three types of masks and respectively utilize them in three independent Transformer blocks. The designed masks respectively model the conventional context modeling, Intra-Speaker dependency, and Inter-Speaker dependency. Furthermore, different speaker-aware information extracted by Transformer blocks diversely contributes to the prediction, and therefore we utilize the attention mechanism to automatically weight them. Experiments on two ERC datasets indicate that our model is efficacious to achieve better performance.
△ Less
Submitted 29 December, 2020;
originally announced December 2020.
-
SPOC learner's final grade prediction based on a novel sampling batch normalization embedded neural network method
Authors:
Zhuonan Liang,
Ziheng Liu,
Huaze Shi,
Yunlong Chen,
Yanbin Cai,
Yating Liang,
Yafan Feng,
Yuqing Yang,
Jing Zhang,
Peng Fu
Abstract:
Recent years have witnessed the rapid growth of Small Private Online Courses (SPOC) which is able to highly customized and personalized to adapt variable educational requests, in which machine learning techniques are explored to summarize and predict the learner's performance, mostly focus on the final grade. However, the problem is that the final grade of learners on SPOC is generally seriously i…
▽ More
Recent years have witnessed the rapid growth of Small Private Online Courses (SPOC) which is able to highly customized and personalized to adapt variable educational requests, in which machine learning techniques are explored to summarize and predict the learner's performance, mostly focus on the final grade. However, the problem is that the final grade of learners on SPOC is generally seriously imbalance which handicaps the training of prediction model. To solve this problem, a sampling batch normalization embedded deep neural network (SBNEDNN) method is developed in this paper. First, a combined indicator is defined to measure the distribution of the data, then a rule is established to guide the sampling process. Second, the batch normalization (BN) modified layers are embedded into full connected neural network to solve the data imbalanced problem. Experimental results with other three deep learning methods demonstrates the superiority of the proposed method.
△ Less
Submitted 11 November, 2022; v1 submitted 15 December, 2020;
originally announced December 2020.
-
A survey of sketches in traffic measurement: Design, Optimization, Application and Implementation
Authors:
Shangsen Li,
Lailong Luo,
Deke Guo,
Qianzhen Zhang,
Pengtao Fu
Abstract:
Network measurement probes the underlying network to support upper-level decisions such as network management, network update, network maintenance, network defense and beyond. Due to the massive, speedy, unpredictable features of network flows, sketches are widely implemented in measurement nodes to approximately record the frequency or estimate the cardinality of flows. At their cores, sketches u…
▽ More
Network measurement probes the underlying network to support upper-level decisions such as network management, network update, network maintenance, network defense and beyond. Due to the massive, speedy, unpredictable features of network flows, sketches are widely implemented in measurement nodes to approximately record the frequency or estimate the cardinality of flows. At their cores, sketches usually maintain one or multiple counter array(s), and rely on hash functions to select the counter(s) for each flow. Then the space-efficient sketches from the distributed measurement nodes are aggregated to provide statistics of the undergoing flows. Currently, tremendous redesigns and optimizations have been proposed to improve the sketches for better network measurement performance. However, existing reviews or surveys mainly focus on one particular aspect of measurement tasks. Researchers and engineers in the network measurement community desire an all-in-one survey that covers the entire processing pipeline of sketch-based network measurement. To this end, we present the first comprehensive survey of this area. We first introduce the preparation of flows for measurement, then detail the most recent investigations of design, aggregation, decoding, application and implementation of sketches for network measurement. To summarize the existing efforts, we carry out an in-depth study of the existing literature, covering more than 90 sketch designs and optimization strategies. Furthermore, we conduct a comprehensive analysis and qualitative/quantitative comparison of the sketch designs. Finally,we highlight the open issues for future sketch-based network measurement research.
△ Less
Submitted 20 July, 2021; v1 submitted 13 December, 2020;
originally announced December 2020.
-
Learning Class-Transductive Intent Representations for Zero-shot Intent Detection
Authors:
Qingyi Si,
Yuanxin Liu,
Peng Fu,
Zheng Lin,
Jiangnan Li,
Weiping Wang
Abstract:
Zero-shot intent detection (ZSID) aims to deal with the continuously emerging intents without annotated training data. However, existing ZSID systems suffer from two limitations: 1) They are not good at modeling the relationship between seen and unseen intents. 2) They cannot effectively recognize unseen intents under the generalized intent detection (GZSID) setting. A critical problem behind thes…
▽ More
Zero-shot intent detection (ZSID) aims to deal with the continuously emerging intents without annotated training data. However, existing ZSID systems suffer from two limitations: 1) They are not good at modeling the relationship between seen and unseen intents. 2) They cannot effectively recognize unseen intents under the generalized intent detection (GZSID) setting. A critical problem behind these limitations is that the representations of unseen intents cannot be learned in the training stage. To address this problem, we propose a novel framework that utilizes unseen class labels to learn Class-Transductive Intent Representations (CTIR). Specifically, we allow the model to predict unseen intents during training, with the corresponding label names serving as input utterances. On this basis, we introduce a multi-task learning objective, which encourages the model to learn the distinctions among intents, and a similarity scorer, which estimates the connections among intents more accurately. CTIR is easy to implement and can be integrated with existing methods. Experiments on two real-world datasets show that CTIR brings considerable improvement to the baseline systems.
△ Less
Submitted 8 June, 2021; v1 submitted 3 December, 2020;
originally announced December 2020.
-
A lateral semicircular canal segmentation based geometric calibration for human temporal bone CT Image
Authors:
Xiaoguang Li,
Peng Fu,
Hongxia Yin,
ZhenChang Wang,
Li Zhuo,
Hui Zhang
Abstract:
Computed Tomography (CT) of the temporal bone has become an important method for diagnosing ear diseases. Due to the different posture of the subject and the settings of CT scanners, the CT image of the human temporal bone should be geometrically calibrated to ensure the symmetry of the bilateral anatomical structure. Manual calibration is a time-consuming task for radiologists and an important pr…
▽ More
Computed Tomography (CT) of the temporal bone has become an important method for diagnosing ear diseases. Due to the different posture of the subject and the settings of CT scanners, the CT image of the human temporal bone should be geometrically calibrated to ensure the symmetry of the bilateral anatomical structure. Manual calibration is a time-consuming task for radiologists and an important pre-processing step for further computer-aided CT analysis. We propose an automatic calibration algorithm for temporal bone CT images. The lateral semicircular canals (LSCs) are segmented as anchors at first. Then, we define a standard 3D coordinate system. The key step is the LSC segmentation. We design a novel 3D LSC segmentation encoder-decoder network, which introduces a 3D dilated convolution and a multi-pooling scheme for feature fusion in the encoding stage. The experimental results show that our LSC segmentation network achieved a higher segmentation accuracy. Our proposed method can help to perform calibration of temporal bone CT images efficiently.
△ Less
Submitted 28 June, 2020;
originally announced June 2020.
-
A tutorial introduction to quantum circuit programming in dependently typed Proto-Quipper
Authors:
Peng Fu,
Kohei Kishida,
Neil J. Ross,
Peter Selinger
Abstract:
We introduce dependently typed Proto-Quipper, or Proto-Quipper-D for short, an experimental quantum circuit programming language with linear dependent types. We give several examples to illustrate how linear dependent types can help in the construction of correct quantum circuits. Specifically, we show how dependent types enable programming families of circuits, and how dependent types solve the p…
▽ More
We introduce dependently typed Proto-Quipper, or Proto-Quipper-D for short, an experimental quantum circuit programming language with linear dependent types. We give several examples to illustrate how linear dependent types can help in the construction of correct quantum circuits. Specifically, we show how dependent types enable programming families of circuits, and how dependent types solve the problem of type-safe uncomputation of garbage qubits. We also discuss other language features along the way.
△ Less
Submitted 12 December, 2020; v1 submitted 17 May, 2020;
originally announced May 2020.
-
Linear Dependent Type Theory for Quantum Programming Languages
Authors:
Peng Fu,
Kohei Kishida,
Peter Selinger
Abstract:
Modern quantum programming languages integrate quantum resources and classical control. They must, on the one hand, be linearly typed to reflect the no-cloning property of quantum resources. On the other hand, high-level and practical languages should also support quantum circuits as first-class citizens, as well as families of circuits that are indexed by some classical parameters. Quantum progra…
▽ More
Modern quantum programming languages integrate quantum resources and classical control. They must, on the one hand, be linearly typed to reflect the no-cloning property of quantum resources. On the other hand, high-level and practical languages should also support quantum circuits as first-class citizens, as well as families of circuits that are indexed by some classical parameters. Quantum programming languages thus need linear dependent type theory. This paper defines a general semantic structure for such a type theory via certain fibrations of monoidal categories. The categorical model of the quantum circuit description language Proto-Quipper-M by Rios and Selinger (2017) constitutes an example of such a fibration, which means that the language can readily be integrated with dependent types. We then devise both a general linear dependent type system and a dependently typed extension of Proto-Quipper-M, and provide them with operational semantics as well as a prototype implementation.
△ Less
Submitted 6 September, 2022; v1 submitted 28 April, 2020;
originally announced April 2020.
-
VOC-ReID: Vehicle Re-identification based on Vehicle-Orientation-Camera
Authors:
Xiangyu Zhu,
Zhenbo Luo,
Pei Fu,
Xiang Ji
Abstract:
Vehicle re-identification is a challenging task due to high intra-class variances and small inter-class variances. In this work, we focus on the failure cases caused by similar background and shape. They pose serve bias on similarity, making it easier to neglect fine-grained information. To reduce the bias, we propose an approach named VOC-ReID, taking the triplet vehicle-orientation-camera as a w…
▽ More
Vehicle re-identification is a challenging task due to high intra-class variances and small inter-class variances. In this work, we focus on the failure cases caused by similar background and shape. They pose serve bias on similarity, making it easier to neglect fine-grained information. To reduce the bias, we propose an approach named VOC-ReID, taking the triplet vehicle-orientation-camera as a whole and reforming background/shape similarity as camera/orientation re-identification. At first, we train models for vehicle, orientation and camera re-identification respectively. Then we use orientation and camera similarity as penalty to get final similarity. Besides, we propose a high performance baseline boosted by bag of tricks and weakly supervised data augmentation. Our algorithm achieves the second place in vehicle re-identification at the NVIDIA AI City Challenge 2020.
△ Less
Submitted 15 May, 2020; v1 submitted 20 April, 2020;
originally announced April 2020.
-
Deep learning-based prediction of response to HER2-targeted neoadjuvant chemotherapy from pre-treatment dynamic breast MRI: A multi-institutional validation study
Authors:
Nathaniel Braman,
Mohammed El Adoui,
Manasa Vulchi,
Paulette Turk,
Maryam Etesami,
Pingfu Fu,
Kaustav Bera,
Stylianos Drisis,
Vinay Varadan,
Donna Plecha,
Mohammed Benjelloun,
Jame Abraham,
Anant Madabhushi
Abstract:
Predicting response to neoadjuvant therapy is a vexing challenge in breast cancer. In this study, we evaluate the ability of deep learning to predict response to HER2-targeted neo-adjuvant chemotherapy (NAC) from pre-treatment dynamic contrast-enhanced (DCE) MRI acquired prior to treatment. In a retrospective study encompassing DCE-MRI data from a total of 157 HER2+ breast cancer patients from 5 i…
▽ More
Predicting response to neoadjuvant therapy is a vexing challenge in breast cancer. In this study, we evaluate the ability of deep learning to predict response to HER2-targeted neo-adjuvant chemotherapy (NAC) from pre-treatment dynamic contrast-enhanced (DCE) MRI acquired prior to treatment. In a retrospective study encompassing DCE-MRI data from a total of 157 HER2+ breast cancer patients from 5 institutions, we developed and validated a deep learning approach for predicting pathological complete response (pCR) to HER2-targeted NAC prior to treatment. 100 patients who received HER2-targeted neoadjuvant chemotherapy at a single institution were used to train (n=85) and tune (n=15) a convolutional neural network (CNN) to predict pCR. A multi-input CNN leveraging both pre-contrast and late post-contrast DCE-MRI acquisitions was identified to achieve optimal response prediction within the validation set (AUC=0.93). This model was then tested on two independent testing cohorts with pre-treatment DCE-MRI data. It achieved strong performance in a 28 patient testing set from a second institution (AUC=0.85, 95% CI 0.67-1.0, p=.0008) and a 29 patient multicenter trial including data from 3 additional institutions (AUC=0.77, 95% CI 0.58-0.97, p=0.006). Deep learning-based response prediction model was found to exceed a multivariable model incorporating predictive clinical variables (AUC < .65 in testing cohorts) and a model of semi-quantitative DCE-MRI pharmacokinetic measurements (AUC < .60 in testing cohorts). The results presented in this work across multiple sites suggest that with further validation deep learning could provide an effective and reliable tool to guide targeted therapy in breast cancer, thus reducing overtreatment among HER2+ patients.
△ Less
Submitted 22 January, 2020;
originally announced January 2020.
-
Dependently Typed Folds for Nested Data Types
Authors:
Peng Fu,
Peter Selinger
Abstract:
We present an approach to develop folds for nested data types using dependent types. We call such folds $\textit{dependently typed folds}$, they have the following properties. (1) Dependently typed folds are defined by well-founded recursion and they can be defined in a total dependently typed language. (2) Dependently typed folds do not depend on maps, map functions and many terminating functions…
▽ More
We present an approach to develop folds for nested data types using dependent types. We call such folds $\textit{dependently typed folds}$, they have the following properties. (1) Dependently typed folds are defined by well-founded recursion and they can be defined in a total dependently typed language. (2) Dependently typed folds do not depend on maps, map functions and many terminating functions can be defined using dependently typed folds. (3) The induction principles for nested data types follow from the definitions of dependently typed folds and the programs defined by dependently typed folds can be formally verified. (4) Dependently typed folds exist for any nested data types and they can be specialized to the traditional $\textit{higher-order folds}$. Using various of examples, we show how to program and reason about dependently typed folds. We also show how to obtain dependently typed folds in general and how to specialize them to the corresponding higher-order folds.
△ Less
Submitted 13 June, 2018;
originally announced June 2018.
-
A Type Checking Algorithm for Higher-rank, Impredicative and Second-order Types
Authors:
Peng Fu
Abstract:
We study a type checking algorithm that is able to type check a nontrivial subclass of functional programs that use features such as higher-rank, impredicative and second-order types. The only place the algorithm requires type annotation is before each function declaration. We prove the soundness of the type checking algorithm with respect to System $\mathbf{F}_ω$, i.e. if the program is type chec…
▽ More
We study a type checking algorithm that is able to type check a nontrivial subclass of functional programs that use features such as higher-rank, impredicative and second-order types. The only place the algorithm requires type annotation is before each function declaration. We prove the soundness of the type checking algorithm with respect to System $\mathbf{F}_ω$, i.e. if the program is type checked, then the type checker will produce a well-typed annotated System $\mathbf{F}_ω$ term. We extend the basic algorithm to handle pattern matching and let-bindings. We implement a prototype type checker and test it on a variety of functional programs.
△ Less
Submitted 13 November, 2017;
originally announced November 2017.
-
Deep Residual Text Detection Network for Scene Text
Authors:
Xiangyu Zhu,
Yingying Jiang,
Shuli Yang,
Xiaobing Wang,
Wei Li,
Pei Fu,
Hua Wang,
Zhenbo Luo
Abstract:
Scene text detection is a challenging problem in computer vision. In this paper, we propose a novel text detection network based on prevalent object detection frameworks. In order to obtain stronger semantic feature, we adopt ResNet as feature extraction layers and exploit multi-level feature by combining hierarchical convolutional networks. A vertical proposal mechanism is utilized to avoid propo…
▽ More
Scene text detection is a challenging problem in computer vision. In this paper, we propose a novel text detection network based on prevalent object detection frameworks. In order to obtain stronger semantic feature, we adopt ResNet as feature extraction layers and exploit multi-level feature by combining hierarchical convolutional networks. A vertical proposal mechanism is utilized to avoid proposal classification, while regression layer remains working to improve localization accuracy. Our approach evaluated on ICDAR2013 dataset achieves F-measure of 0.91, which outperforms previous state-of-the-art results in scene text detection.
△ Less
Submitted 11 November, 2017;
originally announced November 2017.
-
R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection
Authors:
Yingying Jiang,
Xiangyu Zhu,
Xiaobing Wang,
Shuli Yang,
Wei Li,
Hua Wang,
Pei Fu,
Zhenbo Luo
Abstract:
In this paper, we propose a novel method called Rotational Region CNN (R2CNN) for detecting arbitrary-oriented texts in natural scene images. The framework is based on Faster R-CNN [1] architecture. First, we use the Region Proposal Network (RPN) to generate axis-aligned bounding boxes that enclose the texts with different orientations. Second, for each axis-aligned text box proposed by RPN, we ex…
▽ More
In this paper, we propose a novel method called Rotational Region CNN (R2CNN) for detecting arbitrary-oriented texts in natural scene images. The framework is based on Faster R-CNN [1] architecture. First, we use the Region Proposal Network (RPN) to generate axis-aligned bounding boxes that enclose the texts with different orientations. Second, for each axis-aligned text box proposed by RPN, we extract its pooled features with different pooled sizes and the concatenated features are used to simultaneously predict the text/non-text score, axis-aligned box and inclined minimum area box. At last, we use an inclined non-maximum suppression to get the detection results. Our approach achieves competitive results on text detection benchmarks: ICDAR 2015 and ICDAR 2013.
△ Less
Submitted 30 June, 2017; v1 submitted 29 June, 2017;
originally announced June 2017.
-
Representing Nonterminating Rewriting with $\mathbf{F}_2^μ$
Authors:
Peng Fu
Abstract:
We specify a second-order type system $\mathbf{F}_2^μ$ that is tailored for representing nonterminations. The nonterminating trace of a term $t$ in a rewrite system $\mathcal{R}$ corresponds to a productive inhabitant $e$ such that $Γ_{\mathcal{R}} \vdash e : t$ in $\mathbf{F}_2^μ$, where $Γ_{\mathcal{R}}$ is the environment representing the rewrite system. We prove that the productivity checking…
▽ More
We specify a second-order type system $\mathbf{F}_2^μ$ that is tailored for representing nonterminations. The nonterminating trace of a term $t$ in a rewrite system $\mathcal{R}$ corresponds to a productive inhabitant $e$ such that $Γ_{\mathcal{R}} \vdash e : t$ in $\mathbf{F}_2^μ$, where $Γ_{\mathcal{R}}$ is the environment representing the rewrite system. We prove that the productivity checking in $\mathbf{F}_2^μ$ is decidable via a mapping to the $λ$-Y calculus. We develop a type checking algorithm for $\mathbf{F}_2^μ$ based on second-order matching. We implement the type checking algorithm in a proof-of-concept type checker.
△ Less
Submitted 2 June, 2017;
originally announced June 2017.
-
Operational Semantics of Resolution and Productivity in Horn Clause Logic
Authors:
Peng Fu,
Ekaterina Komendantskaya
Abstract:
This paper presents a study of operational and type-theoretic properties of different resolution strategies in Horn clause logic. We distinguish four different kinds of resolution: resolution by unification (SLD-resolution), resolution by term-matching, the recently introduced structural resolution, and partial (or lazy) resolution. We express them all uniformly as abstract reduction systems, whic…
▽ More
This paper presents a study of operational and type-theoretic properties of different resolution strategies in Horn clause logic. We distinguish four different kinds of resolution: resolution by unification (SLD-resolution), resolution by term-matching, the recently introduced structural resolution, and partial (or lazy) resolution. We express them all uniformly as abstract reduction systems, which allows us to undertake a thorough comparative analysis of their properties. To match this small-step semantics, we propose to take Howard's System H as a type-theoretic semantic counterpart. Using System H, we interpret Horn formulas as types, and a derivation for a given formula as the proof term inhabiting the type given by the formula. We prove soundness of these abstract reduction systems relative to System H, and we show completeness of SLD-resolution and structural resolution relative to System H. We identify conditions under which structural resolution is operationally equivalent to SLD-resolution. We show correspondence between term-matching resolution for Horn clause programs without existential variables and term rewriting.
△ Less
Submitted 17 August, 2016; v1 submitted 14 April, 2016;
originally announced April 2016.
-
Proof Relevant Corecursive Resolution
Authors:
Peng Fu,
Ekaterina Komendantskaya,
Tom Schrijvers,
Andrew Pond
Abstract:
Resolution lies at the foundation of both logic programming and type class context reduction in functional languages. Terminating derivations by resolution have well-defined inductive meaning, whereas some non-terminating derivations can be understood coinductively. Cycle detection is a popular method to capture a small subset of such derivations. We show that in fact cycle detection is a restrict…
▽ More
Resolution lies at the foundation of both logic programming and type class context reduction in functional languages. Terminating derivations by resolution have well-defined inductive meaning, whereas some non-terminating derivations can be understood coinductively. Cycle detection is a popular method to capture a small subset of such derivations. We show that in fact cycle detection is a restricted form of coinductive proof, in which the atomic formula forming the cycle plays the role of coinductive hypothesis.
This paper introduces a heuristic method for obtaining richer coinductive hypotheses in the form of Horn formulas. Our approach subsumes cycle detection and gives coinductive meaning to a larger class of derivations. For this purpose we extend resolution with Horn formula resolvents and corecursive evidence generation. We illustrate our method on non-terminating type class resolution problems.
△ Less
Submitted 30 November, 2015;
originally announced November 2015.