subscribe to arXiv mailings

Recurrent Context Compression: Efficiently Expanding the Context Window of LLM

Authors: Chensen Huang, Guibo Zhu, Xuepeng Wang, Yifei Luo, Guojing Ge, Haoran Chen, Dong Yi, Jinqiao Wang

Abstract: To extend the context length of Transformer-based large language models (LLMs) and improve comprehension capabilities, we often face limitations due to computational resources and bounded memory storage capacity. This work introduces a method called Recurrent Context Compression (RCC), designed to efficiently expand the context window length of LLMs within constrained storage space. We also invest… ▽ More To extend the context length of Transformer-based large language models (LLMs) and improve comprehension capabilities, we often face limitations due to computational resources and bounded memory storage capacity. This work introduces a method called Recurrent Context Compression (RCC), designed to efficiently expand the context window length of LLMs within constrained storage space. We also investigate the issue of poor model responses when both instructions and context are compressed in downstream tasks, and propose an instruction reconstruction method to mitigate this problem. We validated the effectiveness of our approach on multiple tasks, achieving a compression rate of up to 32x on text reconstruction tasks with a BLEU4 score close to 0.95, and nearly 100\% accuracy on a passkey retrieval task with a sequence length of 1M. Finally, our method demonstrated competitive performance in long-text question-answering tasks compared to non-compressed methods, while significantly saving storage resources in long-text inference tasks. Our code, models, and demo are available at https://github.com/WUHU-G/RCC_Transformer △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2402.03047 [pdf, other]

PFDM: Parser-Free Virtual Try-on via Diffusion Model

Authors: Yunfang Niu, Dong Yi, Lingxiang Wu, Zhiwei Liu, Pengxiang Cai, Jinqiao Wang

Abstract: Virtual try-on can significantly improve the garment shopping experiences in both online and in-store scenarios, attracting broad interest in computer vision. However, to achieve high-fidelity try-on performance, most state-of-the-art methods still rely on accurate segmentation masks, which are often produced by near-perfect parsers or manual labeling. To overcome the bottleneck, we propose a pars… ▽ More Virtual try-on can significantly improve the garment shopping experiences in both online and in-store scenarios, attracting broad interest in computer vision. However, to achieve high-fidelity try-on performance, most state-of-the-art methods still rely on accurate segmentation masks, which are often produced by near-perfect parsers or manual labeling. To overcome the bottleneck, we propose a parser-free virtual try-on method based on the diffusion model (PFDM). Given two images, PFDM can "wear" garments on the target person seamlessly by implicitly warping without any other information. To learn the model effectively, we synthesize many pseudo-images and construct sample pairs by wearing various garments on persons. Supervised by the large-scale expanded dataset, we fuse the person and garment features using a proposed Garment Fusion Attention (GFA) mechanism. Experiments demonstrate that our proposed PFDM can successfully handle complex cases, synthesize high-fidelity images, and outperform both state-of-the-art parser-free and parser-based models. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: Accepted by IEEE ICASSP 2024

arXiv:2312.10920 [pdf, other]

Domain adaption and physical constrains transfer learning for shale gas production

Authors: Zhaozhong Yang, Liangjie Gou, Chao Min, Duo Yi, Xiaogang Li, Guoquan Wen

Abstract: Effective prediction of shale gas production is crucial for strategic reservoir development. However, in new shale gas blocks, two main challenges are encountered: (1) the occurrence of negative transfer due to insufficient data, and (2) the limited interpretability of deep learning (DL) models. To tackle these problems, we propose a novel transfer learning methodology that utilizes domain adaptat… ▽ More Effective prediction of shale gas production is crucial for strategic reservoir development. However, in new shale gas blocks, two main challenges are encountered: (1) the occurrence of negative transfer due to insufficient data, and (2) the limited interpretability of deep learning (DL) models. To tackle these problems, we propose a novel transfer learning methodology that utilizes domain adaptation and physical constraints. This methodology effectively employs historical data from the source domain to reduce negative transfer from the data distribution perspective, while also using physical constraints to build a robust and reliable prediction model that integrates various types of data. The methodology starts by dividing the production data from the source domain into multiple subdomains, thereby enhancing data diversity. It then uses Maximum Mean Discrepancy (MMD) and global average distance measures to decide on the feasibility of transfer. Through domain adaptation, we integrate all transferable knowledge, resulting in a more comprehensive target model. Lastly, by incorporating drilling, completion, and geological data as physical constraints, we develop a hybrid model. This model, a combination of a multi-layer perceptron (MLP) and a Transformer (Transformer-MLP), is designed to maximize interpretability. Experimental validation in China's southwestern region confirms the method's effectiveness. △ Less

Submitted 17 December, 2023; originally announced December 2023.

arXiv:2311.01149 [pdf, other]

ChineseWebText: Large-scale High-quality Chinese Web Text Extracted with Effective Evaluation Model

Authors: Jianghao Chen, Pu Jian, Tengxiao Xi, Dongyi Yi, Qianlong Du, Chenglin Ding, Guibo Zhu, Chengqing Zong, Jinqiao Wang, Jiajun Zhang

Abstract: During the development of large language models (LLMs), the scale and quality of the pre-training data play a crucial role in shaping LLMs' capabilities. To accelerate the research of LLMs, several large-scale datasets, such as C4 [1], Pile [2], RefinedWeb [3] and WanJuan [4], have been released to the public. However, most of the released corpus focus mainly on English, and there is still lack of… ▽ More During the development of large language models (LLMs), the scale and quality of the pre-training data play a crucial role in shaping LLMs' capabilities. To accelerate the research of LLMs, several large-scale datasets, such as C4 [1], Pile [2], RefinedWeb [3] and WanJuan [4], have been released to the public. However, most of the released corpus focus mainly on English, and there is still lack of complete tool-chain for extracting clean texts from web data. Furthermore, fine-grained information of the corpus, e.g. the quality of each text, is missing. To address these challenges, we propose in this paper a new complete tool-chain EvalWeb to extract Chinese clean texts from noisy web data. First, similar to previous work, manually crafted rules are employed to discard explicit noisy texts from the raw crawled web contents. Second, a well-designed evaluation model is leveraged to assess the remaining relatively clean data, and each text is assigned a specific quality score. Finally, we can easily utilize an appropriate threshold to select the high-quality pre-training data for Chinese. Using our proposed approach, we release the largest and latest large-scale high-quality Chinese web text ChineseWebText, which consists of 1.42 TB and each text is associated with a quality score, facilitating the LLM researchers to choose the data according to the desired quality thresholds. We also release a much cleaner subset of 600 GB Chinese data with the quality exceeding 90%. △ Less

Submitted 10 November, 2023; v1 submitted 2 November, 2023; originally announced November 2023.

arXiv:2306.12108 [pdf]

Complex accident, clear responsibility

Authors: Dexin Yi

Abstract: The problem of allocating accident responsibility for autonomous driving is a difficult issue in the field of autonomous driving. Due to the complexity of autonomous driving technology, most of the research on the responsibility of autonomous driving accidents has remained at the theoretical level. When encountering actual autonomous driving accidents, a proven and fair solution is needed. To addr… ▽ More The problem of allocating accident responsibility for autonomous driving is a difficult issue in the field of autonomous driving. Due to the complexity of autonomous driving technology, most of the research on the responsibility of autonomous driving accidents has remained at the theoretical level. When encountering actual autonomous driving accidents, a proven and fair solution is needed. To address this problem, this study proposes a multi-subject responsibility allocation optimization method based on the RCModel (Risk Chain Model), which analyzes the responsibility of each actor from a technical perspective and promotes a more reasonable and fair allocation of responsibility. △ Less

Submitted 21 June, 2023; originally announced June 2023.

Comments: 7 pages, 7 figures

arXiv:2211.09027 [pdf, other]

LLEDA -- Lifelong Self-Supervised Domain Adaptation

Authors: Mamatha Thota, Dewei Yi, Georgios Leontidis

Abstract: Humans and animals have the ability to continuously learn new information over their lifetime without losing previously acquired knowledge. However, artificial neural networks struggle with this due to new information conflicting with old knowledge, resulting in catastrophic forgetting. The complementary learning systems (CLS) theory suggests that the interplay between hippocampus and neocortex sy… ▽ More Humans and animals have the ability to continuously learn new information over their lifetime without losing previously acquired knowledge. However, artificial neural networks struggle with this due to new information conflicting with old knowledge, resulting in catastrophic forgetting. The complementary learning systems (CLS) theory suggests that the interplay between hippocampus and neocortex systems enables long-term and efficient learning in the mammalian brain, with memory replay facilitating the interaction between these two systems to reduce forgetting. The proposed Lifelong Self-Supervised Domain Adaptation (LLEDA) framework draws inspiration from the CLS theory and mimics the interaction between two networks: a DA network inspired by the hippocampus that quickly adjusts to changes in data distribution and an SSL network inspired by the neocortex that gradually learns domain-agnostic general representations. LLEDA's latent replay technique facilitates communication between these two networks by reactivating and replaying the past memory latent representations to stabilise long-term generalisation and retention without interfering with the previously learned information. Extensive experiments demonstrate that the proposed method outperforms several other methods resulting in a long-term adaptation while being less prone to catastrophic forgetting when transferred to new domains. △ Less

Submitted 7 August, 2023; v1 submitted 12 November, 2022; originally announced November 2022.

Comments: 19 pages, 6 figures, 6 tables; V2 added more experiments on more domains and fixed typos

arXiv:2209.04811 [pdf, other]

Probing for Understanding of English Verb Classes and Alternations in Large Pre-trained Language Models

Authors: David K. Yi, James V. Bruno, Jiayu Han, Peter Zukerman, Shane Steinert-Threlkeld

Abstract: We investigate the extent to which verb alternation classes, as described by Levin (1993), are encoded in the embeddings of Large Pre-trained Language Models (PLMs) such as BERT, RoBERTa, ELECTRA, and DeBERTa using selectively constructed diagnostic classifiers for word and sentence-level prediction tasks. We follow and expand upon the experiments of Kann et al. (2019), which aim to probe whether… ▽ More We investigate the extent to which verb alternation classes, as described by Levin (1993), are encoded in the embeddings of Large Pre-trained Language Models (PLMs) such as BERT, RoBERTa, ELECTRA, and DeBERTa using selectively constructed diagnostic classifiers for word and sentence-level prediction tasks. We follow and expand upon the experiments of Kann et al. (2019), which aim to probe whether static embeddings encode frame-selectional properties of verbs. At both the word and sentence level, we find that contextual embeddings from PLMs not only outperform non-contextual embeddings, but achieve astonishingly high accuracies on tasks across most alternation classes. Additionally, we find evidence that the middle-to-upper layers of PLMs achieve better performance on average than the lower layers across all probing tasks. △ Less

Submitted 11 September, 2022; originally announced September 2022.

Comments: 8 pages, 6 figures

arXiv:2111.00042 [pdf, other]

CvS: Classification via Segmentation For Small Datasets

Authors: Nooshin Mojab, Philip S. Yu, Joelle A. Hallak, Darvin Yi

Abstract: Deep learning models have shown promising results in a wide range of computer vision applications across various domains. The success of deep learning methods relies heavily on the availability of a large amount of data. Deep neural networks are prone to overfitting when data is scarce. This problem becomes even more severe for neural network with classification head with access to only a few data… ▽ More Deep learning models have shown promising results in a wide range of computer vision applications across various domains. The success of deep learning methods relies heavily on the availability of a large amount of data. Deep neural networks are prone to overfitting when data is scarce. This problem becomes even more severe for neural network with classification head with access to only a few data points. However, acquiring large-scale datasets is very challenging, laborious, or even infeasible in some domains. Hence, developing classifiers that are able to perform well in small data regimes is crucial for applications with limited data. This paper presents CvS, a cost-effective classifier for small datasets that derives the classification labels from predicting the segmentation maps. We employ the label propagation method to achieve a fully segmented dataset with only a handful of manually segmented data. We evaluate the effectiveness of our framework on diverse problems showing that CvS is able to achieve much higher classification results compared to previous methods when given only a handful of examples. △ Less

Submitted 29 October, 2021; originally announced November 2021.

arXiv:2108.01254 [pdf, other]

doi 10.1109/RO-MAN46459.2019.8956243

Desk Organization: Effect of Multimodal Inputs on Spatial Relational Learning

Authors: Ryan Rowe, Shivam Singhal, Daqing Yi, Tapomayukh Bhattacharjee, Siddhartha S. Srinivasa

Abstract: For robots to operate in a three dimensional world and interact with humans, learning spatial relationships among objects in the surrounding is necessary. Reasoning about the state of the world requires inputs from many different sensory modalities including vision ($V$) and haptics ($H$). We examine the problem of desk organization: learning how humans spatially position different objects on a pl… ▽ More For robots to operate in a three dimensional world and interact with humans, learning spatial relationships among objects in the surrounding is necessary. Reasoning about the state of the world requires inputs from many different sensory modalities including vision ($V$) and haptics ($H$). We examine the problem of desk organization: learning how humans spatially position different objects on a planar surface according to organizational ''preference''. We model this problem by examining how humans position objects given multiple features received from vision and haptic modalities. However, organizational habits vary greatly between people both in structure and adherence. To deal with user organizational preferences, we add an additional modality, ''utility'' ($U$), which informs on a particular human's perceived usefulness of a given object. Models were trained as generalized (over many different people) or tailored (per person). We use two types of models: random forests, which focus on precise multi-task classification, and Markov logic networks, which provide an easily interpretable insight into organizational habits. The models were applied to both synthetic data, which proved to be learnable when using fixed organizational constraints, and human-study data, on which the random forest achieved over 90% accuracy. Over all combinations of $\{H, U, V\}$ modalities, $UV$ and $HUV$ were the most informative for organization. In a follow-up study, we gauged participants preference of desk organizations by a generalized random forest organization vs. by a random model. On average, participants rated the random forest models as 4.15 on a 5-point Likert scale compared to 1.84 for the random model △ Less

Submitted 2 August, 2021; originally announced August 2021.

Comments: 8 pages, 7 figures

ACM Class: I.2.9

Journal ref: 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) (pp. 1-8). IEEE

arXiv:2107.14654 [pdf, other]

Brain-Inspired Deep Imitation Learning for Autonomous Driving Systems

Authors: Hasan Bayarov Ahmedov, Dewei Yi, Jie Sui

Abstract: Autonomous driving has attracted great attention from both academics and industries. To realise autonomous driving, Deep Imitation Learning (DIL) is treated as one of the most promising solutions, because it improves autonomous driving systems by automatically learning a complex mapping from human driving data, compared to manually designing the driving policy. However, existing DIL methods cannot… ▽ More Autonomous driving has attracted great attention from both academics and industries. To realise autonomous driving, Deep Imitation Learning (DIL) is treated as one of the most promising solutions, because it improves autonomous driving systems by automatically learning a complex mapping from human driving data, compared to manually designing the driving policy. However, existing DIL methods cannot generalise well across domains, that is, a network trained on the data of source domain gives rise to poor generalisation on the data of target domain. In the present study, we propose a novel brain-inspired deep imitation method that builds on the evidence from human brain functions, to improve the generalisation ability of deep neural networks so that autonomous driving systems can perform well in various scenarios. Specifically, humans have a strong generalisation ability which is beneficial from the structural and functional asymmetry of the two sides of the brain. Here, we design dual Neural Circuit Policy (NCP) architectures in deep neural networks based on the asymmetry of human neural networks. Experimental results demonstrate that our brain-inspired method outperforms existing methods regarding generalisation when dealing with unseen data. Our source codes and pretrained models are available at https://github.com/Intenzo21/Brain-Inspired-Deep-Imitation-Learning-for-Autonomous-Driving-Systems}{https://github.com/Intenzo21/Brain-Inspired-Deep-Imitation-Learning-for-Autonomous-Driving-Systems. △ Less

Submitted 30 July, 2021; originally announced July 2021.

arXiv:2106.03905 [pdf, other]

AutoPtosis

Authors: Abdullah Aleem, Manoj Prabhakar Nallabothula, Pete Setabutr, Joelle A. Hallak, Darvin Yi

Abstract: Blepharoptosis, or ptosis as it is more commonly referred to, is a condition of the eyelid where the upper eyelid droops. The current diagnosis for ptosis involves cumbersome manual measurements that are time-consuming and prone to human error. In this paper, we present AutoPtosis, an artificial intelligence based system with interpretable results for rapid diagnosis of ptosis. We utilize a divers… ▽ More Blepharoptosis, or ptosis as it is more commonly referred to, is a condition of the eyelid where the upper eyelid droops. The current diagnosis for ptosis involves cumbersome manual measurements that are time-consuming and prone to human error. In this paper, we present AutoPtosis, an artificial intelligence based system with interpretable results for rapid diagnosis of ptosis. We utilize a diverse dataset collected from the Illinois Ophthalmic Database Atlas (I-ODA) to develop a robust deep learning model for prediction and also develop a clinically inspired model that calculates the marginal reflex distance and iris ratio. AutoPtosis achieved 95.5% accuracy on physician verified data that had an equal class balance. The proposed algorithm can help in the rapid and timely diagnosis of ptosis, significantly reduce the burden on the healthcare system, and save the patients and clinics valuable resources. △ Less

Submitted 9 June, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

arXiv:2104.02609 [pdf, other]

I-ODA, Real-World Multi-modal Longitudinal Data for OphthalmicApplications

Authors: Nooshin Mojab, Vahid Noroozi, Abdullah Aleem, Manoj P. Nallabothula, Joseph Baker, Dimitri T. Azar, Mark Rosenblatt, RV Paul Chan, Darvin Yi, Philip S. Yu, Joelle A. Hallak

Abstract: Data from clinical real-world settings is characterized by variability in quality, machine-type, setting, and source. One of the primary goals of medical computer vision is to develop and validate artificial intelligence (AI) based algorithms on real-world data enabling clinical translations. However, despite the exponential growth in AI based applications in healthcare, specifically in ophthalmol… ▽ More Data from clinical real-world settings is characterized by variability in quality, machine-type, setting, and source. One of the primary goals of medical computer vision is to develop and validate artificial intelligence (AI) based algorithms on real-world data enabling clinical translations. However, despite the exponential growth in AI based applications in healthcare, specifically in ophthalmology, translations to clinical settings remain challenging. Limited access to adequate and diverse real-world data inhibits the development and validation of translatable algorithms. In this paper, we present a new multi-modal longitudinal ophthalmic imaging dataset, the Illinois Ophthalmic Database Atlas (I-ODA), with the goal of advancing state-of-the-art computer vision applications in ophthalmology, and improving upon the translatable capacity of AI based applications across different clinical settings. We present the infrastructure employed to collect, annotate, and anonymize images from multiple sources, demonstrating the complexity of real-world retrospective data and its limitations. I-ODA includes 12 imaging modalities with a total of 3,668,649 ophthalmic images of 33,876 individuals from the Department of Ophthalmology and Visual Sciences at the Illinois Eye and Ear Infirmary of the University of Illinois Chicago (UIC) over the course of 12 years. △ Less

Submitted 29 March, 2021; originally announced April 2021.

arXiv:2007.12672 [pdf, other]

Real-World Multi-Domain Data Applications for Generalizations to Clinical Settings

Authors: Nooshin Mojab, Vahid Noroozi, Darvin Yi, Manoj Prabhakar Nallabothula, Abdullah Aleem, Phillip S. Yu, Joelle A. Hallak

Abstract: With promising results of machine learning based models in computer vision, applications on medical imaging data have been increasing exponentially. However, generalizations to complex real-world clinical data is a persistent problem. Deep learning models perform well when trained on standardized datasets from artificial settings, such as clinical trials. However, real-world data is different and… ▽ More With promising results of machine learning based models in computer vision, applications on medical imaging data have been increasing exponentially. However, generalizations to complex real-world clinical data is a persistent problem. Deep learning models perform well when trained on standardized datasets from artificial settings, such as clinical trials. However, real-world data is different and translations are yielding varying results. The complexity of real-world applications in healthcare could emanate from a mixture of different data distributions across multiple device domains alongside the inevitable noise sourced from varying image resolutions, human errors, and the lack of manual gradings. In addition, healthcare applications not only suffer from the scarcity of labeled data, but also face limited access to unlabeled data due to HIPAA regulations, patient privacy, ambiguity in data ownership, and challenges in collecting data from different sources. These limitations pose additional challenges to applying deep learning algorithms in healthcare and clinical translations. In this paper, we utilize self-supervised representation learning methods, formulated effectively in transfer learning settings, to address limited data availability. Our experiments verify the importance of diverse real-world data for generalization to clinical settings. We show that by employing a self-supervised approach with transfer learning on a multi-domain real-world dataset, we can achieve 16% relative improvement on a standardized dataset over supervised baselines. △ Less

Submitted 24 July, 2020; originally announced July 2020.

arXiv:2002.09809 [pdf, other]

Random Bundle: Brain Metastases Segmentation Ensembling through Annotation Randomization

Authors: Darvin Yi, Endre Grøvik, Michael Iv, Elizabeth Tong, Greg Zaharchuk, Daniel Rubin

Abstract: We introduce a novel ensembling method, Random Bundle (RB), that improves performance for brain metastases segmentation. We create our ensemble by training each network on our dataset with 50% of our annotated lesions censored out. We also apply a lopsided bootstrap loss to recover performance after inducing an in silico 50% false negative rate and make our networks more sensitive. We improve our… ▽ More We introduce a novel ensembling method, Random Bundle (RB), that improves performance for brain metastases segmentation. We create our ensemble by training each network on our dataset with 50% of our annotated lesions censored out. We also apply a lopsided bootstrap loss to recover performance after inducing an in silico 50% false negative rate and make our networks more sensitive. We improve our network detection of lesions's mAP value by 39% and more than triple the sensitivity at 80% precision. We also show slight improvements in segmentation quality through DICE score. Further, RB ensembling improves performance over baseline by a larger margin than a variety of popular ensembling strategies. Finally, we show that RB ensembling is computationally efficient by comparing its performance to a single network when both systems are constrained to have the same compute. △ Less

Submitted 28 April, 2020; v1 submitted 22 February, 2020; originally announced February 2020.

arXiv:2001.09501 [pdf, other]

Brain Metastasis Segmentation Network Trained with Robustness to Annotations with Multiple False Negatives

Authors: Darvin Yi, Endre Grøvik, Michael Iv, Elizabeth Tong, Greg Zaharchuk, Daniel Rubin

Abstract: Deep learning has proven to be an essential tool for medical image analysis. However, the need for accurately labeled input data, often requiring time- and labor-intensive annotation by experts, is a major limitation to the use of deep learning. One solution to this challenge is to allow for use of coarse or noisy labels, which could permit more efficient and scalable labeling of images. In this w… ▽ More Deep learning has proven to be an essential tool for medical image analysis. However, the need for accurately labeled input data, often requiring time- and labor-intensive annotation by experts, is a major limitation to the use of deep learning. One solution to this challenge is to allow for use of coarse or noisy labels, which could permit more efficient and scalable labeling of images. In this work, we develop a lopsided loss function based on entropy regularization that assumes the existence of a nontrivial false negative rate in the target annotations. Starting with a carefully annotated brain metastasis lesion dataset, we simulate data with false negatives by (1) randomly censoring the annotated lesions and (2) systematically censoring the smallest lesions. The latter better models true physician error because smaller lesions are harder to notice than the larger ones. Even with a simulated false negative rate as high as 50%, applying our loss function to randomly censored data preserves maximum sensitivity at 97% of the baseline with uncensored training data, compared to just 10% for a standard loss function. For the size-based censorship, performance is restored from 17% with the current standard to 88% with our lopsided bootstrap loss. Our work will enable more efficient scaling of the image labeling process, in parallel with other approaches on creating more efficient user interfaces and tools for annotation. △ Less

Submitted 26 January, 2020; originally announced January 2020.

arXiv:1912.11966 [pdf]

Handling Missing MRI Input Data in Deep Learning Segmentation of Brain Metastases: A Multi-Center Study

Authors: Endre Grøvik, Darvin Yi, Michael Iv, Elizabeth Tong, Line Brennhaug Nilsen, Anna Latysheva, Cathrine Saxhaug, Kari Dolven Jacobsen, Åslaug Helland, Kyrre Eeg Emblem, Daniel Rubin, Greg Zaharchuk

Abstract: The purpose was to assess the clinical value of a novel DropOut model for detecting and segmenting brain metastases, in which a neural network is trained on four distinct MRI sequences using an input dropout layer, thus simulating the scenario of missing MRI data by training on the full set and all possible subsets of the input data. This retrospective, multi-center study, evaluated 165 patients w… ▽ More The purpose was to assess the clinical value of a novel DropOut model for detecting and segmenting brain metastases, in which a neural network is trained on four distinct MRI sequences using an input dropout layer, thus simulating the scenario of missing MRI data by training on the full set and all possible subsets of the input data. This retrospective, multi-center study, evaluated 165 patients with brain metastases. A deep learning based segmentation model for automatic segmentation of brain metastases, named DropOut, was trained on multi-sequence MRI from 100 patients, and validated/tested on 10/55 patients. The segmentation results were compared with the performance of a state-of-the-art DeepLabV3 model. The MR sequences in the training set included pre- and post-gadolinium (Gd) T1-weighted 3D fast spin echo, post-Gd T1-weighted inversion recovery (IR) prepped fast spoiled gradient echo, and 3D fluid attenuated inversion recovery (FLAIR), whereas the test set did not include the IR prepped image-series. The ground truth were established by experienced neuroradiologists. The results were evaluated using precision, recall, Dice score, and receiver operating characteristics (ROC) curve statistics, while the Wilcoxon rank sum test was used to compare the performance of the two neural networks. The area under the ROC curve (AUC), averaged across all test cases, was 0.989+-0.029 for the DropOut model and 0.989+-0.023 for the DeepLabV3 model (p=0.62). The DropOut model showed a significantly higher Dice score compared to the DeepLabV3 model (0.795+-0.105 vs. 0.774+-0.104, p=0.017), and a significantly lower average false positive rate of 3.6/patient vs. 7.0/patient (p<0.001) using a 10mm3 lesion-size limit. The DropOut model may facilitate accurate detection and segmentation of brain metastases on a multi-center basis, even when the test cohort is missing MRI input data. △ Less

Submitted 26 December, 2019; originally announced December 2019.

arXiv:1912.08775 [pdf, other]

MRI Pulse Sequence Integration for Deep-Learning Based Brain Metastasis Segmentation

Authors: Darvin Yi, Endre Grøvik, Michael Iv, Elizabeth Tong, Kyrre Eeg Emblem, Line Brennhaug Nilsen, Cathrine Saxhaug, Anna Latysheva, Kari Dolven Jacobsen, Åslaug Helland, Greg Zaharchuk, Daniel Rubin

Abstract: Magnetic resonance (MR) imaging is an essential diagnostic tool in clinical medicine. Recently, a variety of deep learning methods have been applied to segmentation tasks in medical images, with promising results for computer-aided diagnosis. For MR images, effectively integrating different pulse sequences is important to optimize performance. However, the best way to integrate different pulse seq… ▽ More Magnetic resonance (MR) imaging is an essential diagnostic tool in clinical medicine. Recently, a variety of deep learning methods have been applied to segmentation tasks in medical images, with promising results for computer-aided diagnosis. For MR images, effectively integrating different pulse sequences is important to optimize performance. However, the best way to integrate different pulse sequences remains unclear. In this study, we evaluate multiple architectural features and characterize their effects in the task of metastasis segmentation. Specifically, we consider (1) different pulse sequence integration schemas, (2) different modes of weight sharing for parallel network branches, and (3) a new approach for enabling robustness to missing pulse sequences. We find that levels of integration and modes of weight sharing that favor low variance work best in our regime of small data (n = 100). By adding an input-level dropout layer, we could preserve the overall performance of these networks while allowing for inference on inputs with missing pulse sequence. We illustrate not only the generalizability of the network but also the utility of this robustness when applying the trained model to data from a different center, which does not use the same pulse sequences. Finally, we apply network visualization methods to better understand which input features are most important for network performance. Together, these results provide a framework for building networks with enhanced robustness to missing data while maintaining comparable performance in medical imaging applications. △ Less

Submitted 18 December, 2019; originally announced December 2019.

Comments: In the IEEE transactions format for submission to IEEE-TMI

arXiv:1912.07248 [pdf, other]

doi 10.1109/TIP.2020.3010631

Latent Complete Row Space Recovery for Multi-view Subspace Clustering

Authors: Hong Tao, Chenping Hou, Yuhua Qian, Jubo Zhu, Dongyun Yi

Abstract: Multi-view subspace clustering has been applied to applications such as image processing and video surveillance, and has attracted increasing attention. Most existing methods learn view-specific self-representation matrices, and construct a combined affinity matrix from multiple views. The affinity construction process is time-consuming, and the combined affinity matrix is not guaranteed to reflec… ▽ More Multi-view subspace clustering has been applied to applications such as image processing and video surveillance, and has attracted increasing attention. Most existing methods learn view-specific self-representation matrices, and construct a combined affinity matrix from multiple views. The affinity construction process is time-consuming, and the combined affinity matrix is not guaranteed to reflect the whole true subspace structure. To overcome these issues, the Latent Complete Row Space Recovery (LCRSR) method is proposed. Concretely, LCRSR is based on the assumption that the multi-view observations are generated from an underlying latent representation, which is further assumed to collect the authentic samples drawn exactly from multiple subspaces. LCRSR is able to recover the row space of the latent representation, which not only carries complete information from multiple views but also determines the subspace membership under certain conditions. LCRSR does not involve the graph construction procedure and is solved with an efficient and convergent algorithm, thereby being more scalable to large-scale datasets. The effectiveness and efficiency of LCRSR are validated by clustering various kinds of multi-view data and illustrated in the background subtraction task. △ Less

Submitted 16 December, 2019; originally announced December 2019.

arXiv:1905.04457 [pdf, ps, other]

Triplet Distillation for Deep Face Recognition

Authors: Yushu Feng, Huan Wang, Daniel T. Yi, Roland Hu

Abstract: Convolutional neural networks (CNNs) have achieved a great success in face recognition, which unfortunately comes at the cost of massive computation and storage consumption. Many compact face recognition networks are thus proposed to resolve this problem. Triplet loss is effective to further improve the performance of those compact models. However, it normally employs a fixed margin to all the sam… ▽ More Convolutional neural networks (CNNs) have achieved a great success in face recognition, which unfortunately comes at the cost of massive computation and storage consumption. Many compact face recognition networks are thus proposed to resolve this problem. Triplet loss is effective to further improve the performance of those compact models. However, it normally employs a fixed margin to all the samples, which neglects the informative similarity structures between different identities. In this paper, we propose an enhanced version of triplet loss, named triplet distillation, which exploits the capability of a teacher model to transfer the similarity information to a small model by adaptively varying the margin between positive and negative pairs. Experiments on LFW, AgeDB, and CPLFW datasets show the merits of our method compared to the original triplet loss. △ Less

Submitted 19 May, 2019; v1 submitted 11 May, 2019; originally announced May 2019.

Comments: 5 pages, 2 tables, accpeted by ICML 2019 ODML-CDNNR Workshop

arXiv:1904.11595 [pdf, other]

DeepPerimeter: Indoor Boundary Estimation from Posed Monocular Sequences

Authors: Ameya Phalak, Zhao Chen, Darvin Yi, Khushi Gupta, Vijay Badrinarayanan, Andrew Rabinovich

Abstract: We present DeepPerimeter, a deep learning based pipeline for inferring a full indoor perimeter (i.e. exterior boundary map) from a sequence of posed RGB images. Our method relies on robust deep methods for depth estimation and wall segmentation to generate an exterior boundary point cloud, and then uses deep unsupervised clustering to fit wall planes to obtain a final boundary map of the room. We… ▽ More We present DeepPerimeter, a deep learning based pipeline for inferring a full indoor perimeter (i.e. exterior boundary map) from a sequence of posed RGB images. Our method relies on robust deep methods for depth estimation and wall segmentation to generate an exterior boundary point cloud, and then uses deep unsupervised clustering to fit wall planes to obtain a final boundary map of the room. We demonstrate that DeepPerimeter results in excellent visual and quantitative performance on the popular ScanNet and FloorNet datasets and works for room shapes of various complexities as well as in multiroom scenarios. We also establish important baselines for future work on indoor perimeter estimation, topics which will become increasingly prevalent as application areas like augmented reality and robotics become more significant. △ Less

Submitted 1 July, 2019; v1 submitted 25 April, 2019; originally announced April 2019.

arXiv:1903.07988 [pdf]

doi 10.1002/jmri.26766

Deep Learning Enables Automatic Detection and Segmentation of Brain Metastases on Multi-Sequence MRI

Authors: Endre Grøvik, Darvin Yi, Michael Iv, Elisabeth Tong, Daniel L. Rubin, Greg Zaharchuk

Abstract: Detecting and segmenting brain metastases is a tedious and time-consuming task for many radiologists, particularly with the growing use of multi-sequence 3D imaging. This study demonstrates automated detection and segmentation of brain metastases on multi-sequence MRI using a deep learning approach based on a fully convolution neural network (CNN). In this retrospective study, a total of 156 patie… ▽ More Detecting and segmenting brain metastases is a tedious and time-consuming task for many radiologists, particularly with the growing use of multi-sequence 3D imaging. This study demonstrates automated detection and segmentation of brain metastases on multi-sequence MRI using a deep learning approach based on a fully convolution neural network (CNN). In this retrospective study, a total of 156 patients with brain metastases from several primary cancers were included. Pre-therapy MR images (1.5T and 3T) included pre- and post-gadolinium T1-weighted 3D fast spin echo, post-gadolinium T1-weighted 3D axial IR-prepped FSPGR, and 3D fluid attenuated inversion recovery. The ground truth was established by manual delineation by two experienced neuroradiologists. CNN training/development was performed using 100 and 5 patients, respectively, with a 2.5D network based on a GoogLeNet architecture. The results were evaluated in 51 patients, equally separated into those with few (1-3), multiple (4-10), and many (>10) lesions. Network performance was evaluated using precision, recall, Dice/F1 score, and ROC-curve statistics. For an optimal probability threshold, detection and segmentation performance was assessed on a per metastasis basis. The area under the ROC-curve (AUC), averaged across all patients, was 0.98. The AUC in the subgroups was 0.99, 0.97, and 0.97 for patients having 1-3, 4-10, and >10 metastases, respectively. Using an average optimal probability threshold determined by the development set, precision, recall, and Dice-score were 0.79, 0.53, and 0.79, respectively. At the same probability threshold, the network showed an average false positive rate of 8.3/patient (no lesion-size limit) and 3.4/patient (10 mm3 lesion size limit). In conclusion, a deep learning approach using multi-sequence MRI can aid in the detection and segmentation of brain metastases. △ Less

Submitted 18 March, 2019; originally announced March 2019.

arXiv:1812.10012 [pdf, ps, other]

doi 10.1109/TCYB.2019.2953564

Joint Embedding Learning and Low-Rank Approximation: A Framework for Incomplete Multi-view Learning

Authors: Hong Tao, Chenping Hou, Dongyun Yi, Jubo Zhu, Dewen Hu

Abstract: In real-world applications, not all instances in multi-view data are fully represented. To deal with incomplete data, Incomplete Multi-view Learning (IML) rises. In this paper, we propose the Joint Embedding Learning and Low-Rank Approximation (JELLA) framework for IML. The JELLA framework approximates the incomplete data by a set of low-rank matrices and learns a full and common embedding by line… ▽ More In real-world applications, not all instances in multi-view data are fully represented. To deal with incomplete data, Incomplete Multi-view Learning (IML) rises. In this paper, we propose the Joint Embedding Learning and Low-Rank Approximation (JELLA) framework for IML. The JELLA framework approximates the incomplete data by a set of low-rank matrices and learns a full and common embedding by linear transformation. Several existing IML methods can be unified as special cases of the framework. More interestingly, some linear transformation based complete multi-view methods can be adapted to IML directly with the guidance of the framework. Thus, the JELLA framework improves the efficiency of processing incomplete multi-view data, and bridges the gap between complete multi-view learning and IML. Moreover, the JELLA framework can provide guidance for developing new algorithms. For illustration, within the framework, we propose the Incomplete Multi-view Learning with Block Diagonal Representation (IML-BDR) method. Assuming that the sampled examples have approximate linear subspace structure, IML-BDR uses the block diagonal structure prior to learn the full embedding, which would lead to more correct clustering. A convergent alternating iterative algorithm with the Successive Over-Relaxation optimization technique is devised for optimization. Experimental results on various datasets demonstrate the effectiveness of IML-BDR. △ Less

Submitted 16 December, 2019; v1 submitted 24 December, 2018; originally announced December 2018.

arXiv:1811.11226 [pdf, ps, other]

CT organ segmentation using GPU data augmentation, unsupervised labels and IOU loss

Authors: Blaine Rister, Darvin Yi, Kaushik Shivakumar, Tomomi Nobashi, Daniel L. Rubin

Abstract: Fully-convolutional neural networks have achieved superior performance in a variety of image segmentation tasks. However, their training requires laborious manual annotation of large datasets, as well as acceleration by parallel processors with high-bandwidth memory, such as GPUs. We show that simple models can achieve competitive accuracy for organ segmentation on CT images when trained with exte… ▽ More Fully-convolutional neural networks have achieved superior performance in a variety of image segmentation tasks. However, their training requires laborious manual annotation of large datasets, as well as acceleration by parallel processors with high-bandwidth memory, such as GPUs. We show that simple models can achieve competitive accuracy for organ segmentation on CT images when trained with extensive data augmentation, which leverages existing graphics hardware to quickly apply geometric and photometric transformations to 3D image data. On 3 mm^3 CT volumes, our GPU implementation is 2.6-8X faster than a widely-used CPU version, including communication overhead. We also show how to automatically generate training labels using rudimentary morphological operations, which are efficiently computed by 3D Fourier transforms. We combined fully-automatic labels for the lungs and bone with semi-automatic ones for the liver, kidneys and bladder, to create a dataset of 130 labeled CT scans. To achieve the best results from data augmentation, our model uses the intersection-over-union (IOU) loss function, a close relative of the Dice loss. We discuss its mathematical properties and explain why it outperforms the usual weighted cross-entropy loss for unbalanced segmentation tasks. We conclude that there is no unique IOU loss function, as the naive one belongs to a broad family of functions with the same essential properties. When combining data augmentation with the IOU loss, our model achieves a Dice score of 78-92% for each organ. The trained model, code and dataset will be made publicly available, to further medical imaging research. △ Less

Submitted 27 November, 2018; originally announced November 2018.

Comments: Journal submission pre-print

arXiv:1806.03018 [pdf, other]

Large-scale Bisample Learning on ID Versus Spot Face Recognition

Authors: Xiangyu Zhu, Hao Liu, Zhen Lei, Hailin Shi, Fan Yang, Dong Yi, Guojun Qi, Stan Z. Li

Abstract: In real-world face recognition applications, there is a tremendous amount of data with two images for each person. One is an ID photo for face enrollment, and the other is a probe photo captured on spot. Most existing methods are designed for training data with limited breadth (a relatively small number of classes) and sufficient depth (many samples for each class). They would meet great challenge… ▽ More In real-world face recognition applications, there is a tremendous amount of data with two images for each person. One is an ID photo for face enrollment, and the other is a probe photo captured on spot. Most existing methods are designed for training data with limited breadth (a relatively small number of classes) and sufficient depth (many samples for each class). They would meet great challenges on ID versus Spot (IvS) data, including the under-represented intra-class variations and an excessive demand on computing devices. In this paper, we propose a deep learning based large-scale bisample learning (LBL) method for IvS face recognition. To tackle the bisample problem with only two samples for each class, a classification-verification-classification (CVC) training strategy is proposed to progressively enhance the IvS performance. Besides, a dominant prototype softmax (DP-softmax) is incorporated to make the deep learning scalable on large-scale classes. We conduct LBL on a IvS face dataset with more than two million identities. Experimental results show the proposed method achieves superior performance to previous ones, validating the effectiveness of LBL on IvS face recognition. △ Less

Submitted 13 February, 2019; v1 submitted 8 June, 2018; originally announced June 2018.

Comments: Accepted by special issue on Deep Learning for Face Analysis. International Journal of Computer Vision (IJCV), 2019

arXiv:1805.07719 [pdf, other]

Balancing Shared Autonomy with Human-Robot Communication

Authors: Rosario Scalise, Yonatan Bisk, Maxwell Forbes, Daqing Yi, Yejin Choi, Siddhartha Srinivasa

Abstract: Robotic agents that share autonomy with a human should leverage human domain knowledge and account for their preferences when completing a task. This extra knowledge can dramatically improve plan efficiency and user-satisfaction, but these gains are lost if communicating with a robot is taxing and unnatural. In this paper, we show how viewing humanrobot language through the lens of shared autonomy… ▽ More Robotic agents that share autonomy with a human should leverage human domain knowledge and account for their preferences when completing a task. This extra knowledge can dramatically improve plan efficiency and user-satisfaction, but these gains are lost if communicating with a robot is taxing and unnatural. In this paper, we show how viewing humanrobot language through the lens of shared autonomy explains the efficiency versus cognitive load trade-offs humans make when deciding how cooperative and explicit to make their instructions. △ Less

Submitted 20 May, 2018; originally announced May 2018.

arXiv:1710.06092 [pdf, other]

Generalizing Informed Sampling for Asymptotically Optimal Sampling-based Kinodynamic Planning via Markov Chain Monte Carlo

Authors: Daqing Yi, Rohan Thakker, Cole Gulino, Oren Salzman, Siddhartha Srinivasa

Abstract: Asymptotically-optimal motion planners such as RRT* have been shown to incrementally approximate the shortest path between start and goal states. Once an initial solution is found, their performance can be dramatically improved by restricting subsequent samples to regions of the state space that can potentially improve the current solution. When the motion planning problem lies in a Euclidean spac… ▽ More Asymptotically-optimal motion planners such as RRT* have been shown to incrementally approximate the shortest path between start and goal states. Once an initial solution is found, their performance can be dramatically improved by restricting subsequent samples to regions of the state space that can potentially improve the current solution. When the motion planning problem lies in a Euclidean space, this region $X_{inf}$, called the informed set, can be sampled directly. However, when planning with differential constraints in non-Euclidean state spaces, no analytic solutions exists to sampling $X_{inf}$ directly. State-of-the-art approaches to sampling $X_{inf}$ in such domains such as Hierarchical Rejection Sampling (HRS) may still be slow in high-dimensional state space. This may cause the planning algorithm to spend most of its time trying to produces samples in $X_{inf}$ rather than explore it. In this paper, we suggest an alternative approach to produce samples in the informed set $X_{inf}$ for a wide range of settings. Our main insight is to recast this problem as one of sampling uniformly within the sub-level-set of an implicit non-convex function. This recasting enables us to apply Monte Carlo sampling methods, used very effectively in the Machine Learning and Optimization communities, to solve our problem. We show for a wide range of scenarios that using our sampler can accelerate the convergence rate to high-quality solutions in high-dimensional problems. △ Less

Submitted 17 October, 2017; originally announced October 2017.

arXiv:1709.05929 [pdf]

Institutionally Distributed Deep Learning Networks

Authors: Ken Chang, Niranjan Balachandar, Carson K Lam, Darvin Yi, James M Brown, Andrew Beers, Bruce R Rosen, Daniel L Rubin, Jayashree Kalpathy-Cramer

Abstract: Deep learning has become a promising approach for automated medical diagnoses. When medical data samples are limited, collaboration among multiple institutions is necessary to achieve high algorithm performance. However, sharing patient data often has limitations due to technical, legal, or ethical concerns. In such cases, sharing a deep learning model is a more attractive alternative. The best me… ▽ More Deep learning has become a promising approach for automated medical diagnoses. When medical data samples are limited, collaboration among multiple institutions is necessary to achieve high algorithm performance. However, sharing patient data often has limitations due to technical, legal, or ethical concerns. In such cases, sharing a deep learning model is a more attractive alternative. The best method of performing such a task is unclear, however. In this study, we simulate the dissemination of learning deep learning network models across four institutions using various heuristics and compare the results with a deep learning model trained on centrally hosted patient data. The heuristics investigated include ensembling single institution models, single weight transfer, and cyclical weight transfer. We evaluated these approaches for image classification in three independent image collections (retinal fundus photos, mammography, and ImageNet). We find that cyclical weight transfer resulted in a performance (testing accuracy = 77.3%) that was closest to that of centrally hosted patient data (testing accuracy = 78.7%). We also found that there is an improvement in the performance of cyclical weight transfer heuristic with high frequency of weight transfer. △ Less

Submitted 10 September, 2017; originally announced September 2017.

arXiv:1705.06362 [pdf, other]

Optimizing and Visualizing Deep Learning for Benign/Malignant Classification in Breast Tumors

Authors: Darvin Yi, Rebecca Lynn Sawyer, David Cohn III, Jared Dunnmon, Carson Lam, Xuerong Xiao, Daniel Rubin

Abstract: Breast cancer has the highest incidence and second highest mortality rate for women in the US. Our study aims to utilize deep learning for benign/malignant classification of mammogram tumors using a subset of cases from the Digital Database of Screening Mammography (DDSM). Though it was a small dataset from the view of Deep Learning (about 1000 patients), we show that currently state of the art ar… ▽ More Breast cancer has the highest incidence and second highest mortality rate for women in the US. Our study aims to utilize deep learning for benign/malignant classification of mammogram tumors using a subset of cases from the Digital Database of Screening Mammography (DDSM). Though it was a small dataset from the view of Deep Learning (about 1000 patients), we show that currently state of the art architectures of deep learning can find a robust signal, even when trained from scratch. Using convolutional neural networks (CNNs), we are able to achieve an accuracy of 85% and an ROC AUC of 0.91, while leading hand-crafted feature based methods are only able to achieve an accuracy of 71%. We investigate an amalgamation of architectures to show that our best result is reached with an ensemble of the lightweight GoogLe Nets tasked with interpreting both the coronal caudal view and the mediolateral oblique view, simply averaging the probability scores of both views to make the final prediction. In addition, we have created a novel method to visualize what features the neural network detects for the benign/malignant classification, and have correlated those features with well known radiological features, such as spiculation. Our algorithm significantly improves existing classification methods for mammography lesions and identifies features that correlate with established clinical markers. △ Less

Submitted 17 May, 2017; originally announced May 2017.

arXiv:1702.05663 [pdf, other]

The Game Imitation: Deep Supervised Convolutional Networks for Quick Video Game AI

Authors: Zhao Chen, Darvin Yi

Abstract: We present a vision-only model for gaming AI which uses a late integration deep convolutional network architecture trained in a purely supervised imitation learning context. Although state-of-the-art deep learning models for video game tasks generally rely on more complex methods such as deep-Q learning, we show that a supervised model which requires substantially fewer resources and training time… ▽ More We present a vision-only model for gaming AI which uses a late integration deep convolutional network architecture trained in a purely supervised imitation learning context. Although state-of-the-art deep learning models for video game tasks generally rely on more complex methods such as deep-Q learning, we show that a supervised model which requires substantially fewer resources and training time can already perform well at human reaction speeds on the N64 classic game Super Smash Bros. We frame our learning task as a 30-class classification problem, and our CNN model achieves 80% top-1 and 95% top-3 validation accuracy. With slight test-time fine-tuning, our model is also competitive during live simulation with the highest-level AI built into the game. We will further show evidence through network visualizations that the network is successfully leveraging temporal information during inference to aid in decision making. Our work demonstrates that supervised CNN models can provide good performance in challenging policy prediction tasks while being significantly simpler and more lightweight than alternatives. △ Less

Submitted 18 February, 2017; originally announced February 2017.

Comments: 11 pages, 12 figures

arXiv:1611.04534 [pdf, other]

3-D Convolutional Neural Networks for Glioblastoma Segmentation

Authors: Darvin Yi, Mu Zhou, Zhao Chen, Olivier Gevaert

Abstract: Convolutional Neural Networks (CNN) have emerged as powerful tools for learning discriminative image features. In this paper, we propose a framework of 3-D fully CNN models for Glioblastoma segmentation from multi-modality MRI data. By generalizing CNN models to true 3-D convolutions in learning 3-D tumor MRI data, the proposed approach utilizes a unique network architecture to decouple image pixe… ▽ More Convolutional Neural Networks (CNN) have emerged as powerful tools for learning discriminative image features. In this paper, we propose a framework of 3-D fully CNN models for Glioblastoma segmentation from multi-modality MRI data. By generalizing CNN models to true 3-D convolutions in learning 3-D tumor MRI data, the proposed approach utilizes a unique network architecture to decouple image pixels. Specifically, we design a convolutional layer with pre-defined Difference- of-Gaussian (DoG) filters to perform true 3-D convolution incorporating local neighborhood information at each pixel. We then use three trained convolutional layers that act to decouple voxels from the initial 3-D convolution. The proposed framework allows identification of high-level tumor structures on MRI. We evaluate segmentation performance on the BRATS segmentation dataset with 274 tumor samples. Extensive experimental results demonstrate encouraging performance of the proposed approach comparing to the state-of-the-art methods. Our data-driven approach achieves a median Dice score accuracy of 89% in whole tumor glioblastoma segmentation, revealing a generalized low-bias possibility to learn from medium-size MRI datasets. △ Less

Submitted 14 November, 2016; originally announced November 2016.

arXiv:1607.04884 [pdf, other]

doi 10.1007/s11192-017-2359-1

Modeling the coevolution between citations and coauthorships in scientific papers

Authors: Zheng Xie, Zonglin Xie, Miao Li, Jianping Li, Dongyun Yi

Abstract: Collaborations and citations within scientific research grow simultaneously and interact dynamically. Modelling the coevolution between them helps to study many phenomena that can be approached only through combining citation and coauthorship data. A geometric graph for the coevolution is proposed, the mechanism of which synthetically expresses the interactive impacts of authors and papers in a ge… ▽ More Collaborations and citations within scientific research grow simultaneously and interact dynamically. Modelling the coevolution between them helps to study many phenomena that can be approached only through combining citation and coauthorship data. A geometric graph for the coevolution is proposed, the mechanism of which synthetically expresses the interactive impacts of authors and papers in a geometrical way. The model is validated against a data set of papers published on PNAS during 2007-2015. The validation shows the ability to reproduce a range of features observed with citation and coauthorship data combined and separately. Particularly, in the empirical distribution of citations per author there exist two limits, in which the distribution appears as a generalized Poisson and a power-law respectively. Our model successfully reproduces the shape of the distribution, and provides an explanation for how the shape emerges via the decisions of authors. The model also captures the empirically positive correlations between the numbers of authors' papers, citations and collaborators. △ Less

Submitted 28 September, 2017; v1 submitted 17 July, 2016; originally announced July 2016.

Journal ref: Scientometrics (2017) 112: 483-507

arXiv:1604.08891 [pdf, other]

doi 10.1002/asi.23935

Modelling transition phenomena of scientific coauthorship networks

Authors: Zheng Xie, Enming Dong, Dongyun Yi, Ouyang Zhenzheng, Jianping Li

Abstract: In a range of scientific coauthorship networks, transitions emerge in degree distributions, correlations between degrees and local clustering coefficients, etc. The existence of those transitions could be regarded as a result of the diversity in collaboration behaviours of scientific fields. A growing geometric hypergraph built on a cluster of concentric circles is proposed to model two specific c… ▽ More In a range of scientific coauthorship networks, transitions emerge in degree distributions, correlations between degrees and local clustering coefficients, etc. The existence of those transitions could be regarded as a result of the diversity in collaboration behaviours of scientific fields. A growing geometric hypergraph built on a cluster of concentric circles is proposed to model two specific collaboration behaviours, namely the behaviour of leaders and that of other members in research teams. The model successfully predicts the transitions, as well as many common features of coauthorship networks. Particulary, it realizes a process of deriving the complex "scale-free" property from the simple "yes/no" experiments. Moreover, it gives a reasonable explanation for the emergence of transitions with the difference of collaboration behaviours between leaders and other members. The difference emerges in the evolution of research teams, which synthetically addresses several specific factors of generating collaborations, namely the communications between research teams, the academic impacts and homophily of authors. △ Less

Submitted 15 June, 2018; v1 submitted 29 April, 2016; originally announced April 2016.

Comments: 19 Pages, 8 figures

Journal ref: Journal of the Association for Information Science & Technology, 69(2):305-317 (2016)

arXiv:1504.05408 [pdf, ps, other]

Effective Discriminative Feature Selection with Non-trivial Solutions

Authors: Hong Tao, Chenping Hou, Feiping Nie, Yuanyuan Jiao, Dongyun Yi

Abstract: Feature selection and feature transformation, the two main ways to reduce dimensionality, are often presented separately. In this paper, a feature selection method is proposed by combining the popular transformation based dimensionality reduction method Linear Discriminant Analysis (LDA) and sparsity regularization. We impose row sparsity on the transformation matrix of LDA through ${\ell}_{2,1}$-… ▽ More Feature selection and feature transformation, the two main ways to reduce dimensionality, are often presented separately. In this paper, a feature selection method is proposed by combining the popular transformation based dimensionality reduction method Linear Discriminant Analysis (LDA) and sparsity regularization. We impose row sparsity on the transformation matrix of LDA through ${\ell}_{2,1}$-norm regularization to achieve feature selection, and the resultant formulation optimizes for selecting the most discriminative features and removing the redundant ones simultaneously. The formulation is extended to the ${\ell}_{2,p}$-norm regularized case: which is more likely to offer better sparsity when $0<p<1$. Thus the formulation is a better approximation to the feature selection problem. An efficient algorithm is developed to solve the ${\ell}_{2,p}$-norm based optimization problem and it is proved that the algorithm converges when $0<p\le 2$. Systematical experiments are conducted to understand the work of the proposed method. Promising experimental results on various types of real-world data sets demonstrate the effectiveness of our algorithm. △ Less

Submitted 21 April, 2015; originally announced April 2015.

arXiv:1504.02351 [pdf, other]

When Face Recognition Meets with Deep Learning: an Evaluation of Convolutional Neural Networks for Face Recognition

Authors: Guosheng Hu, Yongxin Yang, Dong Yi, Josef Kittler, William Christmas, Stan Z. Li, Timothy Hospedales

Abstract: Deep learning, in particular Convolutional Neural Network (CNN), has achieved promising results in face recognition recently. However, it remains an open question: why CNNs work well and how to design a 'good' architecture. The existing works tend to focus on reporting CNN architectures that work well for face recognition rather than investigate the reason. In this work, we conduct an extensive ev… ▽ More Deep learning, in particular Convolutional Neural Network (CNN), has achieved promising results in face recognition recently. However, it remains an open question: why CNNs work well and how to design a 'good' architecture. The existing works tend to focus on reporting CNN architectures that work well for face recognition rather than investigate the reason. In this work, we conduct an extensive evaluation of CNN-based face recognition systems (CNN-FRS) on a common ground to make our work easily reproducible. Specifically, we use public database LFW (Labeled Faces in the Wild) to train CNNs, unlike most existing CNNs trained on private databases. We propose three CNN architectures which are the first reported architectures trained using LFW data. This paper quantitatively compares the architectures of CNNs and evaluate the effect of different implementation choices. We identify several useful properties of CNN-FRS. For instance, the dimensionality of the learned features can be significantly reduced without adverse effect on face recognition accuracy. In addition, traditional metric learning method exploiting CNN-learned features is evaluated. Experiments show two crucial factors to good CNN-FRS performance are the fusion of multiple CNNs and metric learning. To make our work reproducible, source code and models will be made publicly available. △ Less

Submitted 9 April, 2015; originally announced April 2015.

Comments: 7 pages, 4 figures, 7 tables

arXiv:1411.7923 [pdf, other]

Learning Face Representation from Scratch

Authors: Dong Yi, Zhen Lei, Shengcai Liao, Stan Z. Li

Abstract: Pushing by big data and deep convolutional neural network (CNN), the performance of face recognition is becoming comparable to human. Using private large scale training datasets, several groups achieve very high performance on LFW, i.e., 97% to 99%. While there are many open source implementations of CNN, none of large scale face dataset is publicly available. The current situation in the field of… ▽ More Pushing by big data and deep convolutional neural network (CNN), the performance of face recognition is becoming comparable to human. Using private large scale training datasets, several groups achieve very high performance on LFW, i.e., 97% to 99%. While there are many open source implementations of CNN, none of large scale face dataset is publicly available. The current situation in the field of face recognition is that data is more important than algorithm. To solve this problem, this paper proposes a semi-automatical way to collect face images from Internet and builds a large scale dataset containing about 10,000 subjects and 500,000 images, called CASIAWebFace. Based on the database, we use a 11-layer CNN to learn discriminative representation and obtain state-of-theart accuracy on LFW and YTF. The publication of CASIAWebFace will attract more research groups entering this field and accelerate the development of face recognition in the wild. △ Less

Submitted 28 November, 2014; originally announced November 2014.

arXiv:1407.4979 [pdf, other]

Deep Metric Learning for Practical Person Re-Identification

Authors: Dong Yi, Zhen Lei, Stan Z. Li

Abstract: Various hand-crafted features and metric learning methods prevail in the field of person re-identification. Compared to these methods, this paper proposes a more general way that can learn a similarity metric from image pixels directly. By using a "siamese" deep neural network, the proposed method can jointly learn the color feature, texture feature and metric in a unified framework. The network h… ▽ More Various hand-crafted features and metric learning methods prevail in the field of person re-identification. Compared to these methods, this paper proposes a more general way that can learn a similarity metric from image pixels directly. By using a "siamese" deep neural network, the proposed method can jointly learn the color feature, texture feature and metric in a unified framework. The network has a symmetry structure with two sub-networks which are connected by Cosine function. To deal with the big variations of person images, binomial deviance is used to evaluate the cost between similarities and labels, which is proved to be robust to outliers. Compared to existing researches, a more practical setting is studied in the experiments that is training and test on different datasets (cross dataset person re-identification). Both in "intra dataset" and "cross dataset" settings, the superiorities of the proposed method are illustrated on VIPeR and PRID. △ Less

Submitted 18 July, 2014; originally announced July 2014.

arXiv:1406.1247 [pdf, other]

Shared Representation Learning for Heterogeneous Face Recognition

Authors: Dong Yi, Zhen Lei, Shengcai Liao, Stan Z. Li

Abstract: After intensive research, heterogenous face recognition is still a challenging problem. The main difficulties are owing to the complex relationship between heterogenous face image spaces. The heterogeneity is always tightly coupled with other variations, which makes the relationship of heterogenous face images highly nonlinear. Many excellent methods have been proposed to model the nonlinear relat… ▽ More After intensive research, heterogenous face recognition is still a challenging problem. The main difficulties are owing to the complex relationship between heterogenous face image spaces. The heterogeneity is always tightly coupled with other variations, which makes the relationship of heterogenous face images highly nonlinear. Many excellent methods have been proposed to model the nonlinear relationship, but they apt to overfit to the training set, due to limited samples. Inspired by the unsupervised algorithms in deep learning, this paper proposes an novel framework for heterogeneous face recognition. We first extract Gabor features at some localized facial points, and then use Restricted Boltzmann Machines (RBMs) to learn a shared representation locally to remove the heterogeneity around each facial point. Finally, the shared representations of local RBMs are connected together and processed by PCA. Two problems (Sketch-Photo and NIR-VIS) and three databases are selected to evaluate the proposed method. For Sketch-Photo problem, we obtain perfect results on the CUFS database. For NIR-VIS problem, we produce new state-of-the-art performance on the CASIA HFB and NIR-VIS 2.0 databases. △ Less

Submitted 4 June, 2014; originally announced June 2014.

arXiv:1307.1253 [pdf, other]

doi 10.1103/PhysRevE.89.042811

Network robustness of multiplex networks with interlayer degree correlations

Authors: Byungjoon Min, Su Do Yi, Kyu-Min Lee, K. -I. Goh

Abstract: We study the robustness properties of multiplex networks consisting of multiple layers of distinct types of links, focusing on the role of correlations between degrees of a node in different layers. We use generating function formalism to address various notions of the network robustness relevant to multiplex networks such as the resilience of ordinary- and mutual connectivity under random or targ… ▽ More We study the robustness properties of multiplex networks consisting of multiple layers of distinct types of links, focusing on the role of correlations between degrees of a node in different layers. We use generating function formalism to address various notions of the network robustness relevant to multiplex networks such as the resilience of ordinary- and mutual connectivity under random or targeted node removals as well as the biconnectivity. We found that correlated coupling can affect the structural robustness of multiplex networks in diverse fashion. For example, for maximally-correlated duplex networks, all pairs of nodes in the giant component are connected via at least two independent paths and network structure is highly resilient to random failure. In contrast, anti-correlated duplex networks are on one hand robust against targeted attack on high-degree nodes, but on the other hand they can be vulnerable to random failure. △ Less

Submitted 8 April, 2014; v1 submitted 4 July, 2013; originally announced July 2013.

Comments: 9 pages, 9 figures, accepted for publication in Phys. Rev. E

Journal ref: Phys. Rev. E 89, 042811 (2014)

arXiv:1302.7180 [pdf, other]

Fast Matching by 2 Lines of Code for Large Scale Face Recognition Systems

Authors: Dong Yi, Zhen Lei, Yang Hu, Stan Z. Li

Abstract: In this paper, we propose a method to apply the popular cascade classifier into face recognition to improve the computational efficiency while keeping high recognition rate. In large scale face recognition systems, because the probability of feature templates coming from different subjects is very high, most of the matching pairs will be rejected by the early stages of the cascade. Therefore, the… ▽ More In this paper, we propose a method to apply the popular cascade classifier into face recognition to improve the computational efficiency while keeping high recognition rate. In large scale face recognition systems, because the probability of feature templates coming from different subjects is very high, most of the matching pairs will be rejected by the early stages of the cascade. Therefore, the cascade can improve the matching speed significantly. On the other hand, using the nested structure of the cascade, we could drop some stages at the end of feature to reduce the memory and bandwidth usage in some resources intensive system while not sacrificing the performance too much. The cascade is learned by two steps. Firstly, some kind of prepared features are grouped into several nested stages. And then, the threshold of each stage is learned to achieve user defined verification rate (VR). In the paper, we take a landmark based Gabor+LDA face recognition system as baseline to illustrate the process and advantages of the proposed method. However, the use of this method is very generic and not limited in face recognition, which can be easily generalized to other biometrics as a post-processing module. Experiments on the FERET database show the good performance of our baseline and an experiment on a self-collected large scale database illustrates that the cascade can improve the matching speed significantly. △ Less

Submitted 28 February, 2013; originally announced February 2013.

Showing 1–39 of 39 results for author: Yi, D