subscribe to arXiv mailings

Aligning Large Language Models from Self-Reference AI Feedback with one General Principle

Authors: Rong Bao, Rui Zheng, Shihan Dou, Xiao Wang, Enyu Zhou, Bo Wang, Qi Zhang, Liang Ding, Dacheng Tao

Abstract: In aligning large language models (LLMs), utilizing feedback from existing advanced AI rather than humans is an important method to scale supervisory signals. However, it is highly challenging for AI to understand human intentions and societal values, and provide accurate preference feedback based on these. Current AI feedback methods rely on powerful LLMs, carefully designed specific principles t… ▽ More In aligning large language models (LLMs), utilizing feedback from existing advanced AI rather than humans is an important method to scale supervisory signals. However, it is highly challenging for AI to understand human intentions and societal values, and provide accurate preference feedback based on these. Current AI feedback methods rely on powerful LLMs, carefully designed specific principles to describe human intentions, and are easily influenced by position bias. To address these issues, we propose a self-reference-based AI feedback framework that enables a 13B Llama2-Chat to provide high-quality feedback under simple and general principles such as ``best for humanity``. Specifically, we allow the AI to first respond to the user's instructions, then generate criticism of other answers based on its own response as a reference, and finally determine which answer better fits human preferences according to the criticism. Additionally, we use a self-consistency method to further reduce the impact of position bias, and employ semantic perplexity to calculate the preference strength differences between different answers. Experimental results show that our method enables 13B and 70B Llama2-Chat annotators to provide high-quality preference feedback, and the policy models trained based on these preference data achieve significant advantages in benchmark datasets through reinforcement learning. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 19 pages, 3 figures

arXiv:2405.06275 [pdf, other]

Pruning as a Domain-specific LLM Extractor

Authors: Nan Zhang, Yanchi Liu, Xujiang Zhao, Wei Cheng, Runxue Bao, Rui Zhang, Prasenjit Mitra, Haifeng Chen

Abstract: Large Language Models (LLMs) have exhibited remarkable proficiency across a wide array of NLP tasks. However, the escalation in model size also engenders substantial deployment costs. While few efforts have explored model pruning techniques to reduce the size of LLMs, they mainly center on general or task-specific weights. This leads to suboptimal performance due to lacking specificity on the targ… ▽ More Large Language Models (LLMs) have exhibited remarkable proficiency across a wide array of NLP tasks. However, the escalation in model size also engenders substantial deployment costs. While few efforts have explored model pruning techniques to reduce the size of LLMs, they mainly center on general or task-specific weights. This leads to suboptimal performance due to lacking specificity on the target domain or generality on different tasks when applied to domain-specific challenges. This work introduces an innovative unstructured dual-pruning methodology, D-Pruner, for domain-specific compression on LLM. It extracts a compressed, domain-specific, and task-agnostic LLM by identifying LLM weights that are pivotal for general capabilities, like linguistic capability and multi-task solving, and domain-specific knowledge. More specifically, we first assess general weight importance by quantifying the error incurred upon their removal with the help of an open-domain calibration dataset. Then, we utilize this general weight importance to refine the training loss, so that it preserves generality when fitting into a specific domain. Moreover, by efficiently approximating weight importance with the refined training loss on a domain-specific calibration dataset, we obtain a pruned model emphasizing generality and specificity. Our comprehensive experiments across various tasks in healthcare and legal domains show the effectiveness of D-Pruner in domain-specific compression. Our code is available at https://github.com/psunlpgroup/D-Pruner. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: NAACL 2024 Findings

arXiv:2403.16055 [pdf, other]

Modal-adaptive Knowledge-enhanced Graph-based Financial Prediction from Monetary Policy Conference Calls with LLM

Authors: Kun Ouyang, Yi Liu, Shicheng Li, Ruihan Bao, Keiko Harimoto, Xu Sun

Abstract: Financial prediction from Monetary Policy Conference (MPC) calls is a new yet challenging task, which targets at predicting the price movement and volatility for specific financial assets by analyzing multimodal information including text, video, and audio. Although the existing work has achieved great success using cross-modal transformer blocks, it overlooks the potential external financial know… ▽ More Financial prediction from Monetary Policy Conference (MPC) calls is a new yet challenging task, which targets at predicting the price movement and volatility for specific financial assets by analyzing multimodal information including text, video, and audio. Although the existing work has achieved great success using cross-modal transformer blocks, it overlooks the potential external financial knowledge, the varying contributions of different modalities to financial prediction, as well as the innate relations among different financial assets. To tackle these limitations, we propose a novel Modal-Adaptive kNowledge-enhAnced Graph-basEd financial pRediction scheme, named MANAGER. Specifically, MANAGER resorts to FinDKG to obtain the external related knowledge for the input text. Meanwhile, MANAGER adopts BEiT-3 and Hidden-unit BERT (HuBERT) to extract the video and audio features, respectively. Thereafter, MANAGER introduces a novel knowledge-enhanced cross-modal graph that fully characterizes the semantic relations among text, external knowledge, video and audio, to adaptively utilize the information in different modalities, with ChatGLM2 as the backbone. Extensive experiments on a publicly available dataset Monopoly verify the superiority of our model over cutting-edge methods. △ Less

Submitted 21 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

Comments: Accepted by LREC Coling 2024 -FinNLP (oral)

arXiv:2403.14729 [pdf, other]

Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch

Authors: Xidong Wu, Shangqian Gao, Zeyu Zhang, Zhenzhen Li, Runxue Bao, Yanfu Zhang, Xiaoqian Wang, Heng Huang

Abstract: Current techniques for deep neural network (DNN) pruning often involve intricate multi-step processes that require domain-specific expertise, making their widespread adoption challenging. To address the limitation, the Only-Train-Once (OTO) and OTOv2 are proposed to eliminate the need for additional fine-tuning steps by directly training and compressing a general DNN from scratch. Nevertheless, th… ▽ More Current techniques for deep neural network (DNN) pruning often involve intricate multi-step processes that require domain-specific expertise, making their widespread adoption challenging. To address the limitation, the Only-Train-Once (OTO) and OTOv2 are proposed to eliminate the need for additional fine-tuning steps by directly training and compressing a general DNN from scratch. Nevertheless, the static design of optimizers (in OTO) can lead to convergence issues of local optima. In this paper, we proposed the Auto-Train-Once (ATO), an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs. During the model training phase, our approach not only trains the target model but also leverages a controller network as an architecture generator to guide the learning of target model weights. Furthermore, we developed a novel stochastic gradient algorithm that enhances the coordination between model training and controller network training, thereby improving pruning performance. We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures (including ResNet18, ResNet34, ResNet50, ResNet56, and MobileNetv2) on standard benchmark datasets (CIFAR-10, CIFAR-100, and ImageNet). △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2402.16513 [pdf]

Photonic Neural Network Fabricated on Thin Film Lithium Niobate for High-Fidelity and Power-Efficient Matrix Computation

Authors: Yong Zheng, Rongbo Wu, Yuan Ren, Rui Bao, Jian Liu, Yu Ma, Min Wang, Ya Cheng

Abstract: Photonic neural networks (PNNs) have emerged as a promising platform to address the energy consumption issue that comes with the advancement of artificial intelligence technology, and thin film lithium niobate (TFLN) offers an attractive solution as a material platform mainly for its combined characteristics of low optical loss and large electro-optic (EO) coefficients. Here, we present the first… ▽ More Photonic neural networks (PNNs) have emerged as a promising platform to address the energy consumption issue that comes with the advancement of artificial intelligence technology, and thin film lithium niobate (TFLN) offers an attractive solution as a material platform mainly for its combined characteristics of low optical loss and large electro-optic (EO) coefficients. Here, we present the first implementation of an EO tunable PNN based on the TFLN platform. Our device features ultra-high fidelity, high computation speed, and exceptional power efficiency. We benchmark the performance of our device with several deep learning missions including in-situ training of Circle and Moons nonlinear datasets classification, Iris flower species recognition, and handwriting digits recognition. Our work paves the way for sustainable up-scaling of high-speed, energy-efficient PNNs. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: 27 pages,10 figures

arXiv:2402.11441 [pdf, other]

InfuserKI: Enhancing Large Language Models with Knowledge Graphs via Infuser-Guided Knowledge Integration

Authors: Fali Wang, Runxue Bao, Suhang Wang, Wenchao Yu, Yanchi Liu, Wei Cheng, Haifeng Chen

Abstract: Though Large Language Models (LLMs) have shown remarkable open-generation capabilities across diverse domains, they struggle with knowledge-intensive tasks. To alleviate this issue, knowledge integration methods have been proposed to enhance LLMs with domain-specific knowledge graphs using external modules. However, they suffer from data inefficiency as they require both known and unknown knowledg… ▽ More Though Large Language Models (LLMs) have shown remarkable open-generation capabilities across diverse domains, they struggle with knowledge-intensive tasks. To alleviate this issue, knowledge integration methods have been proposed to enhance LLMs with domain-specific knowledge graphs using external modules. However, they suffer from data inefficiency as they require both known and unknown knowledge for fine-tuning. Thus, we study a novel problem of integrating unknown knowledge into LLMs efficiently without unnecessary overlap of known knowledge. Injecting new knowledge poses the risk of forgetting previously acquired knowledge. To tackle this, we propose a novel Infuser-Guided Knowledge Integration (InfuserKI) framework that utilizes transformer internal states to determine whether to enhance the original LLM output with additional information, thereby effectively mitigating knowledge forgetting. Evaluations on the UMLS-2.5k and MetaQA domain knowledge graphs demonstrate that InfuserKI can effectively acquire new knowledge and outperform state-of-the-art baselines by 9% and 6%, respectively, in reducing knowledge forgetting. △ Less

Submitted 17 February, 2024; originally announced February 2024.

Comments: 12 pages, 5 figures

arXiv:2402.09345 [pdf, other]

InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modeling

Authors: Yuchun Miao, Sen Zhang, Liang Ding, Rong Bao, Lefei Zhang, Dacheng Tao

Abstract: Despite the success of reinforcement learning from human feedback (RLHF) in aligning language models with human values, reward hacking, also termed reward overoptimization, remains a critical challenge. This issue primarily arises from reward misgeneralization, where reward models (RMs) compute reward using spurious features that are irrelevant to human preferences. In this work, we tackle this pr… ▽ More Despite the success of reinforcement learning from human feedback (RLHF) in aligning language models with human values, reward hacking, also termed reward overoptimization, remains a critical challenge. This issue primarily arises from reward misgeneralization, where reward models (RMs) compute reward using spurious features that are irrelevant to human preferences. In this work, we tackle this problem from an information-theoretic perspective and propose a framework for reward modeling, namely InfoRM, by introducing a variational information bottleneck objective to filter out irrelevant information. Notably, we further identify a correlation between overoptimization and outliers in the IB latent space of InfoRM, establishing it as a promising tool for detecting reward overoptimization. Inspired by this finding, we propose the Cluster Separation Index (CSI), which quantifies deviations in the IB latent space, as an indicator of reward overoptimization to facilitate the development of online mitigation strategies. Extensive experiments on a wide range of settings and RM scales (70M, 440M, 1.4B, and 7B) demonstrate the effectiveness of InfoRM. Further analyses reveal that InfoRM's overoptimization detection mechanism is not only effective but also robust across a broad range of datasets, signifying a notable advancement in the field of RLHF. The code will be released upon acceptance. △ Less

Submitted 23 May, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

Comments: 35 pages, 28 figures

arXiv:2402.01987 [pdf, other]

Online Transfer Learning for RSV Case Detection

Authors: Yiming Sun, Yuhe Gao, Runxue Bao, Gregory F. Cooper, Jessi Espino, Harry Hochheiser, Marian G. Michaels, John M. Aronis, Chenxi Song, Ye Ye

Abstract: Transfer learning has become a pivotal technique in machine learning and has proven to be effective in various real-world applications. However, utilizing this technique for classification tasks with sequential data often faces challenges, primarily attributed to the scarcity of class labels. To address this challenge, we introduce Multi-Source Adaptive Weighting (MSAW), an online multi-source tra… ▽ More Transfer learning has become a pivotal technique in machine learning and has proven to be effective in various real-world applications. However, utilizing this technique for classification tasks with sequential data often faces challenges, primarily attributed to the scarcity of class labels. To address this challenge, we introduce Multi-Source Adaptive Weighting (MSAW), an online multi-source transfer learning method. MSAW integrates a dynamic weighting mechanism into an ensemble framework, enabling automatic adjustment of weights based on the relevance and contribution of each source (representing historical knowledge) and target model (learning from newly acquired data). We demonstrate the effectiveness of MSAW by applying it to detect Respiratory Syncytial Virus cases within Emergency Department visits, utilizing multiple years of electronic health records from the University of Pittsburgh Medical Center. Our method demonstrates performance improvements over many baselines, including refining pre-trained models with online learning as well as three static weighting approaches, showing MSAW's capacity to integrate historical knowledge with progressively accumulated new data. This study indicates the potential of online transfer learning in healthcare, particularly for developing machine learning models that dynamically adapt to evolving situations where new data is incrementally accumulated. △ Less

Submitted 7 April, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: 10 pages, 2 figures

arXiv:2311.06276 [pdf, other]

Enhancing the machine vision performance with multi-spectral light sources

Authors: Feng Zhang, Rui Bao, Congqi Dai, Wanlu Zhang, Shu Liu, Ruiqian Guo

Abstract: This study mainly focuses on the performance of different multi-spectral light sources on different object colors in machine vision and tries to enhance machine vision with multi-spectral light sources. Using different color pencils as samples, by recognizing the collected images with two classical neural networks, AlexNet and VGG19, the performance was investigated under 35 different multi-spectr… ▽ More This study mainly focuses on the performance of different multi-spectral light sources on different object colors in machine vision and tries to enhance machine vision with multi-spectral light sources. Using different color pencils as samples, by recognizing the collected images with two classical neural networks, AlexNet and VGG19, the performance was investigated under 35 different multi-spectral light sources. The results show that for both models there are always some non-pure white light sources, whose accuracy is better than pure white light, which suggests the potential of multi-spectral light sources to further enhance the effectiveness of machine vision. The comparison of both models is also performed, and surprised to find that the overall performance of VGG19 is lower than that of AlexNet, which shows that the importance of the choice of multi-spectral light sources and models. △ Less

Submitted 20 October, 2023; originally announced November 2023.

Comments: 12 pages, 7 figures

arXiv:2310.14152 [pdf, other]

Orthogonal Subspace Learning for Language Model Continual Learning

Authors: Xiao Wang, Tianze Chen, Qiming Ge, Han Xia, Rong Bao, Rui Zheng, Qi Zhang, Tao Gui, Xuanjing Huang

Abstract: Benefiting from massive corpora and advanced hardware, large language models (LLMs) exhibit remarkable capabilities in language understanding and generation. However, their performance degrades in scenarios where multiple tasks are encountered sequentially, also known as catastrophic forgetting. In this paper, we propose orthogonal low-rank adaptation (O-LoRA), a simple and efficient approach for… ▽ More Benefiting from massive corpora and advanced hardware, large language models (LLMs) exhibit remarkable capabilities in language understanding and generation. However, their performance degrades in scenarios where multiple tasks are encountered sequentially, also known as catastrophic forgetting. In this paper, we propose orthogonal low-rank adaptation (O-LoRA), a simple and efficient approach for continual learning in language models, effectively mitigating catastrophic forgetting while learning new tasks. Specifically, O-LoRA learns tasks in different (low-rank) vector subspaces that are kept orthogonal to each other in order to minimize interference. Our method induces only marginal additional parameter costs and requires no user data storage for replay. Experimental results on continual learning benchmarks show that our method outperforms state-of-the-art methods. Furthermore, compared to previous approaches, our method excels in preserving the generalization ability of LLMs on unseen tasks. △ Less

Submitted 21 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 findings

arXiv:2310.08459 [pdf, other]

A Survey of Heterogeneous Transfer Learning

Authors: Runxue Bao, Yiming Sun, Yuhe Gao, Jindong Wang, Qiang Yang, Haifeng Chen, Zhi-Hong Mao, Ye Ye

Abstract: The application of transfer learning, an approach utilizing knowledge from a source domain to enhance model performance in a target domain, has seen a tremendous rise in recent years, underpinning many real-world scenarios. The key to its success lies in the shared common knowledge between the domains, a prerequisite in most transfer learning methodologies. These methods typically presuppose ident… ▽ More The application of transfer learning, an approach utilizing knowledge from a source domain to enhance model performance in a target domain, has seen a tremendous rise in recent years, underpinning many real-world scenarios. The key to its success lies in the shared common knowledge between the domains, a prerequisite in most transfer learning methodologies. These methods typically presuppose identical feature spaces and label spaces in both domains, known as homogeneous transfer learning, which, however, is not always a practical assumption. Oftentimes, the source and target domains vary in feature spaces, data distributions, and label spaces, making it challenging or costly to secure source domain data with identical feature and label spaces as the target domain. Arbitrary elimination of these differences is not always feasible or optimal. Thus, heterogeneous transfer learning, acknowledging and dealing with such disparities, has emerged as a promising approach for a variety of tasks. Despite the existence of a survey in 2017 on this topic, the fast-paced advances post-2017 necessitate an updated, in-depth review. We therefore present a comprehensive survey of recent developments in heterogeneous transfer learning methods, offering a systematic guide for future research. Our paper reviews methodologies for diverse learning scenarios, discusses the limitations of current studies, and covers various application contexts, including Natural Language Processing, Computer Vision, Multimodality, and Biomedicine, to foster a deeper understanding and spur future research. △ Less

Submitted 15 October, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2309.05608 [pdf, other]

Incorporating Pre-trained Model Prompting in Multimodal Stock Volume Movement Prediction

Authors: Ruibo Chen, Zhiyuan Zhang, Yi Liu, Ruihan Bao, Keiko Harimoto, Xu Sun

Abstract: Multimodal stock trading volume movement prediction with stock-related news is one of the fundamental problems in the financial area. Existing multimodal works that train models from scratch face the problem of lacking universal knowledge when modeling financial news. In addition, the models ability may be limited by the lack of domain-related knowledge due to insufficient data in the datasets. To… ▽ More Multimodal stock trading volume movement prediction with stock-related news is one of the fundamental problems in the financial area. Existing multimodal works that train models from scratch face the problem of lacking universal knowledge when modeling financial news. In addition, the models ability may be limited by the lack of domain-related knowledge due to insufficient data in the datasets. To handle this issue, we propose the Prompt-based MUltimodal Stock volumE prediction model (ProMUSE) to process text and time series modalities. We use pre-trained language models for better comprehension of financial news and adopt prompt learning methods to leverage their capability in universal knowledge to model textual information. Besides, simply fusing two modalities can cause harm to the unimodal representations. Thus, we propose a novel cross-modality contrastive alignment while reserving the unimodal heads beside the fusion head to mitigate this problem. Extensive experiments demonstrate that our proposed ProMUSE outperforms existing baselines. Comprehensive analyses further validate the effectiveness of our architecture compared to potential variants and learning mechanisms. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: 9 pages, 3 figures, 7 tables. Accepted by 2023 KDD Workshop on Machine Learning in Finance

arXiv:2307.06520 [pdf]

doi 10.1109/ACCESS.2024.3360412

A Framework for Migrating to Post-Quantum Cryptography: Security Dependency Analysis and Case Studies

Authors: Khondokar Fida Hasan, Leonie Simpson, Mir Ali Rezazadeh Baee, Chadni Islam, Ziaur Rahman, Warren Armstrong, Praveen Gauravaram, Matthew McKague

Abstract: Quantum computing is emerging as a significant threat to information protected by widely used cryptographic systems. Cryptographic methods, once deemed secure for decades, are now at risk of being compromised, posing a massive threat to the security of sensitive data and communications across enterprises worldwide. As a result, there is an urgent need to migrate to quantum-resistant cryptographic… ▽ More Quantum computing is emerging as a significant threat to information protected by widely used cryptographic systems. Cryptographic methods, once deemed secure for decades, are now at risk of being compromised, posing a massive threat to the security of sensitive data and communications across enterprises worldwide. As a result, there is an urgent need to migrate to quantum-resistant cryptographic systems. This is no simple task. Migrating to a quantum-safe state is a complex process, and many organisations lack the in-house expertise to navigate this transition without guidance. In this paper, we present a comprehensive framework designed to assist enterprises with this migration. Our framework outlines essential steps involved in the cryptographic migration process, and leverages existing organisational inventories. The framework facilitates the efficient identification of cryptographic assets and can be integrated with other enterprise frameworks smoothly. To underscore its practicality and effectiveness, we have incorporated case studies that utilise graph-theoretic techniques to pinpoint and assess cryptographic dependencies. This is useful in prioritising crypto-systems for replacement. △ Less

Submitted 21 February, 2024; v1 submitted 12 July, 2023; originally announced July 2023.

Comments: 24 Pages

arXiv:2306.17257 [pdf, other]

Prediction of COVID-19 Patients' Emergency Room Revisit using Multi-Source Transfer Learning

Authors: Yuelyu Ji, Yuhe Gao, Runxue Bao, Qi Li, Disheng Liu, Yiming Sun, Ye Ye

Abstract: The coronavirus disease 2019 (COVID-19) has led to a global pandemic of significant severity. In addition to its high level of contagiousness, COVID-19 can have a heterogeneous clinical course, ranging from asymptomatic carriers to severe and potentially life-threatening health complications. Many patients have to revisit the emergency room (ER) within a short time after discharge, which significa… ▽ More The coronavirus disease 2019 (COVID-19) has led to a global pandemic of significant severity. In addition to its high level of contagiousness, COVID-19 can have a heterogeneous clinical course, ranging from asymptomatic carriers to severe and potentially life-threatening health complications. Many patients have to revisit the emergency room (ER) within a short time after discharge, which significantly increases the workload for medical staff. Early identification of such patients is crucial for helping physicians focus on treating life-threatening cases. In this study, we obtained Electronic Health Records (EHRs) of 3,210 encounters from 13 affiliated ERs within the University of Pittsburgh Medical Center between March 2020 and January 2021. We leveraged a Natural Language Processing technique, ScispaCy, to extract clinical concepts and used the 1001 most frequent concepts to develop 7-day revisit models for COVID-19 patients in ERs. The research data we collected from 13 ERs may have distributional differences that could affect the model development. To address this issue, we employed a classic deep transfer learning method called the Domain Adversarial Neural Network (DANN) and evaluated different modeling strategies, including the Multi-DANN algorithm, the Single-DANN algorithm, and three baseline methods. Results showed that the Multi-DANN models outperformed the Single-DANN models and baseline models in predicting revisits of COVID-19 patients to the ER within 7 days after discharge. Notably, the Multi-DANN strategy effectively addressed the heterogeneity among multiple source domains and improved the adaptation of source data to the target domain. Moreover, the high performance of Multi-DANN models indicates that EHRs are informative for developing a prediction model to identify COVID-19 patients who are very likely to revisit an ER within 7 days after discharge. △ Less

Submitted 29 June, 2023; originally announced June 2023.

Comments: to appear at ICHI 2023

arXiv:2304.09324 [pdf, other]

Computer-Vision Benchmark Segment-Anything Model (SAM) in Medical Images: Accuracy in 12 Datasets

Authors: Sheng He, Rina Bao, Jingpeng Li, Jeffrey Stout, Atle Bjornerud, P. Ellen Grant, Yangming Ou

Abstract: Background: The segment-anything model (SAM), introduced in April 2023, shows promise as a benchmark model and a universal solution to segment various natural images. It comes without previously-required re-training or fine-tuning specific to each new dataset. Purpose: To test SAM's accuracy in various medical image segmentation tasks and investigate potential factors that may affect its accurac… ▽ More Background: The segment-anything model (SAM), introduced in April 2023, shows promise as a benchmark model and a universal solution to segment various natural images. It comes without previously-required re-training or fine-tuning specific to each new dataset. Purpose: To test SAM's accuracy in various medical image segmentation tasks and investigate potential factors that may affect its accuracy in medical images. Methods: SAM was tested on 12 public medical image segmentation datasets involving 7,451 subjects. The accuracy was measured by the Dice overlap between the algorithm-segmented and ground-truth masks. SAM was compared with five state-of-the-art algorithms specifically designed for medical image segmentation tasks. Associations of SAM's accuracy with six factors were computed, independently and jointly, including segmentation difficulties as measured by segmentation ability score and by Dice overlap in U-Net, image dimension, size of the target region, image modality, and contrast. Results: The Dice overlaps from SAM were significantly lower than the five medical-image-based algorithms in all 12 medical image segmentation datasets, by a margin of 0.1-0.5 and even 0.6-0.7 Dice. SAM-Semantic was significantly associated with medical image segmentation difficulty and the image modality, and SAM-Point and SAM-Box were significantly associated with image segmentation difficulty, image dimension, target region size, and target-vs-background contrast. All these 3 variations of SAM were more accurate in 2D medical images, larger target region sizes, easier cases with a higher Segmentation Ability score and higher U-Net Dice, and higher foreground-background contrast. △ Less

Submitted 5 May, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

Comments: Technical Report

arXiv:2304.01401 [pdf, other]

U-Netmer: U-Net meets Transformer for medical image segmentation

Authors: Sheng He, Rina Bao, P. Ellen Grant, Yangming Ou

Abstract: The combination of the U-Net based deep learning models and Transformer is a new trend for medical image segmentation. U-Net can extract the detailed local semantic and texture information and Transformer can learn the long-rang dependencies among pixels in the input image. However, directly adapting the Transformer for segmentation has ``token-flatten" problem (flattens the local patches into 1D… ▽ More The combination of the U-Net based deep learning models and Transformer is a new trend for medical image segmentation. U-Net can extract the detailed local semantic and texture information and Transformer can learn the long-rang dependencies among pixels in the input image. However, directly adapting the Transformer for segmentation has ``token-flatten" problem (flattens the local patches into 1D tokens which losses the interaction among pixels within local patches) and ``scale-sensitivity" problem (uses a fixed scale to split the input image into local patches). Compared to directly combining U-Net and Transformer, we propose a new global-local fashion combination of U-Net and Transformer, named U-Netmer, to solve the two problems. The proposed U-Netmer splits an input image into local patches. The global-context information among local patches is learnt by the self-attention mechanism in Transformer and U-Net segments each local patch instead of flattening into tokens to solve the `token-flatten" problem. The U-Netmer can segment the input image with different patch sizes with the identical structure and the same parameter. Thus, the U-Netmer can be trained with different patch sizes to solve the ``scale-sensitivity" problem. We conduct extensive experiments in 7 public datasets on 7 organs (brain, heart, breast, lung, polyp, pancreas and prostate) and 4 imaging modalities (MRI, CT, ultrasound, and endoscopy) to show that the proposed U-Netmer can be generally applied to improve accuracy of medical image segmentation. These experimental results show that U-Netmer provides state-of-the-art performance compared to baselines and other models. In addition, the discrepancy among the outputs of U-Netmer with different scales is linearly correlated to the segmentation accuracy which can be considered as a confidence score to rank test images by difficulty without ground-truth. △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: 10 pages, 5 figures, under review

arXiv:2212.08568 [pdf, other]

Biomedical image analysis competitions: The state of current participation practice

Authors: Matthias Eisenmann, Annika Reinke, Vivienn Weru, Minu Dietlinde Tizabi, Fabian Isensee, Tim J. Adler, Patrick Godau, Veronika Cheplygina, Michal Kozubek, Sharib Ali, Anubha Gupta, Jan Kybic, Alison Noble, Carlos Ortiz de Solórzano, Samiksha Pachade, Caroline Petitjean, Daniel Sage, Donglai Wei, Elizabeth Wilden, Deepak Alapatt, Vincent Andrearczyk, Ujjwal Baid, Spyridon Bakas, Niranjan Balu, Sophia Bano , et al. (331 additional authors not shown)

Abstract: The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis,… ▽ More The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps. △ Less

Submitted 12 September, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

arXiv:2211.11843 [pdf, other]

SLUGBOT, an Aplysia-inspired Robotic Grasper for Studying Control

Authors: Kevin Dai, Ravesh Sukhnandan, Michael Bennington, Karen Whirley, Ryan Bao, Lu Li, Jeffrey P. Gill, Hillel J. Chiel, Victoria A. Webster-Wood

Abstract: Living systems can use a single periphery to perform a variety of tasks and adapt to a dynamic environment. This multifunctionality is achieved through the use of neural circuitry that adaptively controls the reconfigurable musculature. Current robotic systems struggle to flexibly adapt to unstructured environments. Through mimicry of the neuromechanical coupling seen in living organisms, robotic… ▽ More Living systems can use a single periphery to perform a variety of tasks and adapt to a dynamic environment. This multifunctionality is achieved through the use of neural circuitry that adaptively controls the reconfigurable musculature. Current robotic systems struggle to flexibly adapt to unstructured environments. Through mimicry of the neuromechanical coupling seen in living organisms, robotic systems could potentially achieve greater autonomy. The tractable neuromechanics of the sea slug $\textit{Aplysia californica's}$ feeding apparatus, or buccal mass, make it an ideal candidate for applying neuromechanical principles to the control of a soft robot. In this work, a robotic grasper was designed to mimic specific morphology of the $\textit{Aplysia}$ feeding apparatus. These include the use of soft actuators akin to biological muscle, a deformable grasping surface, and a similar muscular architecture. A previously developed Boolean neural controller was then adapted for the control of this soft robotic system. The robot was capable of qualitatively replicating swallowing behavior by cyclically ingesting a plastic tube. The robot's normalized translational and rotational kinematics of the odontophore followed profiles observed $\textit{in vivo}$ despite morphological differences. This brings $\textit{Aplysia}$-inspired control $\textit{in roboto}$ one step closer to multifunctional neural control schema $\textit{in vivo}$ and $\textit{in silico}$. Future additions may improve SLUGBOT's viability as a neuromechanical research platform. △ Less

Submitted 21 November, 2022; originally announced November 2022.

Comments: Submitted and accepted to Living Machines 2022 conference

arXiv:2211.03013 [pdf, other]

doi 10.18653/v1/2022.acl-long.157

Robust Lottery Tickets for Pre-trained Language Models

Authors: Rui Zheng, Rong Bao, Yuhao Zhou, Di Liang, Sirui Wang, Wei Wu, Tao Gui, Qi Zhang, Xuanjing Huang

Abstract: Recent works on Lottery Ticket Hypothesis have shown that pre-trained language models (PLMs) contain smaller matching subnetworks(winning tickets) which are capable of reaching accuracy comparable to the original models. However, these tickets are proved to be notrobust to adversarial examples, and even worse than their PLM counterparts. To address this problem, we propose a novel method based on… ▽ More Recent works on Lottery Ticket Hypothesis have shown that pre-trained language models (PLMs) contain smaller matching subnetworks(winning tickets) which are capable of reaching accuracy comparable to the original models. However, these tickets are proved to be notrobust to adversarial examples, and even worse than their PLM counterparts. To address this problem, we propose a novel method based on learning binary weight masks to identify robust tickets hidden in the original PLMs. Since the loss is not differentiable for the binary mask, we assign the hard concrete distribution to the masks and encourage their sparsity using a smoothing approximation of L0 regularization.Furthermore, we design an adversarial loss objective to guide the search for robust tickets and ensure that the tickets perform well bothin accuracy and robustness. Experimental results show the significant improvement of the proposed method over previous work on adversarial robustness evaluation. △ Less

Submitted 5 November, 2022; originally announced November 2022.

Comments: ACL 2022. https://aclanthology.org/2022.acl-long.157

MSC Class: 68-06

arXiv:2211.01762 [pdf, ps, other]

Stock Trading Volume Prediction with Dual-Process Meta-Learning

Authors: Ruibo Chen, Wei Li, Zhiyuan Zhang, Ruihan Bao, Keiko Harimoto, Xu Sun

Abstract: Volume prediction is one of the fundamental objectives in the Fintech area, which is helpful for many downstream tasks, e.g., algorithmic trading. Previous methods mostly learn a universal model for different stocks. However, this kind of practice omits the specific characteristics of individual stocks by applying the same set of parameters for different stocks. On the other hand, learning differe… ▽ More Volume prediction is one of the fundamental objectives in the Fintech area, which is helpful for many downstream tasks, e.g., algorithmic trading. Previous methods mostly learn a universal model for different stocks. However, this kind of practice omits the specific characteristics of individual stocks by applying the same set of parameters for different stocks. On the other hand, learning different models for each stock would face data sparsity or cold start problems for many stocks with small capitalization. To take advantage of the data scale and the various characteristics of individual stocks, we propose a dual-process meta-learning method that treats the prediction of each stock as one task under the meta-learning framework. Our method can model the common pattern behind different stocks with a meta-learner, while modeling the specific pattern for each stock across time spans with stock-dependent parameters. Furthermore, we propose to mine the pattern of each stock in the form of a latent variable which is then used for learning the parameters for the prediction module. This makes the prediction procedure aware of the data pattern. Extensive experiments on volume predictions show that our method can improve the performance of various baseline models. Further analyses testify the effectiveness of our proposed meta-learning framework. △ Less

Submitted 11 October, 2022; originally announced November 2022.

Comments: 16 pages, 3 figures, 5 tables. Published in ECML-PKDD 2022

arXiv:2210.12689 [pdf, other]

Face Emotion Recognization Using Dataset Augmentation Based on Neural Network

Authors: Mengyu Rao, Ruyi Bao, Liangshun Dong

Abstract: Facial expression is one of the most external indications of a person's feelings and emotions. In daily conversation, according to the psychologist, only 7% and 38% of information is communicated through words and sounds respective, while up to 55% is through facial expression. It plays an important role in coordinating interpersonal relationships. Ekman and Friesen recognized six essential emotio… ▽ More Facial expression is one of the most external indications of a person's feelings and emotions. In daily conversation, according to the psychologist, only 7% and 38% of information is communicated through words and sounds respective, while up to 55% is through facial expression. It plays an important role in coordinating interpersonal relationships. Ekman and Friesen recognized six essential emotions in the nineteenth century depending on a cross-cultural study, which indicated that people feel each basic emotion in the same fashion despite culture. As a branch of the field of analyzing sentiment, facial expression recognition offers broad application prospects in a variety of domains, including the interaction between humans and computers, healthcare, and behavior monitoring. Therefore, many researchers have devoted themselves to facial expression recognition. In this paper, an effective hybrid data augmentation method is used. This approach is operated on two public datasets, and four benchmark models see some remarkable results. △ Less

Submitted 21 November, 2022; v1 submitted 23 October, 2022; originally announced October 2022.

Comments: 5 pages, 8 figures, 3 tables

arXiv:2208.10251 [pdf, other]

Rethinking Textual Adversarial Defense for Pre-trained Language Models

Authors: Jiayi Wang, Rongzhou Bao, Zhuosheng Zhang, Hai Zhao

Abstract: Although pre-trained language models (PrLMs) have achieved significant success, recent studies demonstrate that PrLMs are vulnerable to adversarial attacks. By generating adversarial examples with slight perturbations on different levels (sentence / word / character), adversarial attacks can fool PrLMs to generate incorrect predictions, which questions the robustness of PrLMs. However, we find tha… ▽ More Although pre-trained language models (PrLMs) have achieved significant success, recent studies demonstrate that PrLMs are vulnerable to adversarial attacks. By generating adversarial examples with slight perturbations on different levels (sentence / word / character), adversarial attacks can fool PrLMs to generate incorrect predictions, which questions the robustness of PrLMs. However, we find that most existing textual adversarial examples are unnatural, which can be easily distinguished by both human and machine. Based on a general anomaly detector, we propose a novel metric (Degree of Anomaly) as a constraint to enable current adversarial attack approaches to generate more natural and imperceptible adversarial examples. Under this new constraint, the success rate of existing attacks drastically decreases, which reveals that the robustness of PrLMs is not as fragile as they claimed. In addition, we find that four types of randomization can invalidate a large portion of textual adversarial examples. Based on anomaly detector and randomization, we design a universal defense framework, which is among the first to perform textual adversarial defense without knowing the specific attack. Empirical results show that our universal defense framework achieves comparable or even higher after-attack accuracy with other specific defenses, while preserving higher original accuracy at the same time. Our work discloses the essence of textual adversarial attacks, and indicates that (1) further works of adversarial attacks should focus more on how to overcome the detection and resist the randomization, otherwise their adversarial examples would be easily detected and invalidated; and (2) compared with the unnatural and perceptible adversarial examples, it is those undetectable adversarial examples that pose real risks for PrLMs and require more attention for future robustness-enhancing strategies. △ Less

Submitted 21 July, 2022; originally announced August 2022.

arXiv:2208.08056 [pdf, other]

doi 10.13140/RG.2.2.21905.92008

Sampling Through the Lens of Sequential Decision Making

Authors: Jason Xiaotian Dou, Alvin Qingkai Pan, Runxue Bao, Haiyi Harry Mao, Lei Luo, Zhi-Hong Mao

Abstract: Sampling is ubiquitous in machine learning methodologies. Due to the growth of large datasets and model complexity, we want to learn and adapt the sampling process while training a representation. Towards achieving this grand goal, a variety of sampling techniques have been proposed. However, most of them either use a fixed sampling scheme or adjust the sampling scheme based on simple heuristics.… ▽ More Sampling is ubiquitous in machine learning methodologies. Due to the growth of large datasets and model complexity, we want to learn and adapt the sampling process while training a representation. Towards achieving this grand goal, a variety of sampling techniques have been proposed. However, most of them either use a fixed sampling scheme or adjust the sampling scheme based on simple heuristics. They cannot choose the best sample for model training in different stages. Inspired by "Think, Fast and Slow" (System 1 and System 2) in cognitive science, we propose a reward-guided sampling strategy called Adaptive Sample with Reward (ASR) to tackle this challenge. To the best of our knowledge, this is the first work utilizing reinforcement learning (RL) to address the sampling problem in representation learning. Our approach optimally adjusts the sampling process to achieve optimal performance. We explore geographical relationships among samples by distance-based sampling to maximize overall cumulative reward. We apply ASR to the long-standing sampling problems in similarity-based loss functions. Empirical results in information retrieval and clustering demonstrate ASR's superb performance across different datasets. We also discuss an engrossing phenomenon which we name as "ASR gravity well" in experiments. △ Less

Submitted 13 December, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

arXiv:2208.07232 [pdf, other]

Distributional Correlation--Aware Knowledge Distillation for Stock Trading Volume Prediction

Authors: Lei Li, Zhiyuan Zhang, Ruihan Bao, Keiko Harimoto, Xu Sun

Abstract: Traditional knowledge distillation in classification problems transfers the knowledge via class correlations in the soft label produced by teacher models, which are not available in regression problems like stock trading volume prediction. To remedy this, we present a novel distillation framework for training a light-weight student model to perform trading volume prediction given historical transa… ▽ More Traditional knowledge distillation in classification problems transfers the knowledge via class correlations in the soft label produced by teacher models, which are not available in regression problems like stock trading volume prediction. To remedy this, we present a novel distillation framework for training a light-weight student model to perform trading volume prediction given historical transaction data. Specifically, we turn the regression model into a probabilistic forecasting model, by training models to predict a Gaussian distribution to which the trading volume belongs. The student model can thus learn from the teacher at a more informative distributional level, by matching its predicted distributions to that of the teacher. Two correlational distillation objectives are further introduced to encourage the student to produce consistent pair-wise relationships with the teacher model. We evaluate the framework on a real-world stock volume dataset with two different time window settings. Experiments demonstrate that our framework is superior to strong baseline models, compressing the model size by $5\times$ while maintaining $99.6\%$ prediction accuracy. The extensive analysis further reveals that our framework is more effective than vanilla distillation methods under low-resource scenarios. △ Less

Submitted 4 August, 2022; originally announced August 2022.

Comments: ECML-PKDD 2022, our code and data will be available at https://github.com/lancopku/DCKD

arXiv:2208.06058 [pdf, other]

An Accelerated Doubly Stochastic Gradient Method with Faster Explicit Model Identification

Authors: Runxue Bao, Bin Gu, Heng Huang

Abstract: Sparsity regularized loss minimization problems play an important role in various fields including machine learning, data mining, and modern statistics. Proximal gradient descent method and coordinate descent method are the most popular approaches to solving the minimization problem. Although existing methods can achieve implicit model identification, aka support set identification, in a finite nu… ▽ More Sparsity regularized loss minimization problems play an important role in various fields including machine learning, data mining, and modern statistics. Proximal gradient descent method and coordinate descent method are the most popular approaches to solving the minimization problem. Although existing methods can achieve implicit model identification, aka support set identification, in a finite number of iterations, these methods still suffer from huge computational costs and memory burdens in high-dimensional scenarios. The reason is that the support set identification in these methods is implicit and thus cannot explicitly identify the low-complexity structure in practice, namely, they cannot discard useless coefficients of the associated features to achieve algorithmic acceleration via dimension reduction. To address this challenge, we propose a novel accelerated doubly stochastic gradient descent (ADSGD) method for sparsity regularized loss minimization problems, which can reduce the number of block iterations by eliminating inactive coefficients during the optimization process and eventually achieve faster explicit model identification and improve the algorithm efficiency. Theoretically, we first prove that ADSGD can achieve a linear convergence rate and lower overall computational complexity. More importantly, we prove that ADSGD can achieve a linear rate of explicit model identification. Numerically, experimental results on benchmark datasets confirm the efficiency of our proposed method. △ Less

Submitted 11 August, 2022; originally announced August 2022.

arXiv:2206.09576 [pdf, other]

FedSSO: A Federated Server-Side Second-Order Optimization Algorithm

Authors: Xin Ma, Renyi Bao, Jinpeng Jiang, Yang Liu, Arthur Jiang, Jun Yan, Xin Liu, Zhisong Pan

Abstract: In this work, we propose FedSSO, a server-side second-order optimization method for federated learning (FL). In contrast to previous works in this direction, we employ a server-side approximation for the Quasi-Newton method without requiring any training data from the clients. In this way, we not only shift the computation burden from clients to server, but also eliminate the additional communicat… ▽ More In this work, we propose FedSSO, a server-side second-order optimization method for federated learning (FL). In contrast to previous works in this direction, we employ a server-side approximation for the Quasi-Newton method without requiring any training data from the clients. In this way, we not only shift the computation burden from clients to server, but also eliminate the additional communication for second-order updates between clients and server entirely. We provide theoretical guarantee for convergence of our novel method, and empirically demonstrate our fast convergence and communication savings in both convex and non-convex settings. △ Less

Submitted 22 August, 2022; v1 submitted 20 June, 2022; originally announced June 2022.

arXiv:2204.10981 [pdf, other]

Distributed Dynamic Safe Screening Algorithms for Sparse Regularization

Authors: Runxue Bao, Xidong Wu, Wenhan Xian, Heng Huang

Abstract: Distributed optimization has been widely used as one of the most efficient approaches for model training with massive samples. However, large-scale learning problems with both massive samples and high-dimensional features widely exist in the era of big data. Safe screening is a popular technique to speed up high-dimensional models by discarding the inactive features with zero coefficients. Neverth… ▽ More Distributed optimization has been widely used as one of the most efficient approaches for model training with massive samples. However, large-scale learning problems with both massive samples and high-dimensional features widely exist in the era of big data. Safe screening is a popular technique to speed up high-dimensional models by discarding the inactive features with zero coefficients. Nevertheless, existing safe screening methods are limited to the sequential setting. In this paper, we propose a new distributed dynamic safe screening (DDSS) method for sparsity regularized models and apply it on shared-memory and distributed-memory architecture respectively, which can achieve significant speedup without any loss of accuracy by simultaneously enjoying the sparsity of the model and dataset. To the best of our knowledge, this is the first work of distributed safe dynamic screening method. Theoretically, we prove that the proposed method achieves the linear convergence rate with lower overall complexity and can eliminate almost all the inactive features in a finite number of iterations almost surely. Finally, extensive experimental results on benchmark datasets confirm the superiority of our proposed method. △ Less

Submitted 22 April, 2022; originally announced April 2022.

arXiv:2203.11199 [pdf, other]

Distinguishing Non-natural from Natural Adversarial Samples for More Robust Pre-trained Language Model

Authors: Jiayi Wang, Rongzhou Bao, Zhuosheng Zhang, Hai Zhao

Abstract: Recently, the problem of robustness of pre-trained language models (PrLMs) has received increasing research interest. Latest studies on adversarial attacks achieve high attack success rates against PrLMs, claiming that PrLMs are not robust. However, we find that the adversarial samples that PrLMs fail are mostly non-natural and do not appear in reality. We question the validity of current evaluati… ▽ More Recently, the problem of robustness of pre-trained language models (PrLMs) has received increasing research interest. Latest studies on adversarial attacks achieve high attack success rates against PrLMs, claiming that PrLMs are not robust. However, we find that the adversarial samples that PrLMs fail are mostly non-natural and do not appear in reality. We question the validity of current evaluation of robustness of PrLMs based on these non-natural adversarial samples and propose an anomaly detector to evaluate the robustness of PrLMs with more natural adversarial samples. We also investigate two applications of the anomaly detector: (1) In data augmentation, we employ the anomaly detector to force generating augmented data that are distinguished as non-natural, which brings larger gains to the accuracy of PrLMs. (2) We apply the anomaly detector to a defense framework to enhance the robustness of PrLMs. It can be used to defend all types of attacks and achieves higher accuracy on both adversarial samples and compliant samples than other defense frameworks. △ Less

Submitted 19 March, 2022; originally announced March 2022.

Comments: Accepted by findings of ACL 2022

arXiv:2108.12848 [pdf, other]

Span Fine-tuning for Pre-trained Language Models

Authors: Rongzhou Bao, Zhuosheng Zhang, Hai Zhao

Abstract: Pre-trained language models (PrLM) have to carefully manage input units when training on a very large text with a vocabulary consisting of millions of words. Previous works have shown that incorporating span-level information over consecutive words in pre-training could further improve the performance of PrLMs. However, given that span-level clues are introduced and fixed in pre-training, previous… ▽ More Pre-trained language models (PrLM) have to carefully manage input units when training on a very large text with a vocabulary consisting of millions of words. Previous works have shown that incorporating span-level information over consecutive words in pre-training could further improve the performance of PrLMs. However, given that span-level clues are introduced and fixed in pre-training, previous methods are time-consuming and lack of flexibility. To alleviate the inconvenience, this paper presents a novel span fine-tuning method for PrLMs, which facilitates the span setting to be adaptively determined by specific downstream tasks during the fine-tuning phase. In detail, any sentences processed by the PrLM will be segmented into multiple spans according to a pre-sampled dictionary. Then the segmentation information will be sent through a hierarchical CNN module together with the representation outputs of the PrLM and ultimately generate a span-enhanced representation. Experiments on GLUE benchmark show that the proposed span fine-tuning method significantly enhances the PrLM, and at the same time, offer more flexibility in an efficient way. △ Less

Submitted 15 September, 2021; v1 submitted 29 August, 2021; originally announced August 2021.

Comments: Accepted by EMNLP 2021 Finding(early version)

arXiv:2108.11318 [pdf, other]

Long-term, Short-term and Sudden Event: Trading Volume Movement Prediction with Graph-based Multi-view Modeling

Authors: Liang Zhao, Wei Li, Ruihan Bao, Keiko Harimoto, YunfangWu, Xu Sun

Abstract: Trading volume movement prediction is the key in a variety of financial applications. Despite its importance, there is few research on this topic because of its requirement for comprehensive understanding of information from different sources. For instance, the relation between multiple stocks, recent transaction data and suddenly released events are all essential for understanding trading market.… ▽ More Trading volume movement prediction is the key in a variety of financial applications. Despite its importance, there is few research on this topic because of its requirement for comprehensive understanding of information from different sources. For instance, the relation between multiple stocks, recent transaction data and suddenly released events are all essential for understanding trading market. However, most of the previous methods only take the fluctuation information of the past few weeks into consideration, thus yielding poor performance. To handle this issue, we propose a graphbased approach that can incorporate multi-view information, i.e., long-term stock trend, short-term fluctuation and sudden events information jointly into a temporal heterogeneous graph. Besides, our method is equipped with deep canonical analysis to highlight the correlations between different perspectives of fluctuation for better prediction. Experiment results show that our method outperforms strong baselines by a large margin. △ Less

Submitted 22 August, 2021; originally announced August 2021.

Comments: Accepted as a main track paper by IJCAI 21

arXiv:2108.08976 [pdf, other]

doi 10.1016/j.neucom.2022.12.013

ASAT: Adaptively Scaled Adversarial Training in Time Series

Authors: Zhiyuan Zhang, Wei Li, Ruihan Bao, Keiko Harimoto, Yunfang Wu, Xu Sun

Abstract: Adversarial training is a method for enhancing neural networks to improve the robustness against adversarial examples. Besides the security concerns of potential adversarial examples, adversarial training can also improve the generalization ability of neural networks, train robust neural networks, and provide interpretability for neural networks. In this work, we introduce adversarial training in… ▽ More Adversarial training is a method for enhancing neural networks to improve the robustness against adversarial examples. Besides the security concerns of potential adversarial examples, adversarial training can also improve the generalization ability of neural networks, train robust neural networks, and provide interpretability for neural networks. In this work, we introduce adversarial training in time series analysis to enhance the neural networks for better generalization ability by taking the finance field as an example. Rethinking existing research on adversarial training, we propose the adaptively scaled adversarial training (ASAT) in time series analysis, by rescaling data at different time slots with adaptive scales. Experimental results show that the proposed ASAT can improve both the generalization ability and the adversarial robustness of neural networks compared to the baselines. Compared to the traditional adversarial training algorithm, ASAT can achieve better generalization ability and similar adversarial robustness. △ Less

Submitted 19 December, 2022; v1 submitted 19 August, 2021; originally announced August 2021.

Comments: Accepted to Neurocomputing

Journal ref: Neurocomputing 522 (2023) pp. 11-23

arXiv:2105.14553 [pdf, other]

Defending Pre-trained Language Models from Adversarial Word Substitutions Without Performance Sacrifice

Authors: Rongzhou Bao, Jiayi Wang, Hai Zhao

Abstract: Pre-trained contextualized language models (PrLMs) have led to strong performance gains in downstream natural language understanding tasks. However, PrLMs can still be easily fooled by adversarial word substitution, which is one of the most challenging textual adversarial attack methods. Existing defence approaches suffer from notable performance loss and complexities. Thus, this paper presents a… ▽ More Pre-trained contextualized language models (PrLMs) have led to strong performance gains in downstream natural language understanding tasks. However, PrLMs can still be easily fooled by adversarial word substitution, which is one of the most challenging textual adversarial attack methods. Existing defence approaches suffer from notable performance loss and complexities. Thus, this paper presents a compact and performance-preserved framework, Anomaly Detection with Frequency-Aware Randomization (ADFAR). In detail, we design an auxiliary anomaly detection classifier and adopt a multi-task learning procedure, by which PrLMs are able to distinguish adversarial input samples. Then, in order to defend adversarial word substitution, a frequency-aware randomization process is applied to those recognized adversarial input samples. Empirical results show that ADFAR significantly outperforms those newly proposed defense methods over various tasks with much higher inference speed. Remarkably, ADFAR does not impair the overall performance of PrLMs. The code is available at https://github.com/LilyNLP/ADFAR △ Less

Submitted 30 May, 2021; originally announced May 2021.

Comments: Findings of ACL: ACL 2021

arXiv:2103.12516 [pdf, other]

Edge-Cloud Collaboration Enabled Video Service Enhancement: A Hybrid Human-Artificial Intelligence Scheme

Authors: Dapeng Wu, Ruili Bao, Zhidu Li, Honggang Wang, Ruyan Wang

Abstract: In this paper, a video service enhancement strategy is investigated under an edge-cloud collaboration framework, where video caching and delivery decisions are made in the cloud and edge respectively. We aim to guarantee the user fairness in terms of video coding rate under statistical delay constraint and edge caching capacity constraint. A hybrid human-artificial intelligence approach is develop… ▽ More In this paper, a video service enhancement strategy is investigated under an edge-cloud collaboration framework, where video caching and delivery decisions are made in the cloud and edge respectively. We aim to guarantee the user fairness in terms of video coding rate under statistical delay constraint and edge caching capacity constraint. A hybrid human-artificial intelligence approach is developed to improve the user hit rate for video caching. Specifically, individual user interest is first characterized by merging factorization machine (FM) model and multi-layer perceptron (MLP) model, where both low-order and high-order features can be well learned simultaneously. Thereafter, a social aware similarity model is constructed to transferred individual user interest to group interest, based on which, videos can be selected to cache. Furthermore, a double bisection exploration scheme is proposed to optimize wireless resource allocation and video coding rate. The effectiveness of the proposed video caching scheme and video delivery scheme is finally validated by extensive experiments with a real-world data set. △ Less

Submitted 14 January, 2021; originally announced March 2021.

Comments: This paper has been submitted to IEEE Transactions on Multimedia for review

arXiv:2101.06563 [pdf, other]

doi 10.1080/01691864.2020.1869586

Stereo Camera Visual SLAM with Hierarchical Masking and Motion-state Classification at Outdoor Construction Sites Containing Large Dynamic Objects

Authors: Runqiu Bao, Ren Komatsu, Renato Miyagusuku, Masaki Chino, Atsushi Yamashita, Hajime Asama

Abstract: At modern construction sites, utilizing GNSS (Global Navigation Satellite System) to measure the real-time location and orientation (i.e. pose) of construction machines and navigate them is very common. However, GNSS is not always available. Replacing GNSS with on-board cameras and visual simultaneous localization and mapping (visual SLAM) to navigate the machines is a cost-effective solution. Nev… ▽ More At modern construction sites, utilizing GNSS (Global Navigation Satellite System) to measure the real-time location and orientation (i.e. pose) of construction machines and navigate them is very common. However, GNSS is not always available. Replacing GNSS with on-board cameras and visual simultaneous localization and mapping (visual SLAM) to navigate the machines is a cost-effective solution. Nevertheless, at construction sites, multiple construction machines will usually work together and side-by-side, causing large dynamic occlusions in the cameras' view. Standard visual SLAM cannot handle large dynamic occlusions well. In this work, we propose a motion segmentation method to efficiently extract static parts from crowded dynamic scenes to enable robust tracking of camera ego-motion. Our method utilizes semantic information combined with object-level geometric constraints to quickly detect the static parts of the scene. Then, we perform a two-step coarse-to-fine ego-motion tracking with reference to the static parts. This leads to a novel dynamic visual SLAM formation. We test our proposals through a real implementation based on ORB-SLAM2, and datasets we collected from real construction sites. The results show that when standard visual SLAM fails, our method can still retain accurate camera ego-motion tracking in real-time. Comparing to state-of-the-art dynamic visual SLAM methods, ours shows outstanding efficiency and competitive result trajectory accuracy. △ Less

Submitted 16 January, 2021; originally announced January 2021.

Comments: This is an Accepted Manuscript of an article published by Taylor & Francis in Advanced Robotics on Jan. 11th, 2021, available online: https://www.tandfonline.com/doi/full/10.1080/01691864.2020.1869586 [Article DOI:10.1080/01691864.2020.1869586]

Journal ref: Advanced Robotics (2021) 1-14

arXiv:2012.15070 [pdf, other]

Enhancing Pre-trained Language Model with Lexical Simplification

Authors: Rongzhou Bao, Jiayi Wang, Zhuosheng Zhang, Hai Zhao

Abstract: For both human readers and pre-trained language models (PrLMs), lexical diversity may lead to confusion and inaccuracy when understanding the underlying semantic meanings of given sentences. By substituting complex words with simple alternatives, lexical simplification (LS) is a recognized method to reduce such lexical diversity, and therefore to improve the understandability of sentences. In this… ▽ More For both human readers and pre-trained language models (PrLMs), lexical diversity may lead to confusion and inaccuracy when understanding the underlying semantic meanings of given sentences. By substituting complex words with simple alternatives, lexical simplification (LS) is a recognized method to reduce such lexical diversity, and therefore to improve the understandability of sentences. In this paper, we leverage LS and propose a novel approach which can effectively improve the performance of PrLMs in text classification. A rule-based simplification process is applied to a given sentence. PrLMs are encouraged to predict the real label of the given sentence with auxiliary inputs from the simplified version. Using strong PrLMs (BERT and ELECTRA) as baselines, our approach can still further improve the performance in various text classification tasks. △ Less

Submitted 30 December, 2020; originally announced December 2020.

arXiv:2012.13489 [pdf, other]

Learning Robust Representation for Clustering through Locality Preserving Variational Discriminative Network

Authors: Ruixuan Luo, Wei Li, Zhiyuan Zhang, Ruihan Bao, Keiko Harimoto, Xu Sun

Abstract: Clustering is one of the fundamental problems in unsupervised learning. Recent deep learning based methods focus on learning clustering oriented representations. Among those methods, Variational Deep Embedding achieves great success in various clustering tasks by specifying a Gaussian Mixture prior to the latent space. However, VaDE suffers from two problems: 1) it is fragile to the input noise; 2… ▽ More Clustering is one of the fundamental problems in unsupervised learning. Recent deep learning based methods focus on learning clustering oriented representations. Among those methods, Variational Deep Embedding achieves great success in various clustering tasks by specifying a Gaussian Mixture prior to the latent space. However, VaDE suffers from two problems: 1) it is fragile to the input noise; 2) it ignores the locality information between the neighboring data points. In this paper, we propose a joint learning framework that improves VaDE with a robust embedding discriminator and a local structure constraint, which are both helpful to improve the robustness of our model. Experiment results on various vision and textual datasets demonstrate that our method outperforms the state-of-the-art baseline models in all metrics. Further detailed analysis shows that our proposed model is very robust to the adversarial inputs, which is a desirable property for practical applications. △ Less

Submitted 10 March, 2021; v1 submitted 24 December, 2020; originally announced December 2020.

Comments: Accepted by AAAI RSEML 2021

arXiv:2012.03662 [pdf, other]

Confidence-aware Non-repetitive Multimodal Transformers for TextCaps

Authors: Zhaokai Wang, Renda Bao, Qi Wu, Si Liu

Abstract: When describing an image, reading text in the visual scene is crucial to understand the key information. Recent work explores the TextCaps task, i.e. image captioning with reading Optical Character Recognition (OCR) tokens, which requires models to read text and cover them in generated captions. Existing approaches fail to generate accurate descriptions because of their (1) poor reading ability; (… ▽ More When describing an image, reading text in the visual scene is crucial to understand the key information. Recent work explores the TextCaps task, i.e. image captioning with reading Optical Character Recognition (OCR) tokens, which requires models to read text and cover them in generated captions. Existing approaches fail to generate accurate descriptions because of their (1) poor reading ability; (2) inability to choose the crucial words among all extracted OCR tokens; (3) repetition of words in predicted captions. To this end, we propose a Confidence-aware Non-repetitive Multimodal Transformers (CNMT) to tackle the above challenges. Our CNMT consists of a reading, a reasoning and a generation modules, in which Reading Module employs better OCR systems to enhance text reading ability and a confidence embedding to select the most noteworthy tokens. To address the issue of word redundancy in captions, our Generation Module includes a repetition mask to avoid predicting repeated word in captions. Our model outperforms state-of-the-art models on TextCaps dataset, improving from 81.0 to 93.0 in CIDEr. Our source code is publicly available. △ Less

Submitted 21 March, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

Comments: 9 pages; Accepted by AAAI 2021

arXiv:2006.16433 [pdf, ps, other]

Fast OSCAR and OWL Regression via Safe Screening Rules

Authors: Runxue Bao, Bin Gu, Heng Huang

Abstract: Ordered Weighted $L_{1}$ (OWL) regularized regression is a new regression analysis for high-dimensional sparse learning. Proximal gradient methods are used as standard approaches to solve OWL regression. However, it is still a burning issue to solve OWL regression due to considerable computational cost and memory usage when the feature or sample size is large. In this paper, we propose the first s… ▽ More Ordered Weighted $L_{1}$ (OWL) regularized regression is a new regression analysis for high-dimensional sparse learning. Proximal gradient methods are used as standard approaches to solve OWL regression. However, it is still a burning issue to solve OWL regression due to considerable computational cost and memory usage when the feature or sample size is large. In this paper, we propose the first safe screening rule for OWL regression by exploring the order of the primal solution with the unknown order structure via an iterative strategy, which overcomes the difficulties of tackling the non-separable regularizer. It effectively avoids the updates of the parameters whose coefficients must be zero during the learning process. More importantly, the proposed screening rule can be easily applied to standard and stochastic proximal gradient methods. Moreover, we prove that the algorithms with our screening rule are guaranteed to have identical results with the original algorithms. Experimental results on a variety of datasets show that our screening rule leads to a significant computational gain without any loss of accuracy, compared to existing competitive algorithms. △ Less

Submitted 19 October, 2021; v1 submitted 29 June, 2020; originally announced June 2020.

Comments: Correct the error of the optimality conditions

arXiv:2006.13693 [pdf, other]

PECAIQR: A Model for Infectious Disease Applied to the Covid-19 Epidemic

Authors: Richard Bao, August Chen, Jethin Gowda, Shiva Mudide

Abstract: The Covid-19 pandemic has made clear the need to improve modern multivariate time-series forecasting models. Current state of the art predictions of future daily deaths and, especially, hospital resource usage have confidence intervals that are unacceptably wide. Policy makers and hospitals require accurate forecasts to make informed decisions on passing legislation and allocating resources. We us… ▽ More The Covid-19 pandemic has made clear the need to improve modern multivariate time-series forecasting models. Current state of the art predictions of future daily deaths and, especially, hospital resource usage have confidence intervals that are unacceptably wide. Policy makers and hospitals require accurate forecasts to make informed decisions on passing legislation and allocating resources. We used US county-level data on daily deaths and population statistics to forecast future deaths. We extended the SIR epidemiological model to a novel model we call the PECAIQR model. It adds several new variables and parameters to the naive SIR model by taking into account the ramifications of the partial quarantining implemented in the US. We fitted data to the model parameters with numerical integration. Because of the fit degeneracy in parameter space and non-constant nature of the parameters, we developed several methods to optimize our fit, such as training on the data tail and training on specific policy regimes. We use cross-validation to tune our hyper parameters at the county level and generate a CDF for future daily deaths. For predictions made from training data up to May 25th, we consistently obtained an averaged pinball loss score of 0.096 on a 14 day forecast. We finally present examples of possible avenues for utility from our model. We generate longer-time horizon predictions over various 1-month windows in the past, forecast how many medical resources such as ventilators and ICU beds will be needed in counties, and evaluate the efficacy of our model in other countries. △ Less

Submitted 17 June, 2020; originally announced June 2020.

arXiv:2004.00991 [pdf, other]

Computational Performance of a Germline Variant Calling Pipeline for Next Generation Sequencing

Authors: Jie Liu, Xiaotian Wu, Kai Zhang, Bing Liu, Renyi Bao, Xiao Chen, Yiran Cai, Yiming Shen, Xinjun He, Jun Yan, Weixing Ji

Abstract: With the booming of next generation sequencing technology and its implementation in clinical practice and life science research, the need for faster and more efficient data analysis methods becomes pressing in the field of sequencing. Here we report on the evaluation of an optimized germline mutation calling pipeline, HummingBird, by assessing its performance against the widely accepted BWA-GATK p… ▽ More With the booming of next generation sequencing technology and its implementation in clinical practice and life science research, the need for faster and more efficient data analysis methods becomes pressing in the field of sequencing. Here we report on the evaluation of an optimized germline mutation calling pipeline, HummingBird, by assessing its performance against the widely accepted BWA-GATK pipeline. We found that the HummingBird pipeline can significantly reduce the running time of the primary data analysis for whole genome sequencing and whole exome sequencing while without significantly sacrificing the variant calling accuracy. Thus, we conclude that expansion of such software usage will help to improve the primary data analysis efficiency for next generation sequencing. △ Less

Submitted 1 April, 2020; originally announced April 2020.

Comments: 6 pages, 6 figures, 3 tables

MSC Class: cs.PF; q-bio.GN ACM Class: C.4; D.4.8; J.3

arXiv:2001.05272 [pdf]

FGN: Fusion Glyph Network for Chinese Named Entity Recognition

Authors: Zhenyu Xuan, Rui Bao, Shengyi Jiang

Abstract: Chinese NER is a challenging task. As pictographs, Chinese characters contain latent glyph information, which is often overlooked. In this paper, we propose the FGN, Fusion Glyph Network for Chinese NER. Except for adding glyph information, this method may also add extra interactive information with the fusion mechanism. The major innovations of FGN include: (1) a novel CNN structure called CGS-CN… ▽ More Chinese NER is a challenging task. As pictographs, Chinese characters contain latent glyph information, which is often overlooked. In this paper, we propose the FGN, Fusion Glyph Network for Chinese NER. Except for adding glyph information, this method may also add extra interactive information with the fusion mechanism. The major innovations of FGN include: (1) a novel CNN structure called CGS-CNN is proposed to capture both glyph information and interactive information between glyphs from neighboring characters. (2) we provide a method with sliding window and Slice-Attention to fuse the BERT representation and glyph representation for a character, which may capture potential interactive knowledge between context and glyph. Experiments are conducted on four NER datasets, showing that FGN with LSTM-CRF as tagger achieves new state-of-the-arts performance for Chinese NER. Further, more experiments are conducted to investigate the influences of various components and settings in FGN. △ Less

Submitted 8 October, 2020; v1 submitted 15 January, 2020; originally announced January 2020.

arXiv:1910.05078 [pdf, other]

Incorporating Fine-grained Events in Stock Movement Prediction

Authors: Deli Chen, Yanyan Zou, Keiko Harimoto, Ruihan Bao, Xuancheng Ren, Xu Sun

Abstract: Considering event structure information has proven helpful in text-based stock movement prediction. However, existing works mainly adopt the coarse-grained events, which loses the specific semantic information of diverse event types. In this work, we propose to incorporate the fine-grained events in stock movement prediction. Firstly, we propose a professional finance event dictionary built by dom… ▽ More Considering event structure information has proven helpful in text-based stock movement prediction. However, existing works mainly adopt the coarse-grained events, which loses the specific semantic information of diverse event types. In this work, we propose to incorporate the fine-grained events in stock movement prediction. Firstly, we propose a professional finance event dictionary built by domain experts and use it to extract fine-grained events automatically from finance news. Then we design a neural model to combine finance news with fine-grained event structure and stock trade data to predict the stock movement. Besides, in order to improve the generalizability of the proposed method, we design an advanced model that uses the extracted fine-grained events as the distant supervised label to train a multi-task framework of event extraction and stock prediction. The experimental results show that our method outperforms all the baselines and has good generalizability. △ Less

Submitted 11 October, 2019; originally announced October 2019.

Comments: Accepted by 2th ECONLP workshop in EMNLP2019

arXiv:1910.05032 [pdf, other]

Group, Extract and Aggregate: Summarizing a Large Amount of Finance News for Forex Movement Prediction

Authors: Deli Chen, Shuming ma, Keiko Harimoto, Ruihan Bao, Qi Su, Xu Sun

Abstract: Incorporating related text information has proven successful in stock market prediction. However, it is a huge challenge to utilize texts in the enormous forex (foreign currency exchange) market because the associated texts are too redundant. In this work, we propose a BERT-based Hierarchical Aggregation Model to summarize a large amount of finance news to predict forex movement. We firstly group… ▽ More Incorporating related text information has proven successful in stock market prediction. However, it is a huge challenge to utilize texts in the enormous forex (foreign currency exchange) market because the associated texts are too redundant. In this work, we propose a BERT-based Hierarchical Aggregation Model to summarize a large amount of finance news to predict forex movement. We firstly group news from different aspects: time, topic and category. Then we extract the most crucial news in each group by the SOTA extractive summarization method. Finally, we conduct interaction between the news and the trade data with attention to predict the forex movement. The experimental results show that the category based method performs best among three grouping methods and outperforms all the baselines. Besides, we study the influence of essential news attributes (category and region) by statistical analysis and summarize the influence patterns for different currency pairs. △ Less

Submitted 11 October, 2019; originally announced October 2019.

Comments: Accepted by 2th ECONLP workshop in EMNLP2019

arXiv:1807.06093 [pdf, ps, other]

Prognostics Estimations with Dynamic States

Authors: Rong-Jing Bao, Hai-Jun Rong, Zhi-Xin Yang, Badong Chen

Abstract: The health state assessment and remaining useful life (RUL) estimation play very important roles in prognostics and health management (PHM), owing to their abilities to reduce the maintenance and improve the safety of machines or equipment. However, they generally suffer from this problem of lacking prior knowledge to pre-define the exact failure thresholds for a machinery operating in a dynamic e… ▽ More The health state assessment and remaining useful life (RUL) estimation play very important roles in prognostics and health management (PHM), owing to their abilities to reduce the maintenance and improve the safety of machines or equipment. However, they generally suffer from this problem of lacking prior knowledge to pre-define the exact failure thresholds for a machinery operating in a dynamic environment with a high level of uncertainty. In this case, dynamic thresholds depicted by the discrete states is a very attractive way to estimate the RUL of a dynamic machinery. Currently, there are only very few works considering the dynamic thresholds, and these studies adopted different algorithms to determine the discrete states and predict the continuous states separately, which largely increases the complexity of the learning process. In this paper, we propose a novel prognostics approach for RUL estimation of aero-engines with self-joint prediction of continuous and discrete states, wherein the prediction of continuous and discrete states are conducted simultaneously and dynamically within one learning framework. △ Less

Submitted 23 September, 2018; v1 submitted 16 July, 2018; originally announced July 2018.

arXiv:1805.07862 [pdf, other]

Featurized Bidirectional GAN: Adversarial Defense via Adversarially Learned Semantic Inference

Authors: Ruying Bao, Sihang Liang, Qingcan Wang

Abstract: Deep neural networks have been demonstrated to be vulnerable to adversarial attacks, where small perturbations intentionally added to the original inputs can fool the classifier. In this paper, we propose a defense method, Featurized Bidirectional Generative Adversarial Networks (FBGAN), to extract the semantic features of the input and filter the non-semantic perturbation. FBGAN is pre-trained on… ▽ More Deep neural networks have been demonstrated to be vulnerable to adversarial attacks, where small perturbations intentionally added to the original inputs can fool the classifier. In this paper, we propose a defense method, Featurized Bidirectional Generative Adversarial Networks (FBGAN), to extract the semantic features of the input and filter the non-semantic perturbation. FBGAN is pre-trained on the clean dataset in an unsupervised manner, adversarially learning a bidirectional mapping between the high-dimensional data space and the low-dimensional semantic space; also mutual information is applied to disentangle the semantically meaningful features. After the bidirectional mapping, the adversarial data can be reconstructed to denoised data, which could be fed into any pre-trained classifier. We empirically show the quality of reconstruction images and the effectiveness of defense. △ Less

Submitted 29 September, 2018; v1 submitted 20 May, 2018; originally announced May 2018.

arXiv:1802.00237 [pdf, other]

Face Aging with Contextual Generative Adversarial Nets

Authors: Si Liu, Yao Sun, Defa Zhu, Renda Bao, Wei Wang, Xiangbo Shu, Shuicheng Yan

Abstract: Face aging, which renders aging faces for an input face, has attracted extensive attention in the multimedia research. Recently, several conditional Generative Adversarial Nets (GANs) based methods have achieved great success. They can generate images fitting the real face distributions conditioned on each individual age group. However, these methods fail to capture the transition patterns, e.g.,… ▽ More Face aging, which renders aging faces for an input face, has attracted extensive attention in the multimedia research. Recently, several conditional Generative Adversarial Nets (GANs) based methods have achieved great success. They can generate images fitting the real face distributions conditioned on each individual age group. However, these methods fail to capture the transition patterns, e.g., the gradual shape and texture changes between adjacent age groups. In this paper, we propose a novel Contextual Generative Adversarial Nets (C-GANs) to specifically take it into consideration. The C-GANs consists of a conditional transformation network and two discriminative networks. The conditional transformation network imitates the aging procedure with several specially designed residual blocks. The age discriminative network guides the synthesized face to fit the real conditional distribution. The transition pattern discriminative network is novel, aiming to distinguish the real transition patterns with the fake ones. It serves as an extra regularization term for the conditional transformation network, ensuring the generated image pairs to fit the corresponding real transition pattern distribution. Experimental results demonstrate the proposed framework produces appealing results by comparing with the state-of-the-art and ground truth. We also observe performance gain for cross-age face verification. △ Less

Submitted 1 February, 2018; originally announced February 2018.

Comments: accepted at ACM Multimedia 2017

arXiv:1611.09587 [pdf, other]

Surveillance Video Parsing with Single Frame Supervision

Authors: Si Liu, Changhu Wang, Ruihe Qian, Han Yu, Renda Bao

Abstract: Surveillance video parsing, which segments the video frames into several labels, e.g., face, pants, left-leg, has wide applications. However,pixel-wisely annotating all frames is tedious and inefficient. In this paper, we develop a Single frame Video Parsing (SVP) method which requires only one labeled frame per video in training stage. To parse one particular frame, the video segment preceding th… ▽ More Surveillance video parsing, which segments the video frames into several labels, e.g., face, pants, left-leg, has wide applications. However,pixel-wisely annotating all frames is tedious and inefficient. In this paper, we develop a Single frame Video Parsing (SVP) method which requires only one labeled frame per video in training stage. To parse one particular frame, the video segment preceding the frame is jointly considered. SVP (1) roughly parses the frames within the video segment, (2) estimates the optical flow between frames and (3) fuses the rough parsing results warped by optical flow to produce the refined parsing result. The three components of SVP, namely frame parsing, optical flow estimation and temporal fusion are integrated in an end-to-end manner. Experimental results on two surveillance video datasets show the superiority of SVP over state-of-the-arts. △ Less

Submitted 29 November, 2016; originally announced November 2016.

Showing 1–47 of 47 results for author: Bao, R