subscribe to arXiv mailings

MVOC: a training-free multiple video object composition method with diffusion models

Authors: Wei Wang, Yaosen Chen, Yuegen Liu, Qi Yuan, Shubin Yang, Yanru Zhang

Abstract: Video composition is the core task of video editing. Although image composition based on diffusion models has been highly successful, it is not straightforward to extend the achievement to video object composition tasks, which not only exhibit corresponding interaction effects but also ensure that the objects in the composited video maintain motion and identity consistency, which is necessary to c… ▽ More Video composition is the core task of video editing. Although image composition based on diffusion models has been highly successful, it is not straightforward to extend the achievement to video object composition tasks, which not only exhibit corresponding interaction effects but also ensure that the objects in the composited video maintain motion and identity consistency, which is necessary to composite a physical harmony video. To address this challenge, we propose a Multiple Video Object Composition (MVOC) method based on diffusion models. Specifically, we first perform DDIM inversion on each video object to obtain the corresponding noise features. Secondly, we combine and edit each object by image editing methods to obtain the first frame of the composited video. Finally, we use the image-to-video generation model to composite the video with feature and attention injections in the Video Object Dependence Module, which is a training-free conditional guidance operation for video generation, and enables the coordination of features and attention maps between various objects that can be non-independent in the composited video. The final generative model not only constrains the objects in the generated video to be consistent with the original object motion and identity, but also introduces interaction effects between objects. Extensive experiments have demonstrated that the proposed method outperforms existing state-of-the-art approaches. Project page: https://sobeymil.github.io/mvoc.com. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2406.06603 [pdf, other]

FPN-fusion: Enhanced Linear Complexity Time Series Forecasting Model

Authors: Chu Li, Pingjia Xiao, Qiping Yuan

Abstract: This study presents a novel time series prediction model, FPN-fusion, designed with linear computational complexity, demonstrating superior predictive performance compared to DLiner without increasing parameter count or computational demands. Our model introduces two key innovations: first, a Feature Pyramid Network (FPN) is employed to effectively capture time series data characteristics, bypassi… ▽ More This study presents a novel time series prediction model, FPN-fusion, designed with linear computational complexity, demonstrating superior predictive performance compared to DLiner without increasing parameter count or computational demands. Our model introduces two key innovations: first, a Feature Pyramid Network (FPN) is employed to effectively capture time series data characteristics, bypassing the traditional decomposition into trend and seasonal components. Second, a multi-level fusion structure is developed to integrate deep and shallow features seamlessly. Empirically, FPN-fusion outperforms DLiner in 31 out of 32 test cases on eight open-source datasets, with an average reduction of 16.8% in mean squared error (MSE) and 11.8% in mean absolute error (MAE). Additionally, compared to the transformer-based PatchTST, FPN-fusion achieves 10 best MSE and 15 best MAE results, using only 8% of PatchTST's total computational load in the 32 test projects. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: FPN,time series,fusion. arXiv admin note: text overlap with arXiv:2401.03001 by other authors

arXiv:2405.04964 [pdf, other]

Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution

Authors: Yi Xiao, Qiangqiang Yuan, Kui Jiang, Yuzeng Chen, Qiang Zhang, Chia-Wen Lin

Abstract: Recent progress in remote sensing image (RSI) super-resolution (SR) has exhibited remarkable performance using deep neural networks, e.g., Convolutional Neural Networks and Transformers. However, existing SR methods often suffer from either a limited receptive field or quadratic computational overhead, resulting in sub-optimal global representation and unacceptable computational costs in large-sca… ▽ More Recent progress in remote sensing image (RSI) super-resolution (SR) has exhibited remarkable performance using deep neural networks, e.g., Convolutional Neural Networks and Transformers. However, existing SR methods often suffer from either a limited receptive field or quadratic computational overhead, resulting in sub-optimal global representation and unacceptable computational costs in large-scale RSI. To alleviate these issues, we develop the first attempt to integrate the Vision State Space Model (Mamba) for RSI-SR, which specializes in processing large-scale RSI by capturing long-range dependency with linear complexity. To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR, to explore the spatial and frequent correlations. In particular, our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM) to grasp their merits for effective spatial-frequency fusion. Recognizing that global and local dependencies are complementary and both beneficial for SR, we further recalibrate these multi-level features for accurate feature fusion via learnable scaling adaptors. Extensive experiments on AID, DOTA, and DIOR benchmarks demonstrate that our FMSR outperforms state-of-the-art Transformer-based methods HAT-L in terms of PSNR by 0.11 dB on average, while consuming only 28.05% and 19.08% of its memory consumption and complexity, respectively. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution

arXiv:2405.02826 [pdf, other]

Nip in the Bud: Forecasting and Interpreting Post-exploitation Attacks in Real-time through Cyber Threat Intelligence Reports

Authors: Tiantian Zhu, Jie Ying, Tieming Chen, Chunlin Xiong, Wenrui Cheng, Qixuan Yuan, Aohan Zheng, Mingqi Lv, Yan Chen

Abstract: Advanced Persistent Threat (APT) attacks have caused significant damage worldwide. Various Endpoint Detection and Response (EDR) systems are deployed by enterprises to fight against potential threats. However, EDR suffers from high false positives. In order not to affect normal operations, analysts need to investigate and filter detection results before taking countermeasures, in which heavy manua… ▽ More Advanced Persistent Threat (APT) attacks have caused significant damage worldwide. Various Endpoint Detection and Response (EDR) systems are deployed by enterprises to fight against potential threats. However, EDR suffers from high false positives. In order not to affect normal operations, analysts need to investigate and filter detection results before taking countermeasures, in which heavy manual labor and alarm fatigue cause analysts miss optimal response time, thereby leading to information leakage and destruction. Therefore, we propose Endpoint Forecasting and Interpreting (EFI), a real-time attack forecast and interpretation system, which can automatically predict next move during post-exploitation and explain it in technique-level, then dispatch strategies to EDR for advance reinforcement. First, we use Cyber Threat Intelligence (CTI) reports to extract the attack scene graph (ASG) that can be mapped to low-level system logs to strengthen attack samples. Second, we build a serialized graph forecast model, which is combined with the attack provenance graph (APG) provided by EDR to generate an attack forecast graph (AFG) to predict the next move. Finally, we utilize the attack template graph (ATG) and graph alignment plus algorithm for technique-level interpretation to automatically dispatch strategies for EDR to reinforce system in advance. EFI can avoid the impact of existing EDR false positives, and can reduce the attack surface of system without affecting the normal operations. We collect a total of 3,484 CTI reports, generate 1,429 ASGs, label 8,000 sentences, tag 10,451 entities, and construct 256 ATGs. Experimental results on both DARPA Engagement and large scale CTI dataset show that the alignment score between the AFG predicted by EFI and the real attack graph is able to exceed 0.8, the forecast and interpretation precision of EFI can reach 91.8%. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2405.02629 [pdf, other]

SPARSE: Semantic Tracking and Path Analysis for Attack Investigation in Real-time

Authors: Jie Ying, Tiantian Zhu, Wenrui Cheng, Qixuan Yuan, Mingjun Ma, Chunlin Xiong, Tieming Chen, Mingqi Lv, Yan Chen

Abstract: As the complexity and destructiveness of Advanced Persistent Threat (APT) increase, there is a growing tendency to identify a series of actions undertaken to achieve the attacker's target, called attack investigation. Currently, analysts construct the provenance graph to perform causality analysis on Point-Of-Interest (POI) event for capturing critical events (related to the attack). However, due… ▽ More As the complexity and destructiveness of Advanced Persistent Threat (APT) increase, there is a growing tendency to identify a series of actions undertaken to achieve the attacker's target, called attack investigation. Currently, analysts construct the provenance graph to perform causality analysis on Point-Of-Interest (POI) event for capturing critical events (related to the attack). However, due to the vast size of the provenance graph and the rarity of critical events, existing attack investigation methods suffer from problems of high false positives, high overhead, and high latency. To this end, we propose SPARSE, an efficient and real-time system for constructing critical component graphs (i.e., consisting of critical events) from streaming logs. Our key observation is 1) Critical events exist in a suspicious semantic graph (SSG) composed of interaction flows between suspicious entities, and 2) Information flows that accomplish attacker's goal exist in the form of paths. Therefore, SPARSE uses a two-stage framework to implement attack investigation (i.e., constructing the SSG and performing path-level contextual analysis). First, SPARSE operates in a state-based mode where events are consumed as streams, allowing easy access to the SSG related to the POI event through semantic transfer rule and storage strategy. Then, SPARSE identifies all suspicious flow paths (SFPs) related to the POI event from the SSG, quantifies the influence of each path to filter irrelevant events. Our evaluation on a real large-scale attack dataset shows that SPARSE can generate a critical component graph (~ 113 edges) in 1.6 seconds, which is 2014 X smaller than the backtracking graph (~ 227,589 edges). SPARSE is 25 X more effective than other state-of-the-art techniques in filtering irrelevant edges. △ Less

Submitted 4 May, 2024; originally announced May 2024.

arXiv:2404.16313 [pdf, ps, other]

Further Investigations on Nonlinear Complexity of Periodic Binary Sequences

Authors: Qin Yuan, Chunlei Li, Xiangyong Zeng, Tor Helleseth, Debiao He

Abstract: Nonlinear complexity is an important measure for assessing the randomness of sequences. In this paper we investigate how circular shifts affect the nonlinear complexities of finite-length binary sequences and then reveal a more explicit relation between nonlinear complexities of finite-length binary sequences and their corresponding periodic sequences. Based on the relation, we propose two algorit… ▽ More Nonlinear complexity is an important measure for assessing the randomness of sequences. In this paper we investigate how circular shifts affect the nonlinear complexities of finite-length binary sequences and then reveal a more explicit relation between nonlinear complexities of finite-length binary sequences and their corresponding periodic sequences. Based on the relation, we propose two algorithms that can generate all periodic binary sequences with any prescribed nonlinear complexity. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.09624 [pdf, other]

AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception

Authors: Yipo Huang, Xiangfei Sheng, Zhichao Yang, Quan Yuan, Zhichao Duan, Pengfei Chen, Leida Li, Weisi Lin, Guangming Shi

Abstract: The highly abstract nature of image aesthetics perception (IAP) poses significant challenge for current multimodal large language models (MLLMs). The lack of human-annotated multi-modality aesthetic data further exacerbates this dilemma, resulting in MLLMs falling short of aesthetics perception capabilities. To address the above challenge, we first introduce a comprehensively annotated Aesthetic M… ▽ More The highly abstract nature of image aesthetics perception (IAP) poses significant challenge for current multimodal large language models (MLLMs). The lack of human-annotated multi-modality aesthetic data further exacerbates this dilemma, resulting in MLLMs falling short of aesthetics perception capabilities. To address the above challenge, we first introduce a comprehensively annotated Aesthetic Multi-Modality Instruction Tuning (AesMMIT) dataset, which serves as the footstone for building multi-modality aesthetics foundation models. Specifically, to align MLLMs with human aesthetics perception, we construct a corpus-rich aesthetic critique database with 21,904 diverse-sourced images and 88K human natural language feedbacks, which are collected via progressive questions, ranging from coarse-grained aesthetic grades to fine-grained aesthetic descriptions. To ensure that MLLMs can handle diverse queries, we further prompt GPT to refine the aesthetic critiques and assemble the large-scale aesthetic instruction tuning dataset, i.e. AesMMIT, which consists of 409K multi-typed instructions to activate stronger aesthetic capabilities. Based on the AesMMIT database, we fine-tune the open-sourced general foundation models, achieving multi-modality Aesthetic Expert models, dubbed AesExpert. Extensive experiments demonstrate that the proposed AesExpert models deliver significantly better aesthetic perception performances than the state-of-the-art MLLMs, including the most advanced GPT-4V and Gemini-Pro-Vision. Source data will be available at https://github.com/yipoh/AesExpert. △ Less

Submitted 18 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

arXiv:2403.17853 [pdf, other]

Using Domain Knowledge to Guide Dialog Structure Induction via Neural Probabilistic Soft Logic

Authors: Connor Pryor, Quan Yuan, Jeremiah Liu, Mehran Kazemi, Deepak Ramachandran, Tania Bedrax-Weiss, Lise Getoor

Abstract: Dialog Structure Induction (DSI) is the task of inferring the latent dialog structure (i.e., a set of dialog states and their temporal transitions) of a given goal-oriented dialog. It is a critical component for modern dialog system design and discourse analysis. Existing DSI approaches are often purely data-driven, deploy models that infer latent states without access to domain knowledge, underpe… ▽ More Dialog Structure Induction (DSI) is the task of inferring the latent dialog structure (i.e., a set of dialog states and their temporal transitions) of a given goal-oriented dialog. It is a critical component for modern dialog system design and discourse analysis. Existing DSI approaches are often purely data-driven, deploy models that infer latent states without access to domain knowledge, underperform when the training corpus is limited/noisy, or have difficulty when test dialogs exhibit distributional shifts from the training domain. This work explores a neural-symbolic approach as a potential solution to these problems. We introduce Neural Probabilistic Soft Logic Dialogue Structure Induction (NEUPSL DSI), a principled approach that injects symbolic knowledge into the latent space of a generative neural model. We conduct a thorough empirical investigation on the effect of NEUPSL DSI learning on hidden representation quality, few-shot learning, and out-of-domain generalization performance. Over three dialog structure induction datasets and across unsupervised and semi-supervised settings for standard and cross-domain generalization, the injection of symbolic knowledge using NEUPSL DSI provides a consistent boost in performance over the canonical baselines. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2401.08276 [pdf, other]

AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception

Authors: Yipo Huang, Quan Yuan, Xiangfei Sheng, Zhichao Yang, Haoning Wu, Pengfei Chen, Yuzhe Yang, Leida Li, Weisi Lin

Abstract: With collective endeavors, multimodal large language models (MLLMs) are undergoing a flourishing development. However, their performances on image aesthetics perception remain indeterminate, which is highly desired in real-world applications. An obvious obstacle lies in the absence of a specific benchmark to evaluate the effectiveness of MLLMs on aesthetic perception. This blind groping may impede… ▽ More With collective endeavors, multimodal large language models (MLLMs) are undergoing a flourishing development. However, their performances on image aesthetics perception remain indeterminate, which is highly desired in real-world applications. An obvious obstacle lies in the absence of a specific benchmark to evaluate the effectiveness of MLLMs on aesthetic perception. This blind groping may impede the further development of more advanced MLLMs with aesthetic perception capacity. To address this dilemma, we propose AesBench, an expert benchmark aiming to comprehensively evaluate the aesthetic perception capacities of MLLMs through elaborate design across dual facets. (1) We construct an Expert-labeled Aesthetics Perception Database (EAPD), which features diversified image contents and high-quality annotations provided by professional aesthetic experts. (2) We propose a set of integrative criteria to measure the aesthetic perception abilities of MLLMs from four perspectives, including Perception (AesP), Empathy (AesE), Assessment (AesA) and Interpretation (AesI). Extensive experimental results underscore that the current MLLMs only possess rudimentary aesthetic perception ability, and there is still a significant gap between MLLMs and humans. We hope this work can inspire the community to engage in deeper explorations on the aesthetic potentials of MLLMs. Source data will be available at https://github.com/yipoh/AesBench. △ Less

Submitted 16 January, 2024; originally announced January 2024.

arXiv:2401.07139 [pdf, other]

doi 10.1109/TGRS.2023.3291822

Deep Blind Super-Resolution for Satellite Video

Authors: Yi Xiao, Qiangqiang Yuan, Qiang Zhang, Liangpei Zhang

Abstract: Recent efforts have witnessed remarkable progress in Satellite Video Super-Resolution (SVSR). However, most SVSR methods usually assume the degradation is fixed and known, e.g., bicubic downsampling, which makes them vulnerable in real-world scenes with multiple and unknown degradations. To alleviate this issue, blind SR has thus become a research hotspot. Nevertheless, existing approaches are mai… ▽ More Recent efforts have witnessed remarkable progress in Satellite Video Super-Resolution (SVSR). However, most SVSR methods usually assume the degradation is fixed and known, e.g., bicubic downsampling, which makes them vulnerable in real-world scenes with multiple and unknown degradations. To alleviate this issue, blind SR has thus become a research hotspot. Nevertheless, existing approaches are mainly engaged in blur kernel estimation while losing sight of another critical aspect for VSR tasks: temporal compensation, especially compensating for blurry and smooth pixels with vital sharpness from severely degraded satellite videos. Therefore, this paper proposes a practical Blind SVSR algorithm (BSVSR) to explore more sharp cues by considering the pixel-wise blur levels in a coarse-to-fine manner. Specifically, we employed multi-scale deformable convolution to coarsely aggregate the temporal redundancy into adjacent frames by window-slid progressive fusion. Then the adjacent features are finely merged into mid-feature using deformable attention, which measures the blur levels of pixels and assigns more weights to the informative pixels, thus inspiring the representation of sharpness. Moreover, we devise a pyramid spatial transformation module to adjust the solution space of sharp mid-feature, resulting in flexible feature adaptation in multi-level domains. Quantitative and qualitative evaluations on both simulated and real-world satellite videos demonstrate that our BSVSR performs favorably against state-of-the-art non-blind and blind SR models. Code will be available at https://github.com/XY-boy/Blind-Satellite-VSR △ Less

Submitted 13 January, 2024; originally announced January 2024.

Comments: Published in IEEE TGRS

Journal ref: IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1-16, 2023, Art no. 5516316

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2311.13622 [pdf, other]

TDiffDe: A Truncated Diffusion Model for Remote Sensing Hyperspectral Image Denoising

Authors: Jiang He, Yajie Li, Jie L, Qiangqiang Yuan

Abstract: Hyperspectral images play a crucial role in precision agriculture, environmental monitoring or ecological analysis. However, due to sensor equipment and the imaging environment, the observed hyperspectral images are often inevitably corrupted by various noise. In this study, we proposed a truncated diffusion model, called TDiffDe, to recover the useful information in hyperspectral images gradually… ▽ More Hyperspectral images play a crucial role in precision agriculture, environmental monitoring or ecological analysis. However, due to sensor equipment and the imaging environment, the observed hyperspectral images are often inevitably corrupted by various noise. In this study, we proposed a truncated diffusion model, called TDiffDe, to recover the useful information in hyperspectral images gradually. Rather than starting from a pure noise, the input data contains image information in hyperspectral image denoising. Thus, we cut the trained diffusion model from small steps to avoid the destroy of valid information. △ Less

Submitted 22 November, 2023; originally announced November 2023.

arXiv:2310.19288 [pdf, other]

EDiffSR: An Efficient Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution

Authors: Yi Xiao, Qiangqiang Yuan, Kui Jiang, Jiang He, Xianyu Jin, Liangpei Zhang

Abstract: Recently, convolutional networks have achieved remarkable development in remote sensing image Super-Resoltuion (SR) by minimizing the regression objectives, e.g., MSE loss. However, despite achieving impressive performance, these methods often suffer from poor visual quality with over-smooth issues. Generative adversarial networks have the potential to infer intricate details, but they are easy to… ▽ More Recently, convolutional networks have achieved remarkable development in remote sensing image Super-Resoltuion (SR) by minimizing the regression objectives, e.g., MSE loss. However, despite achieving impressive performance, these methods often suffer from poor visual quality with over-smooth issues. Generative adversarial networks have the potential to infer intricate details, but they are easy to collapse, resulting in undesirable artifacts. To mitigate these issues, in this paper, we first introduce Diffusion Probabilistic Model (DPM) for efficient remote sensing image SR, dubbed EDiffSR. EDiffSR is easy to train and maintains the merits of DPM in generating perceptual-pleasant images. Specifically, different from previous works using heavy UNet for noise prediction, we develop an Efficient Activation Network (EANet) to achieve favorable noise prediction performance by simplified channel attention and simple gate operation, which dramatically reduces the computational budget. Moreover, to introduce more valuable prior knowledge into the proposed EDiffSR, a practical Conditional Prior Enhancement Module (CPEM) is developed to help extract an enriched condition. Unlike most DPM-based SR models that directly generate conditions by amplifying LR images, the proposed CPEM helps to retain more informative cues for accurate SR. Extensive experiments on four remote sensing datasets demonstrate that EDiffSR can restore visual-pleasant images on simulated and real-world remote sensing images, both quantitatively and qualitatively. The code of EDiffSR will be available at https://github.com/XY-boy/EDiffSR △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: Submitted to IEEE TGRS

arXiv:2309.16372 [pdf, other]

Aperture Diffraction for Compact Snapshot Spectral Imaging

Authors: Tao Lv, Hao Ye, Quan Yuan, Zhan Shi, Yibo Wang, Shuming Wang, Xun Cao

Abstract: We demonstrate a compact, cost-effective snapshot spectral imaging system named Aperture Diffraction Imaging Spectrometer (ADIS), which consists only of an imaging lens with an ultra-thin orthogonal aperture mask and a mosaic filter sensor, requiring no additional physical footprint compared to common RGB cameras. Then we introduce a new optical design that each point in the object space is multip… ▽ More We demonstrate a compact, cost-effective snapshot spectral imaging system named Aperture Diffraction Imaging Spectrometer (ADIS), which consists only of an imaging lens with an ultra-thin orthogonal aperture mask and a mosaic filter sensor, requiring no additional physical footprint compared to common RGB cameras. Then we introduce a new optical design that each point in the object space is multiplexed to discrete encoding locations on the mosaic filter sensor by diffraction-based spatial-spectral projection engineering generated from the orthogonal mask. The orthogonal projection is uniformly accepted to obtain a weakly calibration-dependent data form to enhance modulation robustness. Meanwhile, the Cascade Shift-Shuffle Spectral Transformer (CSST) with strong perception of the diffraction degeneration is designed to solve a sparsity-constrained inverse problem, realizing the volume reconstruction from 2D measurements with Large amount of aliasing. Our system is evaluated by elaborating the imaging optical theory and reconstruction algorithm with demonstrating the experimental imaging under a single exposure. Ultimately, we achieve the sub-super-pixel spatial resolution and high spectral resolution imaging. The code will be available at: https://github.com/Krito-ex/CSST. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: accepted by International Conference on Computer Vision (ICCV) 2023

arXiv:2308.15299 [pdf, other]

TaskLAMA: Probing the Complex Task Understanding of Language Models

Authors: Quan Yuan, Mehran Kazemi, Xin Xu, Isaac Noble, Vaiva Imbrasaite, Deepak Ramachandran

Abstract: Structured Complex Task Decomposition (SCTD) is the problem of breaking down a complex real-world task (such as planning a wedding) into a directed acyclic graph over individual steps that contribute to achieving the task, with edges specifying temporal dependencies between them. SCTD is an important component of assistive planning tools, and a challenge for commonsense reasoning systems. We probe… ▽ More Structured Complex Task Decomposition (SCTD) is the problem of breaking down a complex real-world task (such as planning a wedding) into a directed acyclic graph over individual steps that contribute to achieving the task, with edges specifying temporal dependencies between them. SCTD is an important component of assistive planning tools, and a challenge for commonsense reasoning systems. We probe how accurately SCTD can be done with the knowledge extracted from Large Language Models (LLMs). We introduce a high-quality human-annotated dataset for this problem and novel metrics to fairly assess performance of LLMs against several baselines. Our experiments reveal that LLMs are able to decompose complex tasks into individual steps effectively, with a relative improvement of 15% to 280% over the best baseline. We also propose a number of approaches to further improve their performance, with a relative improvement of 7% to 37% over the base model. However, we find that LLMs still struggle to predict pairwise temporal dependencies, which reveals a gap in their understanding of complex tasks. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2307.00729 [pdf, other]

An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023

Authors: Sheng Zhao, Qilong Yuan, Yibo Duan, Zhuoyue Chen

Abstract: The task of synthetic speech generation is to generate language content from a given text, then simulating fake human voice.The key factors that determine the effect of synthetic speech generation mainly include speed of generation, accuracy of word segmentation, naturalness of synthesized speech, etc. This paper builds an end-to-end multi-module synthetic speech generation model, including speake… ▽ More The task of synthetic speech generation is to generate language content from a given text, then simulating fake human voice.The key factors that determine the effect of synthetic speech generation mainly include speed of generation, accuracy of word segmentation, naturalness of synthesized speech, etc. This paper builds an end-to-end multi-module synthetic speech generation model, including speaker encoder, synthesizer based on Tacotron2, and vocoder based on WaveRNN. In addition, we perform a lot of comparative experiments on different datasets and various model structures. Finally, we won the first place in the ADD 2023 challenge Track 1.1 with the weighted deception success rate (WDSR) of 44.97%. △ Less

Submitted 2 July, 2023; originally announced July 2023.

arXiv:2306.09245 [pdf]

doi 10.1504/IJBIC.2023.133505

Image encryption for Offshore wind power based on 2D-LCLM and Zhou Yi Eight Trigrams

Authors: Lei Kou, Jinbo Wu, Fangfang Zhang, Peng Ji, Wende Ke, Junhe Wan, Hailin Liu, Yang Li, Quande Yuan

Abstract: Offshore wind power is an important part of the new power system, due to the complex and changing situation at ocean, its normal operation and maintenance cannot be done without information such as images, therefore, it is especially important to transmit the correct image in the process of information transmission. In this paper, we propose a new encryption algorithm for offshore wind power based… ▽ More Offshore wind power is an important part of the new power system, due to the complex and changing situation at ocean, its normal operation and maintenance cannot be done without information such as images, therefore, it is especially important to transmit the correct image in the process of information transmission. In this paper, we propose a new encryption algorithm for offshore wind power based on two-dimensional lagged complex logistic mapping (2D-LCLM) and Zhou Yi Eight Trigrams. Firstly, the initial value of the 2D-LCLM is constructed by the Sha-256 to associate the 2D-LCLM with the plaintext. Secondly, a new encryption rule is proposed from the Zhou Yi Eight Trigrams to obfuscate the pixel values and generate the round key. Then, 2D-LCLM is combined with the Zigzag to form an S-box. Finally, the simulation experiment of the algorithm is accomplished. The experimental results demonstrate that the algorithm can resistant common attacks and has prefect encryption performance. △ Less

Submitted 27 June, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

Comments: accepted by Int. J. of Bio-Inspired Computation

MSC Class: 68P25 ACM Class: E.3

Journal ref: International Journal of Bio-Inspired Computation.vol. 22, no. 1,pp 53-64 (2023)

arXiv:2306.07934 [pdf, other]

BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information

Authors: Mehran Kazemi, Quan Yuan, Deepti Bhatia, Najoung Kim, Xin Xu, Vaiva Imbrasaite, Deepak Ramachandran

Abstract: Automated reasoning with unstructured natural text is a key requirement for many potential applications of NLP and for developing robust AI systems. Recently, Language Models (LMs) have demonstrated complex reasoning capacities even without any finetuning. However, existing evaluation for automated reasoning assumes access to a consistent and coherent set of information over which models reason. W… ▽ More Automated reasoning with unstructured natural text is a key requirement for many potential applications of NLP and for developing robust AI systems. Recently, Language Models (LMs) have demonstrated complex reasoning capacities even without any finetuning. However, existing evaluation for automated reasoning assumes access to a consistent and coherent set of information over which models reason. When reasoning in the real-world, the available information is frequently inconsistent or contradictory, and therefore models need to be equipped with a strategy to resolve such conflicts when they arise. One widely-applicable way of resolving conflicts is to impose preferences over information sources (e.g., based on source credibility or information recency) and adopt the source with higher preference. In this paper, we formulate the problem of reasoning with contradictory information guided by preferences over sources as the classical problem of defeasible reasoning, and develop a dataset called BoardgameQA for measuring the reasoning capacity of LMs in this setting. BoardgameQA also incorporates reasoning with implicit background knowledge, to better reflect reasoning problems in downstream applications. We benchmark various LMs on BoardgameQA and the results reveal a significant gap in the reasoning capacity of state-of-the-art LMs on this problem, showing that reasoning with conflicting information does not surface out-of-the-box in LMs. While performance can be improved with finetuning, it nevertheless remains poor. △ Less

Submitted 13 June, 2023; originally announced June 2023.

arXiv:2305.13918 [pdf]

Development and Whole-Body Validation of Personalizable Female and Male Pedestrian SAFER Human Body Models

Authors: Natalia Lindgren, Qiantailang Yuan, Bengt Pipkorn, Svein Kleiven, Xiaogai Li

Abstract: Vulnerable road users are overrepresented in the worldwide number of road-traffic injury victims. Developing biofidelic male and female pedestrian HBMs representing a range of anthropometries is imperative to follow through with the efforts to increase road safety and propose intervention strategies. In this study, a 50th percentile male and female pedestrian of the SAFER HBM was developed via a n… ▽ More Vulnerable road users are overrepresented in the worldwide number of road-traffic injury victims. Developing biofidelic male and female pedestrian HBMs representing a range of anthropometries is imperative to follow through with the efforts to increase road safety and propose intervention strategies. In this study, a 50th percentile male and female pedestrian of the SAFER HBM was developed via a newly developed image registration-based mesh morphing framework for subject personalization. The HBM and its accompanied personalization framework were evaluated by means of a set of cadaver experiments, where subjects were struck laterally by a generic sedan buck. In the simulated whole-body pedestrian collisions, the personalized HBMs demonstrate a good capability of reproducing the trajectories and head kinematics observed in lateral impacts. The presented pedestrian HBMs and personalization framework provide robust means to thoroughly and accurately reconstruct and evaluate pedestrian-to-vehicle collisions. △ Less

Submitted 11 May, 2023; originally announced May 2023.

arXiv:2304.04421 [pdf, other]

doi 10.1109/TCSVT.2023.3312321

Local-Global Temporal Difference Learning for Satellite Video Super-Resolution

Authors: Yi Xiao, Qiangqiang Yuan, Kui Jiang, Xianyu Jin, Jiang He, Liangpei Zhang, Chia-wen Lin

Abstract: Optical-flow-based and kernel-based approaches have been extensively explored for temporal compensation in satellite Video Super-Resolution (VSR). However, these techniques are less generalized in large-scale or complex scenarios, especially in satellite videos. In this paper, we propose to exploit the well-defined temporal difference for efficient and effective temporal compensation. To fully uti… ▽ More Optical-flow-based and kernel-based approaches have been extensively explored for temporal compensation in satellite Video Super-Resolution (VSR). However, these techniques are less generalized in large-scale or complex scenarios, especially in satellite videos. In this paper, we propose to exploit the well-defined temporal difference for efficient and effective temporal compensation. To fully utilize the local and global temporal information within frames, we systematically modeled the short-term and long-term temporal discrepancies since we observed that these discrepancies offer distinct and mutually complementary properties. Specifically, we devise a Short-term Temporal Difference Module (S-TDM) to extract local motion representations from RGB difference maps between adjacent frames, which yields more clues for accurate texture representation. To explore the global dependency in the entire frame sequence, a Long-term Temporal Difference Module (L-TDM) is proposed, where the differences between forward and backward segments are incorporated and activated to guide the modulation of the temporal feature, leading to a holistic global compensation. Moreover, we further propose a Difference Compensation Unit (DCU) to enrich the interaction between the spatial distribution of the target frame and temporal compensated results, which helps maintain spatial consistency while refining the features to avoid misalignment. Rigorous objective and subjective evaluations conducted across five mainstream video satellites demonstrate that our method performs favorably against state-of-the-art approaches. Code will be available at https://github.com/XY-boy/LGTD △ Less

Submitted 30 October, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

Comments: Accepted by IEEE TCSVT

Journal ref: IEEE Transactions on Circuits and Systems for Video Technology, 2023

arXiv:2304.02401 [pdf, other]

PrivGraph: Differentially Private Graph Data Publication by Exploiting Community Information

Authors: Quan Yuan, Zhikun Zhang, Linkang Du, Min Chen, Peng Cheng, Mingyang Sun

Abstract: Graph data is used in a wide range of applications, while analyzing graph data without protection is prone to privacy breach risks. To mitigate the privacy risks, we resort to the standard technique of differential privacy to publish a synthetic graph. However, existing differentially private graph synthesis approaches either introduce excessive noise by directly perturbing the adjacency matrix, o… ▽ More Graph data is used in a wide range of applications, while analyzing graph data without protection is prone to privacy breach risks. To mitigate the privacy risks, we resort to the standard technique of differential privacy to publish a synthetic graph. However, existing differentially private graph synthesis approaches either introduce excessive noise by directly perturbing the adjacency matrix, or suffer significant information loss during the graph encoding process. In this paper, we propose an effective graph synthesis algorithm PrivGraph by exploiting the community information. Concretely, PrivGraph differentially privately partitions the private graph into communities, extracts intra-community and inter-community information, and reconstructs the graph from the extracted graph information. We validate the effectiveness of PrivGraph on six real-world graph datasets and seven commonly used graph metrics. △ Less

Submitted 13 October, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

Comments: The extended version of the USENIX Security '23 paper

arXiv:2303.08774 [pdf, other]

GPT-4 Technical Report

Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was developing infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4. △ Less

Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: 100 pages; updated authors list; fixed author names and added citation

arXiv:2302.05807 [pdf, other]

Pushing the Accuracy-Group Robustness Frontier with Introspective Self-play

Authors: Jeremiah Zhe Liu, Krishnamurthy Dj Dvijotham, Jihyeon Lee, Quan Yuan, Martin Strobel, Balaji Lakshminarayanan, Deepak Ramachandran

Abstract: Standard empirical risk minimization (ERM) training can produce deep neural network (DNN) models that are accurate on average but under-perform in under-represented population subgroups, especially when there are imbalanced group distributions in the long-tailed training data. Therefore, approaches that improve the accuracy-group robustness trade-off frontier of a DNN model (i.e. improving worst-g… ▽ More Standard empirical risk minimization (ERM) training can produce deep neural network (DNN) models that are accurate on average but under-perform in under-represented population subgroups, especially when there are imbalanced group distributions in the long-tailed training data. Therefore, approaches that improve the accuracy-group robustness trade-off frontier of a DNN model (i.e. improving worst-group accuracy without sacrificing average accuracy, or vice versa) is of crucial importance. Uncertainty-based active learning (AL) can potentially improve the frontier by preferentially sampling underrepresented subgroups to create a more balanced training dataset. However, the quality of uncertainty estimates from modern DNNs tend to degrade in the presence of spurious correlations and dataset bias, compromising the effectiveness of AL for sampling tail groups. In this work, we propose Introspective Self-play (ISP), a simple approach to improve the uncertainty estimation of a deep neural network under dataset bias, by adding an auxiliary introspection task requiring a model to predict the bias for each data point in addition to the label. We show that ISP provably improves the bias-awareness of the model representation and the resulting uncertainty estimates. On two real-world tabular and language tasks, ISP serves as a simple "plug-in" for AL model training, consistently improving both the tail-group sampling rate and the final accuracy-fairness trade-off frontier of popular AL methods. △ Less

Submitted 11 February, 2023; originally announced February 2023.

Comments: Accepted to ICLR 2023. Included additional contribution from Martin Strobel

arXiv:2302.03916 [pdf, other]

QS-ADN: Quasi-Supervised Artifact Disentanglement Network for Low-Dose CT Image Denoising by Local Similarity Among Unpaired Data

Authors: Yuhui Ruan, Qiao Yuan, Chuang Niu, Chen Li, Yudong Yao, Ge Wang, Yueyang Teng

Abstract: Deep learning has been successfully applied to low-dose CT (LDCT) image denoising for reducing potential radiation risk. However, the widely reported supervised LDCT denoising networks require a training set of paired images, which is expensive to obtain and cannot be perfectly simulated. Unsupervised learning utilizes unpaired data and is highly desirable for LDCT denoising. As an example, an art… ▽ More Deep learning has been successfully applied to low-dose CT (LDCT) image denoising for reducing potential radiation risk. However, the widely reported supervised LDCT denoising networks require a training set of paired images, which is expensive to obtain and cannot be perfectly simulated. Unsupervised learning utilizes unpaired data and is highly desirable for LDCT denoising. As an example, an artifact disentanglement network (ADN) relies on unparied images and obviates the need for supervision but the results of artifact reduction are not as good as those through supervised learning.An important observation is that there is often hidden similarity among unpaired data that can be utilized. This paper introduces a new learning mode, called quasi-supervised learning, to empower the ADN for LDCT image denoising.For every LDCT image, the best matched image is first found from an unpaired normal-dose CT (NDCT) dataset. Then, the matched pairs and the corresponding matching degree as prior information are used to construct and train our ADN-type network for LDCT denoising.The proposed method is different from (but compatible with) supervised and semi-supervised learning modes and can be easily implemented by modifying existing networks. The experimental results show that the method is competitive with state-of-the-art methods in terms of noise suppression and contextual fidelity. The code and working dataset are publicly available at https://github.com/ruanyuhui/ADN-QSDL.git. △ Less

Submitted 8 February, 2023; originally announced February 2023.

arXiv:2301.12230 [pdf, other]

Continual Graph Learning: A Survey

Authors: Qiao Yuan, Sheng-Uei Guan, Pin Ni, Tianlun Luo, Ka Lok Man, Prudence Wong, Victor Chang

Abstract: Research on continual learning (CL) mainly focuses on data represented in the Euclidean space, while research on graph-structured data is scarce. Furthermore, most graph learning models are tailored for static graphs. However, graphs usually evolve continually in the real world. Catastrophic forgetting also emerges in graph learning models when being trained incrementally. This leads to the need t… ▽ More Research on continual learning (CL) mainly focuses on data represented in the Euclidean space, while research on graph-structured data is scarce. Furthermore, most graph learning models are tailored for static graphs. However, graphs usually evolve continually in the real world. Catastrophic forgetting also emerges in graph learning models when being trained incrementally. This leads to the need to develop robust, effective and efficient continual graph learning approaches. Continual graph learning (CGL) is an emerging area aiming to realize continual learning on graph-structured data. This survey is written to shed light on this emerging area. It introduces the basic concepts of CGL and highlights two unique challenges brought by graphs. Then it reviews and categorizes recent state-of-the-art approaches, analyzing their strategies to tackle the unique challenges in CGL. Besides, it discusses the main concerns in each family of CGL methods, offering potential solutions. Finally, it explores the open issues and potential applications of CGL. △ Less

Submitted 28 January, 2023; originally announced January 2023.

Comments: 38 pages, 7 figures

arXiv:2212.05891 [pdf]

Text Mining-Based Patent Analysis for Automated Rule Checking in AEC

Authors: Zhe Zheng, Bo-Rui Kang, Qi-Tian Yuan, Yu-Cheng Zhou, Xin-Zheng Lu, Jia-Rui Lin

Abstract: Automated rule checking (ARC), which is expected to promote the efficiency of the compliance checking process in the architecture, engineering, and construction (AEC) industry, is gaining increasing attention. Throwing light on the ARC application hotspots and forecasting its trends are useful to the related research and drive innovations. Therefore, this study takes the patents from the database… ▽ More Automated rule checking (ARC), which is expected to promote the efficiency of the compliance checking process in the architecture, engineering, and construction (AEC) industry, is gaining increasing attention. Throwing light on the ARC application hotspots and forecasting its trends are useful to the related research and drive innovations. Therefore, this study takes the patents from the database of the Derwent Innovations Index database (DII) and China national knowledge infrastructure (CNKI) as data sources and then carried out a three-step analysis including (1) quantitative characteristics (i.e., annual distribution analysis) of patents, (2) identification of ARC topics using a latent Dirichlet allocation (LDA) and, (3) SNA-based co-occurrence analysis of ARC topics. The results show that the research hotspots and trends of Chinese and English patents are different. The contributions of this study have three aspects: (1) an approach to a comprehensive analysis of patents by integrating multiple text mining methods (i.e., SNA and LDA) is introduced ; (2) the application hotspots and development trends of ARC are reviewed based on patent analysis; and (3) a signpost for technological development and innovation of ARC is provided. △ Less

Submitted 12 December, 2022; originally announced December 2022.

arXiv:2211.14422 [pdf]

doi 10.3389/fenrg.2022.88535

Quantitative Method for Security Situation of the Power Information Network Based on the Evolutionary Neural Network

Authors: Quande Yuan, Yuzhen Pi, Lei Kou, Fangfang Zhang, Bo Ye

Abstract: Cybersecurity is the security cornerstone of digital transformation of the power grid and construction of new power systems. The traditional network security situation quantification method only analyzes from the perspective of network performance, ignoring the impact of various power application services on the security situation, so the quantification results cannot fully reflect the power infor… ▽ More Cybersecurity is the security cornerstone of digital transformation of the power grid and construction of new power systems. The traditional network security situation quantification method only analyzes from the perspective of network performance, ignoring the impact of various power application services on the security situation, so the quantification results cannot fully reflect the power information network risk state. This study proposes a method for quantifying security situation of the power information network based on the evolutionary neural network. First, the security posture system architecture is designed by analyzing the business characteristics of power information network applications. Second, combining the importance of power application business, the spatial element index system of coupled interconnection is established from three dimensions of network reliability, threat, and vulnerability. Then, the BP neural network optimized by the genetic evolutionary algorithm is incorporated into the element index calculation process, and the quantitative model of security posture of the power information network based on the evolutionary neural network is constructed. Finally, a simulation experiment environment is built according to a power sector network topology, and the effectiveness and robustness of the method proposed in the study are verified. △ Less

Submitted 25 November, 2022; originally announced November 2022.

Comments: Frontiers in Energy Research

MSC Class: 68T99 ACM Class: I.2

arXiv:2211.03789 [pdf]

doi 10.3389/fenrg.2021.708456

A Random Forest and Current Fault Texture Feature-Based Method for Current Sensor Fault Diagnosis in Three-Phase PWM VSR

Authors: Lei Kou, Xiao-dong Gong, Yi Zheng, Xiu-hui Ni, Yang Li, Quan-de Yuan, Ya-nan Dong

Abstract: Three-phase PWM voltage-source rectifier (VSR) systems have been widely used in various energy conversion systems, where current sensors are the key component for state monitoring and system control. The current sensor faults may bring hidden danger or damage to the whole system; therefore, this paper proposed a random forest (RF) and current fault texture feature-based method for current sensor f… ▽ More Three-phase PWM voltage-source rectifier (VSR) systems have been widely used in various energy conversion systems, where current sensors are the key component for state monitoring and system control. The current sensor faults may bring hidden danger or damage to the whole system; therefore, this paper proposed a random forest (RF) and current fault texture feature-based method for current sensor fault diagnosis in three-phase PWM VSR systems. First, the three-phase alternating currents (ACs) of the three-phase PWM VSR are collected to extract the current fault texture features, and no additional hardware sensors are needed to avoid causing additional unstable factors. Then, the current fault texture features are adopted to train the random forest current sensor fault detection and diagnosis (CSFDD) classifier, which is a data-driven CSFDD classifier. Finally, the effectiveness of the proposed method is verified by simulation experiments. The result shows that the current sensor faults can be detected and located successfully and that it can effectively provide fault locations for maintenance personnel to keep the stable operation of the whole system. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: Frontiers in Energy Research

MSC Class: 68Q04 ACM Class: I.2

arXiv:2211.02631 [pdf]

doi 10.1049/iet-pel.2020.0226

Data-driven design of fault diagnosis for three-phase PWM rectifier using random forests technique with transient synthetic features

Authors: Lei Kou, Chuang Liu, Guo-wei Cai, Jia-ning Zhou, Quan-de Yuan

Abstract: A three-phase pulse-width modulation (PWM) rectifier can usually maintain operation when open-circuit faults occur in insulated-gate bipolar transistors (IGBTs), which will lead the system to be unstable and unsafe. Aiming at this problem, based on random forests with transient synthetic features, a data-driven online fault diagnosis method is proposed to locate the open-circuit faults of IGBTs ti… ▽ More A three-phase pulse-width modulation (PWM) rectifier can usually maintain operation when open-circuit faults occur in insulated-gate bipolar transistors (IGBTs), which will lead the system to be unstable and unsafe. Aiming at this problem, based on random forests with transient synthetic features, a data-driven online fault diagnosis method is proposed to locate the open-circuit faults of IGBTs timely and effectively in this study. Firstly, by analysing the open-circuit fault features of IGBTs in the three-phase PWM rectifier, it is found that the occurrence of the fault features is related to the fault location and time, and the fault features do not always appear immediately with the occurrence of the fault. Secondly, different data-driven fault diagnosis methods are compared and evaluated, the performance of random forests algorithm is better than that of support vector machine or artificial neural networks. Meanwhile, the accuracy of fault diagnosis classifier trained by transient synthetic features is higher than that trained by original features. Also, the random forests fault diagnosis classifier trained by multiplicative features is the best with fault diagnosis accuracy can reach 98.32%. Finally, the online fault diagnosis experiments are carried out and the results demonstrate the effectiveness of the proposed method, which can accurately locate the open-circuit faults in IGBTs while ensuring system safety. △ Less

Submitted 2 November, 2022; originally announced November 2022.

Comments: IET Power Electronics

MSC Class: 68T99 ACM Class: I.2

arXiv:2211.00221 [pdf]

doi 10.3390/s22082822

Review on Monitoring, Operation and Maintenance of Smart Offshore Wind Farms

Authors: Lei Kou, Yang Li, Fangfang Zhang, Xiaodong Gong, Yinghong Hu, Quande Yuan, Wende Ke

Abstract: In recent years, with the development of wind energy, the number and scale of wind farms are developing rapidly. Since offshore wind farm has the advantages of stable wind speed, clean, renewable, non-polluting and no occupation of cultivated land, which has gradually become a new trend of wind power industry all over the world. The operation and maintenance mode of offshore wind power is developi… ▽ More In recent years, with the development of wind energy, the number and scale of wind farms are developing rapidly. Since offshore wind farm has the advantages of stable wind speed, clean, renewable, non-polluting and no occupation of cultivated land, which has gradually become a new trend of wind power industry all over the world. The operation and maintenance mode of offshore wind power is developing in the direction of digitization and intelligence. It is of great significance to carry out the research on the monitoring, operation and maintenance of offshore wind farm, which will be of benefits to reduce the operation and maintenance cost, improve the power generation efficiency, improve the stability of offshore wind farm system and build smart offshore wind farm. This paper will mainly analyze and summarize the monitoring, operation and maintenance of offshore wind farm, especially from the following points: monitoring of "offshore wind power engineering & biological & environment", the monitoring of power equipment and the operation & maintenance of smart offshore wind farms. Finally, the future research challenges about monitoring, operation and maintenance of smart offshore wind farm are proposed, and the future research directions in this field are prospected. △ Less

Submitted 31 October, 2022; originally announced November 2022.

Comments: accepted by Sensors

MSC Class: 90B25 ACM Class: I.2

Journal ref: Sensors 2022, 22, 2822

arXiv:2210.17057 [pdf]

doi 10.1049/iet-pel.2019.0835

Fault diagnosis for open-circuit faults in NPC inverter based on knowledge-driven and data-driven approaches

Authors: Lei Kou, Chuang Liu, Guo-wei Cai, Jia-ning Zhou, Quan-de Yuan, Si-miao Pang

Abstract: In this study, the open-circuit faults diagnosis and location issue of the neutral-point-clamped (NPC) inverters are analysed. A novel fault diagnosis approach based on knowledge driven and data driven was presented for the open-circuit faults in insulated-gate bipolar transistors (IGBTs) of NPC inverter, and Concordia transform (knowledge driven) and random forests (RFs) technique (data driven) a… ▽ More In this study, the open-circuit faults diagnosis and location issue of the neutral-point-clamped (NPC) inverters are analysed. A novel fault diagnosis approach based on knowledge driven and data driven was presented for the open-circuit faults in insulated-gate bipolar transistors (IGBTs) of NPC inverter, and Concordia transform (knowledge driven) and random forests (RFs) technique (data driven) are employed to improve the robustness performance of the fault diagnosis classifier. First, the fault feature data of AC in either normal state or open-circuit faults states of NPC inverter are analysed and extracted. Second, the Concordia transform is used to process the fault samples, and it has been verified that the slopes of current trajectories are not affected by different loads in this study, which can help the proposed method to reduce overdependence on fault data. Moreover, then the transformed fault samples are adopted to train the RFs fault diagnosis classifier, and the fault diagnosis results show that the classification accuracy and robustness performance of the fault diagnosis classifier are improved. Finally, the diagnosis results of online fault diagnosis experiments show that the proposed classifier can locate the open-circuit fault of IGBTs in NPC inverter under the conditions of different loads. △ Less

Submitted 31 October, 2022; originally announced October 2022.

Comments: IET Power Electronics

MSC Class: 68T05 ACM Class: I.2

arXiv:2208.07059 [pdf, other]

UPST-NeRF: Universal Photorealistic Style Transfer of Neural Radiance Fields for 3D Scene

Authors: Yaosen Chen, Qi Yuan, Zhiqiang Li, Yuegen Liu, Wei Wang, Chaoping Xie, Xuming Wen, Qien Yu

Abstract: 3D scenes photorealistic stylization aims to generate photorealistic images from arbitrary novel views according to a given style image while ensuring consistency when rendering from different viewpoints. Some existing stylization methods with neural radiance fields can effectively predict stylized scenes by combining the features of the style image with multi-view images to train 3D scenes. Howev… ▽ More 3D scenes photorealistic stylization aims to generate photorealistic images from arbitrary novel views according to a given style image while ensuring consistency when rendering from different viewpoints. Some existing stylization methods with neural radiance fields can effectively predict stylized scenes by combining the features of the style image with multi-view images to train 3D scenes. However, these methods generate novel view images that contain objectionable artifacts. Besides, they cannot achieve universal photorealistic stylization for a 3D scene. Therefore, a styling image must retrain a 3D scene representation network based on a neural radiation field. We propose a novel 3D scene photorealistic style transfer framework to address these issues. It can realize photorealistic 3D scene style transfer with a 2D style image. We first pre-trained a 2D photorealistic style transfer network, which can meet the photorealistic style transfer between any given content image and style image. Then, we use voxel features to optimize a 3D scene and get the geometric representation of the scene. Finally, we jointly optimize a hyper network to realize the scene photorealistic style transfer of arbitrary style images. In the transfer stage, we use a pre-trained 2D photorealistic network to constrain the photorealistic style of different views and different style images in the 3D scene. The experimental results show that our method not only realizes the 3D photorealistic style transfer of arbitrary style images but also outperforms the existing methods in terms of visual quality and consistency. Project page:https://semchan.github.io/UPST_NeRF. △ Less

Submitted 21 August, 2022; v1 submitted 15 August, 2022; originally announced August 2022.

Comments: arXiv admin note: text overlap with arXiv:2205.12183 by other authors

arXiv:2203.11383 [pdf, other]

doi 10.1145/3477495.3531660

DIANES: A DEI Audit Toolkit for News Sources

Authors: Xiaoxiao Shang, Zhiyuan Peng, Qiming Yuan, Sabiq Khan, Lauren Xie, Yi Fang, Subramaniam Vincent

Abstract: Professional news media organizations have always touted the importance that they give to multiple perspectives. However, in practice the traditional approach to all-sides has favored people in the dominant culture. Hence it has come under ethical critique under the new norms of diversity, equity, and inclusion (DEI). When DEI is applied to journalism, it goes beyond conventional notions of impart… ▽ More Professional news media organizations have always touted the importance that they give to multiple perspectives. However, in practice the traditional approach to all-sides has favored people in the dominant culture. Hence it has come under ethical critique under the new norms of diversity, equity, and inclusion (DEI). When DEI is applied to journalism, it goes beyond conventional notions of impartiality and bias and instead democratizes the journalistic practice of sourcing -- who is quoted or interviewed, who is not, how often, from which demographic group, gender, and so forth. There is currently no real-time or on-demand tool in the hands of reporters to analyze the persons they quote. In this paper, we present DIANES, a DEI Audit Toolkit for News Sources. It consists of a natural language processing pipeline on the backend to extract quotes, speakers, titles, and organizations from news articles in real time. On the frontend, DIANES offers the WordPress plugins, a Web monitor, and a DEI annotation API service, to help news media monitor their own quoting patterns and push themselves towards DEI norms. △ Less

Submitted 28 April, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

arXiv:2202.03632 [pdf, other]

doi 10.34133/research.0153

ECRECer: Enzyme Commission Number Recommendation and Benchmarking based on Multiagent Dual-core Learning

Authors: Zhenkun Shi, Qianqian Yuan, Ruoyu Wang, Hoaran Li, Xiaoping Liao, Hongwu Ma

Abstract: Enzyme Commission (EC) numbers, which associate a protein sequence with the biochemical reactions it catalyzes, are essential for the accurate understanding of enzyme functions and cellular metabolism. Many ab-initio computational approaches were proposed to predict EC numbers for given input sequences directly. However, the prediction performance (accuracy, recall, precision), usability, and effi… ▽ More Enzyme Commission (EC) numbers, which associate a protein sequence with the biochemical reactions it catalyzes, are essential for the accurate understanding of enzyme functions and cellular metabolism. Many ab-initio computational approaches were proposed to predict EC numbers for given input sequences directly. However, the prediction performance (accuracy, recall, precision), usability, and efficiency of existing methods still have much room to be improved. Here, we report ECRECer, a cloud platform for accurately predicting EC numbers based on novel deep learning techniques. To build ECRECer, we evaluate different protein representation methods and adopt a protein language model for protein sequence embedding. After embedding, we propose a multi-agent hierarchy deep learning-based framework to learn the proposed tasks in a multi-task manner. Specifically, we used an extreme multi-label classifier to perform the EC prediction and employed a greedy strategy to integrate and fine-tune the final model. Comparative analyses against four representative methods demonstrate that ECRECer delivers the highest performance, which improves accuracy and F1 score by 70% and 20% over the state-of-the-the-art, respectively. With ECRECer, we can annotate numerous enzymes in the Swiss-Prot database with incomplete EC numbers to their full fourth level. Take UniPort protein "A0A0U5GJ41" as an example (1.14.-.-), ECRECer annotated it with "1.14.11.38", which supported by further protein structure analysis based on AlphaFold2. Finally, we established a webserver (https://ecrecer.biodesign.ac.cn) and provided an offline bundle to improve usability. △ Less

Submitted 7 February, 2022; originally announced February 2022.

Comments: 16 pages, 14 figures

Report number: research.0153 MSC Class: I.2.6

Journal ref: Research. 2023:6;0153

arXiv:2201.10005 [pdf, other]

Text and Code Embeddings by Contrastive Pre-Training

Authors: Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy, Johannes Heidecke, Pranav Shyam, Boris Power, Tyna Eloundou Nekoul, Girish Sastry, Gretchen Krueger, David Schnurr, Felipe Petroski Such, Kenny Hsu, Madeleine Thompson, Tabarak Khan, Toki Sherbakov, Joanne Jang, Peter Welinder, Lilian Weng

Abstract: Text embeddings are useful features in many applications such as semantic search and computing text similarity. Previous work typically trains models customized for different use cases, varying in dataset choice, training objective and model architecture. In this work, we show that contrastive pre-training on unsupervised data at scale leads to high quality vector representations of text and code.… ▽ More Text embeddings are useful features in many applications such as semantic search and computing text similarity. Previous work typically trains models customized for different use cases, varying in dataset choice, training objective and model architecture. In this work, we show that contrastive pre-training on unsupervised data at scale leads to high quality vector representations of text and code. The same unsupervised text embeddings that achieve new state-of-the-art results in linear-probe classification also display impressive semantic search capabilities and sometimes even perform competitively with fine-tuned models. On linear-probe classification accuracy averaging over 7 tasks, our best unsupervised model achieves a relative improvement of 4% and 1.8% over previous best unsupervised and supervised text embedding models respectively. The same text embeddings when evaluated on large-scale semantic search attains a relative improvement of 23.4%, 14.7%, and 10.6% over previous best unsupervised methods on MSMARCO, Natural Questions and TriviaQA benchmarks, respectively. Similarly to text embeddings, we train code embedding models on (text, code) pairs, obtaining a 20.8% relative improvement over prior best work on code search. △ Less

Submitted 24 January, 2022; originally announced January 2022.

arXiv:2112.04263 [pdf, other]

Artificial Intelligence Powered Mobile Networks: From Cognition to Decision

Authors: Guiyang Luo, Quan Yuan, Jinglin Li, Shangguang Wang, Fangchun Yang

Abstract: Mobile networks (MN) are anticipated to provide unprecedented opportunities to enable a new world of connected experiences and radically shift the way people interact with everything. MN are becoming more and more complex, driven by ever-increasingly complicated configuration issues and blossoming new service requirements. This complexity poses significant challenges in deployment, management, ope… ▽ More Mobile networks (MN) are anticipated to provide unprecedented opportunities to enable a new world of connected experiences and radically shift the way people interact with everything. MN are becoming more and more complex, driven by ever-increasingly complicated configuration issues and blossoming new service requirements. This complexity poses significant challenges in deployment, management, operation, optimization, and maintenance, since they require a complete understanding and cognition of MN. Artificial intelligence (AI), which deals with the simulation of intelligent behavior in computers, has demonstrated enormous success in many application domains, suggesting its potential in cognizing the state of MN and making intelligent decisions. In this paper, we first propose an AI-powered mobile network architecture and discuss challenges in terms of cognition complexity, decisions with high-dimensional action space, and self-adaption to system dynamics. Then, potential solutions that are associated with AI are discussed. Finally, we propose a deep learning approach that directly maps the state of MN to perceived QoS, integrating cognition with the decision. Our proposed approach helps operators in making more intelligent decisions to guarantee QoS. Meanwhile, the effectiveness and advantages of our proposed approach are demonstrated on a real-world dataset, involving $31261$ users over $77$ stations within $5$ days. △ Less

Submitted 8 December, 2021; originally announced December 2021.

Journal ref: IEEE Network 2021

arXiv:2110.08702 [pdf, other]

SIN:Superpixel Interpolation Network

Authors: Qing Yuan, Songfeng Lu, Yan Huang, Wuxin Sha

Abstract: Superpixels have been widely used in computer vision tasks due to their representational and computational efficiency. Meanwhile, deep learning and end-to-end framework have made great progress in various fields including computer vision. However, existing superpixel algorithms cannot be integrated into subsequent tasks in an end-to-end way. Traditional algorithms and deep learning-based algorithm… ▽ More Superpixels have been widely used in computer vision tasks due to their representational and computational efficiency. Meanwhile, deep learning and end-to-end framework have made great progress in various fields including computer vision. However, existing superpixel algorithms cannot be integrated into subsequent tasks in an end-to-end way. Traditional algorithms and deep learning-based algorithms are two main streams in superpixel segmentation. The former is non-differentiable and the latter needs a non-differentiable post-processing step to enforce connectivity, which constraints the integration of superpixels and downstream tasks. In this paper, we propose a deep learning-based superpixel segmentation algorithm SIN which can be integrated with downstream tasks in an end-to-end way. Owing to some downstream tasks such as visual tracking require real-time speed, the speed of generating superpixels is also important. To remove the post-processing step, our algorithm enforces spatial connectivity from the start. Superpixels are initialized by sampled pixels and other pixels are assigned to superpixels through multiple updating steps. Each step consists of a horizontal and a vertical interpolation, which is the key to enforcing spatial connectivity. Multi-layer outputs of a fully convolutional network are utilized to predict association scores for interpolations. Experimental results show that our approach runs at about 80fps and performs favorably against state-of-the-art methods. Furthermore, we design a simple but effective loss function which reduces much training time. The improvements of superpixel-based tasks demonstrate the effectiveness of our algorithm. We hope SIN will be integrated into downstream tasks in an end-to-end way and benefit the superpixel-based community. Code is available at: \href{https://github.com/yuanqqq/SIN}{https://github.com/yuanqqq/SIN}. △ Less

Submitted 16 October, 2021; originally announced October 2021.

Comments: 15 pages, 8 figures, to be published in PRICAI-2021

arXiv:2108.07200 [pdf, other]

Continuous-Time Spatiotemporal Calibration of a Rolling Shutter Camera-IMU System

Authors: Jianzhu Huai, Yuan Zhuang, Qicheng Yuan, Yukai Lin

Abstract: The rolling shutter (RS) mechanism is widely used by consumer-grade cameras, which are essential parts in smartphones and autonomous vehicles. The RS effect leads to image distortion upon relative motion between a camera and the scene. This effect needs to be considered in video stabilization, structure from motion, and vision-aided odometry, for which recent studies have improved earlier global s… ▽ More The rolling shutter (RS) mechanism is widely used by consumer-grade cameras, which are essential parts in smartphones and autonomous vehicles. The RS effect leads to image distortion upon relative motion between a camera and the scene. This effect needs to be considered in video stabilization, structure from motion, and vision-aided odometry, for which recent studies have improved earlier global shutter (GS) methods by accounting for the RS effect. However, it is still unclear how the RS affects spatiotemporal calibration of the camera in a sensor assembly, which is crucial to good performance in aforementioned applications. This work takes the camera-IMU system as an example and looks into the RS effect on its spatiotemporal calibration. To this end, we develop a calibration method for a RS-camera-IMU system with continuous-time B-splines by using a calibration target. Unlike in calibrating GS cameras, every observation of a landmark on the target has a unique camera pose fitted by continuous-time B-splines. With simulated data generated from four sets of public calibration data, we show that RS can noticeably affect the extrinsic parameters, causing errors about 1$^\circ$ in orientation and 2 $cm$ in translation with a RS setting as in common smartphone cameras. With real data collected by two industrial camera-IMU systems, we find that considering the RS effect gives more accurate and consistent spatiotemporal calibration. Moreover, our method also accurately calibrates the inter-line delay of the RS. The code for simulation and calibration is publicly available. △ Less

Submitted 16 August, 2021; originally announced August 2021.

Comments: 11 pages, 9 figures

arXiv:2108.06073 [pdf]

doi 10.1109/MGRS.2021.3135954

Coupling Model-Driven and Data-Driven Methods for Remote Sensing Image Restoration and Fusion

Authors: Huanfeng Shen, Menghui Jiang, Jie Li, Chenxia Zhou, Qiangqiang Yuan, Liangpei Zhang

Abstract: In the fields of image restoration and image fusion, model-driven methods and data-driven methods are the two representative frameworks. However, both approaches have their respective advantages and disadvantages. The model-driven methods consider the imaging mechanism, which is deterministic and theoretically reasonable; however, they cannot easily model complicated nonlinear problems. The data-d… ▽ More In the fields of image restoration and image fusion, model-driven methods and data-driven methods are the two representative frameworks. However, both approaches have their respective advantages and disadvantages. The model-driven methods consider the imaging mechanism, which is deterministic and theoretically reasonable; however, they cannot easily model complicated nonlinear problems. The data-driven methods have a stronger prior knowledge learning capability for huge data, especially for nonlinear statistical features; however, the interpretability of the networks is poor, and they are over-dependent on training data. In this paper, we systematically investigate the coupling of model-driven and data-driven methods, which has rarely been considered in the remote sensing image restoration and fusion communities. We are the first to summarize the coupling approaches into the following three categories: 1) data-driven and model-driven cascading methods; 2) variational models with embedded learning; and 3) model-constrained network learning methods. The typical existing and potential coupling methods for remote sensing image restoration and fusion are introduced with application examples. This paper also gives some new insights into the potential future directions, in terms of both methods and applications. △ Less

Submitted 13 August, 2021; originally announced August 2021.

Journal ref: IEEE Geoscience and Remote Sensing Magazine, vol. 10, no. 2, pp. 231-249, June 2022

arXiv:2107.13848 [pdf, other]

Revisiting Swapping in User-space with Lightweight Threading

Authors: Kan Zhong, Wenlin Cui, Youyou Lu, Quanzhang Liu, Xiaodan Yan, Qizhao Yuan, Siwei Luo, Keji Huang

Abstract: Memory-intensive applications, such as in-memory databases, caching systems and key-value stores, are increasingly demanding larger main memory to fit their working sets. Conventional swapping can enlarge the memory capacity by paging out inactive pages to disks. However, the heavy I/O stack makes the traditional kernel-based swapping suffers from several critical performance issues. In this pap… ▽ More Memory-intensive applications, such as in-memory databases, caching systems and key-value stores, are increasingly demanding larger main memory to fit their working sets. Conventional swapping can enlarge the memory capacity by paging out inactive pages to disks. However, the heavy I/O stack makes the traditional kernel-based swapping suffers from several critical performance issues. In this paper, we redesign the swapping system and propose LightSwap, an high-performance user-space swapping scheme that supports paging with both local SSDs and remote memories. First, to avoids kernel-involving, a novel page fault handling mechanism is proposed to handle page faults in user-space and further eliminates the heavy I/O stack with the help of user-space I/O drivers. Second, we co-design Lightswap with light weight thread (LWT) to improve system throughput and make it be transparent to user applications. Finally, we propose a try-catch framework in Lightswap to deal with paging errors which are exacerbated by the scaling in process technology. We implement Lightswap in our production-level system and evaluate it with YCSB workloads running on memcached. Results show that Ligthswap reduces the page faults handling latency by 3--5 times, and improves the throughput of memcached by more than 40% compared with the stat-of-art swapping systems. △ Less

Submitted 29 July, 2021; originally announced July 2021.

arXiv:2107.08355 [pdf]

Fully Polarimetric SAR and Single-Polarization SAR Image Fusion Network

Authors: Liupeng Lin, Jie Li, Huanfeng Shen, Lingli Zhao, Qiangqiang Yuan, Xinghua Li

Abstract: The data fusion technology aims to aggregate the characteristics of different data and obtain products with multiple data advantages. To solves the problem of reduced resolution of PolSAR images due to system limitations, we propose a fully polarimetric synthetic aperture radar (PolSAR) images and single-polarization synthetic aperture radar SAR (SinSAR) images fusion network to generate high-reso… ▽ More The data fusion technology aims to aggregate the characteristics of different data and obtain products with multiple data advantages. To solves the problem of reduced resolution of PolSAR images due to system limitations, we propose a fully polarimetric synthetic aperture radar (PolSAR) images and single-polarization synthetic aperture radar SAR (SinSAR) images fusion network to generate high-resolution PolSAR (HR-PolSAR) images. To take advantage of the polarimetric information of the low-resolution PolSAR (LR-PolSAR) image and the spatial information of the high-resolution single-polarization SAR (HR-SinSAR) image, we propose a fusion framework for joint LR-PolSAR image and HR-SinSAR image and design a cross-attention mechanism to extract features from the joint input data. Besides, based on the physical imaging mechanism, we designed the PolSAR polarimetric loss function for constrained network training. The experimental results confirm the superiority of fusion network over traditional algorithms. The average PSNR is increased by more than 3.6db, and the average MAE is reduced to less than 0.07. Experiments on polarimetric decomposition and polarimetric signature show that it maintains polarimetric information well. △ Less

Submitted 17 July, 2021; originally announced July 2021.

arXiv:2107.03374 [pdf, other]

Evaluating Large Language Models Trained on Code

Authors: Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter , et al. (33 additional authors not shown)

Abstract: We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J sol… ▽ More We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics. △ Less

Submitted 14 July, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

Comments: corrected typos, added references, added authors, added acknowledgements

arXiv:2102.07911 [pdf, other]

MITNet: GAN Enhanced Magnetic Induction Tomography Based on Complex CNN

Authors: Zuohui Chen, Qing Yuan, Xujie Song, Cheng Chen, Dan Zhang, Yun Xiang, Ruigang Liu, Qi Xuan

Abstract: Magnetic induction tomography (MIT) is an efficient solution for long-term brain disease monitoring, which focuses on reconstructing bio-impedance distribution inside the human brain using non-intrusive electromagnetic fields. However, high-quality brain image reconstruction remains challenging since reconstructing images from the measured weak signals is a highly non-linear and ill-conditioned pr… ▽ More Magnetic induction tomography (MIT) is an efficient solution for long-term brain disease monitoring, which focuses on reconstructing bio-impedance distribution inside the human brain using non-intrusive electromagnetic fields. However, high-quality brain image reconstruction remains challenging since reconstructing images from the measured weak signals is a highly non-linear and ill-conditioned problem. In this work, we propose a generative adversarial network (GAN) enhanced MIT technique, named MITNet, based on a complex convolutional neural network (CNN). The experimental results on the real-world dataset validate the performance of our technique, which outperforms the state-of-art method by 25.27%. △ Less

Submitted 15 February, 2021; originally announced February 2021.

arXiv:2101.04882 [pdf, other]

Asymmetric self-play for automatic goal discovery in robotic manipulation

Authors: OpenAI OpenAI, Matthias Plappert, Raul Sampedro, Tao Xu, Ilge Akkaya, Vineet Kosaraju, Peter Welinder, Ruben D'Sa, Arthur Petron, Henrique P. d. O. Pinto, Alex Paino, Hyeonwoo Noh, Lilian Weng, Qiming Yuan, Casey Chu, Wojciech Zaremba

Abstract: We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects. We rely on asymmetric self-play for goal discovery, where two agents, Alice and Bob, play a game. Alice is asked to propose challenging goals and Bob aims to solve them. We show that this method can discover highly diverse and complex goals without an… ▽ More We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects. We rely on asymmetric self-play for goal discovery, where two agents, Alice and Bob, play a game. Alice is asked to propose challenging goals and Bob aims to solve them. We show that this method can discover highly diverse and complex goals without any human priors. Bob can be trained with only sparse rewards, because the interaction between Alice and Bob results in a natural curriculum and Bob can learn from Alice's trajectory when relabeled as a goal-conditioned demonstration. Finally, our method scales, resulting in a single policy that can generalize to many unseen tasks such as setting a table, stacking blocks, and solving simple puzzles. Videos of a learned policy is available at https://robotics-self-play.github.io. △ Less

Submitted 13 January, 2021; originally announced January 2021.

Comments: Videos are shown at https://robotics-self-play.github.io

arXiv:2012.15462 [pdf, other]

doi 10.1109/TCSII.2020.2968376

Modeling and Understanding Ethereum Transaction Records via a Complex Network Approach

Authors: Dan Lin, Jiajing Wu, Qi Yuan, Zibin Zheng

Abstract: As the largest public blockchain-based platform supporting smart contracts, Ethereum has accumulated a large number of user transaction records since its debut in 2014. Analysis of Ethereum transaction records, however, is still relatively unexplored till now. Modeling the transaction records as a static simple graph, existing methods are unable to accurately characterize the temporal and multiple… ▽ More As the largest public blockchain-based platform supporting smart contracts, Ethereum has accumulated a large number of user transaction records since its debut in 2014. Analysis of Ethereum transaction records, however, is still relatively unexplored till now. Modeling the transaction records as a static simple graph, existing methods are unable to accurately characterize the temporal and multiplex features of the edges. In this brief, we first model the Ethereum transaction records as a complex network by incorporating time and amount features of the transactions, and then design several flexible temporal walk strategies for random-walk based graph representation of this large-scale network. Experiments of temporal link prediction on real Ethereum data demonstrate that temporal information and multiplicity characteristic of edges are indispensable for accurate modeling and understanding of Ethereum transaction networks. △ Less

Submitted 31 December, 2020; originally announced December 2020.

Comments: 5 pages, 6 figures. arXiv admin note: substantial text overlap with arXiv:1905.08038

Journal ref: IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 67, no. 11, pp. 2737 - 2741, November 2020

arXiv:2012.13169 [pdf, other]

SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II

Authors: Xiangjun Wang, Junxiao Song, Penghui Qi, Peng Peng, Zhenkun Tang, Wei Zhang, Weimin Li, Xiongjun Pi, Jujie He, Chao Gao, Haitao Long, Quan Yuan

Abstract: AlphaStar, the AI that reaches GrandMaster level in StarCraft II, is a remarkable milestone demonstrating what deep reinforcement learning can achieve in complex Real-Time Strategy (RTS) games. However, the complexities of the game, algorithms and systems, and especially the tremendous amount of computation needed are big obstacles for the community to conduct further research in this direction. W… ▽ More AlphaStar, the AI that reaches GrandMaster level in StarCraft II, is a remarkable milestone demonstrating what deep reinforcement learning can achieve in complex Real-Time Strategy (RTS) games. However, the complexities of the game, algorithms and systems, and especially the tremendous amount of computation needed are big obstacles for the community to conduct further research in this direction. We propose a deep reinforcement learning agent, StarCraft Commander (SCC). With order of magnitude less computation, it demonstrates top human performance defeating GrandMaster players in test matches and top professional players in a live event. Moreover, it shows strong robustness to various human strategies and discovers novel strategies unseen from human plays. In this paper, we will share the key insights and optimizations on efficient imitation learning and reinforcement learning for StarCraft II full game. △ Less

Submitted 9 June, 2021; v1 submitted 24 December, 2020; originally announced December 2020.

Comments: ICML 2021 camera ready

arXiv:2011.09701 [pdf]

Spectral Response Function Guided Deep Optimization-driven Network for Spectral Super-resolution

Authors: Jiang He, Jie Li, Qiangqiang Yuan, Huanfeng Shen, Liangpei Zhang

Abstract: Hyperspectral images are crucial for many research works. Spectral super-resolution (SSR) is a method used to obtain high spatial resolution (HR) hyperspectral images from HR multispectral images. Traditional SSR methods include model-driven algorithms and deep learning. By unfolding a variational method, this paper proposes an optimization-driven convolutional neural network (CNN) with a deep spa… ▽ More Hyperspectral images are crucial for many research works. Spectral super-resolution (SSR) is a method used to obtain high spatial resolution (HR) hyperspectral images from HR multispectral images. Traditional SSR methods include model-driven algorithms and deep learning. By unfolding a variational method, this paper proposes an optimization-driven convolutional neural network (CNN) with a deep spatial-spectral prior, resulting in physically interpretable networks. Unlike the fully data-driven CNN, auxiliary spectral response function (SRF) is utilized to guide CNNs to group the bands with spectral relevance. In addition, the channel attention module (CAM) and reformulated spectral angle mapper loss function are applied to achieve an effective reconstruction model. Finally, experiments on two types of datasets, including natural and remote sensing images, demonstrate the spectral enhancement effect of the proposed method. And the classification results on the remote sensing dataset also verified the validity of the information enhanced by the proposed method. △ Less

Submitted 8 December, 2020; v1 submitted 19 November, 2020; originally announced November 2020.

arXiv:2011.08968 [pdf, other]

Contrastive Weight Regularization for Large Minibatch SGD

Authors: Qiwei Yuan, Weizhe Hua, Yi Zhou, Cunxi Yu

Abstract: The minibatch stochastic gradient descent method (SGD) is widely applied in deep learning due to its efficiency and scalability that enable training deep networks with a large volume of data. Particularly in the distributed setting, SGD is usually applied with large batch size. However, as opposed to small-batch SGD, neural network models trained with large-batch SGD can hardly generalize well, i.… ▽ More The minibatch stochastic gradient descent method (SGD) is widely applied in deep learning due to its efficiency and scalability that enable training deep networks with a large volume of data. Particularly in the distributed setting, SGD is usually applied with large batch size. However, as opposed to small-batch SGD, neural network models trained with large-batch SGD can hardly generalize well, i.e., the validation accuracy is low. In this work, we introduce a novel regularization technique, namely distinctive regularization (DReg), which replicates a certain layer of the deep network and encourages the parameters of both layers to be diverse. The DReg technique introduces very little computation overhead. Moreover, we empirically show that optimizing the neural network with DReg using large-batch SGD achieves a significant boost in the convergence and improved generalization performance. We also demonstrate that DReg can boost the convergence of large-batch SGD with momentum. We believe that DReg can be used as a simple regularization trick to accelerate large-batch training in deep learning. △ Less

Submitted 17 November, 2020; originally announced November 2020.

arXiv:2010.05525 [pdf, ps, other]

Large Scale Product Graph Construction for Recommendation in E-commerce

Authors: Xiaoyong Yang, Yadong Zhu, Yi Zhang, Xiaobo Wang, Quan Yuan

Abstract: Building a recommendation system that serves billions of users on daily basis is a challenging problem, as the system needs to make astronomical number of predictions per second based on real-time user behaviors with O(1) time complexity. Such kind of large scale recommendation systems usually rely heavily on pre-built index of products to speedup the recommendation service so that online user wai… ▽ More Building a recommendation system that serves billions of users on daily basis is a challenging problem, as the system needs to make astronomical number of predictions per second based on real-time user behaviors with O(1) time complexity. Such kind of large scale recommendation systems usually rely heavily on pre-built index of products to speedup the recommendation service so that online user waiting time is un-noticeable. One important indexing structure is the product-product index, where one can retrieval a list of ranked products given a seed product. The index can be viewed as a weighted product-product graph. In this paper, we present our novel technologies to efficiently build such kind of indexed product graphs. In particular, we propose the Swing algorithm to capture the substitute relationships between products, which can utilize the substructures of user-item click bi-partitive graph. Then we propose the Surprise algorithm for the modeling of complementary product relationships, which utilizes product category information and solves the sparsity problem of user co-purchasing graph via clustering technique. Base on these two approaches, we can build the basis product graph for recommendation in Taobao. The approaches are evaluated comprehensively with both offline and online experiments, and the results demonstrate the effectiveness and efficiency of the work. △ Less

Submitted 12 October, 2020; originally announced October 2020.

Showing 1–50 of 83 results for author: Yuan, Q