subscribe to arXiv mailings

arXiv:2406.03138 [pdf, other]

A Frame-based Attention Interpretation Method for Relevant Acoustic Feature Extraction in Long Speech Depression Detection

Authors: Qingkun Deng, Saturnino Luz, Sofia de la Fuente Garcia

Abstract: Speech-based depression detection tools could help early screening of depression. Here, we address two issues that may hinder the clinical practicality of such tools: segment-level labelling noise and a lack of model interpretability. We propose a speech-level Audio Spectrogram Transformer to avoid segment-level labelling. We observe that the proposed model significantly outperforms a segment-leve… ▽ More Speech-based depression detection tools could help early screening of depression. Here, we address two issues that may hinder the clinical practicality of such tools: segment-level labelling noise and a lack of model interpretability. We propose a speech-level Audio Spectrogram Transformer to avoid segment-level labelling. We observe that the proposed model significantly outperforms a segment-level model, providing evidence for the presence of segment-level labelling noise in audio modality and the advantage of longer-duration speech analysis for depression detection. We introduce a frame-based attention interpretation method to extract acoustic features from prediction-relevant waveform signals for interpretation by clinicians. Through interpretation, we observe that the proposed model identifies reduced loudness and F0 as relevant signals of depression, which aligns with the speech characteristics of depressed patients documented in clinical studies. △ Less

Submitted 7 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

Comments: 5 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:2309.13476

arXiv:2405.00344 [pdf, other]

Expert Insight-Enhanced Follow-up Chest X-Ray Summary Generation

Authors: Zhichuan Wang, Kinhei Lee, Qiao Deng, Tiffany Y. So, Wan Hang Chiu, Yeung Yu Hui, Bingjing Zhou, Edward S. Hui

Abstract: A chest X-ray radiology report describes abnormal findings not only from X-ray obtained at current examination, but also findings on disease progression or change in device placement with reference to the X-ray from previous examination. Majority of the efforts on automatic generation of radiology report pertain to reporting the former, but not the latter, type of findings. To the best of the auth… ▽ More A chest X-ray radiology report describes abnormal findings not only from X-ray obtained at current examination, but also findings on disease progression or change in device placement with reference to the X-ray from previous examination. Majority of the efforts on automatic generation of radiology report pertain to reporting the former, but not the latter, type of findings. To the best of the authors' knowledge, there is only one work dedicated to generating summary of the latter findings, i.e., follow-up summary. In this study, we therefore propose a transformer-based framework to tackle this task. Motivated by our observations on the significance of medical lexicon on the fidelity of summary generation, we introduce two mechanisms to bestow expert insight to our model, namely expert soft guidance and masked entity modeling loss. The former mechanism employs a pretrained expert disease classifier to guide the presence level of specific abnormalities, while the latter directs the model's attention toward medical lexicon. Extensive experiments were conducted to demonstrate that the performance of our model is competitive with or exceeds the state-of-the-art. △ Less

Submitted 6 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

Comments: accepted by 22nd International Conference on Artificial Intelligence in medicine (AIME2024)

ACM Class: I.2.1

arXiv:2404.18081 [pdf, other]

ComposerX: Multi-Agent Symbolic Music Composition with LLMs

Authors: Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, Yizhi Li, Yinghao Ma, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenwu Wang, Guangyu Xia, Wei Xue, Yike Guo

Abstract: Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. While demonstrating impressive capabilities in STEM subjects, current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and C… ▽ More Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. While demonstrating impressive capabilities in STEM subjects, current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and Chain-of-Thoughts. To further explore and enhance LLMs' potential in music composition by leveraging their reasoning ability and the large knowledge base in music history and theory, we propose ComposerX, an agent-based symbolic music generation framework. We find that applying a multi-agent approach significantly improves the music composition quality of GPT-4. The results demonstrate that ComposerX is capable of producing coherent polyphonic music compositions with captivating melodies, while adhering to user instructions. △ Less

Submitted 30 April, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.14750 [pdf, other]

Grounded Knowledge-Enhanced Medical VLP for Chest X-Ray

Authors: Qiao Deng, Zhongzhen Huang, Yunqi Wang, Zhichuan Wang, Zhao Wang, Xiaofan Zhang, Qi Dou, Yeung Yu Hui, Edward S. Hui

Abstract: Medical vision-language pre-training has emerged as a promising approach for learning domain-general representations of medical image and text. Current algorithms that exploit the global and local alignment between medical image and text could however be marred by the redundant information in medical data. To address this issue, we propose a grounded knowledge-enhanced medical vision-language pre-… ▽ More Medical vision-language pre-training has emerged as a promising approach for learning domain-general representations of medical image and text. Current algorithms that exploit the global and local alignment between medical image and text could however be marred by the redundant information in medical data. To address this issue, we propose a grounded knowledge-enhanced medical vision-language pre-training (GK-MVLP) framework for chest X-ray. In this framework, medical knowledge is grounded to the appropriate anatomical regions by using a transformer-based grounded knowledge-enhanced module for fine-grained alignment between anatomical region-level visual features and the textural features of medical knowledge. The performance of GK-MVLP is competitive with or exceeds the state of the art on downstream chest X-ray disease classification, disease localization, report generation, and medical visual question-answering tasks. Our results show the advantage of incorporating grounding mechanism to remove biases and improve the alignment between chest X-ray image and radiology report. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.13983 [pdf, other]

Structure-Aware Human Body Reshaping with Adaptive Affinity-Graph Network

Authors: Qiwen Deng, Yangcen Liu, Wen Li, Guoqing Wang

Abstract: Given a source portrait, the automatic human body reshaping task aims at editing it to an aesthetic body shape. As the technology has been widely used in media, several methods have been proposed mainly focusing on generating optical flow to warp the body shape. However, those previous works only consider the local transformation of different body parts (arms, torso, and legs), ignoring the global… ▽ More Given a source portrait, the automatic human body reshaping task aims at editing it to an aesthetic body shape. As the technology has been widely used in media, several methods have been proposed mainly focusing on generating optical flow to warp the body shape. However, those previous works only consider the local transformation of different body parts (arms, torso, and legs), ignoring the global affinity, and limiting the capacity to ensure consistency and quality across the entire body. In this paper, we propose a novel Adaptive Affinity-Graph Network (AAGN), which extracts the global affinity between different body parts to enhance the quality of the generated optical flow. Specifically, our AAGN primarily introduces the following designs: (1) we propose an Adaptive Affinity-Graph (AAG) Block that leverages the characteristic of a fully connected graph. AAG represents different body parts as nodes in an adaptive fully connected graph and captures all the affinities between nodes to obtain a global affinity map. The design could better improve the consistency between body parts. (2) Besides, for high-frequency details are crucial for photo aesthetics, a Body Shape Discriminator (BSD) is designed to extract information from both high-frequency and spatial domain. Particularly, an SRM filter is utilized to extract high-frequency details, which are combined with spatial features as input to the BSD. With this design, BSD guides the Flow Generator (FG) to pay attention to various fine details rather than rigid pixel-level fitting. Extensive experiments conducted on the BR-5K dataset demonstrate that our framework significantly enhances the aesthetic appeal of reshaped photos, marginally surpassing all previous work to achieve state-of-the-art in all evaluation metrics. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 11 pages;

arXiv:2404.10004 [pdf]

A Strategy Transfer and Decision Support Approach for Epidemic Control in Experience Shortage Scenarios

Authors: X. Xiao, P. Chen, X. Cao, K. Liu, L. Deng, D. Zhao, Z. Chen, Q. Deng, F. Yu, H. Zhang

Abstract: Epidemic outbreaks can cause critical health concerns and severe global economic crises. For countries or regions with new infectious disease outbreaks, it is essential to generate preventive strategies by learning lessons from others with similar risk profiles. A Strategy Transfer and Decision Support Approach (STDSA) is proposed based on the profile similarity evaluation. There are four steps in… ▽ More Epidemic outbreaks can cause critical health concerns and severe global economic crises. For countries or regions with new infectious disease outbreaks, it is essential to generate preventive strategies by learning lessons from others with similar risk profiles. A Strategy Transfer and Decision Support Approach (STDSA) is proposed based on the profile similarity evaluation. There are four steps in this method: (1) The similarity evaluation indicators are determined from three dimensions, i.e., the Basis of National Epidemic Prevention & Control, Social Resilience, and Infection Situation. (2) The data related to the indicators are collected and preprocessed. (3) The first round of screening on the preprocessed dataset is conducted through an improved collaborative filtering algorithm to calculate the preliminary similarity result from the perspective of the infection situation. (4) Finally, the K-Means model is used for the second round of screening to obtain the final similarity values. The approach will be applied to decision-making support in the context of COVID-19. Our results demonstrate that the recommendations generated by the STDSA model are more accurate and aligned better with the actual situation than those produced by pure K-means models. This study will provide new insights into preventing and controlling epidemics in regions that lack experience. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 20 pages, 9 figures

arXiv:2404.06769 [pdf]

Solving the Food-Energy-Water Nexus Problem via Intelligent Optimization Algorithms

Authors: Qi Deng, Zheng Fan, Zhi Li, Xinna Pan, Qi Kang, MengChu Zhou

Abstract: The application of evolutionary algorithms (EAs) to multi-objective optimization problems has been widespread. However, the EA research community has not paid much attention to large-scale multi-objective optimization problems arising from real-world applications. Especially, Food-Energy-Water systems are intricately linked among food, energy and water that impact each other. They usually involve… ▽ More The application of evolutionary algorithms (EAs) to multi-objective optimization problems has been widespread. However, the EA research community has not paid much attention to large-scale multi-objective optimization problems arising from real-world applications. Especially, Food-Energy-Water systems are intricately linked among food, energy and water that impact each other. They usually involve a huge number of decision variables and many conflicting objectives to be optimized. Solving their related optimization problems is essentially important to sustain the high-quality life of human beings. Their solution space size expands exponentially with the number of decision variables. Searching in such a vast space is challenging because of such large numbers of decision variables and objective functions. In recent years, a number of large-scale many-objectives optimization evolutionary algorithms have been proposed. In this paper, we solve a Food-Energy-Water optimization problem by using the state-of-art intelligent optimization methods and compare their performance. Our results conclude that the algorithm based on an inverse model outperforms the others. This work should be highly useful for practitioners to select the most suitable method for their particular large-scale engineering optimization problems. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2402.18070 [pdf, other]

A Hierarchical Dataflow-Driven Heterogeneous Architecture for Wireless Baseband Processing

Authors: Limin Jiang, Yi Shi, Haiqin Hu, Qingyu Deng, Siyi Xu, Yintao Liu, Feng Yuan, Si Wang, Yihao Shen, Fangfang Ye, Shan Cao, Zhiyuan Jiang

Abstract: Wireless baseband processing (WBP) is a key element of wireless communications, with a series of signal processing modules to improve data throughput and counter channel fading. Conventional hardware solutions, such as digital signal processors (DSPs) and more recently, graphic processing units (GPUs), provide various degrees of parallelism, yet they both fail to take into account the cyclical and… ▽ More Wireless baseband processing (WBP) is a key element of wireless communications, with a series of signal processing modules to improve data throughput and counter channel fading. Conventional hardware solutions, such as digital signal processors (DSPs) and more recently, graphic processing units (GPUs), provide various degrees of parallelism, yet they both fail to take into account the cyclical and consecutive character of WBP. Furthermore, the large amount of data in WBPs cannot be processed quickly in symmetric multiprocessors (SMPs) due to the unpredictability of memory latency. To address this issue, we propose a hierarchical dataflow-driven architecture to accelerate WBP. A pack-and-ship approach is presented under a non-uniform memory access (NUMA) architecture to allow the subordinate tiles to operate in a bundled access and execute manner. We also propose a multi-level dataflow model and the related scheduling scheme to manage and allocate the heterogeneous hardware resources. Experiment results demonstrate that our prototype achieves $2\times$ and $2.3\times$ speedup in terms of normalized throughput and single-tile clock cycles compared with GPU and DSP counterparts in several critical WBP benchmarks. Additionally, a link-level throughput of $288$ Mbps can be achieved with a $45$-core configuration. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 7 pages, 7 figures, conference

arXiv:2402.11202 [pdf, other]

Towards Scalability and Extensibility of Query Reformulation Modeling in E-commerce Search

Authors: Ziqi Zhang, Yupin Huang, Quan Deng, Jinghui Xiao, Vivek Mittal, Jingyuan Deng

Abstract: Customer behavioral data significantly impacts e-commerce search systems. However, in the case of less common queries, the associated behavioral data tends to be sparse and noisy, offering inadequate support to the search mechanism. To address this challenge, the concept of query reformulation has been introduced. It suggests that less common queries could utilize the behavior patterns of their po… ▽ More Customer behavioral data significantly impacts e-commerce search systems. However, in the case of less common queries, the associated behavioral data tends to be sparse and noisy, offering inadequate support to the search mechanism. To address this challenge, the concept of query reformulation has been introduced. It suggests that less common queries could utilize the behavior patterns of their popular counterparts with similar meanings. In Amazon product search, query reformulation has displayed its effectiveness in improving search relevance and bolstering overall revenue. Nonetheless, adapting this method for smaller or emerging businesses operating in regions with lower traffic and complex multilingual settings poses the challenge in terms of scalability and extensibility. This study focuses on overcoming this challenge by constructing a query reformulation solution capable of functioning effectively, even when faced with limited training data, in terms of quality and scale, along with relatively complex linguistic characteristics. In this paper we provide an overview of the solution implemented within Amazon product search infrastructure, which encompasses a range of elements, including refining the data mining process, redefining model training objectives, and reshaping training strategies. The effectiveness of the proposed solution is validated through online A/B testing on search ranking and Ads matching. Notably, employing the proposed solution in search ranking resulted in 0.14% and 0.29% increase in overall revenue in Japanese and Hindi cases, respectively, and a 0.08\% incremental gain in the English case compared to the legacy implementation; while in search Ads matching led to a 0.36% increase in Ads revenue in the Japanese case. △ Less

Submitted 17 February, 2024; originally announced February 2024.

arXiv:2401.13971 [pdf, other]

Stochastic Weakly Convex Optimization Beyond Lipschitz Continuity

Authors: Wenzhi Gao, Qi Deng

Abstract: This paper considers stochastic weakly convex optimization without the standard Lipschitz continuity assumption. Based on new adaptive regularization (stepsize) strategies, we show that a wide class of stochastic algorithms, including the stochastic subgradient method, preserve the $\mathcal{O} ( 1 / \sqrt{K})$ convergence rate with constant failure rate. Our analyses rest on rather weak assumptio… ▽ More This paper considers stochastic weakly convex optimization without the standard Lipschitz continuity assumption. Based on new adaptive regularization (stepsize) strategies, we show that a wide class of stochastic algorithms, including the stochastic subgradient method, preserve the $\mathcal{O} ( 1 / \sqrt{K})$ convergence rate with constant failure rate. Our analyses rest on rather weak assumptions: the Lipschitz parameter can be either bounded by a general growth function of $\|x\|$ or locally estimated through independent random samples. △ Less

Submitted 25 January, 2024; originally announced January 2024.

arXiv:2311.11572 [pdf, other]

Cryogenic quasi-static embedded DRAM for energy-efficient compute-in-memory applications

Authors: Yuhao Shu, Hongtu Zhang, Hao Sun, Mengru Zhang, Wenfeng Zhao, Qi Deng, Zhidong Tang, Yumeng Yuan, Yongqi Hu, Yu Gu, Xufeng Kou, Yajun Ha

Abstract: Compute-in-memory (CIM) presents an attractive approach for energy-efficient computing in data-intensive applications. However, the development of suitable memory designs to achieve high-performance CIM remains a challenging task. Here, we propose a cryogenic quasi-static embedded DRAM to address the logic-memory mismatch of CIM. Guided by the re-calibrated cryogenic device model, the designed fou… ▽ More Compute-in-memory (CIM) presents an attractive approach for energy-efficient computing in data-intensive applications. However, the development of suitable memory designs to achieve high-performance CIM remains a challenging task. Here, we propose a cryogenic quasi-static embedded DRAM to address the logic-memory mismatch of CIM. Guided by the re-calibrated cryogenic device model, the designed four-transistor bit-cell achieves full-swing data storage, low power consumption, and extended retention time at cryogenic temperatures. Combined with the adoption of cryogenic write bitline biasing technique and readout circuitry optimization, our 4Kb cryogenic eDRAM chip demonstrates a 1.37$\times$10$^6$ times improvement in retention time, while achieving a 75 times improvement in retention variability, compared to room-temperature operation. Moreover, it also achieves outstanding power performance with a retention power of 112 fW and a dynamic power of 108 $μ$W at 4.2 K, which can be further decreased by 7.1% and 13.6% using the dynamic voltage scaling technique. This work reveals the great potential of cryogenic CMOS for high-density data storage and lays a solid foundation for energy-efficient CIM implementations. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2310.11973 [pdf, other]

Decentralized Gradient-Free Methods for Stochastic Non-Smooth Non-Convex Optimization

Authors: Zhenwei Lin, Jingfan Xia, Qi Deng, Luo Luo

Abstract: We consider decentralized gradient-free optimization of minimizing Lipschitz continuous functions that satisfy neither smoothness nor convexity assumption. We propose two novel gradient-free algorithms, the Decentralized Gradient-Free Method (DGFM) and its variant, the Decentralized Gradient-Free Method$^+$ (DGFM$^{+}$). Based on the techniques of randomized smoothing and gradient tracking, DGFM r… ▽ More We consider decentralized gradient-free optimization of minimizing Lipschitz continuous functions that satisfy neither smoothness nor convexity assumption. We propose two novel gradient-free algorithms, the Decentralized Gradient-Free Method (DGFM) and its variant, the Decentralized Gradient-Free Method$^+$ (DGFM$^{+}$). Based on the techniques of randomized smoothing and gradient tracking, DGFM requires the computation of the zeroth-order oracle of a single sample in each iteration, making it less demanding in terms of computational resources for individual computing nodes. Theoretically, DGFM achieves a complexity of $\mathcal O(d^{3/2}δ^{-1}\varepsilon ^{-4})$ for obtaining an $(δ,\varepsilon)$-Goldstein stationary point. DGFM$^{+}$, an advanced version of DGFM, incorporates variance reduction to further improve the convergence behavior. It samples a mini-batch at each iteration and periodically draws a larger batch of data, which improves the complexity to $\mathcal O(d^{3/2}δ^{-1} \varepsilon^{-3})$. Moreover, experimental results underscore the empirical advantages of our proposed algorithms when applied to real-world datasets. △ Less

Submitted 18 October, 2023; originally announced October 2023.

arXiv:2309.13476 [pdf, other]

Hierarchical attention interpretation: an interpretable speech-level transformer for bi-modal depression detection

Authors: Qingkun Deng, Saturnino Luz, Sofia de la Fuente Garcia

Abstract: Depression is a common mental disorder. Automatic depression detection tools using speech, enabled by machine learning, help early screening of depression. This paper addresses two limitations that may hinder the clinical implementations of such tools: noise resulting from segment-level labelling and a lack of model interpretability. We propose a bi-modal speech-level transformer to avoid segment-… ▽ More Depression is a common mental disorder. Automatic depression detection tools using speech, enabled by machine learning, help early screening of depression. This paper addresses two limitations that may hinder the clinical implementations of such tools: noise resulting from segment-level labelling and a lack of model interpretability. We propose a bi-modal speech-level transformer to avoid segment-level labelling and introduce a hierarchical interpretation approach to provide both speech-level and sentence-level interpretations, based on gradient-weighted attention maps derived from all attention layers to track interactions between input features. We show that the proposed model outperforms a model that learns at a segment level ($p$=0.854, $r$=0.947, $F1$=0.897 compared to $p$=0.732, $r$=0.808, $F1$=0.768). For model interpretation, using one true positive sample, we show which sentences within a given speech are most relevant to depression detection; and which text tokens and Mel-spectrogram regions within these sentences are most relevant to depression detection. These interpretations allow clinicians to verify the validity of predictions made by depression detection tools, promoting their clinical implementations. △ Less

Submitted 6 October, 2023; v1 submitted 23 September, 2023; originally announced September 2023.

Comments: 5 pages, 3 figures, submitted to IEEE International Conference on Acoustics, Speech, and Signal Processing

ACM Class: F.2.2; I.2.7

arXiv:2309.00753 [pdf, ps, other]

Jamming Suppression Via Resource Hopping in High-Mobility OTFS-SCMA Systems

Authors: Qinwen Deng, Yao Ge, Zhi Ding

Abstract: This letter studies the mechanism of uplink multiple access and jamming suppression in an OTFS system. Specifically, we propose a novel resource hopping mechanism for orthogonal time frequency space (OTFS) systems with delay or Doppler partitioned sparse code multiple access (SCMA) to mitigate the effect of jamming in controlled multiuser uplink. We analyze the non-uniform impact of classic jammin… ▽ More This letter studies the mechanism of uplink multiple access and jamming suppression in an OTFS system. Specifically, we propose a novel resource hopping mechanism for orthogonal time frequency space (OTFS) systems with delay or Doppler partitioned sparse code multiple access (SCMA) to mitigate the effect of jamming in controlled multiuser uplink. We analyze the non-uniform impact of classic jamming signals such as narrowband interference (NBI) and periodic impulse noise (PIN) in delay-Doppler (DD) domain on OTFS systems. Leveraging turbo equalization, our proposed hopping method demonstrates consistent BER performance improvement under jamming over conventional OTFS-SCMA systems compared to static resource allocation schemes. △ Less

Submitted 1 September, 2023; originally announced September 2023.

arXiv:2308.10767 [pdf, other]

GBM-based Bregman Proximal Algorithms for Constrained Learning

Authors: Zhenwei Lin, Qi Deng

Abstract: As the complexity of learning tasks surges, modern machine learning encounters a new constrained learning paradigm characterized by more intricate and data-driven function constraints. Prominent applications include Neyman-Pearson classification (NPC) and fairness classification, which entail specific risk constraints that render standard projection-based training algorithms unsuitable. Gradient b… ▽ More As the complexity of learning tasks surges, modern machine learning encounters a new constrained learning paradigm characterized by more intricate and data-driven function constraints. Prominent applications include Neyman-Pearson classification (NPC) and fairness classification, which entail specific risk constraints that render standard projection-based training algorithms unsuitable. Gradient boosting machines (GBMs) are among the most popular algorithms for supervised learning; however, they are generally limited to unconstrained settings. In this paper, we adapt the GBM for constrained learning tasks within the framework of Bregman proximal algorithms. We introduce a new Bregman primal-dual method with a global optimality guarantee when the learning objective and constraint functions are convex. In cases of nonconvex functions, we demonstrate how our algorithm remains effective under a Bregman proximal point framework. Distinct from existing constrained learning algorithms, ours possess a unique advantage in their ability to seamlessly integrate with publicly available GBM implementations such as XGBoost (Chen and Guestrin, 2016) and LightGBM (Ke et al., 2017), exclusively relying on their public interfaces. We provide substantial experimental evidence to showcase the effectiveness of the Bregman algorithm framework. While our primary focus is on NPC and fairness ML, our framework holds significant potential for a broader range of constrained learning applications. The source code is currently freely available at https://github.com/zhenweilin/ConstrainedGBM}{https://github.com/zhenweilin/ConstrainedGBM. △ Less

Submitted 21 August, 2023; originally announced August 2023.

arXiv:2308.10630 [pdf, other]

A Homogenization Approach for Gradient-Dominated Stochastic Optimization

Authors: Jiyuan Tan, Chenyu Xue, Chuwen Zhang, Qi Deng, Dongdong Ge, Yinyu Ye

Abstract: Gradient dominance property is a condition weaker than strong convexity, yet sufficiently ensures global convergence even in non-convex optimization. This property finds wide applications in machine learning, reinforcement learning (RL), and operations management. In this paper, we propose the stochastic homogeneous second-order descent method (SHSODM) for stochastic functions enjoying gradient do… ▽ More Gradient dominance property is a condition weaker than strong convexity, yet sufficiently ensures global convergence even in non-convex optimization. This property finds wide applications in machine learning, reinforcement learning (RL), and operations management. In this paper, we propose the stochastic homogeneous second-order descent method (SHSODM) for stochastic functions enjoying gradient dominance property based on a recently proposed homogenization approach. Theoretically, we provide its sample complexity analysis, and further present an enhanced result by incorporating variance reduction techniques. Our findings show that SHSODM matches the best-known sample complexity achieved by other second-order methods for gradient-dominated stochastic optimization but without cubic regularization. Empirically, since the homogenization approach only relies on solving extremal eigenvector problem at each iteration instead of Newton-type system, our methods gain the advantage of cheaper computational cost and robustness in ill-conditioned problems. Numerical experiments on several RL tasks demonstrate the better performance of SHSODM compared to other off-the-shelf methods. △ Less

Submitted 29 May, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

Comments: Accepted by UAI`24

arXiv:2305.13823 [pdf, other]

XRoute Environment: A Novel Reinforcement Learning Environment for Routing

Authors: Zhanwen Zhou, Hankz Hankui Zhuo, Xiaowu Zhang, Qiyuan Deng

Abstract: Routing is a crucial and time-consuming stage in modern design automation flow for advanced technology nodes. Great progress in the field of reinforcement learning makes it possible to use those approaches to improve the routing quality and efficiency. However, the scale of the routing problems solved by reinforcement learning-based methods in recent studies is too small for these methods to be us… ▽ More Routing is a crucial and time-consuming stage in modern design automation flow for advanced technology nodes. Great progress in the field of reinforcement learning makes it possible to use those approaches to improve the routing quality and efficiency. However, the scale of the routing problems solved by reinforcement learning-based methods in recent studies is too small for these methods to be used in commercial EDA tools. We introduce the XRoute Environment, a new reinforcement learning environment where agents are trained to select and route nets in an advanced, end-to-end routing framework. Novel algorithms and ideas can be quickly tested in a safe and reproducible manner in it. The resulting environment is challenging, easy to use, customize and add additional scenarios, and it is available under a permissive open-source license. In addition, it provides support for distributed deployment and multi-instance experiments. We propose two tasks for learning and build a full-chip test bed with routing benchmarks of various region sizes. We also pre-define several static routing regions with different pin density and number of nets for easier learning and testing. For net ordering task, we report baseline results for two widely used reinforcement learning algorithms (PPO and DQN) and one searching-based algorithm (TritonRoute). The XRoute Environment will be available at https://github.com/xplanlab/xroute_env. △ Less

Submitted 5 June, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: arXiv admin note: text overlap with arXiv:1907.11180 by other authors

arXiv:2305.10181 [pdf, other]

Exploring the cloud of feature interaction scores in a Rashomon set

Authors: Sichao Li, Rong Wang, Quanling Deng, Amanda Barnard

Abstract: Interactions among features are central to understanding the behavior of machine learning models. Recent research has made significant strides in detecting and quantifying feature interactions in single predictive models. However, we argue that the feature interactions extracted from a single pre-specified model may not be trustworthy since: a well-trained predictive model may not preserve the tru… ▽ More Interactions among features are central to understanding the behavior of machine learning models. Recent research has made significant strides in detecting and quantifying feature interactions in single predictive models. However, we argue that the feature interactions extracted from a single pre-specified model may not be trustworthy since: a well-trained predictive model may not preserve the true feature interactions and there exist multiple well-performing predictive models that differ in feature interaction strengths. Thus, we recommend exploring feature interaction strengths in a model class of approximately equally accurate predictive models. In this work, we introduce the feature interaction score (FIS) in the context of a Rashomon set, representing a collection of models that achieve similar accuracy on a given task. We propose a general and practical algorithm to calculate the FIS in the model class. We demonstrate the properties of the FIS via synthetic data and draw connections to other areas of statistics. Additionally, we introduce a Halo plot for visualizing the feature interaction variance in high-dimensional space and a swarm plot for analyzing FIS in a Rashomon set. Experiments with recidivism prediction and image classification illustrate how feature interactions can vary dramatically in importance for similarly accurate predictive models. Our results suggest that the proposed FIS can provide valuable insights into the nature of feature interactions in machine learning models. △ Less

Submitted 11 February, 2024; v1 submitted 17 May, 2023; originally announced May 2023.

arXiv:2305.06802 [pdf, other]

Physics-Informed Neural Networks for Discovering Localised Eigenstates in Disordered Media

Authors: Liam Harcombe, Quanling Deng

Abstract: The Schrödinger equation with random potentials is a fundamental model for understanding the behaviour of particles in disordered systems. Disordered media are characterised by complex potentials that lead to the localisation of wavefunctions, also called Anderson localisation. These wavefunctions may have similar scales of eigenenergies which poses difficulty in their discovery. It has been a lon… ▽ More The Schrödinger equation with random potentials is a fundamental model for understanding the behaviour of particles in disordered systems. Disordered media are characterised by complex potentials that lead to the localisation of wavefunctions, also called Anderson localisation. These wavefunctions may have similar scales of eigenenergies which poses difficulty in their discovery. It has been a longstanding challenge due to the high computational cost and complexity of solving the Schrödinger equation. Recently, machine-learning tools have been adopted to tackle these challenges. In this paper, based upon recent advances in machine learning, we present a novel approach for discovering localised eigenstates in disordered media using physics-informed neural networks (PINNs). We focus on the spectral approximation of Hamiltonians in one dimension with potentials that are randomly generated according to the Bernoulli, normal, and uniform distributions. We introduce a novel feature to the loss function that exploits known physical phenomena occurring in these regions to scan across the domain and successfully discover these eigenstates, regardless of the similarity of their eigenenergies. We present various examples to demonstrate the performance of the proposed approach and compare it with isogeometric analysis. △ Less

Submitted 12 July, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

arXiv:2304.04778 [pdf, ps, other]

First-order methods for Stochastic Variational Inequality problems with Function Constraints

Authors: Digvijay Boob, Qi Deng, Mohammad Khalafi

Abstract: The monotone Variational Inequality (VI) is a general model with important applications in various engineering and scientific domains. In numerous instances, the VI problems are accompanied by function constraints that can be data-driven, making the usual projection operator challenging to compute. This paper presents novel first-order methods for the function-constrained Variational Inequality (F… ▽ More The monotone Variational Inequality (VI) is a general model with important applications in various engineering and scientific domains. In numerous instances, the VI problems are accompanied by function constraints that can be data-driven, making the usual projection operator challenging to compute. This paper presents novel first-order methods for the function-constrained Variational Inequality (FCVI) problem in smooth or nonsmooth settings with possibly stochastic operators and constraints. We introduce the AdOpEx method, which employs an operator extrapolation on the KKT operator of the FCVI in a smooth deterministic setting. Since this operator is not uniformly Lipschitz continuous in the Lagrange multipliers, we employ an adaptive two-timescale algorithm leading to bounded multipliers and achieving the optimal $O(1/T)$ convergence rate. For the nonsmooth and stochastic VIs, we introduce design changes to the AdOpEx method and propose a novel P-OpEx method that takes partial extrapolation. It converges at the rate of $O(1/\sqrt{T})$ when both the operator and constraints are stochastic or nonsmooth. This method has suboptimal dependence on the noise and Lipschitz constants of function constraints. We propose a constraint extrapolation approach leading to the OpConEx method that improves this dependence by an order of magnitude. All our algorithms easily extend to saddle point problems with function constraints that couple the primal and dual variables while maintaining the same complexity results. To the best of our knowledge, all our complexity results are new in the literature △ Less

Submitted 24 May, 2024; v1 submitted 10 April, 2023; originally announced April 2023.

arXiv:2303.15132 [pdf, other]

doi 10.1109/ICASSP49357.2023.10096820

Cross-utterance ASR Rescoring with Graph-based Label Propagation

Authors: Srinath Tankasala, Long Chen, Andreas Stolcke, Anirudh Raju, Qianli Deng, Chander Chandak, Aparna Khare, Roland Maas, Venkatesh Ravichandran

Abstract: We propose a novel approach for ASR N-best hypothesis rescoring with graph-based label propagation by leveraging cross-utterance acoustic similarity. In contrast to conventional neural language model (LM) based ASR rescoring/reranking models, our approach focuses on acoustic information and conducts the rescoring collaboratively among utterances, instead of individually. Experiments on the VCTK da… ▽ More We propose a novel approach for ASR N-best hypothesis rescoring with graph-based label propagation by leveraging cross-utterance acoustic similarity. In contrast to conventional neural language model (LM) based ASR rescoring/reranking models, our approach focuses on acoustic information and conducts the rescoring collaboratively among utterances, instead of individually. Experiments on the VCTK dataset demonstrate that our approach consistently improves ASR performance, as well as fairness across speaker groups with different accents. Our approach provides a low-cost solution for mitigating the majoritarian bias of ASR systems, without the need to train new domain- or accent-specific models. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Comments: To appear in IEEE ICASSP 2023

Journal ref: Proc. IEEE ICASSP, June 2023

arXiv:2303.03590 [pdf, other]

Research on Efficient Fuzzy Clustering Method Based on Local Fuzzy Granular balls

Authors: Jiang Xie, Qiao Deng, Shuyin Xia, Yangzhou Zhao, Guoyin Wang, Xinbo Gao

Abstract: In recent years, the problem of fuzzy clustering has been widely concerned. The membership iteration of existing methods is mostly considered globally, which has considerable problems in noisy environments, and iterative calculations for clusters with a large number of different sample sizes are not accurate and efficient. In this paper, starting from the strategy of large-scale priority, the data… ▽ More In recent years, the problem of fuzzy clustering has been widely concerned. The membership iteration of existing methods is mostly considered globally, which has considerable problems in noisy environments, and iterative calculations for clusters with a large number of different sample sizes are not accurate and efficient. In this paper, starting from the strategy of large-scale priority, the data is fuzzy iterated using granular-balls, and the membership degree of data only considers the two granular-balls where it is located, thus improving the efficiency of iteration. The formed fuzzy granular-balls set can use more processing methods in the face of different data scenarios, which enhances the practicability of fuzzy clustering calculations. △ Less

Submitted 8 March, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

arXiv:2302.08902 [pdf, other]

Fashion Image Retrieval with Multi-Granular Alignment

Authors: Jinkuan Zhu, Hao Huang, Qiao Deng, Xiyao Li

Abstract: Fashion image retrieval task aims to search relevant clothing items of a query image from the gallery. The previous recipes focus on designing different distance-based loss functions, pulling relevant pairs to be close and pushing irrelevant images apart. However, these methods ignore fine-grained features (e.g. neckband, cuff) of clothing images. In this paper, we propose a novel fashion image re… ▽ More Fashion image retrieval task aims to search relevant clothing items of a query image from the gallery. The previous recipes focus on designing different distance-based loss functions, pulling relevant pairs to be close and pushing irrelevant images apart. However, these methods ignore fine-grained features (e.g. neckband, cuff) of clothing images. In this paper, we propose a novel fashion image retrieval method leveraging both global and fine-grained features, dubbed Multi-Granular Alignment (MGA). Specifically, we design a Fine-Granular Aggregator(FGA) to capture and aggregate detailed patterns. Then we propose Attention-based Token Alignment (ATA) to align image features at the multi-granular level in a coarse-to-fine manner. To prove the effectiveness of our proposed method, we conduct experiments on two sub-tasks (In-Shop & Consumer2Shop) of the public fashion datasets DeepFashion. The experimental results show that our MGA outperforms the state-of-the-art methods by 1.8% and 0.6% in the two sub-tasks on the R@1 metric, respectively. △ Less

Submitted 7 March, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

arXiv:2302.08869 [pdf, other]

OTFS Signaling for SCMA With Coordinated Multi-Point Vehicle Communications

Authors: Yao Ge, Qinwen Deng, David González G., Yong Liang Guan, Zhi Ding

Abstract: This paper investigates an uplink coordinated multi-point (CoMP) coverage scenario, in which multiple mobile users are grouped for sparse code multiple access (SCMA), and served by the remote radio head (RRH) in front of them and the RRH behind them simultaneously. We apply orthogonal time frequency space (OTFS) modulation for each user to exploit the degrees of freedom arising from both the delay… ▽ More This paper investigates an uplink coordinated multi-point (CoMP) coverage scenario, in which multiple mobile users are grouped for sparse code multiple access (SCMA), and served by the remote radio head (RRH) in front of them and the RRH behind them simultaneously. We apply orthogonal time frequency space (OTFS) modulation for each user to exploit the degrees of freedom arising from both the delay and Doppler domains. As the signals received by the RRHs in front of and behind the users experience respectively positive and negative Doppler frequency shifts, our proposed OTFS-based SCMA (OBSCMA) with CoMP system can effectively harvest extra Doppler and spatial diversity for better performance. Based on maximum likelihood (ML) detector, we analyze the single-user average bit error rate (ABER) bound as the benchmark of the ABER performance for our proposed OBSCMA with CoMP system. We also develop a customized Gaussian approximation with expectation propagation (GAEP) algorithm for multi-user detection and propose efficient algorithm structures for centralized and decentralized detectors. Our proposed OBSCMA with CoMP system leads to stronger performance than the existing solutions. The proposed centralized and decentralized detectors exhibit effective reception and robustness under channel state information uncertainty. △ Less

Submitted 17 February, 2023; originally announced February 2023.

Comments: 15 pages, 12 figures, accepted by IEEE Transactions on Vehicular Technology

arXiv:2301.12713 [pdf, other]

Delayed Stochastic Algorithms for Distributed Weakly Convex Optimization

Authors: Wenzhi Gao, Qi Deng

Abstract: This paper studies delayed stochastic algorithms for weakly convex optimization in a distributed network with workers connected to a master node. Recently, Xu et al. 2022 showed that an inertial stochastic subgradient method converges at a rate of $\mathcal{O}(τ_{\text{max}}/\sqrt{K})$ which depends on the maximum information delay $τ_{\text{max}}$. In this work, we show that the delayed stochasti… ▽ More This paper studies delayed stochastic algorithms for weakly convex optimization in a distributed network with workers connected to a master node. Recently, Xu et al. 2022 showed that an inertial stochastic subgradient method converges at a rate of $\mathcal{O}(τ_{\text{max}}/\sqrt{K})$ which depends on the maximum information delay $τ_{\text{max}}$. In this work, we show that the delayed stochastic subgradient method ($\texttt{DSGD}$) obtains a tighter convergence rate which depends on the expected delay $\barτ$. Furthermore, for an important class of composition weakly convex problems, we develop a new delayed stochastic prox-linear ($\texttt{DSPL}$) method in which the delays only affect the high-order term in the complexity rate and hence, are negligible after a certain number of $\texttt{DSPL}$ iterations. In addition, we demonstrate the robustness of our proposed algorithms against arbitrary delays. By incorporating a simple safeguarding step in both methods, we achieve convergence rates that depend solely on the number of workers, eliminating the effect of the delay. Our numerical experiments further confirm the empirical superiority of our proposed methods. △ Less

Submitted 1 November, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

arXiv:2301.12174 [pdf, other]

Stochastic Dimension-reduced Second-order Methods for Policy Optimization

Authors: Jinsong Liu, Chenghan Xie, Qi Deng, Dongdong Ge, Yinyu Ye

Abstract: In this paper, we propose several new stochastic second-order algorithms for policy optimization that only require gradient and Hessian-vector product in each iteration, making them computationally efficient and comparable to policy gradient methods. Specifically, we propose a dimension-reduced second-order method (DR-SOPO) which repeatedly solves a projected two-dimensional trust region subproble… ▽ More In this paper, we propose several new stochastic second-order algorithms for policy optimization that only require gradient and Hessian-vector product in each iteration, making them computationally efficient and comparable to policy gradient methods. Specifically, we propose a dimension-reduced second-order method (DR-SOPO) which repeatedly solves a projected two-dimensional trust region subproblem. We show that DR-SOPO obtains an $\mathcal{O}(ε^{-3.5})$ complexity for reaching approximate first-order stationary condition and certain subspace second-order stationary condition. In addition, we present an enhanced algorithm (DVR-SOPO) which further improves the complexity to $\mathcal{O}(ε^{-3})$ based on the variance reduction technique. Preliminary experiments show that our proposed algorithms perform favorably compared with stochastic and variance-reduced policy gradient methods. △ Less

Submitted 28 January, 2023; originally announced January 2023.

arXiv:2212.11143 [pdf, other]

Efficient First-order Methods for Convex Optimization with Strongly Convex Function Constraints

Authors: Zhenwei Lin, Qi Deng

Abstract: In this paper, we introduce faster first-order primal-dual algorithms for minimizing a convex function subject to strongly convex function constraints. Before our work, the best complexity bound was $\mathcal{O}(1/{\varepsilon})$, and it remains unclear how to improve this result by leveraging the strong convexity assumption. We address this issue by developing novel techniques to progressively es… ▽ More In this paper, we introduce faster first-order primal-dual algorithms for minimizing a convex function subject to strongly convex function constraints. Before our work, the best complexity bound was $\mathcal{O}(1/{\varepsilon})$, and it remains unclear how to improve this result by leveraging the strong convexity assumption. We address this issue by developing novel techniques to progressively estimate the strong convexity of the Lagrangian function. Our approach yields an improved complexity of $\mathcal{O}(1/\sqrt{\varepsilon})$, matching the complexity lower bound for strongly-convex-concave saddle point optimization. We show the superior performance of our methods in sparsity-inducing constrained optimization, notably Google's personalized PageRank problem. Furthermore, we show that a restarted version of the proposed methods can effectively identify the sparsity pattern of the optimal solution within a finite number of steps, a result that appears to have independent significance. △ Less

Submitted 5 November, 2023; v1 submitted 21 December, 2022; originally announced December 2022.

Comments: new experiments

MSC Class: 90C25; 90C30; 90C06

arXiv:2212.06643 [pdf, other]

Boosting Semi-Supervised Learning with Contrastive Complementary Labeling

Authors: Qinyi Deng, Yong Guo, Zhibang Yang, Haolin Pan, Jian Chen

Abstract: Semi-supervised learning (SSL) has achieved great success in leveraging a large amount of unlabeled data to learn a promising classifier. A popular approach is pseudo-labeling that generates pseudo labels only for those unlabeled data with high-confidence predictions. As for the low-confidence ones, existing methods often simply discard them because these unreliable pseudo labels may mislead the m… ▽ More Semi-supervised learning (SSL) has achieved great success in leveraging a large amount of unlabeled data to learn a promising classifier. A popular approach is pseudo-labeling that generates pseudo labels only for those unlabeled data with high-confidence predictions. As for the low-confidence ones, existing methods often simply discard them because these unreliable pseudo labels may mislead the model. Nevertheless, we highlight that these data with low-confidence pseudo labels can be still beneficial to the training process. Specifically, although the class with the highest probability in the prediction is unreliable, we can assume that this sample is very unlikely to belong to the classes with the lowest probabilities. In this way, these data can be also very informative if we can effectively exploit these complementary labels, i.e., the classes that a sample does not belong to. Inspired by this, we propose a novel Contrastive Complementary Labeling (CCL) method that constructs a large number of reliable negative pairs based on the complementary labels and adopts contrastive learning to make use of all the unlabeled data. Extensive experiments demonstrate that CCL significantly improves the performance on top of existing methods. More critically, our CCL is particularly effective under the label-scarce settings. For example, we yield an improvement of 2.43% over FixMatch on CIFAR-10 only with 40 labeled data. △ Less

Submitted 27 December, 2022; v1 submitted 13 December, 2022; originally announced December 2022.

Comments: typos corrected, 5 figures, 3 tables,

arXiv:2211.13116 [pdf, other]

Fed-TDA: Federated Tabular Data Augmentation on Non-IID Data

Authors: Shaoming Duan, Chuanyi Liu, Peiyi Han, Tianyu He, Yifeng Xu, Qiyuan Deng

Abstract: Non-independent and identically distributed (non-IID) data is a key challenge in federated learning (FL), which usually hampers the optimization convergence and the performance of FL. Existing data augmentation methods based on federated generative models or raw data sharing strategies for solving the non-IID problem still suffer from low performance, privacy protection concerns, and high communic… ▽ More Non-independent and identically distributed (non-IID) data is a key challenge in federated learning (FL), which usually hampers the optimization convergence and the performance of FL. Existing data augmentation methods based on federated generative models or raw data sharing strategies for solving the non-IID problem still suffer from low performance, privacy protection concerns, and high communication overhead in decentralized tabular data. To tackle these challenges, we propose a federated tabular data augmentation method, named Fed-TDA. The core idea of Fed-TDA is to synthesize tabular data for data augmentation using some simple statistics (e.g., distributions of each column and global covariance). Specifically, we propose the multimodal distribution transformation and inverse cumulative distribution mapping respectively synthesize continuous and discrete columns in tabular data from a noise according to the pre-learned statistics. Furthermore, we theoretically analyze that our Fed-TDA not only preserves data privacy but also maintains the distribution of the original data and the correlation between columns. Through extensive experiments on five real-world tabular datasets, we demonstrate the superiority of Fed-TDA over the state-of-the-art in test performance and communication efficiency. △ Less

Submitted 12 January, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

arXiv:2210.12683 [pdf, other]

GAN-based Facial Attribute Manipulation

Authors: Yunfan Liu, Qi Li, Qiyao Deng, Zhenan Sun, Ming-Hsuan Yang

Abstract: Facial Attribute Manipulation (FAM) aims to aesthetically modify a given face image to render desired attributes, which has received significant attention due to its broad practical applications ranging from digital entertainment to biometric forensics. In the last decade, with the remarkable success of Generative Adversarial Networks (GANs) in synthesizing realistic images, numerous GAN-based mod… ▽ More Facial Attribute Manipulation (FAM) aims to aesthetically modify a given face image to render desired attributes, which has received significant attention due to its broad practical applications ranging from digital entertainment to biometric forensics. In the last decade, with the remarkable success of Generative Adversarial Networks (GANs) in synthesizing realistic images, numerous GAN-based models have been proposed to solve FAM with various problem formulation approaches and guiding information representations. This paper presents a comprehensive survey of GAN-based FAM methods with a focus on summarizing their principal motivations and technical details. The main contents of this survey include: (i) an introduction to the research background and basic concepts related to FAM, (ii) a systematic review of GAN-based FAM methods in three main categories, and (iii) an in-depth discussion of important properties of FAM methods, open issues, and future research directions. This survey not only builds a good starting point for researchers new to this field but also serves as a reference for the vision community. △ Less

Submitted 23 October, 2022; originally announced October 2022.

arXiv:2208.04740 [pdf, other]

Aesthetic Language Guidance Generation of Images Using Attribute Comparison

Authors: Xin Jin, Qiang Deng, Jianwen Lv, Heng Huang, Hao Lou, Chaoen Xiao

Abstract: With the vigorous development of mobile photography technology, major mobile phone manufacturers are scrambling to improve the shooting ability of equipments and the photo beautification algorithm of software. However, the improvement of intelligent equipments and algorithms cannot replace human subjective photography technology. In this paper, we propose the aesthetic language guidance of image (… ▽ More With the vigorous development of mobile photography technology, major mobile phone manufacturers are scrambling to improve the shooting ability of equipments and the photo beautification algorithm of software. However, the improvement of intelligent equipments and algorithms cannot replace human subjective photography technology. In this paper, we propose the aesthetic language guidance of image (ALG). We divide ALG into ALG-T and ALG-I according to whether the guiding rules are based on photography templates or guidance images. Whether it is ALG-T or ALG-I, we guide photography from three attributes of color, lighting and composition of the images. The differences of the three attributes between the input images and the photography templates or the guidance images are described in natural language, which is aesthetic natural language guidance (ALG). Also, because of the differences in lighting and composition between landscape images and portrait images, we divide the input images into landscape images and portrait images. Both ALG-T and ALG-I conduct aesthetic language guidance respectively for the two types of input images (landscape images and portrait images). △ Less

Submitted 9 August, 2022; originally announced August 2022.

Comments: 13 pages, 18 figures, on going research

arXiv:2208.04517 [pdf, other]

Attribute Controllable Beautiful Caucasian Face Generation by Aesthetics Driven Reinforcement Learning

Authors: Xin Jin, Shu Zhao, Le Zhang, Xin Zhao, Qiang Deng, Chaoen Xiao

Abstract: In recent years, image generation has made great strides in improving the quality of images, producing high-fidelity ones. Also, quite recently, there are architecture designs, which enable GAN to unsupervisedly learn the semantic attributes represented in different layers. However, there is still a lack of research on generating face images more consistent with human aesthetics. Based on EigenGAN… ▽ More In recent years, image generation has made great strides in improving the quality of images, producing high-fidelity ones. Also, quite recently, there are architecture designs, which enable GAN to unsupervisedly learn the semantic attributes represented in different layers. However, there is still a lack of research on generating face images more consistent with human aesthetics. Based on EigenGAN [He et al., ICCV 2021], we build the techniques of reinforcement learning into the generator of EigenGAN. The agent tries to figure out how to alter the semantic attributes of the generated human faces towards more preferable ones. To accomplish this, we trained an aesthetics scoring model that can conduct facial beauty prediction. We also can utilize this scoring model to analyze the correlation between face attributes and aesthetics scores. Empirically, using off-the-shelf techniques from reinforcement learning would not work well. So instead, we present a new variant incorporating the ingredients emerging in the reinforcement learning communities in recent years. Compared to the original generated images, the adjusted ones show clear distinctions concerning various attributes. Experimental results using the MindSpore, show the effectiveness of the proposed method. Altered facial images are commonly more attractive, with significantly improved aesthetic levels. △ Less

Submitted 8 August, 2022; originally announced August 2022.

Comments: 13 pages, 5 figures. ACM Multimedia 2022 Technical Demos and Videos Program

arXiv:2208.00238 [pdf, other]

Improving Fine-tuning of Self-supervised Models with Contrastive Initialization

Authors: Haolin Pan, Yong Guo, Qinyi Deng, Haomin Yang, Yiqun Chen, Jian Chen

Abstract: Self-supervised learning (SSL) has achieved remarkable performance in pretraining the models that can be further used in downstream tasks via fine-tuning. However, these self-supervised models may not capture meaningful semantic information since the images belonging to the same class are always regarded as negative pairs in the contrastive loss. Consequently, the images of the same class are ofte… ▽ More Self-supervised learning (SSL) has achieved remarkable performance in pretraining the models that can be further used in downstream tasks via fine-tuning. However, these self-supervised models may not capture meaningful semantic information since the images belonging to the same class are always regarded as negative pairs in the contrastive loss. Consequently, the images of the same class are often located far away from each other in learned feature space, which would inevitably hamper the fine-tuning process. To address this issue, we seek to provide a better initialization for the self-supervised models by enhancing the semantic information. To this end, we propose a Contrastive Initialization (COIN) method that breaks the standard fine-tuning pipeline by introducing an extra initialization stage before fine-tuning. Extensive experiments show that, with the enriched semantics, our COIN significantly outperforms existing methods without introducing extra training cost and sets new state-of-the-arts on multiple downstream tasks. △ Less

Submitted 30 July, 2022; originally announced August 2022.

Comments: 22 pages, 4 figures

arXiv:2207.01806 [pdf, other]

Aesthetic Attribute Assessment of Images Numerically on Mixed Multi-attribute Datasets

Authors: Xin Jin, Xinning Li, Hao Lou, Chenyu Fan, Qiang Deng, Chaoen Xiao, Shuai Cui, Amit Kumar Singh

Abstract: With the continuous development of social software and multimedia technology, images have become a kind of important carrier for spreading information and socializing. How to evaluate an image comprehensively has become the focus of recent researches. The traditional image aesthetic assessment methods often adopt single numerical overall assessment scores, which has certain subjectivity and can no… ▽ More With the continuous development of social software and multimedia technology, images have become a kind of important carrier for spreading information and socializing. How to evaluate an image comprehensively has become the focus of recent researches. The traditional image aesthetic assessment methods often adopt single numerical overall assessment scores, which has certain subjectivity and can no longer meet the higher aesthetic requirements. In this paper, we construct an new image attribute dataset called aesthetic mixed dataset with attributes(AMD-A) and design external attribute features for fusion. Besides, we propose a efficient method for image aesthetic attribute assessment on mixed multi-attribute dataset and construct a multitasking network architecture by using the EfficientNet-B0 as the backbone network. Our model can achieve aesthetic classification, overall scoring and attribute scoring. In each sub-network, we improve the feature extraction through ECA channel attention module. As for the final overall scoring, we adopt the idea of the teacher-student network and use the classification sub-network to guide the aesthetic overall fine-grain regression. Experimental results, using the MindSpore, show that our proposed method can effectively improve the performance of the aesthetic overall and attribute assessment. △ Less

Submitted 5 July, 2022; originally announced July 2022.

Comments: 7 pages, 9figures, to appear: ACM Transactions on Multimedia Computing Communications and Applications (TOMM)

arXiv:2206.09946 [pdf]

doi 10.36190/2022.42

Short Video Uprising: How #BlackLivesMatter Content on TikTok Challenges the Protest Paradigm

Authors: Yanru Jiang, Xin Jin, Qinhao Deng

Abstract: This study uses TikTok (N = 8,173) to examine how short-form video platforms challenge the protest paradigm in the recent Black Lives Matter movement. A computer-mediated visual analysis, computer vision, is employed to identify the presence of four visual frames of protest (riot, confrontation, spectacle, and debate) in multimedia content. Results of descriptive statistics and the t-test indicate… ▽ More This study uses TikTok (N = 8,173) to examine how short-form video platforms challenge the protest paradigm in the recent Black Lives Matter movement. A computer-mediated visual analysis, computer vision, is employed to identify the presence of four visual frames of protest (riot, confrontation, spectacle, and debate) in multimedia content. Results of descriptive statistics and the t-test indicate that the three delegitimizing frames - riot, confrontation, and spectacle - are rarely found on TikTok, whereas the debate frame, that empowers marginalized communities, dominates the public sphere. However, although the three delegitimizing frames receive lower social media visibility, as measured by views, likes, shares, followers, and durations, legitimizing elements, such as the debate frame, minority identities, and unofficial sources, are not generally favored by TikTok audiences. This study concludes that while short-form video platforms could potentially challenge the protest paradigm on the content creators' side, the audiences' preference as measured by social media visibility might still be moderately associated with the protest paradigm. △ Less

Submitted 20 June, 2022; originally announced June 2022.

Comments: Workshop Proceedings of the 16th International AAAI Conference on Web and Social Media

arXiv:2204.12326 [pdf, other]

doi 10.1145/3477495.3532005

Investigating Accuracy-Novelty Performance for Graph-based Collaborative Filtering

Authors: Minghao Zhao, Le Wu, Yile Liang, Lei Chen, Jian Zhang, Qilin Deng, Kai Wang, Xudong Shen, Tangjie Lv, Runze Wu

Abstract: Recent years have witnessed the great accuracy performance of graph-based Collaborative Filtering (CF) models for recommender systems. By taking the user-item interaction behavior as a graph, these graph-based CF models borrow the success of Graph Neural Networks (GNN), and iteratively perform neighborhood aggregation to propagate the collaborative signals. While conventional CF models are known f… ▽ More Recent years have witnessed the great accuracy performance of graph-based Collaborative Filtering (CF) models for recommender systems. By taking the user-item interaction behavior as a graph, these graph-based CF models borrow the success of Graph Neural Networks (GNN), and iteratively perform neighborhood aggregation to propagate the collaborative signals. While conventional CF models are known for facing the challenges of the popularity bias that favors popular items, one may wonder "Whether the existing graph-based CF models alleviate or exacerbate popularity bias of recommender systems?" To answer this question, we first investigate the two-fold performances w.r.t. accuracy and novelty for existing graph-based CF methods. The empirical results show that symmetric neighborhood aggregation adopted by most existing graph-based CF models exacerbate the popularity bias and this phenomenon becomes more serious as the depth of graph propagation increases. Further, we theoretically analyze the cause of popularity bias for graph-based CF. Then, we propose a simple yet effective plugin, namely r-AdjNorm, to achieve an accuracy-novelty trade-off by controlling the normalization strength in the neighborhood aggregation process. Meanwhile, r-AdjNorm can be smoothly applied to the existing graph-based CF backbones without additional computation. Finally, experimental results on three benchmark datasets show that our proposed method can improve novelty without sacrificing accuracy under various graph-based CF backbones. △ Less

Submitted 27 April, 2022; v1 submitted 26 April, 2022; originally announced April 2022.

Comments: To appear in SIGIR 2022

arXiv:2203.15334 [pdf, other]

AnyFace: Free-style Text-to-Face Synthesis and Manipulation

Authors: Jianxin Sun, Qiyao Deng, Qi Li, Muyi Sun, Min Ren, Zhenan Sun

Abstract: Existing text-to-image synthesis methods generally are only applicable to words in the training dataset. However, human faces are so variable to be described with limited words. So this paper proposes the first free-style text-to-face method namely AnyFace enabling much wider open world applications such as metaverse, social media, cosmetics, forensics, etc. AnyFace has a novel two-stream framewor… ▽ More Existing text-to-image synthesis methods generally are only applicable to words in the training dataset. However, human faces are so variable to be described with limited words. So this paper proposes the first free-style text-to-face method namely AnyFace enabling much wider open world applications such as metaverse, social media, cosmetics, forensics, etc. AnyFace has a novel two-stream framework for face image synthesis and manipulation given arbitrary descriptions of the human face. Specifically, one stream performs text-to-face generation and the other conducts face image reconstruction. Facial text and image features are extracted using the CLIP (Contrastive Language-Image Pre-training) encoders. And a collaborative Cross Modal Distillation (CMD) module is designed to align the linguistic and visual features across these two streams. Furthermore, a Diverse Triplet Loss (DT loss) is developed to model fine-grained features and improve facial diversity. Extensive experiments on Multi-modal CelebA-HQ and CelebAText-HQ demonstrate significant advantages of AnyFace over state-of-the-art methods. AnyFace can achieve high-quality, high-resolution, and high-diversity face synthesis and manipulation results without any constraints on the number and content of input captions. △ Less

Submitted 29 March, 2022; originally announced March 2022.

arXiv:2203.12373 [pdf, ps, other]

A boundary-penalized isogeometric analysis for second-order hyperbolic equations

Authors: Quanling Deng, Pouria Behnoudfar, Victor Calo

Abstract: Explicit time-marching schemes are popular for solving time-dependent partial differential equations; one of the biggest challenges these methods suffer is increasing the critical time-marching step size that guarantees numerical stability. In general, there are two ways to increase the critical step size. One is to reduce the stiffness of the spatially discretized system, while the other is to de… ▽ More Explicit time-marching schemes are popular for solving time-dependent partial differential equations; one of the biggest challenges these methods suffer is increasing the critical time-marching step size that guarantees numerical stability. In general, there are two ways to increase the critical step size. One is to reduce the stiffness of the spatially discretized system, while the other is to design time-marching schemes with larger stability regions. In this paper, we focus on the recently proposed explicit generalized-$α$ method for second-order hyperbolic equations and increase the critical step size by reducing the stiffness of the isogeometric-discretized system. In particular, we apply boundary penalization to lessen the system's stiffness. For $p$-th order $C^{p-1}$ isogeometric elements, we show numerically that the critical step size increases by a factor of $\sqrt{\frac{p^2-3p+6}{4}}$, which indicates the advantages of using the proposed method, especially for high-order elements. Various examples in one, two, and three dimensions validate the performance of the proposed technique. △ Less

Submitted 23 March, 2022; originally announced March 2022.

arXiv:2111.15018 [pdf, other]

Hyperspectral Image Segmentation based on Graph Processing over Multilayer Networks

Authors: Songyang Zhang, Qinwen Deng, Zhi Ding

Abstract: Hyperspectral imaging is an important sensing technology with broad applications and impact in areas including environmental science, weather, and geo/space exploration. One important task of hyperspectral image (HSI) processing is the extraction of spectral-spatial features. Leveraging on the recent-developed graph signal processing over multilayer networks (M-GSP), this work proposes several app… ▽ More Hyperspectral imaging is an important sensing technology with broad applications and impact in areas including environmental science, weather, and geo/space exploration. One important task of hyperspectral image (HSI) processing is the extraction of spectral-spatial features. Leveraging on the recent-developed graph signal processing over multilayer networks (M-GSP), this work proposes several approaches to HSI segmentation based on M-GSP feature extraction. To capture joint spectral-spatial information, we first customize a tensor-based multilayer network (MLN) model for HSI, and define a MLN singular space for feature extraction. We then develop an unsupervised HSI segmentation method by utilizing MLN spectral clustering. Regrouping HSI pixels via MLN-based clustering, we further propose a semi-supervised HSI classification based on multi-resolution fusions of superpixels. Our experimental results demonstrate the strength of M-GSP in HSI processing and spectral-spatial information extraction. △ Less

Submitted 29 November, 2021; originally announced November 2021.

arXiv:2111.00203 [pdf, other]

doi 10.1145/3474085.3475280

Imitating Arbitrary Talking Style for Realistic Audio-DrivenTalking Face Synthesis

Authors: Haozhe Wu, Jia Jia, Haoyu Wang, Yishun Dou, Chao Duan, Qingshan Deng

Abstract: People talk with diversified styles. For one piece of speech, different talking styles exhibit significant differences in the facial and head pose movements. For example, the "excited" style usually talks with the mouth wide open, while the "solemn" style is more standardized and seldomly exhibits exaggerated motions. Due to such huge differences between different styles, it is necessary to incorp… ▽ More People talk with diversified styles. For one piece of speech, different talking styles exhibit significant differences in the facial and head pose movements. For example, the "excited" style usually talks with the mouth wide open, while the "solemn" style is more standardized and seldomly exhibits exaggerated motions. Due to such huge differences between different styles, it is necessary to incorporate the talking style into audio-driven talking face synthesis framework. In this paper, we propose to inject style into the talking face synthesis framework through imitating arbitrary talking style of the particular reference video. Specifically, we systematically investigate talking styles with our collected \textit{Ted-HD} dataset and construct style codes as several statistics of 3D morphable model~(3DMM) parameters. Afterwards, we devise a latent-style-fusion~(LSF) model to synthesize stylized talking faces by imitating talking styles from the style codes. We emphasize the following novel characteristics of our framework: (1) It doesn't require any annotation of the style, the talking style is learned in an unsupervised manner from talking videos in the wild. (2) It can imitate arbitrary styles from arbitrary videos, and the style codes can also be interpolated to generate new styles. Extensive experiments demonstrate that the proposed framework has the ability to synthesize more natural and expressive talking styles compared with baseline methods. △ Less

Submitted 30 October, 2021; originally announced November 2021.

Comments: Accepted by MM2021, code available at https://github.com/wuhaozhe/style_avatar

ACM Class: I.1.4

arXiv:2110.11073 [pdf, other]

RL4RS: A Real-World Dataset for Reinforcement Learning based Recommender System

Authors: Kai Wang, Zhene Zou, Minghao Zhao, Qilin Deng, Yue Shang, Yile Liang, Runze Wu, Xudong Shen, Tangjie Lyu, Changjie Fan

Abstract: Reinforcement learning based recommender systems (RL-based RS) aim at learning a good policy from a batch of collected data, by casting recommendations to multi-step decision-making tasks. However, current RL-based RS research commonly has a large reality gap. In this paper, we introduce the first open-source real-world dataset, RL4RS, hoping to replace the artificial datasets and semi-simulated R… ▽ More Reinforcement learning based recommender systems (RL-based RS) aim at learning a good policy from a batch of collected data, by casting recommendations to multi-step decision-making tasks. However, current RL-based RS research commonly has a large reality gap. In this paper, we introduce the first open-source real-world dataset, RL4RS, hoping to replace the artificial datasets and semi-simulated RS datasets previous studies used due to the resource limitation of the RL-based RS domain. Unlike academic RL research, RL-based RS suffers from the difficulties of being well-validated before deployment. We attempt to propose a new systematic evaluation framework, including evaluation of environment simulation, evaluation on environments, counterfactual policy evaluation, and evaluation on environments built from test set. In summary, the RL4RS (Reinforcement Learning for Recommender Systems), a new resource with special concerns on the reality gaps, contains two real-world datasets, data understanding tools, tuned simulation environments, related advanced RL baselines, batch RL baselines, and counterfactual policy evaluation algorithms. The RL4RS suite can be found at https://github.com/fuxiAIlab/RL4RS. In addition to the RL-based recommender systems, we expect the resource to contribute to research in applied reinforcement learning. △ Less

Submitted 17 April, 2023; v1 submitted 18 October, 2021; originally announced October 2021.

Comments: 4-th version, SIGIR2023

arXiv:2108.03562 [pdf, other]

Master Graduation Thesis: A Lightweight and Distributed Container-based Framework

Authors: Qifan Deng, Rajkumar Buyya

Abstract: Edge/Fog computing is a novel computing paradigm that provides resource-limited Internet of Things (IoT) devices with scalable computing and storage resources. Compared to cloud computing, edge/fog servers have fewer resources, but they can be accessed with higher bandwidth and less communication latency. Thus, integrating edge/fog and cloud infrastructures can support the execution of diverse lat… ▽ More Edge/Fog computing is a novel computing paradigm that provides resource-limited Internet of Things (IoT) devices with scalable computing and storage resources. Compared to cloud computing, edge/fog servers have fewer resources, but they can be accessed with higher bandwidth and less communication latency. Thus, integrating edge/fog and cloud infrastructures can support the execution of diverse latency-sensitive and computation-intensive IoT applications. Although some frameworks attempt to provide such integration, there are still several challenges to be addressed, such as dynamic scheduling of different IoT applications, scalability mechanisms, multi-platform support, and supporting different interaction models. To overcome these challenges, we propose a lightweight and distributed container-based framework, called FogBus2. It provides a mechanism for scheduling heterogeneous IoT applications and implements several scheduling policies. Also, it proposes an optimized genetic algorithm to obtain fast convergence to well-suited solutions. Besides, it offers a scalability mechanism to ensure efficient responsiveness when either the number of IoT devices increases or the resources become overburdened. Also, the dynamic resource discovery mechanism of FogBus2 assists new entities to quickly join the system. We have also developed two IoT applications, called Conway's Game of Life and Video Optical Character Recognition to demonstrate the effectiveness of FogBus2 for handling real-time and non-real-time IoT applications. Experimental results show FogBus2's scheduling policy improves the response time of IoT applications by 53\% compared to other policies. Also, the scalability mechanism can reduce up to 48\% of the queuing waiting time compared to frameworks that do not support scalability. △ Less

Submitted 7 August, 2021; originally announced August 2021.

Comments: https://github.com/cloudslab/fogbus2

arXiv:2108.00591 [pdf, other]

Resource Management in Edge and Fog Computing using FogBus2 Framework

Authors: Mohammad Goudarzi, Qifan Deng, Rajkumar Buyya

Abstract: Edge/Fog computing is a novel computing paradigm that provides resource-limited Internet of Things (IoT) devices with scalable computing and storage resources. Compared to cloud computing, edge/fog servers have fewer resources, but they can be accessed with higher bandwidth and less communication latency. Thus, integrating edge/fog and cloud infrastructures can support the execution of diverse lat… ▽ More Edge/Fog computing is a novel computing paradigm that provides resource-limited Internet of Things (IoT) devices with scalable computing and storage resources. Compared to cloud computing, edge/fog servers have fewer resources, but they can be accessed with higher bandwidth and less communication latency. Thus, integrating edge/fog and cloud infrastructures can support the execution of diverse latency-sensitive and computation-intensive IoT applications. Although some frameworks attempt to provide such integration, there are still several challenges to be addressed, such as dynamic scheduling of different IoT applications, scalability mechanisms, multi-platform support, and supporting different interaction models. FogBus2, as a new python-based framework, offers a lightweight and distributed container-based framework to overcome these challenges. In this chapter, we highlight key features of the FogBus2 framework alongside describing its main components. Besides, we provide a step-by-step guideline to set up an integrated computing environment, containing multiple cloud service providers (Hybrid-cloud) and edge devices, which is a prerequisite for any IoT application scenario. To obtain this, a low-overhead communication network among all computing resources is initiated by the provided scripts and configuration files. Next, we provide instructions and corresponding code snippets to install and run the main framework and its integrated applications. Finally, we demonstrate how to implement and integrate several new IoT applications and custom scheduling and scalability policies with the FogBus2 framework. △ Less

Submitted 1 August, 2021; originally announced August 2021.

Comments: Software Availability: The source code of the FogBus2 framework and newly implemented IoT applications and scheduling policies are accessible from the CLOUDS Laboratory GitHub webpage: https://github.com/Cloudslab/FogBus2

arXiv:2106.03034 [pdf, other]

Minibatch and Momentum Model-based Methods for Stochastic Weakly Convex Optimization

Authors: Qi Deng, Wenzhi Gao

Abstract: Stochastic model-based methods have received increasing attention lately due to their appealing robustness to the stepsize selection and provable efficiency guarantee. We make two important extensions for improving model-based methods on stochastic weakly convex optimization. First, we propose new minibatch model-based methods by involving a set of samples to approximate the model function in each… ▽ More Stochastic model-based methods have received increasing attention lately due to their appealing robustness to the stepsize selection and provable efficiency guarantee. We make two important extensions for improving model-based methods on stochastic weakly convex optimization. First, we propose new minibatch model-based methods by involving a set of samples to approximate the model function in each iteration. For the first time, we show that stochastic algorithms achieve linear speedup over the batch size even for non-smooth and non-convex (particularly, weakly convex) problems. To this end, we develop a novel sensitivity analysis of the proximal mapping involved in each algorithm iteration. Our analysis appears to be of independent interests in more general settings. Second, motivated by the success of momentum stochastic gradient descent, we propose a new stochastic extrapolated model-based method, greatly extending the classic Polyak momentum technique to a wider class of stochastic algorithms for weakly convex optimization. The rate of convergence to some natural stationarity condition is established over a fairly flexible range of extrapolation terms. While mainly focusing on weakly convex optimization, we also extend our work to convex optimization. We apply the minibatch and extrapolated model-based methods to stochastic convex optimization, for which we provide a new complexity bound and promising linear speedup in batch size. Moreover, an accelerated model-based method based on Nesterov's momentum is presented, for which we establish an optimal complexity bound for reaching optimality. △ Less

Submitted 12 November, 2021; v1 submitted 6 June, 2021; originally announced June 2021.

Comments: 39 pages, 9 figures

arXiv:2104.05307 [pdf, other]

Personalized Bundle Recommendation in Online Games

Authors: Qilin Deng, Kai Wang, Minghao Zhao, Zhene Zou, Runze Wu, Jianrong Tao, Changjie Fan, Liang Chen

Abstract: In business domains, \textit{bundling} is one of the most important marketing strategies to conduct product promotions, which is commonly used in online e-commerce and offline retailers. Existing recommender systems mostly focus on recommending individual items that users may be interested in. In this paper, we target at a practical but less explored recommendation problem named bundle recommendat… ▽ More In business domains, \textit{bundling} is one of the most important marketing strategies to conduct product promotions, which is commonly used in online e-commerce and offline retailers. Existing recommender systems mostly focus on recommending individual items that users may be interested in. In this paper, we target at a practical but less explored recommendation problem named bundle recommendation, which aims to offer a combination of items to users. To tackle this specific recommendation problem in the context of the \emph{virtual mall} in online games, we formalize it as a link prediction problem on a user-item-bundle tripartite graph constructed from the historical interactions, and solve it with a neural network model that can learn directly on the graph-structure data. Extensive experiments on three public datasets and one industrial game dataset demonstrate the effectiveness of the proposed method. Further, the bundle recommendation model has been deployed in production for more than one year in a popular online game developed by Netease Games, and the launch of the model yields more than 60\% improvement on conversion rate of bundles, and a relative improvement of more than 15\% on gross merchandise volume (GMV). △ Less

Submitted 12 April, 2021; originally announced April 2021.

Comments: 8 pages, 10 figures, accepted paper on CIKM 2020

arXiv:2104.02981 [pdf, other]

Reinforcement Learning with a Disentangled Universal Value Function for Item Recommendation

Authors: Kai Wang, Zhene Zou, Qilin Deng, Runze Wu, Jianrong Tao, Changjie Fan, Liang Chen, Peng Cui

Abstract: In recent years, there are great interests as well as challenges in applying reinforcement learning (RL) to recommendation systems (RS). In this paper, we summarize three key practical challenges of large-scale RL-based recommender systems: massive state and action spaces, high-variance environment, and the unspecific reward setting in recommendation. All these problems remain largely unexplored i… ▽ More In recent years, there are great interests as well as challenges in applying reinforcement learning (RL) to recommendation systems (RS). In this paper, we summarize three key practical challenges of large-scale RL-based recommender systems: massive state and action spaces, high-variance environment, and the unspecific reward setting in recommendation. All these problems remain largely unexplored in the existing literature and make the application of RL challenging. We develop a model-based reinforcement learning framework, called GoalRec. Inspired by the ideas of world model (model-based), value function estimation (model-free), and goal-based RL, a novel disentangled universal value function designed for item recommendation is proposed. It can generalize to various goals that the recommender may have, and disentangle the stochastic environmental dynamics and high-variance reward signals accordingly. As a part of the value function, free from the sparse and high-variance reward signals, a high-capacity reward-independent world model is trained to simulate complex environmental dynamics under a certain goal. Based on the predicted environmental dynamics, the disentangled universal value function is related to the user's future trajectory instead of a monolithic state and a scalar reward. We demonstrate the superiority of GoalRec over previous approaches in terms of the above three practical challenges in a series of simulations and a real application. △ Less

Submitted 11 April, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

Comments: 9 pages, 4 figures, to be published in Proceedings of the AAAI Conference on Artificial Intelligence 2021

arXiv:2104.01975 [pdf, other]

Cascaded Robust Learning at Imperfect Labels for Chest X-ray Segmentation

Authors: Cheng Xue, Qiao Deng, Xiaomeng Li, Qi Dou, Pheng Ann Heng

Abstract: The superior performance of CNN on medical image analysis heavily depends on the annotation quality, such as the number of labeled image, the source of image, and the expert experience. The annotation requires great expertise and labour. To deal with the high inter-rater variability, the study of imperfect label has great significance in medical image segmentation tasks. In this paper, we present… ▽ More The superior performance of CNN on medical image analysis heavily depends on the annotation quality, such as the number of labeled image, the source of image, and the expert experience. The annotation requires great expertise and labour. To deal with the high inter-rater variability, the study of imperfect label has great significance in medical image segmentation tasks. In this paper, we present a novel cascaded robust learning framework for chest X-ray segmentation with imperfect annotation. Our model consists of three independent network, which can effectively learn useful information from the peer networks. The framework includes two stages. In the first stage, we select the clean annotated samples via a model committee setting, the networks are trained by minimizing a segmentation loss using the selected clean samples. In the second stage, we design a joint optimization framework with label correction to gradually correct the wrong annotation and improve the network performance. We conduct experiments on the public chest X-ray image datasets collected by Shenzhen Hospital. The results show that our methods could achieve a significant improvement on the accuracy in segmentation tasks compared to the previous methods. △ Less

Submitted 5 April, 2021; originally announced April 2021.

Comments: 9pages, 4 figures. MICCAI 2020

arXiv:2103.06999 [pdf, other]

doi 10.1109/TIP.2022.3149225

An Efficient Hypergraph Approach to Robust Point Cloud Resampling

Authors: Qinwen Deng, Songyang Zhang, Zhi Ding

Abstract: Efficient processing and feature extraction of largescale point clouds are important in related computer vision and cyber-physical systems. This work investigates point cloud resampling based on hypergraph signal processing (HGSP) to better explore the underlying relationship among different cloud points and to extract contour-enhanced features. Specifically, we design hypergraph spectral filters… ▽ More Efficient processing and feature extraction of largescale point clouds are important in related computer vision and cyber-physical systems. This work investigates point cloud resampling based on hypergraph signal processing (HGSP) to better explore the underlying relationship among different cloud points and to extract contour-enhanced features. Specifically, we design hypergraph spectral filters to capture multi-lateral interactions among the signal nodes of point clouds and to better preserve their surface outlines. Without the need and the computation to first construct the underlying hypergraph, our low complexity approach directly estimates hypergraph spectrum of point clouds by leveraging hypergraph stationary processes from the observed 3D coordinates. Evaluating the proposed resampling methods with several metrics, our test results validate the high efficacy of hypergraph characterization of point clouds and demonstrate the robustness of hypergraph-based resampling under noisy observations. △ Less

Submitted 11 March, 2021; originally announced March 2021.

arXiv:2103.00657 [pdf, other]

Achieving Competitive Play Through Bottom-Up Approach in Semantic Segmentation

Authors: E. Pryzant, Q. Deng, B. Mei, E. Shrestha

Abstract: With the renaissance of neural networks, object detection has slowly shifted from a bottom-up recognition problem to a top-down approach. Best in class algorithms enumerate a near-complete list of objects and classify each into object/not object. In this paper, we show that strong performance can still be achieved using a bottom-up approach for vision-based object recognition tasks and achieve com… ▽ More With the renaissance of neural networks, object detection has slowly shifted from a bottom-up recognition problem to a top-down approach. Best in class algorithms enumerate a near-complete list of objects and classify each into object/not object. In this paper, we show that strong performance can still be achieved using a bottom-up approach for vision-based object recognition tasks and achieve competitive video game play. We propose PuckNet, which is used to detect four extreme points (top, left, bottom, and right-most points) and one center point of objects using a fully convolutional neural network. Object detection is then a purely keypoint-based appearance estimation problem, without implicit feature learning or region classification. The method proposed herein performs on-par with the best in class region-based detection methods, with a bounding box AP of 36.4% on COCO test-dev. In addition, the extreme points estimated directly resolve into a rectangular object mask, with a COCO Mask AP of 17.6%, outperforming the Mask AP of vanilla bounding boxes. Guided segmentation of extreme points further improves this to 32.1% Mask AP. We applied the PuckNet vision system to the SuperTuxKart video game to test it's capacity to achieve competitive play in dynamic and co-operative multiplayer environments. △ Less

Submitted 28 February, 2021; originally announced March 2021.

arXiv:2102.04682 [pdf, other]

OTFS Signaling for Uplink NOMA of Heterogeneous Mobility Users

Authors: Yao Ge, Qinwen Deng, P. C. Ching, Zhi Ding

Abstract: We investigate a coded uplink non-orthogonal multiple access (NOMA) configuration in which groups of co-channel users are modulated in accordance with orthogonal time frequency space (OTFS). We take advantage of OTFS characteristics to achieve NOMA spectrum sharing in the delay-Doppler domain between stationary and mobile users. We develop an efficient iterative turbo receiver based on the princip… ▽ More We investigate a coded uplink non-orthogonal multiple access (NOMA) configuration in which groups of co-channel users are modulated in accordance with orthogonal time frequency space (OTFS). We take advantage of OTFS characteristics to achieve NOMA spectrum sharing in the delay-Doppler domain between stationary and mobile users. We develop an efficient iterative turbo receiver based on the principle of successive interference cancellation (SIC) to overcome the co-channel interference (CCI). We propose two turbo detector algorithms: orthogonal approximate message passing with linear minimum mean squared error (OAMP-LMMSE) and Gaussian approximate message passing with expectation propagation (GAMP-EP). The interactive OAMP-LMMSE detector and GAMP-EP detector are respectively assigned for the reception of the stationary and mobile users. We analyze the convergence performance of our proposed iterative SIC turbo receiver by utilizing a customized extrinsic information transfer (EXIT) chart and simplify the corresponding detector algorithms to further reduce receiver complexity. Our proposed iterative SIC turbo receiver demonstrates performance improvement over existing receivers and robustness against imperfect SIC process and channel state information uncertainty. △ Less

Submitted 9 February, 2021; originally announced February 2021.

Comments: 31 pages, 10 figures, accepted by IEEE Transactions on Communications

Showing 1–50 of 67 results for author: Deng, Q