subscribe to arXiv mailings

Quantized Prompt for Efficient Generalization of Vision-Language Models

Authors: Tianxiang Hao, Xiaohan Ding, Juexiao Feng, Yuhong Yang, Hui Chen, Guiguang Ding

Abstract: In the past few years, large-scale pre-trained vision-language models like CLIP have achieved tremendous success in various fields. Naturally, how to transfer the rich knowledge in such huge pre-trained models to downstream tasks and datasets becomes a hot topic. During downstream adaptation, the most challenging problems are overfitting and catastrophic forgetting, which can cause the model to ov… ▽ More In the past few years, large-scale pre-trained vision-language models like CLIP have achieved tremendous success in various fields. Naturally, how to transfer the rich knowledge in such huge pre-trained models to downstream tasks and datasets becomes a hot topic. During downstream adaptation, the most challenging problems are overfitting and catastrophic forgetting, which can cause the model to overly focus on the current data and lose more crucial domain-general knowledge. Existing works use classic regularization techniques to solve the problems. As solutions become increasingly complex, the ever-growing storage and inference costs are also a significant problem that urgently needs to be addressed. While in this paper, we start from an observation that proper random noise can suppress overfitting and catastrophic forgetting. Then we regard quantization error as a kind of noise, and explore quantization for regularizing vision-language model, which is quite efficiency and effective. Furthermore, to improve the model's generalization capability while maintaining its specialization capacity at minimal cost, we deeply analyze the characteristics of the weight distribution in prompts, conclude several principles for quantization module design and follow such principles to create several competitive baselines. The proposed method is significantly efficient due to its inherent lightweight nature, making it possible to adapt on extremely resource-limited devices. Our method can be fruitfully integrated into many existing approaches like MaPLe, enhancing accuracy while reducing storage overhead, making it more powerful yet versatile. Extensive experiments on 11 datasets shows great superiority of our method sufficiently. Code is available at https://github.com/beyondhtx/QPrompt. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 14 pages, 7 figures. Accepted by ECCV 2024

arXiv:2405.10941 [pdf, other]

Variational Quantum Algorithm Landscape Reconstruction by Low-Rank Tensor Completion

Authors: Tianyi Hao, Zichang He, Ruslan Shaydulin, Marco Pistoia, Swamit Tannu

Abstract: Variational quantum algorithms (VQAs) are a broad class of algorithms with many applications in science and industry. Applying a VQA to a problem involves optimizing a parameterized quantum circuit by maximizing or minimizing a cost function. A particular challenge associated with VQAs is understanding the properties of associated cost functions. Having the landscapes of VQA cost functions can gre… ▽ More Variational quantum algorithms (VQAs) are a broad class of algorithms with many applications in science and industry. Applying a VQA to a problem involves optimizing a parameterized quantum circuit by maximizing or minimizing a cost function. A particular challenge associated with VQAs is understanding the properties of associated cost functions. Having the landscapes of VQA cost functions can greatly assist in developing and testing new variational quantum algorithms, but they are extremely expensive to compute. Reconstructing the landscape of a VQA using existing techniques requires a large number of cost function evaluations, especially when the dimension or the resolution of the landscape is high. To address this challenge, we propose a low-rank tensor-completion-based approach for local landscape reconstruction. By leveraging compact low-rank representations of tensors, our technique can overcome the curse of dimensionality and handle high-resolution landscapes. We demonstrate the power of landscapes in VQA development by showcasing practical applications of analyzing penalty terms for constrained optimization problems and examining the probability landscapes of certain basis states. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2404.17179 [pdf, other]

Meta-Object: Interactive and Multisensory Virtual Object Learned from the Real World for the Post-Metaverse

Authors: Dooyoung Kim, Taewook Ha, Jinseok Hong, Seonji Kim, Selin Choi, Heejeong Ko, Woontack Woo

Abstract: With the proliferation of wearable Augmented Reality/Virtual Reality (AR/VR) devices, ubiquitous virtual experiences seamlessly integrate into daily life through metaverse platforms. To support immersive metaverse experiences akin to reality, we propose a next-generation virtual object, a meta-object, a property-embedded virtual object that contains interactive and multisensory characteristics lea… ▽ More With the proliferation of wearable Augmented Reality/Virtual Reality (AR/VR) devices, ubiquitous virtual experiences seamlessly integrate into daily life through metaverse platforms. To support immersive metaverse experiences akin to reality, we propose a next-generation virtual object, a meta-object, a property-embedded virtual object that contains interactive and multisensory characteristics learned from the real world. Current virtual objects differ significantly from real-world objects due to restricted sensory feedback based on limited physical properties. To leverage meta-objects in the metaverse, three key components are needed: meta-object modeling and property embedding, interaction-adaptive multisensory feedback, and an intelligence simulation-based post-metaverse platform. Utilizing meta-objects that enable both on-site and remote users to interact as if they were engaging with real objects could contribute to the advent of the post-metaverse era through wearable AR/VR devices. △ Less

Submitted 28 April, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

Comments: 12 pages, 4 figures, under review in the IEEE CG&A magazine

arXiv:2403.09192 [pdf, other]

PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation

Authors: Yizhe Xiong, Hui Chen, Tianxiang Hao, Zijia Lin, Jungong Han, Yuesong Zhang, Guoxin Wang, Yongjun Bao, Guiguang Ding

Abstract: Recently, the scale of transformers has grown rapidly, which introduces considerable challenges in terms of training overhead and inference efficiency in the scope of task adaptation. Existing works, namely Parameter-Efficient Fine-Tuning (PEFT) and model compression, have separately investigated the challenges. However, PEFT cannot guarantee the inference efficiency of the original backbone, espe… ▽ More Recently, the scale of transformers has grown rapidly, which introduces considerable challenges in terms of training overhead and inference efficiency in the scope of task adaptation. Existing works, namely Parameter-Efficient Fine-Tuning (PEFT) and model compression, have separately investigated the challenges. However, PEFT cannot guarantee the inference efficiency of the original backbone, especially for large-scale models. Model compression requires significant training costs for structure searching and re-training. Consequently, a simple combination of them cannot guarantee accomplishing both training efficiency and inference efficiency with minimal costs. In this paper, we propose a novel Parallel Yielding Re-Activation (PYRA) method for such a challenge of training-inference efficient task adaptation. PYRA first utilizes parallel yielding adaptive weights to comprehensively perceive the data distribution in downstream tasks. A re-activation strategy for token modulation is then applied for tokens to be merged, leading to calibrated token features. Extensive experiments demonstrate that PYRA outperforms all competing methods under both low compression rate and high compression rate, demonstrating its effectiveness and superiority in maintaining both training efficiency and inference efficiency for large-scale foundation models. Our code is available at https://github.com/THU-MIG/PYRA. △ Less

Submitted 16 July, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Comments: 14 pages, 4 figures, Accepted by ECCV 2024

arXiv:2403.03310 [pdf, other]

Graph Learning for Parameter Prediction of Quantum Approximate Optimization Algorithm

Authors: Zhiding Liang, Gang Liu, Zheyuan Liu, Jinglei Cheng, Tianyi Hao, Kecheng Liu, Hang Ren, Zhixin Song, Ji Liu, Fanny Ye, Yiyu Shi

Abstract: In recent years, quantum computing has emerged as a transformative force in the field of combinatorial optimization, offering novel approaches to tackling complex problems that have long challenged classical computational methods. Among these, the Quantum Approximate Optimization Algorithm (QAOA) stands out for its potential to efficiently solve the Max-Cut problem, a quintessential example of com… ▽ More In recent years, quantum computing has emerged as a transformative force in the field of combinatorial optimization, offering novel approaches to tackling complex problems that have long challenged classical computational methods. Among these, the Quantum Approximate Optimization Algorithm (QAOA) stands out for its potential to efficiently solve the Max-Cut problem, a quintessential example of combinatorial optimization. However, practical application faces challenges due to current limitations on quantum computational resource. Our work optimizes QAOA initialization, using Graph Neural Networks (GNN) as a warm-start technique. This sacrifices affordable computational resource on classical computer to reduce quantum computational resource overhead, enhancing QAOA's effectiveness. Experiments with various GNN architectures demonstrate the adaptability and stability of our framework, highlighting the synergy between quantum algorithms and machine learning. Our findings show GNN's potential in improving QAOA performance, opening new avenues for hybrid quantum-classical approaches in quantum computing and contributing to practical applications. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2312.10813 [pdf, other]

Re-parameterized Low-rank Prompt: Generalize a Vision-Language Model within 0.5K Parameters

Authors: Tianxiang Hao, Mengyao Lyu, Hui Chen, Sicheng Zhao, Jungong Han, Guiguang Ding

Abstract: With the development of large pre-trained vision-language models, how to effectively transfer the knowledge of such foundational models to downstream tasks becomes a hot topic, especially in a data-deficient scenario. Recently, prompt tuning has become a popular solution. When adapting the vision-language models, researchers freeze the parameters in the backbone and only design and tune the prompt… ▽ More With the development of large pre-trained vision-language models, how to effectively transfer the knowledge of such foundational models to downstream tasks becomes a hot topic, especially in a data-deficient scenario. Recently, prompt tuning has become a popular solution. When adapting the vision-language models, researchers freeze the parameters in the backbone and only design and tune the prompts. On the one hand, the delicate design of prompt tuning exhibits strong performance. On the other hand, complicated structures and update rules largely increase the computation and storage cost. Motivated by the observation that the evolution pattern of the generalization capability in visual-language models aligns harmoniously with the trend of rank variations in the prompt matrix during adaptation, we design a new type of prompt, Re-parameterized Low-rank Prompt (RLP), for both efficient and effective adaptation. Our method could largely reduce the number of tunable parameters and storage space, which is quite beneficial in resource-limited scenarios. Extensive experiments further demonstrate the superiority of RLP. In particular, RLP shows comparable or even stronger performance than the latest state-of-the-art methods with an extremely small number of parameters. On a series of tasks over 11 datasets, RLP significantly increases the average downstream accuracy of classic prompt tuning by up to 5.25% using merely 0.5K parameters. △ Less

Submitted 11 January, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

arXiv:2311.14762 [pdf, other]

The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024

Authors: Benjamin Kiefer, Lojze Žust, Matej Kristan, Janez Perš, Matija Teršek, Arnold Wiliem, Martin Messmer, Cheng-Yen Yang, Hsiang-Wei Huang, Zhongyu Jiang, Heng-Cheng Kuo, Jie Mei, Jenq-Neng Hwang, Daniel Stadler, Lars Sommer, Kaer Huang, Aiguo Zheng, Weitu Chong, Kanokphan Lertniphonphan, Jun Xie, Feng Chen, Jian Li, Zhepeng Wang, Luca Zedda, Andrea Loddo , et al. (24 additional authors not shown)

Abstract: The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicles (USV). Three challenges categories are considered: (i) UAV-based Maritime Object Tracking with Re-identification, (ii) USV-based Maritime Obstacle Segmentation and Detection, (iii) USV-based Maritime Boat Tracking. The USV-based Maritime Obst… ▽ More The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicles (USV). Three challenges categories are considered: (i) UAV-based Maritime Object Tracking with Re-identification, (ii) USV-based Maritime Obstacle Segmentation and Detection, (iii) USV-based Maritime Boat Tracking. The USV-based Maritime Obstacle Segmentation and Detection features three sub-challenges, including a new embedded challenge addressing efficicent inference on real-world embedded devices. This report offers a comprehensive overview of the findings from the challenges. We provide both statistical and qualitative analyses, evaluating trends from over 195 submissions. All datasets, evaluation code, and the leaderboard are available to the public at https://macvi.org/workshop/macvi24. △ Less

Submitted 23 November, 2023; originally announced November 2023.

Comments: Part of 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 IEEE Xplore submission as part of WACV 2024

arXiv:2311.01743 [pdf, other]

doi 10.1109/LWC.2023.3327833

Energy Efficiency Optimization for Subterranean LoRaWAN Using A Reinforcement Learning Approach: A Direct-to-Satellite Scenario

Authors: Kaiqiang Lin, Muhammad Asad Ullah, Hirley Alves, Konstantin Mikhaylov, Tong Hao

Abstract: The integration of subterranean LoRaWAN and non-terrestrial networks (NTN) delivers substantial economic and societal benefits in remote agriculture and disaster rescue operations. The LoRa modulation leverages quasi-orthogonal spreading factors (SFs) to optimize data rates, airtime, coverage and energy consumption. However, it is still challenging to effectively assign SFs to end devices for mini… ▽ More The integration of subterranean LoRaWAN and non-terrestrial networks (NTN) delivers substantial economic and societal benefits in remote agriculture and disaster rescue operations. The LoRa modulation leverages quasi-orthogonal spreading factors (SFs) to optimize data rates, airtime, coverage and energy consumption. However, it is still challenging to effectively assign SFs to end devices for minimizing co-SF interference in massive subterranean LoRaWAN NTN. To address this, we investigate a reinforcement learning (RL)-based SFs allocation scheme to optimize the system's energy efficiency (EE). To efficiently capture the device-to-environment interactions in dense networks, we proposed an SFs allocation technique using the multi-agent dueling double deep Q-network (MAD3QN) and the multi-agent advantage actor-critic (MAA2C) algorithms based on an analytical reward mechanism. Our proposed RL-based SFs allocation approach evinces better performance compared to four benchmarks in the extreme underground direct-to-satellite scenario. Remarkably, MAD3QN shows promising potentials in surpassing MAA2C in terms of convergence rate and EE. △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: 5 pages, 6 figures, paper accepted for publication in IEEE Wireless Communications Letters

arXiv:2310.15106 [pdf, other]

Theoretical Analysis of the Radio Map Estimation Problem

Authors: Daniel Romero, Tien Ngoc Ha, Raju Shrestha, Massimo Franceschetti

Abstract: Radio maps provide radio frequency metrics, such as the received signal strength, at every location of a geographic area. These maps, which are estimated using a set of measurements collected at multiple positions, find a wide range of applications in wireless communications, including the prediction of coverage holes, network planning, resource allocation, and path planning for mobile robots. Alt… ▽ More Radio maps provide radio frequency metrics, such as the received signal strength, at every location of a geographic area. These maps, which are estimated using a set of measurements collected at multiple positions, find a wide range of applications in wireless communications, including the prediction of coverage holes, network planning, resource allocation, and path planning for mobile robots. Although a vast number of estimators have been proposed, the theoretical understanding of the radio map estimation (RME) problem has not been addressed. The present work aims at filling this gap along two directions. First, the complexity of the set of radio map functions is quantified by means of lower and upper bounds on their spatial variability, which offers valuable insight into the required spatial distribution of measurements and the estimators that can be used. Second, the reconstruction error for power maps in free space is upper bounded for three conventional spatial interpolators. The proximity coefficient, which is a decreasing function of the distance from the transmitters to the mapped region, is proposed to quantify the complexity of the RME problem. Numerical experiments assess the tightness of the obtained bounds and the validity of the main takeaways in complex environments. △ Less

Submitted 23 March, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.11043 [pdf, other]

Spoofing Attack Detection in the Physical Layer with Robustness to User Movement

Authors: Daniel Romero, Tien Ngoc Ha, Peter Gerstoft

Abstract: In a spoofing attack, an attacker impersonates a legitimate user to access or modify data belonging to the latter. Typical approaches for spoofing detection in the physical layer declare an attack when a change is observed in certain channel features, such as the received signal strength (RSS) measured by spatially distributed receivers. However, since channels change over time, for example due to… ▽ More In a spoofing attack, an attacker impersonates a legitimate user to access or modify data belonging to the latter. Typical approaches for spoofing detection in the physical layer declare an attack when a change is observed in certain channel features, such as the received signal strength (RSS) measured by spatially distributed receivers. However, since channels change over time, for example due to user movement, such approaches are impractical. To sidestep this limitation, this paper proposes a scheme that combines the decisions of a position-change detector based on a deep neural network to distinguish spoofing from movement. Building upon community detection on graphs, the sequence of received frames is partitioned into subsequences to detect concurrent transmissions from distinct locations. The scheme can be easily deployed in practice since it just involves collecting a small dataset of measurements at a few tens of locations that need not even be computed or recorded. The scheme is evaluated on real data collected for this purpose. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: WCNC. arXiv admin note: text overlap with arXiv:2211.04269

arXiv:2310.11036 [pdf, other]

Radio Map Estimation: Empirical Validation and Analysis

Authors: Raju Shrestha, Tien Ngoc Ha, Pham Q. Viet, Daniel Romero

Abstract: Radio maps quantify magnitudes such as the received signal strength at every location of a geographical region. Although the estimation of radio maps has attracted widespread interest, the vast majority of works rely on simulated data and, therefore, cannot establish the effectiveness and relative performance of existing algorithms in practice. To fill this gap, this paper presents the first compr… ▽ More Radio maps quantify magnitudes such as the received signal strength at every location of a geographical region. Although the estimation of radio maps has attracted widespread interest, the vast majority of works rely on simulated data and, therefore, cannot establish the effectiveness and relative performance of existing algorithms in practice. To fill this gap, this paper presents the first comprehensive and rigorous study of radio map estimation (RME) in the real world. The main features of the RME problem are analyzed and the capabilities of existing estimators are compared using large measurement datasets collected in this work. By studying four performance metrics, recent theoretical findings are empirically corroborated and a large number of conclusions are drawn. Remarkably, the estimation error is seen to be reasonably small even with few measurements, which establishes the viability of RME in practice. Besides, from extensive comparisons, it is concluded that estimators based on deep neural networks necessitate large volumes of training data to exhibit a significant advantage over more traditional methods. Combining both types of schemes is seen to result in a novel estimator that features the best performance in most situations. The acquired datasets are made publicly available to enable further studies. △ Less

Submitted 22 January, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: 13 pages, Journal version, submitted to the IEEE Transactions on Wireless Communications

arXiv:2309.12668 [pdf, other]

UWA360CAM: A 360$^{\circ}$ 24/7 Real-Time Streaming Camera System for Underwater Applications

Authors: Quan-Dung Pham, Yipeng Zhu, Tan-Sang Ha, K. H. Long Nguyen, Binh-Son Hua, Sai-Kit Yeung

Abstract: Omnidirectional camera is a cost-effective and information-rich sensor highly suitable for many marine applications and the ocean scientific community, encompassing several domains such as augmented reality, mapping, motion estimation, visual surveillance, and simultaneous localization and mapping. However, designing and constructing such a high-quality 360$^{\circ}$ real-time streaming camera sys… ▽ More Omnidirectional camera is a cost-effective and information-rich sensor highly suitable for many marine applications and the ocean scientific community, encompassing several domains such as augmented reality, mapping, motion estimation, visual surveillance, and simultaneous localization and mapping. However, designing and constructing such a high-quality 360$^{\circ}$ real-time streaming camera system for underwater applications is a challenging problem due to the technical complexity in several aspects including sensor resolution, wide field of view, power supply, optical design, system calibration, and overheating management. This paper presents a novel and comprehensive system that addresses the complexities associated with the design, construction, and implementation of a fully functional 360$^{\circ}$ real-time streaming camera system specifically tailored for underwater environments. Our proposed system, UWA360CAM, can stream video in real time, operate in 24/7, and capture 360$^{\circ}$ underwater panorama images. Notably, our work is the pioneering effort in providing a detailed and replicable account of this system. The experiments provide a comprehensive analysis of our proposed system. △ Less

Submitted 30 September, 2023; v1 submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.12025 [pdf, other]

Robust Approximation Algorithms for Non-monotone $k$-Submodular Maximization under a Knapsack Constraint

Authors: Dung T. K. Ha, Canh V. Pham, Tan D. Tran, Huan X. Hoang

Abstract: The problem of non-monotone $k$-submodular maximization under a knapsack constraint ($\kSMK$) over the ground set size $n$ has been raised in many applications in machine learning, such as data summarization, information propagation, etc. However, existing algorithms for the problem are facing questioning of how to overcome the non-monotone case and how to fast return a good solution in case of th… ▽ More The problem of non-monotone $k$-submodular maximization under a knapsack constraint ($\kSMK$) over the ground set size $n$ has been raised in many applications in machine learning, such as data summarization, information propagation, etc. However, existing algorithms for the problem are facing questioning of how to overcome the non-monotone case and how to fast return a good solution in case of the big size of data. This paper introduces two deterministic approximation algorithms for the problem that competitively improve the query complexity of existing algorithms. Our first algorithm, $\LAA$, returns an approximation ratio of $1/19$ within $O(nk)$ query complexity. The second one, $\RLA$, improves the approximation ratio to $1/5-ε$ in $O(nk)$ queries, where $ε$ is an input parameter. Our algorithms are the first ones that provide constant approximation ratios within only $O(nk)$ query complexity for the non-monotone objective. They, therefore, need fewer the number of queries than state-of-the-the-art ones by a factor of $Ω(\log n)$. Besides the theoretical analysis, we have evaluated our proposed ones with several experiments in some instances: Influence Maximization and Sensor Placement for the problem. The results confirm that our algorithms ensure theoretical quality as the cutting-edge techniques and significantly reduce the number of queries. △ Less

Submitted 21 September, 2023; originally announced September 2023.

Comments: 12 pages

Report number: KSE-ID38

arXiv:2308.03213 [pdf, other]

doi 10.1145/3579371.3589044

Enabling High Performance Debugging for Variational Quantum Algorithms using Compressed Sensing

Authors: Kun Liu, Tianyi Hao, Swamit Tannu

Abstract: Variational quantum algorithms (VQAs) can potentially solve practical problems using contemporary Noisy Intermediate Scale Quantum (NISQ) computers. VQAs find near-optimal solutions in the presence of qubit errors by classically optimizing a loss function computed by parameterized quantum circuits. However, developing and testing VQAs is challenging due to the limited availability of quantum hardw… ▽ More Variational quantum algorithms (VQAs) can potentially solve practical problems using contemporary Noisy Intermediate Scale Quantum (NISQ) computers. VQAs find near-optimal solutions in the presence of qubit errors by classically optimizing a loss function computed by parameterized quantum circuits. However, developing and testing VQAs is challenging due to the limited availability of quantum hardware, their high error rates, and the significant overhead of classical simulations. Furthermore, VQA researchers must pick the right initialization for circuit parameters, utilize suitable classical optimizer configurations, and deploy appropriate error mitigation methods. Unfortunately, these tasks are done in an ad-hoc manner today, as there are no software tools to configure and tune the VQA hyperparameters. In this paper, we present OSCAR (cOmpressed Sensing based Cost lAndscape Reconstruction) to help configure: 1) correct initialization, 2) noise mitigation techniques, and 3) classical optimizers to maximize the quality of the solution on NISQ hardware. OSCAR enables efficient debugging and performance tuning by providing users with the loss function landscape without running thousands of quantum circuits as required by the grid search. Using OSCAR, we can accurately reconstruct the complete cost landscape with up to 100X speedup. Furthermore, OSCAR can compute an optimizer function query in an instant by interpolating a computed landscape, thus enabling the trial run of a VQA configuration with considerably reduced overhead. △ Less

Submitted 6 August, 2023; originally announced August 2023.

Comments: 13 pages, 13 figures. KL and TH contributed equally to this work

Journal ref: In Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA '23), June 17-21, 2023, Orlando, FL, USA. Association for Computing Machinery, New York, NY, USA, Article 9, 1-13

arXiv:2307.01676 [pdf, other]

RaidEnv: Exploring New Challenges in Automated Content Balancing for Boss Raid Games

Authors: Hyeon-Chang Jeon, In-Chang Baek, Cheong-mok Bae, Taehwa Park, Wonsang You, Taegwan Ha, Hoyun Jung, Jinha Noh, Seungwon Oh, Kyung-Joong Kim

Abstract: The balance of game content significantly impacts the gaming experience. Unbalanced game content diminishes engagement or increases frustration because of repetitive failure. Although game designers intend to adjust the difficulty of game content, this is a repetitive, labor-intensive, and challenging process, especially for commercial-level games with extensive content. To address this issue, the… ▽ More The balance of game content significantly impacts the gaming experience. Unbalanced game content diminishes engagement or increases frustration because of repetitive failure. Although game designers intend to adjust the difficulty of game content, this is a repetitive, labor-intensive, and challenging process, especially for commercial-level games with extensive content. To address this issue, the game research community has explored automated game balancing using artificial intelligence (AI) techniques. However, previous studies have focused on limited game content and did not consider the importance of the generalization ability of playtesting agents when encountering content changes. In this study, we propose RaidEnv, a new game simulator that includes diverse and customizable content for the boss raid scenario in MMORPG games. Additionally, we design two benchmarks for the boss raid scenario that can aid in the practical application of game AI. These benchmarks address two open problems in automatic content balancing, and we introduce two evaluation metrics to provide guidance for AI in automatic content balancing. This novel game research platform expands the frontiers of automatic game balancing problems and offers a framework within a realistic game production pipeline. △ Less

Submitted 4 July, 2023; originally announced July 2023.

Comments: 14 pages, 6 figures, 6 tables, 2 algorithms

arXiv:2306.17181 [pdf, other]

Unsupervised Text Embedding Space Generation Using Generative Adversarial Networks for Text Synthesis

Authors: Jun-Min Lee, Tae-Bin Ha

Abstract: Generative Adversarial Networks (GAN) is a model for data synthesis, which creates plausible data through the competition of generator and discriminator. Although GAN application to image synthesis is extensively studied, it has inherent limitations to natural language generation. Because natural language is composed of discrete tokens, a generator has difficulty updating its gradient through back… ▽ More Generative Adversarial Networks (GAN) is a model for data synthesis, which creates plausible data through the competition of generator and discriminator. Although GAN application to image synthesis is extensively studied, it has inherent limitations to natural language generation. Because natural language is composed of discrete tokens, a generator has difficulty updating its gradient through backpropagation; therefore, most text-GAN studies generate sentences starting with a random token based on a reward system. Thus, the generators of previous studies are pre-trained in an autoregressive way before adversarial training, causing data memorization that synthesized sentences reproduce the training data. In this paper, we synthesize sentences using a framework similar to the original GAN. More specifically, we propose Text Embedding Space Generative Adversarial Networks (TESGAN) which generate continuous text embedding spaces instead of discrete tokens to solve the gradient backpropagation problem. Furthermore, TESGAN conducts unsupervised learning which does not directly refer to the text of the training data to overcome the data memorization issue. By adopting this novel method, TESGAN can synthesize new sentences, showing the potential of unsupervised learning for text synthesis. We expect to see extended research combining Large Language Models with a new perspective of viewing text as an continuous space. △ Less

Submitted 17 October, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

Comments: NEJLT accpeted

arXiv:2306.04593 [pdf, other]

MarineVRS: Marine Video Retrieval System with Explainability via Semantic Understanding

Authors: Tan-Sang Ha, Hai Nguyen-Truong, Tuan-Anh Vu, Sai-Kit Yeung

Abstract: Building a video retrieval system that is robust and reliable, especially for the marine environment, is a challenging task due to several factors such as dealing with massive amounts of dense and repetitive data, occlusion, blurriness, low lighting conditions, and abstract queries. To address these challenges, we present MarineVRS, a novel and flexible video retrieval system designed explicitly f… ▽ More Building a video retrieval system that is robust and reliable, especially for the marine environment, is a challenging task due to several factors such as dealing with massive amounts of dense and repetitive data, occlusion, blurriness, low lighting conditions, and abstract queries. To address these challenges, we present MarineVRS, a novel and flexible video retrieval system designed explicitly for the marine domain. MarineVRS integrates state-of-the-art methods for visual and linguistic object representation to enable efficient and accurate search and analysis of vast volumes of underwater video data. In addition, unlike the conventional video retrieval system, which only permits users to index a collection of images or videos and search using a free-form natural language sentence, our retrieval system includes an additional Explainability module that outputs the segmentation masks of the objects that the input query referred to. This feature allows users to identify and isolate specific objects in the video footage, leading to more detailed analysis and understanding of their behavior and movements. Finally, with its adaptability, explainability, accuracy, and scalability, MarineVRS is a powerful tool for marine researchers and scientists to efficiently and accurately process vast amounts of data and gain deeper insights into the behavior and movements of marine species. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: Accepted to OCEANS 2023 Limerick. Website: https://marinevrs.hkustvgd.com/

arXiv:2305.10292 [pdf, other]

Linear Query Approximation Algorithms for Non-monotone Submodular Maximization under Knapsack Constraint

Authors: Canh V. Pham, Tan D. Tran, Dung T. K. Ha, My T. Thai

Abstract: This work, for the first time, introduces two constant factor approximation algorithms with linear query complexity for non-monotone submodular maximization over a ground set of size $n$ subject to a knapsack constraint, $\mathsf{DLA}$ and $\mathsf{RLA}$. $\mathsf{DLA}$ is a deterministic algorithm that provides an approximation factor of $6+ε$ while $\mathsf{RLA}$ is a randomized algorithm with a… ▽ More This work, for the first time, introduces two constant factor approximation algorithms with linear query complexity for non-monotone submodular maximization over a ground set of size $n$ subject to a knapsack constraint, $\mathsf{DLA}$ and $\mathsf{RLA}$. $\mathsf{DLA}$ is a deterministic algorithm that provides an approximation factor of $6+ε$ while $\mathsf{RLA}$ is a randomized algorithm with an approximation factor of $4+ε$. Both run in $O(n \log(1/ε)/ε)$ query complexity. The key idea to obtain a constant approximation ratio with linear query lies in: (1) dividing the ground set into two appropriate subsets to find the near-optimal solution over these subsets with linear queries, and (2) combining a threshold greedy with properties of two disjoint sets or a random selection process to improve solution quality. In addition to the theoretical analysis, we have evaluated our proposed solutions with three applications: Revenue Maximization, Image Summarization, and Maximum Weighted Cut, showing that our algorithms not only return comparative results to state-of-the-art algorithms but also require significantly fewer queries. △ Less

Submitted 10 July, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

arXiv:2305.09296 [pdf, other]

On CSI-Free Multi-Antenna Schemes for Massive Wireless-Powered Underground Sensor Networks

Authors: Kaiqiang Lin, Onel Luis Alcaraz López, Hirley Alves, Tong Hao

Abstract: Radio-frequency wireless energy transfer (WET) is a promising technology to realize wireless-powered underground sensor networks (WPUSNs) and enable sustainable underground monitoring. However, due to the severe attenuation in harsh underground soil and the tight energy budget of the underground sensors, traditional WPUSNs relying on the channel state information (CSI) are highly inefficient, espe… ▽ More Radio-frequency wireless energy transfer (WET) is a promising technology to realize wireless-powered underground sensor networks (WPUSNs) and enable sustainable underground monitoring. However, due to the severe attenuation in harsh underground soil and the tight energy budget of the underground sensors, traditional WPUSNs relying on the channel state information (CSI) are highly inefficient, especially in massive WET scenarios. To address this challenge, we comparatively assess the feasibility of several state-of-the-art CSI-free multi-antenna WET schemes for WPUSNs, under a given power budget. Moreover, to overcome the extremely low WET efficiency in underground channels, we propose a distributed CSI-free system, where multiple power beacons (PBs) simultaneously charge a large set of underground sensors without any CSI. We consider the position-aware K-Means and the position-agnostic equally-far-from-center (EFFC) approaches for the optimal deployment of the PBs. Our results evince that the performance of the proposed distributed CSI-free system can approach or even surpass that of a traditional full-CSI WET strategy, especially when adopting an appropriate CSI-free scheme, applying the advisable PBs deployment approach, and equipping the PBs with an appropriate number of antennas. Finally, we discuss the impact of underground parameters, i.e., the burial depth of devices and the volumetric water content of soil, on the system's performance, and identify potential challenges and research opportunities for practical distributed CSI-free WPUSNs deployment. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: 13 pages, 10 figures, paper accepted for publication in IEEE Internet of Things Journal

arXiv:2305.00603 [pdf, other]

Consolidator: Mergeable Adapter with Grouped Connections for Visual Adaptation

Authors: Tianxiang Hao, Hui Chen, Yuchen Guo, Guiguang Ding

Abstract: Recently, transformers have shown strong ability as visual feature extractors, surpassing traditional convolution-based models in various scenarios. However, the success of vision transformers largely owes to their capacity to accommodate numerous parameters. As a result, new challenges for adapting large models to downstream tasks arise. On the one hand, classic fine-tuning tunes all parameters i… ▽ More Recently, transformers have shown strong ability as visual feature extractors, surpassing traditional convolution-based models in various scenarios. However, the success of vision transformers largely owes to their capacity to accommodate numerous parameters. As a result, new challenges for adapting large models to downstream tasks arise. On the one hand, classic fine-tuning tunes all parameters in a huge model for every task and thus easily falls into overfitting, leading to inferior performance. On the other hand, on resource-limited devices, fine-tuning stores a full copy of parameters and thus is usually impracticable for the shortage of storage space. However, few works have focused on how to efficiently and effectively transfer knowledge in a vision transformer. Existing methods did not dive into the properties of visual features, leading to inferior performance. Moreover, some of them bring heavy inference cost though benefiting storage. To tackle these problems, we propose consolidator to modify the pre-trained model with the addition of a small set of tunable parameters to temporarily store the task-specific knowledge while freezing the backbone model. Motivated by the success of group-wise convolution, we adopt grouped connections across the features extracted by fully connected layers to construct tunable parts in a consolidator. To further enhance the model's capacity to transfer knowledge under a constrained storage budget and keep inference efficient, we consolidate the parameters in two stages: 1. between adaptation and storage, and 2. between loading and inference. On a series of downstream visual tasks, our consolidator can reach up to 7.56 better accuracy than full fine-tuning with merely 0.35% parameters, and outperform state-of-the-art parameter-efficient tuning methods by a clear margin. Code is available at https://github.com/beyondhtx/Consolidator. △ Less

Submitted 30 April, 2023; originally announced May 2023.

Comments: ICLR 2023

arXiv:2301.13513 [pdf, other]

Privacy Preserving Ultra-Short-term Wind Power Prediction Based on Secure Multi Party Computation

Authors: Hang Fan, Xiaoyu Fan, Tianyi Hao, Wei Wei, Kun Chen, Guosai Wang, Xiaofeng Jia, Yidong Li, Wei Xu

Abstract: Mining the spatial and temporal correlation of wind farm output data is beneficial for enhancing the precision of ultra-short-term wind power prediction. However, if the wind farms are owned by separate entities, they may be reluctant to share their data directly due to privacy concerns as well as business management regulation policies. Although cryptographic approaches have been designed to prot… ▽ More Mining the spatial and temporal correlation of wind farm output data is beneficial for enhancing the precision of ultra-short-term wind power prediction. However, if the wind farms are owned by separate entities, they may be reluctant to share their data directly due to privacy concerns as well as business management regulation policies. Although cryptographic approaches have been designed to protect privacy in the process of data sharing, it is still a challenging problem to encrypt the original data while extracting the nonlinear relationship among multiple wind farms in the machine learning process. This paper presents pwXGBoost, a technique based on the machine learning tree model and secure multi-party computation (SMPC) that can successfully extract complicated relationships while preserving data privacy. A maximum mean discrepancy (MMD) based scheme is proposed to effectively choose adjacent candidate wind farms to participate in the collaborative model training, therefore improving the accuracy and reducing the burden of data acquisition. The proposed method was evaluated on real world data collected from a cluster of wind farms in Inner Mongolia, China, demonstrating that it is capable of achieving considerable efficiency and performance improvements while preserving privacy △ Less

Submitted 31 January, 2023; originally announced January 2023.

arXiv:2301.05514 [pdf, other]

doi 10.57717/cgt.v3i2.49

Primal-Dual Cops and Robber

Authors: Minh Tuan Ha, Paul Jungeblut, Torsten Ueckerdt, Paweł Żyliński

Abstract: Cops and Robber is a family of two-player games played on graphs in which one player controls a number of cops and the other player controls a robber. In alternating turns, each player moves (all) their figures. The cops try to capture the robber while the latter tries to flee indefinitely. In this paper we consider a variant of the game played on a planar graph where the robber moves between adja… ▽ More Cops and Robber is a family of two-player games played on graphs in which one player controls a number of cops and the other player controls a robber. In alternating turns, each player moves (all) their figures. The cops try to capture the robber while the latter tries to flee indefinitely. In this paper we consider a variant of the game played on a planar graph where the robber moves between adjacent vertices while the cops move between adjacent faces. The cops capture the robber if they occupy all incident faces. We prove that a constant number of cops suffices to capture the robber on any planar graph of maximum degree $Δ$ if and only if $Δ\leq 4$. △ Less

Submitted 10 January, 2024; v1 submitted 13 January, 2023; originally announced January 2023.

Comments: Equal to the published version

Journal ref: Computing in Geometry and Topology, 3(2), 4:1-4:12 (2024)

arXiv:2211.07016 [pdf, other]

doi 10.1109/qcs56647.2022.00017

Exploiting In-Constraint Energy in Constrained Variational Quantum Optimization

Authors: Tianyi Hao, Ruslan Shaydulin, Marco Pistoia, Jeffrey Larson

Abstract: A central challenge of applying near-term quantum optimization algorithms to industrially relevant problems is the need to incorporate complex constraints. In general, such constraints cannot be easily encoded in the circuit, and the quantum circuit measurement outcomes are not guaranteed to respect the constraints. Therefore, the optimization must trade off the in-constraint probability and the q… ▽ More A central challenge of applying near-term quantum optimization algorithms to industrially relevant problems is the need to incorporate complex constraints. In general, such constraints cannot be easily encoded in the circuit, and the quantum circuit measurement outcomes are not guaranteed to respect the constraints. Therefore, the optimization must trade off the in-constraint probability and the quality of the in-constraint solution by adding a penalty for constraint violation into the objective. We propose a new approach for solving constrained optimization problems with unconstrained, easy-to-implement quantum ansatze. Our method leverages the in-constraint energy as the objective and adds a lower-bound constraint on the in-constraint probability to the optimizer. We demonstrate significant gains in solution quality over directly optimizing the penalized energy. We implement our method in QVoice, a Python package that interfaces with Qiskit for quick prototyping in simulators and on quantum hardware. △ Less

Submitted 13 November, 2022; originally announced November 2022.

Journal ref: In Proceedings of the Third International Workshop on Quantum Computing Software (in conjunction with SC22), 2022

arXiv:2210.15136 [pdf, ps, other]

3D Shape Knowledge Graph for Cross-domain 3D Shape Retrieval

Authors: Rihao Chang, Yongtao Ma, Tong Hao, Weizhi Nie

Abstract: The surge in 3D modeling has led to a pronounced research emphasis on the field of 3D shape retrieval. Numerous contemporary approaches have been put forth to tackle this intricate challenge. Nevertheless, effectively addressing the intricacies of cross-modal 3D shape retrieval remains a formidable undertaking, owing to inherent modality-based disparities. This study presents an innovative notion,… ▽ More The surge in 3D modeling has led to a pronounced research emphasis on the field of 3D shape retrieval. Numerous contemporary approaches have been put forth to tackle this intricate challenge. Nevertheless, effectively addressing the intricacies of cross-modal 3D shape retrieval remains a formidable undertaking, owing to inherent modality-based disparities. This study presents an innovative notion, termed "geometric words", which functions as elemental constituents for representing entities through combinations. To establish the knowledge graph, we employ geometric words as nodes, connecting them via shape categories and geometry attributes. Subsequently, we devise a unique graph embedding method for knowledge acquisition. Finally, an effective similarity measure is introduced for retrieval purposes. Importantly, each 3D or 2D entity can anchor its geometric terms within the knowledge graph, thereby serving as a link between cross-domain data. As a result, our approach facilitates multiple cross-domain 3D shape retrieval tasks. We evaluate the proposed method's performance on the ModelNet40 and ShapeNetCore55 datasets, encompassing scenarios related to 3D shape retrieval and cross-domain retrieval. Furthermore, we employ the established cross-modal dataset (MI3DOR) to assess cross-modal 3D shape retrieval. The resulting experimental outcomes, in conjunction with comparisons against state-of-the-art techniques, clearly highlight the superiority of our approach. △ Less

Submitted 21 December, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

arXiv:2209.11518 [pdf, other]

Marine Video Kit: A New Marine Video Dataset for Content-based Analysis and Retrieval

Authors: Quang-Trung Truong, Tuan-Anh Vu, Tan-Sang Ha, Lokoc Jakub, Yue Him Wong Tim, Ajay Joneja, Sai-Kit Yeung

Abstract: Effective analysis of unusual domain specific video collections represents an important practical problem, where state-of-the-art general purpose models still face limitations. Hence, it is desirable to design benchmark datasets that challenge novel powerful models for specific domains with additional constraints. It is important to remember that domain specific data may be noisier (e.g., endoscop… ▽ More Effective analysis of unusual domain specific video collections represents an important practical problem, where state-of-the-art general purpose models still face limitations. Hence, it is desirable to design benchmark datasets that challenge novel powerful models for specific domains with additional constraints. It is important to remember that domain specific data may be noisier (e.g., endoscopic or underwater videos) and often require more experienced users for effective search. In this paper, we focus on single-shot videos taken from moving cameras in underwater environments, which constitute a nontrivial challenge for research purposes. The first shard of a new Marine Video Kit dataset is presented to serve for video retrieval and other computer vision challenges. Our dataset is used in a special session during Video Browser Showdown 2023. In addition to basic meta-data statistics, we present several insights based on low-level features as well as semantic annotations of selected keyframes. The analysis also contains experiments showing limitations of respected general purpose models for retrieval. Our dataset and code are publicly available at https://hkust-vgd.github.io/marinevideokit. △ Less

Submitted 6 December, 2022; v1 submitted 23 September, 2022; originally announced September 2022.

Comments: Camera Ready for MMM 2023, Bergen, Norway

arXiv:2208.04011 [pdf, other]

doi 10.1016/j.image.2021.116601

Information Extraction from Scanned Invoice Images using Text Analysis and Layout Features

Authors: Hien Thi Ha, Aleš Horák

Abstract: While storing invoice content as metadata to avoid paper document processing may be the future trend, almost all of daily issued invoices are still printed on paper or generated in digital formats such as PDFs. In this paper, we introduce the OCRMiner system for information extraction from scanned document images which is based on text analysis techniques in combination with layout features to ext… ▽ More While storing invoice content as metadata to avoid paper document processing may be the future trend, almost all of daily issued invoices are still printed on paper or generated in digital formats such as PDFs. In this paper, we introduce the OCRMiner system for information extraction from scanned document images which is based on text analysis techniques in combination with layout features to extract indexing metadata of (semi-)structured documents. The system is designed to process the document in a similar way a human reader uses, i.e. to employ different layout and text attributes in a coordinated decision. The system consists of a set of interconnected modules that start with (possibly erroneous) character-based output from a standard OCR system and allow to apply different techniques and to expand the extracted knowledge at each step. Using an open source OCR, the system is able to recover the invoice data in 90% for English and in 88% for the Czech set. △ Less

Submitted 8 August, 2022; originally announced August 2022.

Comments: This is an author preprint of the article published by Elsevier in Signal Processing: Image Communication at https://doi.org/10.1016/j.image.2021.116601

Journal ref: Signal Processing: Image Communication 102 (2022)

arXiv:2205.05918 [pdf, other]

Fall detection using multimodal data

Authors: Thao V. Ha, Hoang Nguyen, Son T. Huynh, Trung T. Nguyen, Binh T. Nguyen

Abstract: In recent years, the occurrence of falls has increased and has had detrimental effects on older adults. Therefore, various machine learning approaches and datasets have been introduced to construct an efficient fall detection algorithm for the social community. This paper studies the fall detection problem based on a large public dataset, namely the UP-Fall Detection Dataset. This dataset was coll… ▽ More In recent years, the occurrence of falls has increased and has had detrimental effects on older adults. Therefore, various machine learning approaches and datasets have been introduced to construct an efficient fall detection algorithm for the social community. This paper studies the fall detection problem based on a large public dataset, namely the UP-Fall Detection Dataset. This dataset was collected from a dozen of volunteers using different sensors and two cameras. We propose several techniques to obtain valuable features from these sensors and cameras and then construct suitable models for the main problem. The experimental results show that our proposed methods can bypass the state-of-the-art methods on this dataset in terms of accuracy, precision, recall, and F1 score. △ Less

Submitted 12 May, 2022; originally announced May 2022.

Comments: 12 pages, 5 figures, 6 tables

arXiv:2203.15246 [pdf, other]

doi 10.3389/fphy.2022.906590

A quantum-inspired tensor network method for constrained combinatorial optimization problems

Authors: Tianyi Hao, Xuxin Huang, Chunjing Jia, Cheng Peng

Abstract: Combinatorial optimization is of general interest for both theoretical study and real-world applications. Fast-developing quantum algorithms provide a different perspective on solving combinatorial optimization problems. In this paper, we propose a quantum-inspired tensor-network-based algorithm for general locally constrained combinatorial optimization problems. Our algorithm constructs a Hamilto… ▽ More Combinatorial optimization is of general interest for both theoretical study and real-world applications. Fast-developing quantum algorithms provide a different perspective on solving combinatorial optimization problems. In this paper, we propose a quantum-inspired tensor-network-based algorithm for general locally constrained combinatorial optimization problems. Our algorithm constructs a Hamiltonian for the problem of interest, effectively mapping it to a quantum problem, then encodes the constraints directly into a tensor network state and solves the optimal solution by evolving the system to the ground state of the Hamiltonian. We demonstrate our algorithm with the open-pit mining problem, which results in a quadratic asymptotic time complexity. Our numerical results show the effectiveness of this construction and potential applications in further studies for general combinatorial optimization problems. △ Less

Submitted 5 September, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

Journal ref: Frontiers in Physics, Volume 10, Article 906590 (2022)

arXiv:2111.11707 [pdf, other]

Deps-SAN: Neural Machine Translation with Dependency-Scaled Self-Attention Network

Authors: Ru Peng, Nankai Lin, Yi Fang, Shengyi Jiang, Tianyong Hao, Boyu Chen, Junbo Zhao

Abstract: Syntax knowledge contributes its powerful strength in Neural machine translation (NMT) tasks. Early NMT works supposed that syntax details can be automatically learned from numerous texts via attention networks. However, succeeding researches pointed out that limited by the uncontrolled nature of attention computation, the NMT model requires an external syntax to capture the deep syntactic awarene… ▽ More Syntax knowledge contributes its powerful strength in Neural machine translation (NMT) tasks. Early NMT works supposed that syntax details can be automatically learned from numerous texts via attention networks. However, succeeding researches pointed out that limited by the uncontrolled nature of attention computation, the NMT model requires an external syntax to capture the deep syntactic awareness. Although existing syntax-aware NMT methods have born great fruits in combining syntax, the additional workloads they introduced render the model heavy and slow. Particularly, these efforts scarcely involve the Transformer-based NMT and modify its core self-attention network (SAN). To this end, we propose a parameter-free, Dependency-scaled Self-Attention Network (Deps-SAN) for syntax-aware Transformer-based NMT. A quantified matrix of dependency closeness between tokens is constructed to impose explicit syntactic constraints into the SAN for learning syntactic details and dispelling the dispersion of attention distributions. Two knowledge sparsing techniques are further integrated to avoid the model overfitting the dependency noises introduced by the external parser. Experiments and analyses on IWSLT14 German-to-English and WMT16 German-to-English benchmark NMT tasks verify the effectiveness of our approach. △ Less

Submitted 4 October, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

arXiv:2110.07511 [pdf, other]

doi 10.1109/TIP.2022.3216772

Contrastive Proposal Extension with LSTM Network for Weakly Supervised Object Detection

Authors: Pei Lv, Suqi Hu, Tianran Hao

Abstract: Weakly supervised object detection (WSOD) has attracted more and more attention since it only uses image-level labels and can save huge annotation costs. Most of the WSOD methods use Multiple Instance Learning (MIL) as their basic framework, which regard it as an instance classification problem. However, these methods based on MIL tends to converge only on the most discriminate regions of differen… ▽ More Weakly supervised object detection (WSOD) has attracted more and more attention since it only uses image-level labels and can save huge annotation costs. Most of the WSOD methods use Multiple Instance Learning (MIL) as their basic framework, which regard it as an instance classification problem. However, these methods based on MIL tends to converge only on the most discriminate regions of different instances, rather than their corresponding complete regions, that is, insufficient integrity. Inspired by the habit of observing things by the human, we propose a new method by comparing the initial proposals and the extension ones to optimize those initial proposals. Specifically, we propose one new strategy for WSOD by involving contrastive proposal extension (CPE), which consists of multiple directional contrastive proposal extensions (D-CPE), and each D-CPE contains encoders based on LSTM network and corresponding decoders. Firstly, the boundary of initial proposals in MIL is extended to different positions according to well-designed sequential order. Then, CPE compares the extended proposal and the initial proposal by extracting the feature semantics of them using the encoders, and calculates the integrity of the initial proposal to optimize the score of the initial proposal. These contrastive contextual semantics will guide the basic WSOD to suppress bad proposals and improve the scores of good ones. In addition, a simple two-stream network is designed as the decoder to constrain the temporal coding of LSTM and improve the performance of WSOD further. Experiments on PASCAL VOC 2007, VOC 2012 and MS-COCO datasets show that our method has achieved the state-of-the-art results. △ Less

Submitted 19 October, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

Comments: 15 pages,12 figures, accepted to IEEE Transactions on Image Processing

arXiv:2109.08863

Streaming algorithms for Budgeted $k$-Submodular Maximization problem

Authors: Canh V. Pham, Quang C. Vu, Dung K. T. Ha, Tai T. Nguyen

Abstract: Stimulated by practical applications arising from viral marketing. This paper investigates a novel Budgeted $k$-Submodular Maximization problem defined as follows: Given a finite set $V$, a budget $B$ and a $k$-submodular function $f: (k+1)^V \mapsto \mathbb{R}_+$, the problem asks to find a solution $\s=(S_1, S_2, \ldots, S_k)$, each element $e \in V$ has a cost $c_i(e)$ to be put into $i$-th set… ▽ More Stimulated by practical applications arising from viral marketing. This paper investigates a novel Budgeted $k$-Submodular Maximization problem defined as follows: Given a finite set $V$, a budget $B$ and a $k$-submodular function $f: (k+1)^V \mapsto \mathbb{R}_+$, the problem asks to find a solution $\s=(S_1, S_2, \ldots, S_k)$, each element $e \in V$ has a cost $c_i(e)$ to be put into $i$-th set $S_i$, with the total cost of $s$ does not exceed $B$ so that $f(\s)$ is maximized. To address this problem, we propose two streaming algorithms that provide approximation guarantees for the problem. In particular, in the case of each element $e$ has the same cost for all $i$-th sets, we propose a deterministic streaming algorithm which provides an approximation ratio of $\frac{1}{4}-ε$ when $f$ is monotone and $\frac{1}{5}-ε$ when $f$ is non-monotone. For the general case, we propose a random streaming algorithm that provides an approximation ratio of $\min\{\fracα{2}, \frac{(1-α)k}{(1+β)k-β} \}-ε$ when $f$ is monotone and $\min\{\fracα{2}, \frac{(1-α)k}{(1+2β)k-2β} \}-ε$ when $f$ is non-monotone in expectation, where $β=\max_{e\in V, i , j \in [k], i\neq j} \frac{c_i(e)}{c_j(e)}$ and $ε, α$ are fixed inputs. △ Less

Submitted 22 October, 2021; v1 submitted 18 September, 2021; originally announced September 2021.

Comments: There are some results of the article that need to be corrected

arXiv:2109.04004 [pdf, ps, other]

OpenClinicalAI: enabling AI to diagnose diseases in real-world clinical settings

Authors: Yunyou Huang, Nana Wang, Suqin Tang, Li Ma, Tianshu Hao, Zihan Jiang, Fan Zhang, Guoxin Kang, Xiuxia Miao, Xianglong Guan, Ruchang Zhang, Zhifei Zhang, Jianfeng Zhan

Abstract: This paper quantitatively reveals the state-of-the-art and state-of-the-practice AI systems only achieve acceptable performance on the stringent conditions that all categories of subjects are known, which we call closed clinical settings, but fail to work in real-world clinical settings. Compared to the diagnosis task in the closed setting, real-world clinical settings pose severe challenges, and… ▽ More This paper quantitatively reveals the state-of-the-art and state-of-the-practice AI systems only achieve acceptable performance on the stringent conditions that all categories of subjects are known, which we call closed clinical settings, but fail to work in real-world clinical settings. Compared to the diagnosis task in the closed setting, real-world clinical settings pose severe challenges, and we must treat them differently. We build a clinical AI benchmark named Clinical AIBench to set up real-world clinical settings to facilitate researches. We propose an open, dynamic machine learning framework and develop an AI system named OpenClinicalAI to diagnose diseases in real-world clinical settings. The first versions of Clinical AIBench and OpenClinicalAI target Alzheimer's disease. In the real-world clinical setting, OpenClinicalAI significantly outperforms the state-of-the-art AI system. In addition, OpenClinicalAI develops personalized diagnosis strategies to avoid unnecessary testing and seamlessly collaborates with clinicians. It is promising to be embedded in the current medical systems to improve medical services. △ Less

Submitted 8 September, 2021; originally announced September 2021.

arXiv:2107.14444 [pdf, other]

Manipulating Identical Filter Redundancy for Efficient Pruning on Deep and Complicated CNN

Authors: Xiaohan Ding, Tianxiang Hao, Jungong Han, Yuchen Guo, Guiguang Ding

Abstract: The existence of redundancy in Convolutional Neural Networks (CNNs) enables us to remove some filters/channels with acceptable performance drops. However, the training objective of CNNs usually tends to minimize an accuracy-related loss function without any attention paid to the redundancy, making the redundancy distribute randomly on all the filters, such that removing any of them may trigger inf… ▽ More The existence of redundancy in Convolutional Neural Networks (CNNs) enables us to remove some filters/channels with acceptable performance drops. However, the training objective of CNNs usually tends to minimize an accuracy-related loss function without any attention paid to the redundancy, making the redundancy distribute randomly on all the filters, such that removing any of them may trigger information loss and accuracy drop, necessitating a following finetuning step for recovery. In this paper, we propose to manipulate the redundancy during training to facilitate network pruning. To this end, we propose a novel Centripetal SGD (C-SGD) to make some filters identical, resulting in ideal redundancy patterns, as such filters become purely redundant due to their duplicates; hence removing them does not harm the network. As shown on CIFAR and ImageNet, C-SGD delivers better performance because the redundancy is better organized, compared to the existing methods. The efficiency also characterizes C-SGD because it is as fast as regular SGD, requires no finetuning, and can be conducted simultaneously on all the layers even in very deep CNNs. Besides, C-SGD can improve the accuracy of CNNs by first training a model with the same architecture but wider layers then squeezing it into the original width. △ Less

Submitted 30 July, 2021; originally announced July 2021.

Comments: Extension of the CVPR-2019 paper (https://openaccess.thecvf.com/content_CVPR_2019/papers/Ding_Centripetal_SGD_for_Pruning_Very_Deep_Convolutional_Networks_With_Complicated_CVPR_2019_paper.pdf). arXiv admin note: substantial text overlap with arXiv:1904.03837

arXiv:2104.04650 [pdf, other]

Towards Automated and Marker-less Parkinson Disease Assessment: Predicting UPDRS Scores using Sit-stand videos

Authors: Deval Mehta, Umar Asif, Tian Hao, Erhan Bilal, Stefan Von Cavallar, Stefan Harrer, Jeffrey Rogers

Abstract: This paper presents a novel deep learning enabled, video based analysis framework for assessing the Unified Parkinsons Disease Rating Scale (UPDRS) that can be used in the clinic or at home. We report results from comparing the performance of the framework to that of trained clinicians on a population of 32 Parkinsons disease (PD) patients. In-person clinical assessments by trained neurologists ar… ▽ More This paper presents a novel deep learning enabled, video based analysis framework for assessing the Unified Parkinsons Disease Rating Scale (UPDRS) that can be used in the clinic or at home. We report results from comparing the performance of the framework to that of trained clinicians on a population of 32 Parkinsons disease (PD) patients. In-person clinical assessments by trained neurologists are used as the ground truth for training our framework and for comparing the performance. We find that the standard sit-to-stand activity can be used to evaluate the UPDRS sub-scores of bradykinesia (BRADY) and posture instability and gait disorders (PIGD). For BRADY we find F1-scores of 0.75 using our framework compared to 0.50 for the video based rater clinicians, while for PIGD we find 0.78 for the framework and 0.45 for the video based rater clinicians. We believe our proposed framework has potential to provide clinically acceptable end points of PD in greater granularity without imposing burdens on patients and clinicians, which empowers a variety of use cases such as passive tracking of PD progression in spaces such as nursing homes, in-home self-assessment, and enhanced tele-medicine. △ Less

Submitted 9 April, 2021; originally announced April 2021.

Comments: Accepted by CVPR Workshops 2021

arXiv:2103.13080 [pdf, other]

Shift-and-Balance Attention

Authors: Chunjie Luo, Jianfeng Zhan, Tianshu Hao, Lei Wang, Wanling Gao

Abstract: Attention is an effective mechanism to improve the deep model capability. Squeeze-and-Excite (SE) introduces a light-weight attention branch to enhance the network's representational power. The attention branch is gated using the Sigmoid function and multiplied by the feature map's trunk branch. It is too sensitive to coordinate and balance the trunk and attention branches' contributions. To contr… ▽ More Attention is an effective mechanism to improve the deep model capability. Squeeze-and-Excite (SE) introduces a light-weight attention branch to enhance the network's representational power. The attention branch is gated using the Sigmoid function and multiplied by the feature map's trunk branch. It is too sensitive to coordinate and balance the trunk and attention branches' contributions. To control the attention branch's influence, we propose a new attention method, called Shift-and-Balance (SB). Different from Squeeze-and-Excite, the attention branch is regulated by the learned control factor to control the balance, then added into the feature map's trunk branch. Experiments show that Shift-and-Balance attention significantly improves the accuracy compared to Squeeze-and-Excite when applied in more layers, increasing more size and capacity of a network. Moreover, Shift-and-Balance attention achieves better or close accuracy compared to the state-of-art Dynamic Convolution. △ Less

Submitted 24 March, 2021; originally announced March 2021.

arXiv:2012.08743 [pdf, ps, other]

Improving Multilingual Neural Machine Translation For Low-Resource Languages: French,English - Vietnamese

Authors: Thi-Vinh Ngo, Phuong-Thai Nguyen, Thanh-Le Ha, Khac-Quy Dinh, Le-Minh Nguyen

Abstract: Prior works have demonstrated that a low-resource language pair can benefit from multilingual machine translation (MT) systems, which rely on many language pairs' joint training. This paper proposes two simple strategies to address the rare word issue in multilingual MT systems for two low-resource language pairs: French-Vietnamese and English-Vietnamese. The first strategy is about dynamical lear… ▽ More Prior works have demonstrated that a low-resource language pair can benefit from multilingual machine translation (MT) systems, which rely on many language pairs' joint training. This paper proposes two simple strategies to address the rare word issue in multilingual MT systems for two low-resource language pairs: French-Vietnamese and English-Vietnamese. The first strategy is about dynamical learning word similarity of tokens in the shared space among source languages while another one attempts to augment the translation ability of rare words through updating their embeddings during the training. Besides, we leverage monolingual data for multilingual MT systems to increase the amount of synthetic parallel corpora while dealing with the data sparsity problem. We have shown significant improvements of up to +1.62 and +2.54 BLEU points over the bilingual baseline systems for both language pairs and released our datasets for the research community. △ Less

Submitted 10 July, 2021; v1 submitted 15 December, 2020; originally announced December 2020.

Comments: The 3rd Workshop on Technologies for MT of Low Resource Languages (LoResMT 2020)

arXiv:2007.03260 [pdf, other]

ResRep: Lossless CNN Pruning via Decoupling Remembering and Forgetting

Authors: Xiaohan Ding, Tianxiang Hao, Jianchao Tan, Ji Liu, Jungong Han, Yuchen Guo, Guiguang Ding

Abstract: We propose ResRep, a novel method for lossless channel pruning (a.k.a. filter pruning), which slims down a CNN by reducing the width (number of output channels) of convolutional layers. Inspired by the neurobiology research about the independence of remembering and forgetting, we propose to re-parameterize a CNN into the remembering parts and forgetting parts, where the former learn to maintain th… ▽ More We propose ResRep, a novel method for lossless channel pruning (a.k.a. filter pruning), which slims down a CNN by reducing the width (number of output channels) of convolutional layers. Inspired by the neurobiology research about the independence of remembering and forgetting, we propose to re-parameterize a CNN into the remembering parts and forgetting parts, where the former learn to maintain the performance and the latter learn to prune. Via training with regular SGD on the former but a novel update rule with penalty gradients on the latter, we realize structured sparsity. Then we equivalently merge the remembering and forgetting parts into the original architecture with narrower layers. In this sense, ResRep can be viewed as a successful application of Structural Re-parameterization. Such a methodology distinguishes ResRep from the traditional learning-based pruning paradigm that applies a penalty on parameters to produce sparsity, which may suppress the parameters essential for the remembering. ResRep slims down a standard ResNet-50 with 76.15% accuracy on ImageNet to a narrower one with only 45% FLOPs and no accuracy drop, which is the first to achieve lossless pruning with such a high compression ratio. The code and models are at https://github.com/DingXiaoH/ResRep. △ Less

Submitted 14 August, 2021; v1 submitted 7 July, 2020; originally announced July 2020.

Comments: ICCV 2021

arXiv:2006.15234 [pdf, other]

Efficient 2D Tensor Network Simulation of Quantum Systems

Authors: Yuchen Pang, Tianyi Hao, Annika Dugad, Yiqing Zhou, Edgar Solomonik

Abstract: Simulation of quantum systems is challenging due to the exponential size of the state space. Tensor networks provide a systematically improvable approximation for quantum states. 2D tensor networks such as Projected Entangled Pair States (PEPS) are well-suited for key classes of physical systems and quantum circuits. However, direct contraction of PEPS networks has exponential cost, while approxim… ▽ More Simulation of quantum systems is challenging due to the exponential size of the state space. Tensor networks provide a systematically improvable approximation for quantum states. 2D tensor networks such as Projected Entangled Pair States (PEPS) are well-suited for key classes of physical systems and quantum circuits. However, direct contraction of PEPS networks has exponential cost, while approximate algorithms require computations with large tensors. We propose new scalable algorithms and software abstractions for PEPS-based methods, accelerating the bottleneck operation of contraction and refactorization of a tensor subnetwork. We employ randomized SVD with an implicit matrix to reduce cost and memory footprint asymptotically. Further, we develop a distributed-memory PEPS library and study accuracy and efficiency of alternative algorithms for PEPS contraction and evolution on the Stampede2 supercomputer. We also simulate a popular near-term quantum algorithm, the Variational Quantum Eigensolver (VQE), and benchmark Imaginary Time Evolution (ITE), which compute ground states of Hamiltonians. △ Less

Submitted 3 September, 2020; v1 submitted 26 June, 2020; originally announced June 2020.

Comments: to be published in SC 2020

arXiv:2005.09940 [pdf, other]

Relative Positional Encoding for Speech Recognition and Direct Translation

Authors: Ngoc-Quan Pham, Thanh-Le Ha, Tuan-Nam Nguyen, Thai-Son Nguyen, Elizabeth Salesky, Sebastian Stueker, Jan Niehues, Alexander Waibel

Abstract: Transformer models are powerful sequence-to-sequence architectures that are capable of directly mapping speech inputs to transcriptions or translations. However, the mechanism for modeling positions in this model was tailored for text modeling, and thus is less ideal for acoustic inputs. In this work, we adapt the relative position encoding scheme to the Speech Transformer, where the key addition… ▽ More Transformer models are powerful sequence-to-sequence architectures that are capable of directly mapping speech inputs to transcriptions or translations. However, the mechanism for modeling positions in this model was tailored for text modeling, and thus is less ideal for acoustic inputs. In this work, we adapt the relative position encoding scheme to the Speech Transformer, where the key addition is relative distance between input states in the self-attention network. As a result, the network can better adapt to the variable distributions present in speech data. Our experiments show that our resulting model achieves the best recognition result on the Switchboard benchmark in the non-augmentation condition, and the best published result in the MuST-C speech translation benchmark. We also show that this model is able to better utilize synthetic data than the Transformer, and adapts better to variable sentence segmentation quality for speech translation. △ Less

Submitted 20 May, 2020; originally announced May 2020.

Comments: Submitted to Interspeech 2020

arXiv:2004.14690 [pdf, other]

AIBench Training: Balanced Industry-Standard AI Training Benchmarking

Authors: Fei Tang, Wanling Gao, Jianfeng Zhan, Chuanxin Lan, Xu Wen, Lei Wang, Chunjie Luo, Jiahui Dai, Zheng Cao, Xingwang Xiong, Zihan Jiang, Tianshu Hao, Fanda Fan, Fan Zhang, Yunyou Huang, Jianan Chen, Mengjia Du, Rui Ren, Chen Zheng, Daoyi Zheng, Haoning Tang, Kunlin Zhan, Biao Wang, Defei Kong, Minghe Yu , et al. (8 additional authors not shown)

Abstract: Earlier-stage evaluations of a new AI architecture/system need affordable benchmarks. Only using a few AI component benchmarks like MLPerfalone in the other stages may lead to misleading conclusions. Moreover, the learning dynamics are not well understood, and the benchmarks' shelf-life is short. This paper proposes a balanced benchmarking methodology. We use real-world benchmarks to cover the fac… ▽ More Earlier-stage evaluations of a new AI architecture/system need affordable benchmarks. Only using a few AI component benchmarks like MLPerfalone in the other stages may lead to misleading conclusions. Moreover, the learning dynamics are not well understood, and the benchmarks' shelf-life is short. This paper proposes a balanced benchmarking methodology. We use real-world benchmarks to cover the factors space that impacts the learning dynamics to the most considerable extent. After performing an exhaustive survey on Internet service AI domains, we identify and implement nineteen representative AI tasks with state-of-the-art models. For repeatable performance ranking (RPR subset) and workload characterization (WC subset), we keep two subsets to a minimum for affordability. We contribute by far the most comprehensive AI training benchmark suite. The evaluations show: (1) AIBench Training (v1.1) outperforms MLPerfTraining (v0.7) in terms of diversity and representativeness of model complexity, computational cost, convergent rate, computation, and memory access patterns, and hotspot functions; (2) Against the AIBench full benchmarks, its RPR subset shortens the benchmarking cost by 64%, while maintaining the primary workload characteristics; (3) The performance ranking shows the single-purpose AI accelerator like TPU with the optimized TensorFlowframework performs better than that of GPUs while losing the latter's general support for various AI models. The specification, source code, and performance numbers are available from the AIBench homepage https://www.benchcouncil.org/aibench-training/index.html. △ Less

Submitted 10 March, 2021; v1 submitted 30 April, 2020; originally announced April 2020.

Comments: ISPASS 2021

arXiv:2003.09891 [pdf, other]

Low Latency ASR for Simultaneous Speech Translation

Authors: Thai Son Nguyen, Jan Niehues, Eunah Cho, Thanh-Le Ha, Kevin Kilgour, Markus Muller, Matthias Sperber, Sebastian Stueker, Alex Waibel

Abstract: User studies have shown that reducing the latency of our simultaneous lecture translation system should be the most important goal. We therefore have worked on several techniques for reducing the latency for both components, the automatic speech recognition and the speech translation module. Since the commonly used commitment latency is not appropriate in our case of continuous stream decoding, we… ▽ More User studies have shown that reducing the latency of our simultaneous lecture translation system should be the most important goal. We therefore have worked on several techniques for reducing the latency for both components, the automatic speech recognition and the speech translation module. Since the commonly used commitment latency is not appropriate in our case of continuous stream decoding, we focused on word latency. We used it to analyze the performance of our current system and to identify opportunities for improvements. In order to minimize the latency we combined run-on decoding with a technique for identifying stable partial hypotheses when stream decoding and a protocol for dynamic output update that allows to revise the most recent parts of the transcription. This combination reduces the latency at word level, where the words are final and will never be updated again in the future, from 18.1s to 1.1s without sacrificing performance in terms of word error rate. △ Less

Submitted 22 March, 2020; originally announced March 2020.

arXiv:2002.07162 [pdf, other]

AIBench: An Agile Domain-specific Benchmarking Methodology and an AI Benchmark Suite

Authors: Wanling Gao, Fei Tang, Jianfeng Zhan, Chuanxin Lan, Chunjie Luo, Lei Wang, Jiahui Dai, Zheng Cao, Xiongwang Xiong, Zihan Jiang, Tianshu Hao, Fanda Fan, Xu Wen, Fan Zhang, Yunyou Huang, Jianan Chen, Mengjia Du, Rui Ren, Chen Zheng, Daoyi Zheng, Haoning Tang, Kunlin Zhan, Biao Wang, Defei Kong, Minghe Yu , et al. (9 additional authors not shown)

Abstract: Domain-specific software and hardware co-design is encouraging as it is much easier to achieve efficiency for fewer tasks. Agile domain-specific benchmarking speeds up the process as it provides not only relevant design inputs but also relevant metrics, and tools. Unfortunately, modern workloads like Big data, AI, and Internet services dwarf the traditional one in terms of code size, deployment sc… ▽ More Domain-specific software and hardware co-design is encouraging as it is much easier to achieve efficiency for fewer tasks. Agile domain-specific benchmarking speeds up the process as it provides not only relevant design inputs but also relevant metrics, and tools. Unfortunately, modern workloads like Big data, AI, and Internet services dwarf the traditional one in terms of code size, deployment scale, and execution path, and hence raise serious benchmarking challenges. This paper proposes an agile domain-specific benchmarking methodology. Together with seventeen industry partners, we identify ten important end-to-end application scenarios, among which sixteen representative AI tasks are distilled as the AI component benchmarks. We propose the permutations of essential AI and non-AI component benchmarks as end-to-end benchmarks. An end-to-end benchmark is a distillation of the essential attributes of an industry-scale application. We design and implement a highly extensible, configurable, and flexible benchmark framework, on the basis of which, we propose the guideline for building end-to-end benchmarks, and present the first end-to-end Internet service AI benchmark. The preliminary evaluation shows the value of our benchmark suite---AIBench against MLPerf and TailBench for hardware and software designers, micro-architectural researchers, and code developers. The specifications, source code, testbed, and results are publicly available from the web site \url{http://www.benchcouncil.org/AIBench/index.html}. △ Less

Submitted 17 February, 2020; originally announced February 2020.

Comments: 25 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:1908.08998

arXiv:2002.03493 [pdf, other]

AI-oriented Medical Workload Allocation for Hierarchical Cloud/Edge/Device Computing

Authors: Tianshu Hao, Jianfeng Zhan, Kai Hwang, Wanling Gao, Xu Wen

Abstract: In a hierarchically-structured cloud/edge/device computing environment, workload allocation can greatly affect the overall system performance. This paper deals with AI-oriented medical workload generated in emergency rooms (ER) or intensive care units (ICU) in metropolitan areas. The goal is to optimize AI-workload allocation to cloud clusters, edge servers, and end devices so that minimum respons… ▽ More In a hierarchically-structured cloud/edge/device computing environment, workload allocation can greatly affect the overall system performance. This paper deals with AI-oriented medical workload generated in emergency rooms (ER) or intensive care units (ICU) in metropolitan areas. The goal is to optimize AI-workload allocation to cloud clusters, edge servers, and end devices so that minimum response time can be achieved in life-saving emergency applications. In particular, we developed a new workload allocation method for the AI workload in distributed cloud/edge/device computing systems. An efficient scheduling and allocation strategy is developed in order to reduce the overall response time to satisfy multi-patient demands. We apply several ICU AI workloads from a comprehensive edge computing benchmark Edge AIBench. The healthcare AI applications involved are short-of-breath alerts, patient phenotype classification, and life-death threats. Our experimental results demonstrate the high efficiency and effectiveness in real-life health-care and emergency applications. △ Less

Submitted 9 February, 2020; originally announced February 2020.

arXiv:1910.03467 [pdf, ps, other]

doi 10.18653/v1/D19-5228

Overcoming the Rare Word Problem for Low-Resource Language Pairs in Neural Machine Translation

Authors: Thi-Vinh Ngo, Thanh-Le Ha, Phuong-Thai Nguyen, Le-Minh Nguyen

Abstract: Among the six challenges of neural machine translation (NMT) coined by (Koehn and Knowles, 2017), rare-word problem is considered the most severe one, especially in translation of low-resource languages. In this paper, we propose three solutions to address the rare words in neural machine translation systems. First, we enhance source context to predict the target words by connecting directly the s… ▽ More Among the six challenges of neural machine translation (NMT) coined by (Koehn and Knowles, 2017), rare-word problem is considered the most severe one, especially in translation of low-resource languages. In this paper, we propose three solutions to address the rare words in neural machine translation systems. First, we enhance source context to predict the target words by connecting directly the source embeddings to the output of the attention component in NMT. Second, we propose an algorithm to learn morphology of unknown words for English in supervised way in order to minimize the adverse effect of rare-word problem. Finally, we exploit synonymous relation from the WordNet to overcome out-of-vocabulary (OOV) problem of NMT. We evaluate our approaches on two low-resource language pairs: English-Vietnamese and Japanese-Vietnamese. In our experiments, we have achieved significant improvements of up to roughly +1.0 BLEU points in both language pairs. △ Less

Submitted 17 October, 2019; v1 submitted 6 October, 2019; originally announced October 2019.

Journal ref: Proceedings of the 6th Workshop on Asian Translation, WAT 2019

arXiv:1910.02238 [pdf, other]

doi 10.5281/zenodo.3525490

How Transformer Revitalizes Character-based Neural Machine Translation: An Investigation on Japanese-Vietnamese Translation Systems

Authors: Thi-Vinh Ngo, Thanh-Le Ha, Phuong-Thai Nguyen, Le-Minh Nguyen

Abstract: While translating between East Asian languages, many works have discovered clear advantages of using characters as the translation unit. Unfortunately, traditional recurrent neural machine translation systems hinder the practical usage of those character-based systems due to their architectural limitations. They are unfavorable in handling extremely long sequences as well as highly restricted in p… ▽ More While translating between East Asian languages, many works have discovered clear advantages of using characters as the translation unit. Unfortunately, traditional recurrent neural machine translation systems hinder the practical usage of those character-based systems due to their architectural limitations. They are unfavorable in handling extremely long sequences as well as highly restricted in parallelizing the computations. In this paper, we demonstrate that the new transformer architecture can perform character-based translation better than the recurrent one. We conduct experiments on a low-resource language pair: Japanese-Vietnamese. Our models considerably outperform the state-of-the-art systems which employ word-based recurrent architectures. △ Less

Submitted 17 October, 2019; v1 submitted 5 October, 2019; originally announced October 2019.

Journal ref: 16th International Workshop on Spoken Language Translation 2019

arXiv:1908.01924 [pdf, ps, other]

Edge AIBench: Towards Comprehensive End-to-end Edge Computing Benchmarking

Authors: Tianshu Hao, Yunyou Huang, Xu Wen, Wanling Gao, Fan Zhang, Chen Zheng, Lei Wang, Hainan Ye, Kai Hwang, Zujie Ren, Jianfeng Zhan

Abstract: In edge computing scenarios, the distribution of data and collaboration of workloads on different layers are serious concerns for performance, privacy, and security issues. So for edge computing benchmarking, we must take an end-to-end view, considering all three layers: client-side devices, edge computing layer, and cloud servers. Unfortunately, the previous work ignores this most important point… ▽ More In edge computing scenarios, the distribution of data and collaboration of workloads on different layers are serious concerns for performance, privacy, and security issues. So for edge computing benchmarking, we must take an end-to-end view, considering all three layers: client-side devices, edge computing layer, and cloud servers. Unfortunately, the previous work ignores this most important point. This paper presents the BenchCouncil's coordinated e ort on edge AI benchmarks, named Edge AIBench. In total, Edge AIBench models four typical application scenarios: ICU Patient Monitor, Surveillance Camera, Smart Home, and Autonomous Vehicle with the focus on data distribution and workload collaboration on three layers. Edge AIBench is a part of the open-source AIBench project, publicly available from http://www.benchcouncil.org/AIBench/index.html. We also build an edge computing testbed with a federated learning framework to resolve performance, privacy, and security issues. △ Less

Submitted 5 August, 2019; originally announced August 2019.

arXiv:1908.00298 [pdf, other]

LoadCNN: A Low Training Cost Deep Learning Model for Day-Ahead Individual Residential Load Forecasting

Authors: Yunyou Huang, Nana Wang, Wanling Gao, Xiaoxu Guo, Cheng Huang, Tianshu Hao, Jianfeng Zhan

Abstract: Accurate day-ahead individual residential load forecasting is of great importance to various applications of smart grid on day-ahead market. Deep learning, as a powerful machine learning technology, has shown great advantages and promising application in load forecasting tasks. However, deep learning is a computationally-hungry method, and requires high costs (e.g., time, energy and CO2 emission)… ▽ More Accurate day-ahead individual residential load forecasting is of great importance to various applications of smart grid on day-ahead market. Deep learning, as a powerful machine learning technology, has shown great advantages and promising application in load forecasting tasks. However, deep learning is a computationally-hungry method, and requires high costs (e.g., time, energy and CO2 emission) to train a deep learning model, which aggravates the energy crisis and incurs a substantial burden to the environment. As a consequence, the deep learning methods are difficult to be popularized and applied in the real smart grid environment. In this paper, we propose a low training cost model based on convolutional neural network, namely LoadCNN, for next-day load forecasting of individual resident with reduced training cost. The experiments show that the training time of LoadCNN is only approximately 1/54 of the one of other state-of-the-art models, and energy consumption and CO2 emissions are only approximate 1/45 of those of other state-of-the-art models based on the same indicators. Meanwhile, the prediction accuracy of our model is equal to that of current state-of-the-art models, making LoadCNN the first load forecasting model simultaneously achieving high prediction accuracy and low training costs. LoadCNN is an efficient green model that is able to be quickly, cost-effectively and environmentally-friendly deployed in a realistic smart grid environment. △ Less

Submitted 19 December, 2019; v1 submitted 1 August, 2019; originally announced August 2019.

arXiv:1906.08584 [pdf, other]

Improving Zero-shot Translation with Language-Independent Constraints

Authors: Ngoc-Quan Pham, Jan Niehues, Thanh-Le Ha, Alex Waibel

Abstract: An important concern in training multilingual neural machine translation (NMT) is to translate between language pairs unseen during training, i.e zero-shot translation. Improving this ability kills two birds with one stone by providing an alternative to pivot translation which also allows us to better understand how the model captures information between languages. In this work, we carried out a… ▽ More An important concern in training multilingual neural machine translation (NMT) is to translate between language pairs unseen during training, i.e zero-shot translation. Improving this ability kills two birds with one stone by providing an alternative to pivot translation which also allows us to better understand how the model captures information between languages. In this work, we carried out an investigation on this capability of the multilingual NMT models. First, we intentionally create an encoder architecture which is independent with respect to the source language. Such experiments shed light on the ability of NMT encoders to learn multilingual representations, in general. Based on such proof of concept, we were able to design regularization methods into the standard Transformer model, so that the whole architecture becomes more robust in zero-shot conditions. We investigated the behaviour of such models on the standard IWSLT 2017 multilingual dataset. We achieved an average improvement of 2.23 BLEU points across 12 language pairs compared to the zero-shot performance of a state-of-the-art multilingual system. Additionally, we carry out further experiments in which the effect is confirmed even for language pairs with multiple intermediate pivots. △ Less

Submitted 20 June, 2019; originally announced June 2019.

Comments: 10 pages version accepted in WMT 2019

arXiv:1905.02940 [pdf, other]

A new direction to promote the implementation of artificial intelligence in natural clinical settings

Authors: Yunyou Huang, Zhifei Zhang, Nana Wang, Nengquan Li, Mengjia Du, Tianshu Hao, Jianfeng Zhan

Abstract: Artificial intelligence (AI) researchers claim that they have made great `achievements' in clinical realms. However, clinicians point out the so-called `achievements' have no ability to implement into natural clinical settings. The root cause for this huge gap is that many essential features of natural clinical tasks are overlooked by AI system developers without medical background. In this paper,… ▽ More Artificial intelligence (AI) researchers claim that they have made great `achievements' in clinical realms. However, clinicians point out the so-called `achievements' have no ability to implement into natural clinical settings. The root cause for this huge gap is that many essential features of natural clinical tasks are overlooked by AI system developers without medical background. In this paper, we propose that the clinical benchmark suite is a novel and promising direction to capture the essential features of the real-world clinical tasks, hence qualifies itself for guiding the development of AI systems, promoting the implementation of AI in real-world clinical practice. △ Less

Submitted 8 May, 2019; originally announced May 2019.

arXiv:1808.00684 [pdf, other]

doi 10.1016/j.jocs.2018.06.012

Synapse: Synthetic Application Profiler and Emulator

Authors: Andre Merzky, Ming Tai Ha, Matteo Turilli, Shantenu Jha

Abstract: Motivated by the need to emulate workload execution characteristics on high-performance and distributed heterogeneous resources, we introduce Synapse. Synapse is used as a proxy application (or "representative application") for real workloads, with the advantage that it can be tuned in different ways and dimensions, and also at levels of granularity that are not possible with real applications. Sy… ▽ More Motivated by the need to emulate workload execution characteristics on high-performance and distributed heterogeneous resources, we introduce Synapse. Synapse is used as a proxy application (or "representative application") for real workloads, with the advantage that it can be tuned in different ways and dimensions, and also at levels of granularity that are not possible with real applications. Synapse has a platform-independent application profiler, and has the ability to emulate profiled workloads on a variety of resources. Experiments show that the automated profiling performed using Synapse captures an application's characteristics with high fidelity. The emulation of an application using Synapse can reproduce the application's execution behavior in the original run-time environment, and can also reproduce those behaviors on different run-time environments. △ Less

Submitted 2 August, 2018; originally announced August 2018.

Comments: Large portions of this work originally appeared as arXiv:1506.00272, which was subsequently published as a workshop paper. This is an extended version published in the "Journal of Computational Science"

Report number: 01

Journal ref: Journal of Computational Science, 27C (2018) pp. 329-344

Showing 1–50 of 64 results for author: Hao, T