-
ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze
Authors:
Chunyu Xuan,
Yazhe Niu,
Yuan Pu,
Shuai Hu,
Yu Liu,
Jing Yang
Abstract:
Monte Carlo Tree Search (MCTS)-based algorithms, such as MuZero and its derivatives, have achieved widespread success in various decision-making domains. These algorithms employ the reanalyze process to enhance sample efficiency from stale data, albeit at the expense of significant wall-clock time consumption. To address this issue, we propose a general approach named ReZero to boost tree search o…
▽ More
Monte Carlo Tree Search (MCTS)-based algorithms, such as MuZero and its derivatives, have achieved widespread success in various decision-making domains. These algorithms employ the reanalyze process to enhance sample efficiency from stale data, albeit at the expense of significant wall-clock time consumption. To address this issue, we propose a general approach named ReZero to boost tree search operations for MCTS-based algorithms. Specifically, drawing inspiration from the one-armed bandit model, we reanalyze training samples through a backward-view reuse technique which obtains the value estimation of a certain child node in advance. To further adapt to this design, we periodically reanalyze the entire buffer instead of frequently reanalyzing the mini-batch. The synergy of these two designs can significantly reduce the search cost and meanwhile guarantee or even improve performance, simplifying both data collecting and reanalyzing. Experiments conducted on Atari environments and board games demonstrate that ReZero substantially improves training speed while maintaining high sample efficiency. The code is available as part of the LightZero benchmark at https://github.com/opendilab/LightZero.
△ Less
Submitted 28 May, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
Addressing the Scalability Bottleneck of Semantic Technologies at Bosch
Authors:
Diego Rincon-Yanez,
Mohamed H. Gad-Elrab,
Daria Stepanova,
Kien Trung Tran,
Cuong Chu Xuan,
Baifan Zhou,
Evgeny Karlamov
Abstract:
At the heart of smart manufacturing is real-time semi-automatic decision-making. Such decisions are vital for optimizing production lines, e.g., reducing resource consumption, improving the quality of discrete manufacturing operations, and optimizing the actual products, e.g., optimizing the sampling rate for measuring product dimensions during production. Such decision-making relies on massive in…
▽ More
At the heart of smart manufacturing is real-time semi-automatic decision-making. Such decisions are vital for optimizing production lines, e.g., reducing resource consumption, improving the quality of discrete manufacturing operations, and optimizing the actual products, e.g., optimizing the sampling rate for measuring product dimensions during production. Such decision-making relies on massive industrial data thus posing a real-time processing bottleneck.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
Constrained Proximal Policy Optimization
Authors:
Chengbin Xuan,
Feng Zhang,
Faliang Yin,
Hak-Keung Lam
Abstract:
The problem of constrained reinforcement learning (CRL) holds significant importance as it provides a framework for addressing critical safety satisfaction concerns in the field of reinforcement learning (RL). However, with the introduction of constraint satisfaction, the current CRL methods necessitate the utilization of second-order optimization or primal-dual frameworks with additional Lagrangi…
▽ More
The problem of constrained reinforcement learning (CRL) holds significant importance as it provides a framework for addressing critical safety satisfaction concerns in the field of reinforcement learning (RL). However, with the introduction of constraint satisfaction, the current CRL methods necessitate the utilization of second-order optimization or primal-dual frameworks with additional Lagrangian multipliers, resulting in increased complexity and inefficiency during implementation. To address these issues, we propose a novel first-order feasible method named Constrained Proximal Policy Optimization (CPPO). By treating the CRL problem as a probabilistic inference problem, our approach integrates the Expectation-Maximization framework to solve it through two steps: 1) calculating the optimal policy distribution within the feasible region (E-step), and 2) conducting a first-order update to adjust the current policy towards the optimal policy obtained in the E-step (M-step). We establish the relationship between the probability ratios and KL divergence to convert the E-step into a convex optimization problem. Furthermore, we develop an iterative heuristic algorithm from a geometric perspective to solve this problem. Additionally, we introduce a conservative update mechanism to overcome the constraint violation issue that occurs in the existing feasible region method. Empirical evaluations conducted in complex and uncertain environments validate the effectiveness of our proposed method, as it performs at least as well as other baselines.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Optimal Ternary Linear Complementary Dual Codes
Authors:
Liangdong Lu,
Ruihu Li,
Qiang Fu,
Chen Xuan,
Wenping Ma
Abstract:
Linear complementary dual (LCD) codes introduced by Massey are the codes whose intersections with their dual codes are trivial. It can help to improve the security of the information processed by sensitive devices, especially against side-channel attacks (SCA) and fault invasive attacks. In this paper, By construction of puncturing, extending, shortening and combination codes, many good ternary LC…
▽ More
Linear complementary dual (LCD) codes introduced by Massey are the codes whose intersections with their dual codes are trivial. It can help to improve the security of the information processed by sensitive devices, especially against side-channel attacks (SCA) and fault invasive attacks. In this paper, By construction of puncturing, extending, shortening and combination codes, many good ternary LCD codes are presented. We give a Table 1 with the values of $d_{LCD}(n,k)$ for length $ n \leq 20$. In addition, Many of these ternary LCD codes given in this paper are optimal which are saturating the lower or upper bound of Grassl's codetable in \cite{Grassl} and some of them are nearly optimal.
△ Less
Submitted 25 December, 2020; v1 submitted 3 December, 2020;
originally announced December 2020.
-
An End-to-End Encryption Solution for Enterprise Content Applications
Authors:
Chaoting Xuan
Abstract:
The content host services (like Dropbox, OneDrive, and Google Drive) used by enterprise customers are deployed either on premise or in cloud. Because users may store business-sensitive data (contents) in these hosting services, they may want to protect their data from disclosure to anyone else, even IT administrators. Unfortunately, even contents (files) are encrypted in the hosting services, they…
▽ More
The content host services (like Dropbox, OneDrive, and Google Drive) used by enterprise customers are deployed either on premise or in cloud. Because users may store business-sensitive data (contents) in these hosting services, they may want to protect their data from disclosure to anyone else, even IT administrators. Unfortunately, even contents (files) are encrypted in the hosting services, they sometimes are still accessible to IT administrators today. The sensitive data could be exposed to public if the IT administrator turns malicious (like disgruntled employee) or his account is compromised by hackers.
We propose an end-to-end encryption (E2EE) solution to address this challenge. The user data is encrypted at client side (mobile device) and remains encrypted in transit and at rest on server. Specifically, we design a new method to allow master secret recover and escrow, while protecting them from being accessed by malicious administrators. In addition, we present a content (file) encryption scheme that achieves privacy, and granular access control. And it can be seamlessly integrated with major content host services used by business users today.
△ Less
Submitted 1 June, 2020;
originally announced June 2020.
-
From Text to Sound: A Preliminary Study on Retrieving Sound Effects to Radio Stories
Authors:
Songwei Ge,
Curtis Xuan,
Ruihua Song,
Chao Zou,
Wei Liu,
Jin Zhou
Abstract:
Sound effects play an essential role in producing high-quality radio stories but require enormous labor cost to add. In this paper, we address the problem of automatically adding sound effects to radio stories with a retrieval-based model. However, directly implementing a tag-based retrieval model leads to high false positives due to the ambiguity of story contents. To solve this problem, we intro…
▽ More
Sound effects play an essential role in producing high-quality radio stories but require enormous labor cost to add. In this paper, we address the problem of automatically adding sound effects to radio stories with a retrieval-based model. However, directly implementing a tag-based retrieval model leads to high false positives due to the ambiguity of story contents. To solve this problem, we introduce a retrieval-based framework hybridized with a semantic inference model which helps to achieve robust retrieval results. Our model relies on fine-designed features extracted from the context of candidate triggers. We collect two story dubbing datasets through crowdsourcing to analyze the setting of adding sound effects and to train and test our proposed methods. We further discuss the importance of each feature and introduce several heuristic rules for the trade-off between precision and recall. Together with the text-to-speech technology, our results reveal a promising automatic pipeline on producing high-quality radio stories.
△ Less
Submitted 20 August, 2019;
originally announced August 2019.
-
Vision-based Robotic Arm Imitation by Human Gesture
Authors:
Cheng Xuan,
Zhiqiang Tang,
Jinxin Xu
Abstract:
One of the most efficient ways for a learning-based robotic arm to learn to process complex tasks as human, is to directly learn from observing how human complete those tasks, and then imitate. Our idea is based on success of Deep Q-Learning (DQN) algorithm according to reinforcement learning, and then extend to Deep Deterministic Policy Gradient (DDPG) algorithm. We developed a learning-based met…
▽ More
One of the most efficient ways for a learning-based robotic arm to learn to process complex tasks as human, is to directly learn from observing how human complete those tasks, and then imitate. Our idea is based on success of Deep Q-Learning (DQN) algorithm according to reinforcement learning, and then extend to Deep Deterministic Policy Gradient (DDPG) algorithm. We developed a learning-based method, combining modified DDPG and visual imitation network. Our approach acquires frames only from a monocular camera, and no need to either construct a 3D environment or generate actual points. The result we expected during training, was that robot would be able to move as almost the same as how human hands did.
△ Less
Submitted 4 October, 2018; v1 submitted 14 March, 2017;
originally announced March 2017.