subscribe to arXiv mailings

Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis

Authors: Zeyi Zhang, Tenglong Ao, Yuyao Zhang, Qingzhe Gao, Chuan Lin, Baoquan Chen, Libin Liu

Abstract: In this work, we present Semantic Gesticulator, a novel framework designed to synthesize realistic gestures accompanying speech with strong semantic correspondence. Semantically meaningful gestures are crucial for effective non-verbal communication, but such gestures often fall within the long tail of the distribution of natural human motion. The sparsity of these movements makes it challenging fo… ▽ More In this work, we present Semantic Gesticulator, a novel framework designed to synthesize realistic gestures accompanying speech with strong semantic correspondence. Semantically meaningful gestures are crucial for effective non-verbal communication, but such gestures often fall within the long tail of the distribution of natural human motion. The sparsity of these movements makes it challenging for deep learning-based systems, trained on moderately sized datasets, to capture the relationship between the movements and the corresponding speech semantics. To address this challenge, we develop a generative retrieval framework based on a large language model. This framework efficiently retrieves suitable semantic gesture candidates from a motion library in response to the input speech. To construct this motion library, we summarize a comprehensive list of commonly used semantic gestures based on findings in linguistics, and we collect a high-quality motion dataset encompassing both body and hand movements. We also design a novel GPT-based model with strong generalization capabilities to audio, capable of generating high-quality gestures that match the rhythm of speech. Furthermore, we propose a semantic alignment mechanism to efficiently align the retrieved semantic gestures with the GPT's output, ensuring the naturalness of the final animation. Our system demonstrates robustness in generating gestures that are rhythmically coherent and semantically explicit, as evidenced by a comprehensive collection of examples. User studies confirm the quality and human-likeness of our results, and show that our system outperforms state-of-the-art systems in terms of semantic appropriateness by a clear margin. △ Less

Submitted 16 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

Comments: SIGGRAPH 2024 (Journal Track); Project page: https://pku-mocca.github.io/Semantic-Gesticulator-Page

arXiv:2403.11303 [pdf]

A Brief Study of Computer Network Security Technologies

Authors: Tulasi Udupa A, Sushma Jayaram, Shreya Ganesh Hegde

Abstract: The rapid development of computer network system brings both a great convenience and new security threats for users. Network security problem generally includes network system security and data security. Specifically, it refers to the reliability of network system, confidentiality, integrity and availability of data information in the system. This paper introduces the significance of network secur… ▽ More The rapid development of computer network system brings both a great convenience and new security threats for users. Network security problem generally includes network system security and data security. Specifically, it refers to the reliability of network system, confidentiality, integrity and availability of data information in the system. This paper introduces the significance of network security systems and highlights related technologies, mainly authentication, data encryption, firewall and antivirus technology. Network security problems can be faced by any network user, therefore we must greatly prioritize network security, try to prevent hostile attacks and ensure the overall security of the network system. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2310.10198 [pdf, other]

MoConVQ: Unified Physics-Based Motion Control via Scalable Discrete Representations

Authors: Heyuan Yao, Zhenhua Song, Yuyang Zhou, Tenglong Ao, Baoquan Chen, Libin Liu

Abstract: In this work, we present MoConVQ, a novel unified framework for physics-based motion control leveraging scalable discrete representations. Building upon vector quantized variational autoencoders (VQ-VAE) and model-based reinforcement learning, our approach effectively learns motion embeddings from a large, unstructured dataset spanning tens of hours of motion examples. The resultant motion represe… ▽ More In this work, we present MoConVQ, a novel unified framework for physics-based motion control leveraging scalable discrete representations. Building upon vector quantized variational autoencoders (VQ-VAE) and model-based reinforcement learning, our approach effectively learns motion embeddings from a large, unstructured dataset spanning tens of hours of motion examples. The resultant motion representation not only captures diverse motion skills but also offers a robust and intuitive interface for various applications. We demonstrate the versatility of MoConVQ through several applications: universal tracking control from various motion sources, interactive character control with latent motion representations using supervised learning, physics-based motion generation from natural language descriptions using the GPT framework, and, most interestingly, seamless integration with large language models (LLMs) with in-context learning to tackle complex and abstract tasks. △ Less

Submitted 19 December, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: Project page: MoConVQ.github.io

arXiv:2304.06952 [pdf, other]

PPG Signals for Hypertension Diagnosis: A Novel Method using Deep Learning Models

Authors: Graham Frederick, Yaswant T, Brintha Therese A

Abstract: Hypertension is a medical condition characterized by high blood pressure, and classifying it into its various stages is crucial to managing the disease. In this project, a novel method is proposed for classifying stages of hypertension using Photoplethysmography (PPG) signals and deep learning models, namely AvgPool_VGG-16. The PPG signal is a non-invasive method of measuring blood pressure throug… ▽ More Hypertension is a medical condition characterized by high blood pressure, and classifying it into its various stages is crucial to managing the disease. In this project, a novel method is proposed for classifying stages of hypertension using Photoplethysmography (PPG) signals and deep learning models, namely AvgPool_VGG-16. The PPG signal is a non-invasive method of measuring blood pressure through the use of light sensors that measure the changes in blood volume in the microvasculature of tissues. PPG images from the publicly available blood pressure classification dataset were used to train the model. Multiclass classification for various PPG stages were done. The results show the proposed method achieves high accuracy in classifying hypertension stages, demonstrating the potential of PPG signals and deep learning models in hypertension diagnosis and management. △ Less

Submitted 14 April, 2023; originally announced April 2023.

Comments: 10 pages, 6figures, 2 tables

arXiv:2303.16856 [pdf, other]

Robust Dancer: Long-term 3D Dance Synthesis Using Unpaired Data

Authors: Bin Feng, Tenglong Ao, Zequn Liu, Wei Ju, Libin Liu, Ming Zhang

Abstract: How to automatically synthesize natural-looking dance movements based on a piece of music is an incrementally popular yet challenging task. Most existing data-driven approaches require hard-to-get paired training data and fail to generate long sequences of motion due to error accumulation of autoregressive structure. We present a novel 3D dance synthesis system that only needs unpaired data for tr… ▽ More How to automatically synthesize natural-looking dance movements based on a piece of music is an incrementally popular yet challenging task. Most existing data-driven approaches require hard-to-get paired training data and fail to generate long sequences of motion due to error accumulation of autoregressive structure. We present a novel 3D dance synthesis system that only needs unpaired data for training and could generate realistic long-term motions at the same time. For the unpaired data training, we explore the disentanglement of beat and style, and propose a Transformer-based model free of reliance upon paired data. For the synthesis of long-term motions, we devise a new long-history attention strategy. It first queries the long-history embedding through an attention computation and then explicitly fuses this embedding into the generation pipeline via multimodal adaptation gate (MAG). Objective and subjective evaluations show that our results are comparable to strong baseline methods, despite not requiring paired training data, and are robust when inferring long-term music. To our best knowledge, we are the first to achieve unpaired data training - an ability that enables to alleviate data limitations effectively. Our code is released on https://github.com/BFeng14/RobustDancer △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: Preliminary video demo: https://youtu.be/gJbxG9QlcUU

arXiv:2303.14613 [pdf, other]

GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents

Authors: Tenglong Ao, Zeyi Zhang, Libin Liu

Abstract: The automatic generation of stylized co-speech gestures has recently received increasing attention. Previous systems typically allow style control via predefined text labels or example motion clips, which are often not flexible enough to convey user intent accurately. In this work, we present GestureDiffuCLIP, a neural network framework for synthesizing realistic, stylized co-speech gestures with… ▽ More The automatic generation of stylized co-speech gestures has recently received increasing attention. Previous systems typically allow style control via predefined text labels or example motion clips, which are often not flexible enough to convey user intent accurately. In this work, we present GestureDiffuCLIP, a neural network framework for synthesizing realistic, stylized co-speech gestures with flexible style control. We leverage the power of the large-scale Contrastive-Language-Image-Pre-training (CLIP) model and present a novel CLIP-guided mechanism that extracts efficient style representations from multiple input modalities, such as a piece of text, an example motion clip, or a video. Our system learns a latent diffusion model to generate high-quality gestures and infuses the CLIP representations of style into the generator via an adaptive instance normalization (AdaIN) layer. We further devise a gesture-transcript alignment mechanism that ensures a semantically correct gesture generation based on contrastive learning. Our system can also be extended to allow fine-grained style control of individual body parts. We demonstrate an extensive set of examples showing the flexibility and generalizability of our model to a variety of style descriptions. In a user study, we show that our system outperforms the state-of-the-art approaches regarding human likeness, appropriateness, and style correctness. △ Less

Submitted 16 October, 2023; v1 submitted 25 March, 2023; originally announced March 2023.

Comments: SIGGRAPH 2023 (Journal Track); Project Page: https://pku-mocca.github.io/GestureDiffuCLIP-Page/

arXiv:2210.10386 [pdf, other]

Virtual Screening on FPGA: Performance and Energy versus Effort

Authors: Tom Vander Aa, Tom Haber, Thomas J. Ashby, Roel Wuyts, Wilfried Verachtert

Abstract: With their widespread availability, FPGA-based accelerators cards have become an alternative to GPUs and CPUs to accelerate computing in applications with certain requirements (like energy efficiency) or properties (like fixed-point computations). In this paper we show results and experiences from mapping an industrial application used for drug discovery on several types of accelerators. We especi… ▽ More With their widespread availability, FPGA-based accelerators cards have become an alternative to GPUs and CPUs to accelerate computing in applications with certain requirements (like energy efficiency) or properties (like fixed-point computations). In this paper we show results and experiences from mapping an industrial application used for drug discovery on several types of accelerators. We especially highlight the effort versus benefit of FPGAs compared to CPUs and GPUs in terms of performance and energy efficiency. For this application, even with extensive use of FPGA-specific features, and performing different optimizations, results on GPUs are still better, both in terms of energy and performance. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: To be published at H2RC 2022 - https://h2rc.cse.sc.edu/

arXiv:2210.01448 [pdf, other]

doi 10.1145/3550454.3555435

Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings

Authors: Tenglong Ao, Qingzhe Gao, Yuke Lou, Baoquan Chen, Libin Liu

Abstract: Automatic synthesis of realistic co-speech gestures is an increasingly important yet challenging task in artificial embodied agent creation. Previous systems mainly focus on generating gestures in an end-to-end manner, which leads to difficulties in mining the clear rhythm and semantics due to the complex yet subtle harmony between speech and gestures. We present a novel co-speech gesture synthesi… ▽ More Automatic synthesis of realistic co-speech gestures is an increasingly important yet challenging task in artificial embodied agent creation. Previous systems mainly focus on generating gestures in an end-to-end manner, which leads to difficulties in mining the clear rhythm and semantics due to the complex yet subtle harmony between speech and gestures. We present a novel co-speech gesture synthesis method that achieves convincing results both on the rhythm and semantics. For the rhythm, our system contains a robust rhythm-based segmentation pipeline to ensure the temporal coherence between the vocalization and gestures explicitly. For the gesture semantics, we devise a mechanism to effectively disentangle both low- and high-level neural embeddings of speech and motion based on linguistic theory. The high-level embedding corresponds to semantics, while the low-level embedding relates to subtle variations. Lastly, we build correspondence between the hierarchical embeddings of the speech and the motion, resulting in rhythm- and semantics-aware gesture synthesis. Evaluations with existing objective metrics, a newly proposed rhythmic metric, and human feedback show that our method outperforms state-of-the-art systems by a clear margin. △ Less

Submitted 4 May, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

Comments: SIGGRAPH Asia 2022 (Journal Track); Project Page: https://pku-mocca.github.io/Rhythmic-Gesticulator-Page/

arXiv:2004.02561 [pdf, other]

A High-Performance Implementation of Bayesian Matrix Factorization with Limited Communication

Authors: Tom Vander Aa, Xiangju Qin, Paul Blomstedt, Roel Wuyts, Wilfried Verachtert, Samuel Kaski

Abstract: Matrix factorization is a very common machine learning technique in recommender systems. Bayesian Matrix Factorization (BMF) algorithms would be attractive because of their ability to quantify uncertainty in their predictions and avoid over-fitting, combined with high prediction accuracy. However, they have not been widely used on large-scale data because of their prohibitive computational cost. I… ▽ More Matrix factorization is a very common machine learning technique in recommender systems. Bayesian Matrix Factorization (BMF) algorithms would be attractive because of their ability to quantify uncertainty in their predictions and avoid over-fitting, combined with high prediction accuracy. However, they have not been widely used on large-scale data because of their prohibitive computational cost. In recent work, efforts have been made to reduce the cost, both by improving the scalability of the BMF algorithm as well as its implementation, but so far mainly separately. In this paper we show that the state-of-the-art of both approaches to scalability can be combined. We combine the recent highly-scalable Posterior Propagation algorithm for BMF, which parallelizes computation of blocks of the matrix, with a distributed BMF implementation that users asynchronous communication within each block. We show that the combination of the two methods gives substantial improvements in the scalability of BMF on web-scale datasets, when the goal is to reduce the wall-clock time. △ Less

Submitted 14 April, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

Comments: European Commission Project: EPEEC - European joint Effort toward a Highly Productive Programming Environment for Heterogeneous Exascale Computing (EC-H2020-80151)

arXiv:2001.03000 [pdf, other]

doi 10.3233/IDA-184287

Guidelines for enhancing data locality in selected machine learning algorithms

Authors: Imen Chakroun, Tom Vander Aa, Thomas J. Ashby

Abstract: To deal with the complexity of the new bigger and more complex generation of data, machine learning (ML) techniques are probably the first and foremost used. For ML algorithms to produce results in a reasonable amount of time, they need to be implemented efficiently. In this paper, we analyze one of the means to increase the performances of machine learning algorithms which is exploiting data loca… ▽ More To deal with the complexity of the new bigger and more complex generation of data, machine learning (ML) techniques are probably the first and foremost used. For ML algorithms to produce results in a reasonable amount of time, they need to be implemented efficiently. In this paper, we analyze one of the means to increase the performances of machine learning algorithms which is exploiting data locality. Data locality and access patterns are often at the heart of performance issues in computing systems due to the use of certain hardware techniques to improve performance. Altering the access patterns to increase locality can dramatically increase performance of a given algorithm. Besides, repeated data access can be seen as redundancy in data movement. Similarly, there can also be redundancy in the repetition of calculations. This work also identifies some of the opportunities for avoiding these redundancies by directly reusing computation results. We start by motivating why and how a more efficient implementation can be achieved by exploiting reuse in the memory hierarchy of modern instruction set processors. Next we document the possibilities of such reuse in some selected machine learning algorithms. △ Less

Submitted 9 January, 2020; originally announced January 2020.

Comments: European Commission Project: EPEEC - European joint Effort toward a Highly Productive Programming Environment for Heterogeneous Exascale Computing (EC-H2020-80151) This an extended version of arXiv:1904.11203

Journal ref: Intelligent Data Analysis, vol. 23, no. 5, pp. 1003-1020, 2019

arXiv:1904.11203 [pdf]

Reviewing Data Access Patterns and Computational Redundancy for Machine Learning Algorithms

Authors: Imen Chakroun, Tom Vander Aa, Tom Ashby

Abstract: Machine learning (ML) is probably the first and foremost used technique to deal with the size and complexity of the new generation of data. In this paper, we analyze one of the means to increase the performances of ML algorithms which is exploiting data locality. Data locality and access patterns are often at the heart of performance issues in computing systems due to the use of certain hardware t… ▽ More Machine learning (ML) is probably the first and foremost used technique to deal with the size and complexity of the new generation of data. In this paper, we analyze one of the means to increase the performances of ML algorithms which is exploiting data locality. Data locality and access patterns are often at the heart of performance issues in computing systems due to the use of certain hardware techniques to improve performance. Altering the access patterns to increase locality can dramatically increase performance of a given algorithm. Besides, repeated data access can be seen as redundancy in data movement. Similarly, there can also be redundancy in the repetition of calculations. This work also identifies some of the opportunities for avoiding these redundancies by directly reusing computation results. We document the possibilities of such reuse in some selected machine learning algorithms and give initial indicative results from our first experiments on data access improvement and algorithm redesign. △ Less

Submitted 9 January, 2020; v1 submitted 25 April, 2019; originally announced April 2019.

Comments: European Commission Project: EPEEC - European joint Effort toward a Highly Productive Programming Environment for Heterogeneous Exascale Computing (EC-H2020-80151) An extended version of this paper titled "Guidelines for enhancing data locality in selected machine learning algorithms" has been published in the journal "Intelligent Data Analysis"

arXiv:1904.02514 [pdf, other]

SMURFF: a High-Performance Framework for Matrix Factorization

Authors: Tom Vander Aa, Imen Chakroun, Thomas J. Ashby, Jaak Simm, Adam Arany, Yves Moreau, Thanh Le Van, José Felipe Golib Dzib, Jörg Wegner, Vladimir Chupakhin, Hugo Ceulemans, Roel Wuyts, Wilfried Verachtert

Abstract: Bayesian Matrix Factorization (BMF) is a powerful technique for recommender systems because it produces good results and is relatively robust against overfitting. Yet BMF is more computationally intensive and thus more challenging to implement for large datasets. In this work we present SMURFF a high-performance feature-rich framework to compose and construct different Bayesian matrix-factorizatio… ▽ More Bayesian Matrix Factorization (BMF) is a powerful technique for recommender systems because it produces good results and is relatively robust against overfitting. Yet BMF is more computationally intensive and thus more challenging to implement for large datasets. In this work we present SMURFF a high-performance feature-rich framework to compose and construct different Bayesian matrix-factorization methods. The framework has been successfully used in to do large scale runs of compound-activity prediction. SMURFF is available as open-source and can be used both on a supercomputer and on a desktop or laptop machine. Documentation and several examples are provided as Jupyter notebooks using SMURFF's high-level Python API. △ Less

Submitted 29 July, 2019; v1 submitted 4 April, 2019; originally announced April 2019.

Comments: European Commission Project: EPEEC - European joint Effort toward a Highly Productive Programming Environment for Heterogeneous Exascale Computing (EC-H2020-80151)

arXiv:1802.01159 [pdf, other]

doi 10.1145/2818052.2869130

Mining Twitter Conversations around E-commerce Promotional Events

Authors: Binny Mathew, Unnikrishnan T A, Tanmoy Chakraborty, Niloy Ganguly, Samik Datta

Abstract: With Social Media platforms establishing themselves as the de facto destinations for their customers views and opinions, brands around the World are investing heavily on invigorating their customer connects by utilizing such platforms to their fullest. In this paper, we develop a novel technique for mining conversations in Twitter by weaving together all conversations around an event into one unif… ▽ More With Social Media platforms establishing themselves as the de facto destinations for their customers views and opinions, brands around the World are investing heavily on invigorating their customer connects by utilizing such platforms to their fullest. In this paper, we develop a novel technique for mining conversations in Twitter by weaving together all conversations around an event into one unified graph (Conversation Graph, henceforth). The structure of the Conversation Graph emerges as a variant of the BOWTIE structure (dubbed ASKEWBOWTIE henceforth) as a result of the complex communication patterns amongst these players. Finally, we investigate the structural properties of the ASKEWBOWTIE structure to understand the configuration of the components and their temporal evolution. △ Less

Submitted 4 February, 2018; originally announced February 2018.

Comments: 4 pages, 5 tables, 3 figures

arXiv:1705.10633 [pdf, other]

doi 10.1016/j.procs.2017.05.009

Distributed Matrix Factorization using Asynchrounous Communication

Authors: Tom Vander Aa, Imen Chakroun, Tom Haber

Abstract: Using the matrix factorization technique in machine learning is very common mainly in areas like recommender systems. Despite its high prediction accuracy and its ability to avoid over-fitting of the data, the Bayesian Probabilistic Matrix Factorization algorithm (BPMF) has not been widely used on large scale data because of the prohibitive cost. In this paper, we propose a distributed high-perfor… ▽ More Using the matrix factorization technique in machine learning is very common mainly in areas like recommender systems. Despite its high prediction accuracy and its ability to avoid over-fitting of the data, the Bayesian Probabilistic Matrix Factorization algorithm (BPMF) has not been widely used on large scale data because of the prohibitive cost. In this paper, we propose a distributed high-performance parallel implementation of the BPMF using Gibbs sampling on shared and distributed architectures. We show by using efficient load balancing using work stealing on a single node, and by using asynchronous communication in the distributed version we beat state of the art implementations. △ Less

Submitted 29 May, 2017; originally announced May 2017.

Comments: arXiv admin note: substantial text overlap with arXiv:1705.04159

arXiv:1705.04159 [pdf, other]

doi 10.1109/CLUSTER.2016.13

Distributed Bayesian Probabilistic Matrix Factorization

Authors: Tom Vander Aa, Imen Chakroun, Tom Haber

Abstract: Matrix factorization is a common machine learning technique for recommender systems. Despite its high prediction accuracy, the Bayesian Probabilistic Matrix Factorization algorithm (BPMF) has not been widely used on large scale data because of its high computational cost. In this paper we propose a distributed high-performance parallel implementation of BPMF on shared memory and distributed archit… ▽ More Matrix factorization is a common machine learning technique for recommender systems. Despite its high prediction accuracy, the Bayesian Probabilistic Matrix Factorization algorithm (BPMF) has not been widely used on large scale data because of its high computational cost. In this paper we propose a distributed high-performance parallel implementation of BPMF on shared memory and distributed architectures. We show by using efficient load balancing using work stealing on a single node, and by using asynchronous communication in the distributed version we beat state of the art implementations. △ Less

Submitted 11 May, 2017; originally announced May 2017.

Showing 1–15 of 15 results for author: Ao, T