subscribe to arXiv mailings

MIMO Capacity Maximization with Beyond-Diagonal RIS

Authors: Ignacio Santamaria, Mohammad Soleymani, Eduard Jorswieck, Jesús Gutiérrez

Abstract: This paper addresses the problem of maximizing the capacity of a multiple-input multiple-output (MIMO) link assisted by a beyond-diagonal reconfigurable intelligent surface (BD-RIS). We maximize the capacity by alternately optimizing the transmit covariance matrix, and the BD-RIS scattering matrix, which, according to network theory, should be unitary and symmetric. These constraints make the opti… ▽ More This paper addresses the problem of maximizing the capacity of a multiple-input multiple-output (MIMO) link assisted by a beyond-diagonal reconfigurable intelligent surface (BD-RIS). We maximize the capacity by alternately optimizing the transmit covariance matrix, and the BD-RIS scattering matrix, which, according to network theory, should be unitary and symmetric. These constraints make the optimization of BD-RIS more challenging than that of diagonal RIS. To find a stationary point of the capacity we maximize a sequence of quadratic problems in the manifold of unitary matrices. This leads to an efficient algorithm that always improves the capacity obtained by a diagonal RIS. Through simulation examples, we study the capacity improvement provided by a passive BD-RIS architecture over the conventional RIS model in which the phase shift matrix is diagonal. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 5 pages, 4 figures

arXiv:2405.14017 [pdf, other]

MagicPose4D: Crafting Articulated Models with Appearance and Motion Control

Authors: Hao Zhang, Di Chang, Fang Li, Mohammad Soleymani, Narendra Ahuja

Abstract: With the success of 2D and 3D visual generative models, there is growing interest in generating 4D content. Existing methods primarily rely on text prompts to produce 4D content, but they often fall short of accurately defining complex or rare motions. To address this limitation, we propose MagicPose4D, a novel framework for refined control over both appearance and motion in 4D generation. Unlike… ▽ More With the success of 2D and 3D visual generative models, there is growing interest in generating 4D content. Existing methods primarily rely on text prompts to produce 4D content, but they often fall short of accurately defining complex or rare motions. To address this limitation, we propose MagicPose4D, a novel framework for refined control over both appearance and motion in 4D generation. Unlike traditional methods, MagicPose4D accepts monocular videos as motion prompts, enabling precise and customizable motion generation. MagicPose4D comprises two key modules: i) Dual-Phase 4D Reconstruction Module} which operates in two phases. The first phase focuses on capturing the model's shape using accurate 2D supervision and less accurate but geometrically informative 3D pseudo-supervision without imposing skeleton constraints. The second phase refines the model using more accurate pseudo-3D supervision, obtained in the first phase and introduces kinematic chain-based skeleton constraints to ensure physical plausibility. Additionally, we propose a Global-local Chamfer loss that aligns the overall distribution of predicted mesh vertices with the supervision while maintaining part-level alignment without extra annotations. ii) Cross-category Motion Transfer Module} leverages the predictions from the 4D reconstruction module and uses a kinematic-chain-based skeleton to achieve cross-category motion transfer. It ensures smooth transitions between frames through dynamic rigidity, facilitating robust generalization without additional training. Through extensive experiments, we demonstrate that MagicPose4D significantly improves the accuracy and consistency of 4D content generation, outperforming existing methods in various benchmarks. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: Project Page: https://boese0601.github.io/magicpose4d

arXiv:2403.10737 [pdf, other]

Leveraging Synthetic Data for Generalizable and Fair Facial Action Unit Detection

Authors: Liupei Lu, Yufeng Yin, Yuming Gu, Yizhen Wu, Pratusha Prasad, Yajie Zhao, Mohammad Soleymani

Abstract: Facial action unit (AU) detection is a fundamental block for objective facial expression analysis. Supervised learning approaches require a large amount of manual labeling which is costly. The limited labeled data are also not diverse in terms of gender which can affect model fairness. In this paper, we propose to use synthetically generated data and multi-source domain adaptation (MSDA) to addres… ▽ More Facial action unit (AU) detection is a fundamental block for objective facial expression analysis. Supervised learning approaches require a large amount of manual labeling which is costly. The limited labeled data are also not diverse in terms of gender which can affect model fairness. In this paper, we propose to use synthetically generated data and multi-source domain adaptation (MSDA) to address the problems of the scarcity of labeled data and the diversity of subjects. Specifically, we propose to generate a diverse dataset through synthetic facial expression re-targeting by transferring the expressions from real faces to synthetic avatars. Then, we use MSDA to transfer the AU detection knowledge from a real dataset and the synthetic dataset to a target dataset. Instead of aligning the overall distributions of different domains, we propose Paired Moment Matching (PM2) to align the features of the paired real and synthetic data with the same facial expression. To further improve gender fairness, PM2 matches the features of the real data with a female and a male synthetic image. Our results indicate that synthetic data and the proposed model improve both AU detection performance and fairness across genders, demonstrating its potential to solve AU detection in-the-wild. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: The work was done in 2021

arXiv:2403.09069 [pdf, other]

Dyadic Interaction Modeling for Social Behavior Generation

Authors: Minh Tran, Di Chang, Maksim Siniukov, Mohammad Soleymani

Abstract: Human-human communication is like a delicate dance where listeners and speakers concurrently interact to maintain conversational dynamics. Hence, an effective model for generating listener nonverbal behaviors requires understanding the dyadic context and interaction. In this paper, we present an effective framework for creating 3D facial motions in dyadic interactions. Existing work consider a lis… ▽ More Human-human communication is like a delicate dance where listeners and speakers concurrently interact to maintain conversational dynamics. Hence, an effective model for generating listener nonverbal behaviors requires understanding the dyadic context and interaction. In this paper, we present an effective framework for creating 3D facial motions in dyadic interactions. Existing work consider a listener as a reactive agent with reflexive behaviors to the speaker's voice and facial motions. The heart of our framework is Dyadic Interaction Modeling (DIM), a pre-training approach that jointly models speakers' and listeners' motions through masking and contrastive learning to learn representations that capture the dyadic context. To enable the generation of non-deterministic behaviors, we encode both listener and speaker motions into discrete latent representations, through VQ-VAE. The pre-trained model is further fine-tuned for motion generation. Extensive experiments demonstrate the superiority of our framework in generating listener motions, establishing a new state-of-the-art according to the quantitative measures capturing the diversity and realism of generated motions. Qualitative results demonstrate the superior capabilities of the proposed approach in generating diverse and realistic expressions, eye blinks and head gestures. △ Less

Submitted 26 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2402.16434 [pdf, other]

Optimization of the Downlink Spectral- and Energy-Efficiency of RIS-aided Multi-user URLLC MIMO Systems

Authors: Mohammad Soleymani, Ignacio Santamaria, Eduard Jorswieck, Robert Schober, Lajos Hanzo

Abstract: Modern wireless communication systems are expected to provide improved latency and reliability. To meet these expectations, a short packet length is needed, which makes the first-order Shannon rate an inaccurate performance metric for such communication systems. A more accurate approximation of the achievable rates of finite-block-length (FBL) coding regimes is known as the normal approximation (N… ▽ More Modern wireless communication systems are expected to provide improved latency and reliability. To meet these expectations, a short packet length is needed, which makes the first-order Shannon rate an inaccurate performance metric for such communication systems. A more accurate approximation of the achievable rates of finite-block-length (FBL) coding regimes is known as the normal approximation (NA). It is therefore of substantial interest to study the optimization of the FBL rate in multi-user multiple-input multiple-output (MIMO) systems, in which each user may transmit and/or receive multiple data streams. Hence, we formulate a general optimization problem for improving the spectral and energy efficiency of multi-user MIMO-aided ultra-reliable low-latency communication (URLLC) systems, which are assisted by reconfigurable intelligent surfaces (RISs). We show that a RIS is capable of substantially improving the performance of multi-user MIMO-aided URLLC systems. Moreover, the benefits of RIS increase as the packet length and/or the tolerable bit error rate are reduced. This reveals that RISs can be even more beneficial in URLLC systems for improving the FBL rates than in conventional systems approaching Shannon rates. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.15513 [pdf, other]

doi 10.1109/BIBM58861.2023.10385292

Investigating the Generalizability of Physiological Characteristics of Anxiety

Authors: Emily Zhou, Mohammad Soleymani, Maja J. Matarić

Abstract: Recent works have demonstrated the effectiveness of machine learning (ML) techniques in detecting anxiety and stress using physiological signals, but it is unclear whether ML models are learning physiological features specific to stress. To address this ambiguity, we evaluated the generalizability of physiological features that have been shown to be correlated with anxiety and stress to high-arous… ▽ More Recent works have demonstrated the effectiveness of machine learning (ML) techniques in detecting anxiety and stress using physiological signals, but it is unclear whether ML models are learning physiological features specific to stress. To address this ambiguity, we evaluated the generalizability of physiological features that have been shown to be correlated with anxiety and stress to high-arousal emotions. Specifically, we examine features extracted from electrocardiogram (ECG) and electrodermal (EDA) signals from the following three datasets: Anxiety Phases Dataset (APD), Wearable Stress and Affect Detection (WESAD), and the Continuously Annotated Signals of Emotion (CASE) dataset. We aim to understand whether these features are specific to anxiety or general to other high-arousal emotions through a statistical regression analysis, in addition to a within-corpus, cross-corpus, and leave-one-corpus-out cross-validation across instances of stress and arousal. We used the following classifiers: Support Vector Machines, LightGBM, Random Forest, XGBoost, and an ensemble of the aforementioned models. We found that models trained on an arousal dataset perform relatively well on a previously unseen stress dataset, and vice versa. Our experimental results suggest that the evaluated models may be identifying emotional arousal instead of stress. This work is the first cross-corpus evaluation across stress and arousal from ECG and EDA signals, contributing new findings about the generalizability of stress detection. △ Less

Submitted 23 January, 2024; originally announced February 2024.

Journal ref: 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2023, pp. 4848-4855

arXiv:2402.01647 [pdf, other]

Build Your Own Robot Friend: An Open-Source Learning Module for Accessible and Engaging AI Education

Authors: Zhonghao Shi, Allison O'Connell, Zongjian Li, Siqi Liu, Jennifer Ayissi, Guy Hoffman, Mohammad Soleymani, Maja J. Matarić

Abstract: As artificial intelligence (AI) is playing an increasingly important role in our society and global economy, AI education and literacy have become necessary components in college and K-12 education to prepare students for an AI-powered society. However, current AI curricula have not yet been made accessible and engaging enough for students and schools from all socio-economic backgrounds with diffe… ▽ More As artificial intelligence (AI) is playing an increasingly important role in our society and global economy, AI education and literacy have become necessary components in college and K-12 education to prepare students for an AI-powered society. However, current AI curricula have not yet been made accessible and engaging enough for students and schools from all socio-economic backgrounds with different educational goals. In this work, we developed an open-source learning module for college and high school students, which allows students to build their own robot companion from the ground up. This open platform can be used to provide hands-on experience and introductory knowledge about various aspects of AI, including robotics, machine learning (ML), software engineering, and mechanical engineering. Because of the social and personal nature of a socially assistive robot companion, this module also puts a special emphasis on human-centered AI, enabling students to develop a better understanding of human-AI interaction and AI ethics through hands-on learning activities. With open-source documentation, assembling manuals and affordable materials, students from different socio-economic backgrounds can personalize their learning experience based on their individual educational goals. To evaluate the student-perceived quality of our module, we conducted a usability testing workshop with 15 college students recruited from a minority-serving institution. Our results indicate that our AI module is effective, easy-to-follow, and engaging, and it increases student interest in studying AI/ML and robotics in the future. We hope that this work will contribute toward accessible and engaging AI education in human-AI interaction for college and high school students. △ Less

Submitted 6 January, 2024; originally announced February 2024.

Comments: Accepted to the Proceedings of the AAAI Conference on Artificial Intelligence (2024)

arXiv:2401.11921 [pdf, other]

Maximizing Spectral and Energy Efficiency in Multi-user MIMO OFDM Systems with RIS and Hardware Impairment

Authors: Mohammad Soleymani, Ignacio Santamaria, Aydin Sezgin, Eduard Jorswieck

Abstract: An emerging technology to enhance the spectral efficiency (SE) and energy efficiency (EE) of wireless communication systems is reconfigurable intelligent surface (RIS), which is shown to be very powerful in single-carrier systems. However, in multi-user orthogonal frequency division multiplexing (OFDM) systems, RIS may not be as promising as in single-carrier systems since an independent optimizat… ▽ More An emerging technology to enhance the spectral efficiency (SE) and energy efficiency (EE) of wireless communication systems is reconfigurable intelligent surface (RIS), which is shown to be very powerful in single-carrier systems. However, in multi-user orthogonal frequency division multiplexing (OFDM) systems, RIS may not be as promising as in single-carrier systems since an independent optimization of RIS elements at each sub-carrier is impossible in multi-carrier systems. Thus, this paper investigates the performance of various RIS technologies like regular (reflective and passive), simultaneously transmit and reflect (STAR), and multi-sector beyond diagonal (BD) RIS in multi-user multiple-input multiple-output (MIMO) OFDM broadcast channels (BC). This requires to formulate and solve a joint MIMO precoding and RIS optimization problem. The obtained solution reveals that RIS can significantly improve the system performance even when the number of RIS elements is relatively low. Moreover, we develop resource allocation schemes for STAR-RIS and multi-sector BD-RIS in MIMO OFDM BCs, and show that these RIS technologies can outperform a regular RIS, especially when the regular RIS cannot assist the communications for all the users. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2311.12052 [pdf, other]

MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion

Authors: Di Chang, Yichun Shi, Quankai Gao, Jessica Fu, Hongyi Xu, Guoxian Song, Qing Yan, Yizhe Zhu, Xiao Yang, Mohammad Soleymani

Abstract: In this work, we propose MagicPose, a diffusion-based model for 2D human pose and facial expression retargeting. Specifically, given a reference image, we aim to generate a person's new images by controlling the poses and facial expressions while keeping the identity unchanged. To this end, we propose a two-stage training strategy to disentangle human motions and appearance (e.g., facial expressio… ▽ More In this work, we propose MagicPose, a diffusion-based model for 2D human pose and facial expression retargeting. Specifically, given a reference image, we aim to generate a person's new images by controlling the poses and facial expressions while keeping the identity unchanged. To this end, we propose a two-stage training strategy to disentangle human motions and appearance (e.g., facial expressions, skin tone and dressing), consisting of (1) the pre-training of an appearance-control block and (2) learning appearance-disentangled pose control. Our novel design enables robust appearance control over generated human images, including body, facial attributes, and even background. By leveraging the prior knowledge of image diffusion models, MagicPose generalizes well to unseen human identities and complex poses without the need for additional fine-tuning. Moreover, the proposed model is easy to use and can be considered as a plug-in module/extension to Stable Diffusion. The code is available at: https://github.com/Boese0601/MagicDance △ Less

Submitted 5 May, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

Comments: Accepted by ICML 2024. MagicPose and MagicDance are the same project. Website:https://boese0601.github.io/magicdance/ Code:https://github.com/Boese0601/MagicDance

arXiv:2310.08289 [pdf, other]

Maximization of minimum rate in MIMO OFDM RIS-assisted Broadcast Channels

Authors: Mohammad Soleymani, Ignacio Santamaria, Aydin Sezgin, Eduard Jorswieck

Abstract: Reconfigurable intelligent surface (RIS) is a promising technology to enhance the spectral efficiency of wireless communication systems. By optimizing the RIS elements, the performance of the overall system can be improved. Yet, in contrast to single-carrier systems, in multi-carrier systems, it is not possible to independently optimize RIS elements at each sub-carrier, which may reduce the benefi… ▽ More Reconfigurable intelligent surface (RIS) is a promising technology to enhance the spectral efficiency of wireless communication systems. By optimizing the RIS elements, the performance of the overall system can be improved. Yet, in contrast to single-carrier systems, in multi-carrier systems, it is not possible to independently optimize RIS elements at each sub-carrier, which may reduce the benefits of RIS in multi-user orthogonal frequency division multiplexing (OFDM) systems. To this end, we investigate the effectiveness of RIS in multiple-input, multiple-output (MIMO) OFDM broadcast channels (BC). We formulate and solve a joint precoding and RIS optimization problem. We show that RIS can significantly improve the system performance even when the number of RIS elements per sub-band is very low. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: Accepted at IEEE CAMSAP 2023

arXiv:2309.02418 [pdf, other]

Personalized Adaptation with Pre-trained Speech Encoders for Continuous Emotion Recognition

Authors: Minh Tran, Yufeng Yin, Mohammad Soleymani

Abstract: There are individual differences in expressive behaviors driven by cultural norms and personality. This between-person variation can result in reduced emotion recognition performance. Therefore, personalization is an important step in improving the generalization and robustness of speech emotion recognition. In this paper, to achieve unsupervised personalized emotion recognition, we first pre-trai… ▽ More There are individual differences in expressive behaviors driven by cultural norms and personality. This between-person variation can result in reduced emotion recognition performance. Therefore, personalization is an important step in improving the generalization and robustness of speech emotion recognition. In this paper, to achieve unsupervised personalized emotion recognition, we first pre-train an encoder with learnable speaker embeddings in a self-supervised manner to learn robust speech representations conditioned on speakers. Second, we propose an unsupervised method to compensate for the label distribution shifts by finding similar speakers and leveraging their label distributions from the training set. Extensive experimental results on the MSP-Podcast corpus indicate that our method consistently outperforms strong personalization baselines and achieves state-of-the-art performance for valence estimation. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: Accepted by INTERSPEECH 2023

arXiv:2308.12544 [pdf, other]

Analog Multi-Party Computing: Locally Differential Private Protocols for Collaborative Computations

Authors: Hsuan-Po Liu, Mahdi Soleymani, Hessam Mahdavifar

Abstract: We consider a fully-decentralized scenario in which no central trusted entity exists and all clients are honest-but-curious. The state-of-the-art approaches to this problem often rely on cryptographic protocols, such as multiparty computation (MPC), that require mapping real-valued data to a discrete alphabet, specifically a finite field. These approaches, however, can result in substantial accura… ▽ More We consider a fully-decentralized scenario in which no central trusted entity exists and all clients are honest-but-curious. The state-of-the-art approaches to this problem often rely on cryptographic protocols, such as multiparty computation (MPC), that require mapping real-valued data to a discrete alphabet, specifically a finite field. These approaches, however, can result in substantial accuracy losses due to computation overflows. To address this issue, we propose A-MPC, a private analog MPC protocol that performs all computations in the analog domain. We characterize the privacy of individual datasets in terms of $(ε, δ)$-local differential privacy, where the privacy of a single record in each client's dataset is guaranteed against other participants. In particular, we characterize the required noise variance in the Gaussian mechanism in terms of the required $(ε,δ)$-local differential privacy parameters by solving an optimization problem. Furthermore, compared with existing decentralized protocols, A-MPC keeps the privacy of individual datasets against the collusion of all other participants, thereby, in a notably significant improvement, increasing the maximum number of colluding clients tolerated in the protocol by a factor of three compared with the state-of-the-art collaborative learning protocols. Our experiments illustrate that the accuracy of the proposed $(ε,δ)$-locally differential private logistic regression and linear regression models trained in a fully-decentralized fashion using A-MPC closely follows that of a centralized one performed by a single trusted entity. △ Less

Submitted 18 October, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

arXiv:2308.12380 [pdf, other]

FG-Net: Facial Action Unit Detection with Generalizable Pyramidal Features

Authors: Yufeng Yin, Di Chang, Guoxian Song, Shen Sang, Tiancheng Zhi, Jing Liu, Linjie Luo, Mohammad Soleymani

Abstract: Automatic detection of facial Action Units (AUs) allows for objective facial expression analysis. Due to the high cost of AU labeling and the limited size of existing benchmarks, previous AU detection methods tend to overfit the dataset, resulting in a significant performance loss when evaluated across corpora. To address this problem, we propose FG-Net for generalizable facial action unit detecti… ▽ More Automatic detection of facial Action Units (AUs) allows for objective facial expression analysis. Due to the high cost of AU labeling and the limited size of existing benchmarks, previous AU detection methods tend to overfit the dataset, resulting in a significant performance loss when evaluated across corpora. To address this problem, we propose FG-Net for generalizable facial action unit detection. Specifically, FG-Net extracts feature maps from a StyleGAN2 model pre-trained on a large and diverse face image dataset. Then, these features are used to detect AUs with a Pyramid CNN Interpreter, making the training efficient and capturing essential local features. The proposed FG-Net achieves a strong generalization ability for heatmap-based AU detection thanks to the generalizable and semantic-rich features extracted from the pre-trained generative model. Extensive experiments are conducted to evaluate within- and cross-corpus AU detection with the widely-used DISFA and BP4D datasets. Compared with the state-of-the-art, the proposed method achieves superior cross-domain performance while maintaining competitive within-domain performance. In addition, FG-Net is data-efficient and achieves competitive performance even when trained on 1000 samples. Our code will be released at \url{https://github.com/ihp-lab/FG-Net} △ Less

Submitted 23 August, 2023; originally announced August 2023.

arXiv:2308.11078 [pdf, other]

Matrix Completion over Finite Fields: Bounds and Belief Propagation Algorithms

Authors: Mahdi Soleymani, Qiang Liu, Hessam Mahdavifar, Laura Balzano

Abstract: We consider the low rank matrix completion problem over finite fields. This problem has been extensively studied in the domain of real/complex numbers, however, to the best of authors' knowledge, there exists merely one efficient algorithm to tackle the problem in the binary field, due to Saunderson et al. [1]. In this paper, we improve upon the theoretical guarantees for the algorithm provided in… ▽ More We consider the low rank matrix completion problem over finite fields. This problem has been extensively studied in the domain of real/complex numbers, however, to the best of authors' knowledge, there exists merely one efficient algorithm to tackle the problem in the binary field, due to Saunderson et al. [1]. In this paper, we improve upon the theoretical guarantees for the algorithm provided in [1]. Furthermore, we formulate a new graphical model for the matrix completion problem over the finite field of size $q$, $\Bbb{F}_q$, and present a message passing (MP) based approach to solve this problem. The proposed algorithm is the first one for the considered matrix completion problem over finite fields of arbitrary size. Our proposed method has a significantly lower computational complexity, reducing it from $O(n^{2r+3})$ in [1] down to $O(n^2)$ (where, the underlying matrix has dimension $n \times n$ and $r$ denotes its rank), while also improving the performance. △ Less

Submitted 21 August, 2023; originally announced August 2023.

arXiv:2308.10713 [pdf, other]

LibreFace: An Open-Source Toolkit for Deep Facial Expression Analysis

Authors: Di Chang, Yufeng Yin, Zongjian Li, Minh Tran, Mohammad Soleymani

Abstract: Facial expression analysis is an important tool for human-computer interaction. In this paper, we introduce LibreFace, an open-source toolkit for facial expression analysis. This open-source toolbox offers real-time and offline analysis of facial behavior through deep learning models, including facial action unit (AU) detection, AU intensity estimation, and facial expression recognition. To accomp… ▽ More Facial expression analysis is an important tool for human-computer interaction. In this paper, we introduce LibreFace, an open-source toolkit for facial expression analysis. This open-source toolbox offers real-time and offline analysis of facial behavior through deep learning models, including facial action unit (AU) detection, AU intensity estimation, and facial expression recognition. To accomplish this, we employ several techniques, including the utilization of a large-scale pre-trained network, feature-wise knowledge distillation, and task-specific fine-tuning. These approaches are designed to effectively and accurately analyze facial expressions by leveraging visual information, thereby facilitating the implementation of real-time interactive applications. In terms of Action Unit (AU) intensity estimation, we achieve a Pearson Correlation Coefficient (PCC) of 0.63 on DISFA, which is 7% higher than the performance of OpenFace 2.0 while maintaining highly-efficient inference that runs two times faster than OpenFace 2.0. Despite being compact, our model also demonstrates competitive performance to state-of-the-art facial expression analysis methods on AffecNet, FFHQ, and RAF-DB. Our code will be released at https://github.com/ihp-lab/LibreFace △ Less

Submitted 23 August, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

Comments: 10 pages, 5 figures. Accepted by WACV 2024 Round 1. (Application Track) Project Page: https://boese0601.github.io/libreface/

arXiv:2308.02696 [pdf, other]

NOMA-based Improper Signaling for MIMO STAR-RIS-assisted Broadcast Channels with Hardware Impairments

Authors: Mohammad Soleymani, Ignacio Santamaria, Eduard Jorswieck

Abstract: This paper proposes schemes to improve the spectral efficiency of a multiple-input multiple-output (MIMO) broadcast channel (BC) with I/Q imbalance (IQI) at transceivers by employing a combination of improper Gaussian signaling (IGS), non-orthogonal multiple access (NOMA) and simultaneously transmit and reflect (STAR) reconfigurable intelligent surface (RIS). When there exists IQI, the output RF s… ▽ More This paper proposes schemes to improve the spectral efficiency of a multiple-input multiple-output (MIMO) broadcast channel (BC) with I/Q imbalance (IQI) at transceivers by employing a combination of improper Gaussian signaling (IGS), non-orthogonal multiple access (NOMA) and simultaneously transmit and reflect (STAR) reconfigurable intelligent surface (RIS). When there exists IQI, the output RF signal is a widely linear transformation of the input signal, which may make the output signal improper. To compensate for IQI, we employ IGS, thus generating a transmit improper signal. We show that IGS alongside with NOMA can highly increase the minimum rate of the users. Moreover, we propose schemes for different operational modes of STAR-RIS and show that STAR-RIS can significantly improve the system performance. Additionally, we show that IQI can highly degrade the performance especially if it is overlooked in the design. △ Less

Submitted 4 August, 2023; originally announced August 2023.

Comments: IEEE GLOBECOM 2023

arXiv:2307.10707 [pdf, other]

doi 10.1109/LSP.2023.3296902

SNR Maximization in Beyond Diagonal RIS-assisted Single and Multiple Antenna Links

Authors: Ignacio Santamaria, Mohammad Soleymani, Eduard Jorswieck, Jesus Gutierrez

Abstract: Reconfigurable intelligent surface (RIS) architectures not limited to diagonal phase shift matrices have recently been considered to increase their flexibility in shaping the wireless channel. One of these beyond-diagonal RIS or BD-RIS architectures leads to a unitary and symmetric RIS matrix. In this letter, we consider the problem of maximizing the signal-to-noise ratio (SNR) in single and multi… ▽ More Reconfigurable intelligent surface (RIS) architectures not limited to diagonal phase shift matrices have recently been considered to increase their flexibility in shaping the wireless channel. One of these beyond-diagonal RIS or BD-RIS architectures leads to a unitary and symmetric RIS matrix. In this letter, we consider the problem of maximizing the signal-to-noise ratio (SNR) in single and multiple antenna links assisted by a BD-RIS. The Max-SNR problem admits a closed-form solution based on the Takagi factorization of a certain complex and symmetric matrix. This allows us to solve the max-SNR problem for SISO, SIMO, and MISO channels. △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: 5 pages, 2 figures

Journal ref: IEEE Signal Processing Letters, 2023

arXiv:2307.05295 [pdf, other]

doi 10.1109/TWC.2023.3324190

Optimization of Rate-Splitting Multiple Access in Beyond Diagonal RIS-assisted URLLC Systems

Authors: Mohammad Soleymani, Ignacio Santamaria, Eduard Jorswieck, Bruno Clerckx

Abstract: This paper proposes a general optimization framework for rate splitting multiple access (RSMA) in beyond diagonal (BD) reconfigurable intelligent surface (RIS) assisted ultra-reliable low-latency communications (URLLC) systems. This framework can provide a suboptimal solution for a large family of optimization problems in which the objective and/or constraints are linear functions of the rates and… ▽ More This paper proposes a general optimization framework for rate splitting multiple access (RSMA) in beyond diagonal (BD) reconfigurable intelligent surface (RIS) assisted ultra-reliable low-latency communications (URLLC) systems. This framework can provide a suboptimal solution for a large family of optimization problems in which the objective and/or constraints are linear functions of the rates and/or energy efficiency (EE) of users. Using this framework, we show that RSMA and RIS can be mutually beneficial tools when the system is overloaded, i.e., when the number of users per cell is higher than the number of base station (BS) antennas. Additionally, we show that the benefits of RSMA increase when the packets are shorter and/or the reliability constraint is more stringent. Furthermore, we show that the RSMA benefits increase with the number of users per cell and decrease with the number of BS antennas. Finally, we show that RIS (either diagonal or BD) can highly improve the system performance, and BD-RIS outperforms regular RIS. △ Less

Submitted 13 October, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

Comments: Accepted at IEEE Transaction of Wireless Communications

arXiv:2306.01309 [pdf, other]

Energy-efficient Rate Splitting for MIMO STAR-RIS-assisted Broadcast Channels with I/Q Imbalance

Authors: Mohammad Soleymani, Ignacio Santamaria, Eduard Jorswieck

Abstract: This paper proposes an energy-efficient scheme for multicell multiple-input, multiple-output (MIMO) simultaneous transmit and reflect (STAR) reconfigurable intelligent surfaces (RIS)-assisted broadcast channels by employing rate splitting (RS) and improper Gaussian signaling (IGS). Regular RISs can only reflect signals. Thus, a regular RIS can assist only when the transmitter and receiver are in t… ▽ More This paper proposes an energy-efficient scheme for multicell multiple-input, multiple-output (MIMO) simultaneous transmit and reflect (STAR) reconfigurable intelligent surfaces (RIS)-assisted broadcast channels by employing rate splitting (RS) and improper Gaussian signaling (IGS). Regular RISs can only reflect signals. Thus, a regular RIS can assist only when the transmitter and receiver are in the reflection space of the RIS. However, a STAR-RIS can simultaneously transmit and reflect, thus providing a 360-degrees coverage. In this paper, we assume that transceivers may suffer from I/Q imbalance (IQI). To compensate for IQI, we employ IGS. Moreover, we employ RS to manage intracell interference. We show that RIS can significantly improve the energy efficiency (EE) of the system when RIS components are carefully optimized. Additionally, we show that STAR-RIS can significantly outperform a regular RIS when the regular RIS cannot cover all the users. We also show that RS can highly increase the EE comparing to treating interference as noise. △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: Accepted at the 31st European Signal Processing Conference (EUSIPCO 2023)

arXiv:2303.10590 [pdf, other]

Multi-modal Facial Action Unit Detection with Large Pre-trained Models for the 5th Competition on Affective Behavior Analysis in-the-wild

Authors: Yufeng Yin, Minh Tran, Di Chang, Xinrui Wang, Mohammad Soleymani

Abstract: Facial action unit detection has emerged as an important task within facial expression analysis, aimed at detecting specific pre-defined, objective facial expressions, such as lip tightening and cheek raising. This paper presents our submission to the Affective Behavior Analysis in-the-wild (ABAW) 2023 Competition for AU detection. We propose a multi-modal method for facial action unit detection w… ▽ More Facial action unit detection has emerged as an important task within facial expression analysis, aimed at detecting specific pre-defined, objective facial expressions, such as lip tightening and cheek raising. This paper presents our submission to the Affective Behavior Analysis in-the-wild (ABAW) 2023 Competition for AU detection. We propose a multi-modal method for facial action unit detection with visual, acoustic, and lexical features extracted from the large pre-trained models. To provide high-quality details for visual feature extraction, we apply super-resolution and face alignment to the training data and show potential performance gain. Our approach achieves the F1 score of 52.3% on the official validation set of the 5th ABAW Challenge. △ Less

Submitted 17 April, 2023; v1 submitted 19 March, 2023; originally announced March 2023.

Comments: 8 pages, 7 figures, 5 tables

arXiv:2303.03014 [pdf, other]

Interference Leakage Minimization in RIS-assisted MIMO Interference Channels

Authors: Ignacio Santamaria, Mohammad Soleymani, Eduard Jorswieck, Jesus Gutierrez

Abstract: We address the problem of interference leakage (IL) minimization in the $K$-user multiple-input multiple-output (MIMO) interference channel (IC) assisted by a reconfigurable intelligent surface (RIS). We describe an iterative algorithm based on block coordinate descent to minimize the IL cost function. A reformulation of the problem provides a geometric interpretation and shows interesting connect… ▽ More We address the problem of interference leakage (IL) minimization in the $K$-user multiple-input multiple-output (MIMO) interference channel (IC) assisted by a reconfigurable intelligent surface (RIS). We describe an iterative algorithm based on block coordinate descent to minimize the IL cost function. A reformulation of the problem provides a geometric interpretation and shows interesting connections with envelope precoding and phase-only zero-forcing beamforming problems. As a result of this analysis, we derive a set of necessary (but not sufficient) conditions for a phase-optimized RIS to be able to perfectly cancel the interference on the $K$-user MIMO IC. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: Accepted at ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

arXiv:2301.00594 [pdf, other]

Rate Region of MIMO RIS-assisted Broadcast Channels with Rate Splitting and Improper Signaling

Authors: Mohammad Soleymani, Ignacio Santamaria, Eduard Jorswieck

Abstract: In this paper, we study the achievable rate region of 1-layer rate splitting (RS) in the presence of hardware impairment (HWI) and improper Gaussian signaling (IGS) for a single-cell reconfigurable intelligent surface (RIS) assisted broadcast channel (BC). We assume that the transceivers may suffer from an imbalance in in-band and quadrature signals, which is known as I/Q imbalance (IQI). The rece… ▽ More In this paper, we study the achievable rate region of 1-layer rate splitting (RS) in the presence of hardware impairment (HWI) and improper Gaussian signaling (IGS) for a single-cell reconfigurable intelligent surface (RIS) assisted broadcast channel (BC). We assume that the transceivers may suffer from an imbalance in in-band and quadrature signals, which is known as I/Q imbalance (IQI). The received signal and noise can be improper when there exists IQI. Therefore, we employ IGS to compensate for IQI as well as to manage interference. Our results show that RS and RIS can significantly enlarge the rate region, where the role of RS is to manage interference while RIS mainly improves the coverage. △ Less

Submitted 2 January, 2023; originally announced January 2023.

arXiv:2301.00583 [pdf, other]

doi 10.1109/ACCESS.2023.3294092

Spectral and Energy Efficiency Maximization of MISO STAR-RIS-assisted URLLC Systems

Authors: Mohammad Soleymani, Ignacio Santamaria, Eduard Jorswieck

Abstract: This paper proposes a general optimization framework to improve the spectral and energy efficiency (EE) of ultra-reliable low-latency communication (URLLC) simultaneous-transfer-and-receive (STAR) reconfigurable intelligent surface (RIS)-assisted interference-limited systems with finite block length (FBL). This framework can solve a large variety of optimization problems in which the objective and… ▽ More This paper proposes a general optimization framework to improve the spectral and energy efficiency (EE) of ultra-reliable low-latency communication (URLLC) simultaneous-transfer-and-receive (STAR) reconfigurable intelligent surface (RIS)-assisted interference-limited systems with finite block length (FBL). This framework can solve a large variety of optimization problems in which the objective and/or constraints are linear functions of the rates and/or EE of users. Additionally, the framework can be applied to any interference-limited system with treating interference as noise as the decoding strategy at receivers. We consider a multi-cell broadcast channel as an example and show how this framework can be specialized to solve the minimum-weighted rate, weighted sum rate, global EE and weighted EE of the system. We make realistic assumptions regarding the (STAR-)RIS by considering three different feasibility sets for the components of either regular RIS or STAR-RIS. Our results show that RIS can substantially increase the spectral and EE of URLLC systems if the reflecting coefficients are properly optimized. Moreover, we consider three different transmission strategies for STAR-RIS as energy splitting (ES), mode switching (MS), and time switching (TS). We show that STAR-RIS can outperform a regular RIS when the regular RIS cannot cover all the users. Furthermore, it is shown that the ES scheme outperforms the MS and TS schemes. △ Less

Submitted 10 July, 2023; v1 submitted 2 January, 2023; originally announced January 2023.

Comments: Accepted at IEEE ACCESS

arXiv:2210.15760 [pdf]

Towards Improving Workers' Safety and Progress Monitoring of Construction Sites Through Construction Site Understanding

Authors: Mahdi Bonyani, Maryam Soleymani

Abstract: An important component of computer vision research is object detection. In recent years, there has been tremendous progress in the study of construction site images. However, there are obvious problems in construction object detection, including complex backgrounds, varying-sized objects, and poor imaging quality. In the state-of-the-art approaches, elaborate attention mechanisms are developed to… ▽ More An important component of computer vision research is object detection. In recent years, there has been tremendous progress in the study of construction site images. However, there are obvious problems in construction object detection, including complex backgrounds, varying-sized objects, and poor imaging quality. In the state-of-the-art approaches, elaborate attention mechanisms are developed to handle space-time features, but rarely address the importance of channel-wise feature adjustments. We propose a lightweight Optimized Positioning (OP) module to improve channel relation based on global feature affinity association, which can be used to determine the Optimized weights adaptively for each channel. OP first computes the intermediate optimized position by comparing each channel with the remaining channels for a given set of feature maps. A weighted aggregation of all the channels will then be used to represent each channel. The OP-Net module is a general deep neural network module that can be plugged into any deep neural network. Algorithms that utilize deep learning have demonstrated their ability to identify a wide range of objects from images nearly in real time. Machine intelligence can potentially benefit the construction industry by automatically analyzing productivity and monitoring safety using algorithms that are linked to construction images. The benefits of on-site automatic monitoring are immense when it comes to hazard prevention. Construction monitoring tasks can also be automated once construction objects have been correctly recognized. Object detection task in construction site images is experimented with extensively to demonstrate its efficacy and effectiveness. A benchmark test using SODA demonstrated that our OP-Net was capable of achieving new state-of-the-art performance in accuracy while maintaining a reasonable computational overhead. △ Less

Submitted 27 October, 2022; originally announced October 2022.

arXiv:2208.08753 [pdf, ps, other]

doi 10.1109/TVT.2022.3222633

Rate Splitting in MIMO RIS-assisted Systems with Hardware Impairments and Improper Signaling

Authors: Mohammad Soleymani, Ignacio Santamaria, Eduard Jorswieck

Abstract: In this paper, we propose an optimization framework for rate splitting (RS) techniques in multiple-input multiple-output (MIMO) reconfigurable intelligent surface (RIS)-assisted systems, possibly with I/Q imbalance (IQI). This framework can be applied to any optimization problem in which the objective and/or constraints are linear functions of the rates and/or transmit covariance matrices. Such pr… ▽ More In this paper, we propose an optimization framework for rate splitting (RS) techniques in multiple-input multiple-output (MIMO) reconfigurable intelligent surface (RIS)-assisted systems, possibly with I/Q imbalance (IQI). This framework can be applied to any optimization problem in which the objective and/or constraints are linear functions of the rates and/or transmit covariance matrices. Such problems include minimum-weighted and weighted-sum rate maximization, total power minimization for a target rate, minimum-weighted energy efficiency (EE) and global EE maximization. The framework may be applied to any interference-limited system with hardware impairments. For the sake of illustration, we consider a multicell MIMO RIS-assisted broadcast channel (BC) in which the base stations (BSs) and/or the users may suffer from IQI. Since IQI generates improper noise, we consider improper Gaussian signaling (IGS) as an interference-management technique that can additionally compensate for IQI. We show that RS when combined with IGS can substantially improve the spectral and energy efficiency of overloaded networks (i.e., when the number of users per cell is larger than the number of transmit/receive antennas). △ Less

Submitted 15 November, 2022; v1 submitted 18 August, 2022; originally announced August 2022.

Comments: accepted at IEEE Transaction on Vehicular Technology

arXiv:2208.08087 [pdf]

Autonomous Resource Management in Construction Companies Using Deep Reinforcement Learning Based on IoT

Authors: Maryam Soleymani, Mahdi Bonyani, Meghdad Attarzadeh

Abstract: Resource allocation is one of the most critical issues in planning construction projects, due to its direct impact on cost, time, and quality. There are usually specific allocation methods for autonomous resource management according to the projects objectives. However, integrated planning and optimization of utilizing resources in an entire construction organization are scarce. The purpose of thi… ▽ More Resource allocation is one of the most critical issues in planning construction projects, due to its direct impact on cost, time, and quality. There are usually specific allocation methods for autonomous resource management according to the projects objectives. However, integrated planning and optimization of utilizing resources in an entire construction organization are scarce. The purpose of this study is to present an automatic resource allocation structure for construction companies based on Deep Reinforcement Learning (DRL), which can be used in various situations. In this structure, Data Harvesting (DH) gathers resource information from the distributed Internet of Things (IoT) sensor devices all over the companys projects to be employed in the autonomous resource management approach. Then, Coverage Resources Allocation (CRA) is compared to the information obtained from DH in which the Autonomous Resource Management (ARM) determines the project of interest. Likewise, Double Deep Q-Networks (DDQNs) with similar models are trained on two distinct assignment situations based on structured resource information of the company to balance objectives with resource constraints. The suggested technique in this paper can efficiently adjust to large resource management systems by combining portfolio information with adopted individual project information. Also, the effects of important information processing parameters on resource allocation performance are analyzed in detail. Moreover, the results of the generalizability of management approaches are presented, indicating no need for additional training when the variables of situations change. △ Less

Submitted 6 September, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

arXiv:2206.03795 [pdf, other]

doi 10.1109/TSP.2023.3259145

NOMA-based Improper Signaling for Multicell MISO RIS-assisted Broadcast Channels

Authors: Mohammad Soleymani, Ignacio Santamaria, Eduard Jorswieck, Sepehr Rezvani

Abstract: In this paper, we study the performance of reconfigurable intelligent surfaces (RISs) in a multicell broadcast channel (BC) that employs improper Gaussian signaling (IGS) jointly with non-orthogonal multiple access (NOMA) to optimize either the minimum-weighted rate or the energy efficiency (EE) of the network. We show that although the RIS can significantly improve the system performance, it cann… ▽ More In this paper, we study the performance of reconfigurable intelligent surfaces (RISs) in a multicell broadcast channel (BC) that employs improper Gaussian signaling (IGS) jointly with non-orthogonal multiple access (NOMA) to optimize either the minimum-weighted rate or the energy efficiency (EE) of the network. We show that although the RIS can significantly improve the system performance, it cannot mitigate interference completely, so we have to employ other interference-management techniques to further improve performance. We show that the proposed NOMA-based IGS scheme can substantially outperform proper Gaussian signaling (PGS) and IGS schemes that treat interference as noise (TIN) in particular when the number of users per cell is larger than the number of base station (BS) antennas (referred to as overloaded networks). In other words, IGS and NOMA complement to each other as interference management techniques in multicell RIS-assisted BCs. Furthermore, we consider three different feasibility sets for the RIS components showing that even a RIS with a small number of elements provides considerable gains for all the feasibility sets. △ Less

Submitted 15 March, 2023; v1 submitted 8 June, 2022; originally announced June 2022.

Comments: Accepted at IEEE Transactions on Signal Processing

arXiv:2203.14171 [pdf, other]

A Speech Representation Anonymization Framework via Selective Noise Perturbation

Authors: Minh Tran, Mohammad Soleymani

Abstract: Privacy and security are major concerns when communicating speech signals to cloud services such as automatic speech recognition (ASR) and speech emotion recognition (SER). Existing solutions for speech anonymization mainly focus on voice conversion or voice modification to convert a raw utterance into another one with similar content but different, or no, identity-related information. However, an… ▽ More Privacy and security are major concerns when communicating speech signals to cloud services such as automatic speech recognition (ASR) and speech emotion recognition (SER). Existing solutions for speech anonymization mainly focus on voice conversion or voice modification to convert a raw utterance into another one with similar content but different, or no, identity-related information. However, an alternative approach to share speech data under the form of privacy-preserving representation has been largely under-explored. In this paper, we propose a speech anonymization framework that achieves privacy via noise perturbation to a selected subset of the high-utility representations extracted using a pre-trained speech encoder. The subset is chosen with a Transformer-based privacy-risk saliency estimator. We validate our framework on four tasks, namely, Automatic Speaker Verification (ASV), ASR, SER and Intent Classification (IC) for privacy and utility assessment. Experimental results show that our approach is able to achieve a competitive, or even better, utility compared to the speech anonymization baselines from the VoicePrivacy2022 Challenges, providing the same level of privacy. Moreover, the easily-controlled amount of perturbation allows our framework to have a flexible range of privacy-utility trade-offs without re-training any component. △ Less

Submitted 27 October, 2022; v1 submitted 26 March, 2022; originally announced March 2022.

arXiv:2202.09914 [pdf, other]

SOInter: A Novel Deep Energy Based Interpretation Method for Explaining Structured Output Models

Authors: S. Fatemeh Seyyedsalehi, Mahdieh Soleymani, Hamid R. Rabiee

Abstract: We propose a novel interpretation technique to explain the behavior of structured output models, which learn mappings between an input vector to a set of output variables simultaneously. Because of the complex relationship between the computational path of output variables in structured models, a feature can affect the value of output through other ones. We focus on one of the outputs as the targe… ▽ More We propose a novel interpretation technique to explain the behavior of structured output models, which learn mappings between an input vector to a set of output variables simultaneously. Because of the complex relationship between the computational path of output variables in structured models, a feature can affect the value of output through other ones. We focus on one of the outputs as the target and try to find the most important features utilized by the structured model to decide on the target in each locality of the input space. In this paper, we assume an arbitrary structured output model is available as a black box and argue how considering the correlations between output variables can improve the explanation performance. The goal is to train a function as an interpreter for the target output variable over the input space. We introduce an energy-based training process for the interpreter function, which effectively considers the structural information incorporated into the model to be explained. The effectiveness of the proposed method is confirmed using a variety of simulated and real data sets. △ Less

Submitted 20 February, 2022; originally announced February 2022.

arXiv:2201.09165 [pdf, other]

A Pre-trained Audio-Visual Transformer for Emotion Recognition

Authors: Minh Tran, Mohammad Soleymani

Abstract: In this paper, we introduce a pretrained audio-visual Transformer trained on more than 500k utterances from nearly 4000 celebrities from the VoxCeleb2 dataset for human behavior understanding. The model aims to capture and extract useful information from the interactions between human facial and auditory behaviors, with application in emotion recognition. We evaluate the model performance on two d… ▽ More In this paper, we introduce a pretrained audio-visual Transformer trained on more than 500k utterances from nearly 4000 celebrities from the VoxCeleb2 dataset for human behavior understanding. The model aims to capture and extract useful information from the interactions between human facial and auditory behaviors, with application in emotion recognition. We evaluate the model performance on two datasets, namely CREMAD-D (emotion classification) and MSP-IMPROV (continuous emotion regression). Experimental results show that fine-tuning the pre-trained model helps improving emotion classification accuracy by 5-7% and Concordance Correlation Coefficients (CCC) in continuous emotion recognition by 0.03-0.09 compared to the same model trained from scratch. We also demonstrate the robustness of finetuning the pre-trained model in a low-resource setting. With only 10% of the original training set provided, fine-tuning the pre-trained model can lead to at least 10% better emotion recognition accuracy and a CCC score improvement by at least 0.1 for continuous emotion recognition. △ Less

Submitted 22 January, 2022; originally announced January 2022.

Comments: Accepted by IEEE ICASSP 2022

arXiv:2109.09868 [pdf, other]

ApproxIFER: A Model-Agnostic Approach to Resilient and Robust Prediction Serving Systems

Authors: Mahdi Soleymani, Ramy E. Ali, Hessam Mahdavifar, A. Salman Avestimehr

Abstract: Due to the surge of cloud-assisted AI services, the problem of designing resilient prediction serving systems that can effectively cope with stragglers/failures and minimize response delays has attracted much interest. The common approach for tackling this problem is replication which assigns the same prediction task to multiple workers. This approach, however, is very inefficient and incurs signi… ▽ More Due to the surge of cloud-assisted AI services, the problem of designing resilient prediction serving systems that can effectively cope with stragglers/failures and minimize response delays has attracted much interest. The common approach for tackling this problem is replication which assigns the same prediction task to multiple workers. This approach, however, is very inefficient and incurs significant resource overheads. Hence, a learning-based approach known as parity model (ParM) has been recently proposed which learns models that can generate parities for a group of predictions in order to reconstruct the predictions of the slow/failed workers. While this learning-based approach is more resource-efficient than replication, it is tailored to the specific model hosted by the cloud and is particularly suitable for a small number of queries (typically less than four) and tolerating very few (mostly one) number of stragglers. Moreover, ParM does not handle Byzantine adversarial workers. We propose a different approach, named Approximate Coded Inference (ApproxIFER), that does not require training of any parity models, hence it is agnostic to the model hosted by the cloud and can be readily applied to different data domains and model architectures. Compared with earlier works, ApproxIFER can handle a general number of stragglers and scales significantly better with the number of queries. Furthermore, ApproxIFER is robust against Byzantine workers. Our extensive experiments on a large number of datasets and model architectures also show significant accuracy improvement by up to 58% over the parity model approaches. △ Less

Submitted 20 September, 2021; originally announced September 2021.

arXiv:2109.05056 [pdf, other]

Speaker Turn Modeling for Dialogue Act Classification

Authors: Zihao He, Leili Tavabi, Kristina Lerman, Mohammad Soleymani

Abstract: Dialogue Act (DA) classification is the task of classifying utterances with respect to the function they serve in a dialogue. Existing approaches to DA classification model utterances without incorporating the turn changes among speakers throughout the dialogue, therefore treating it no different than non-interactive written text. In this paper, we propose to integrate the turn changes in conversa… ▽ More Dialogue Act (DA) classification is the task of classifying utterances with respect to the function they serve in a dialogue. Existing approaches to DA classification model utterances without incorporating the turn changes among speakers throughout the dialogue, therefore treating it no different than non-interactive written text. In this paper, we propose to integrate the turn changes in conversations among speakers when modeling DAs. Specifically, we learn conversation-invariant speaker turn embeddings to represent the speaker turns in a conversation; the learned speaker turn embeddings are then merged with the utterance embeddings for the downstream task of DA classification. With this simple yet effective mechanism, our model is able to capture the semantics from the dialogue content while accounting for different speaker turns in a conversation. Validation on three benchmark public datasets demonstrates superior performance of our model. △ Less

Submitted 10 September, 2021; originally announced September 2021.

arXiv:2108.09934 [pdf, other]

Modeling Dynamics of Facial Behavior for Mental Health Assessment

Authors: Minh Tran, Ellen Bradley, Michelle Matvey, Joshua Woolley, Mohammad Soleymani

Abstract: Facial action unit (FAU) intensities are popular descriptors for the analysis of facial behavior. However, FAUs are sparsely represented when only a few are activated at a time. In this study, we explore the possibility of representing the dynamics of facial expressions by adopting algorithms used for word representation in natural language processing. Specifically, we perform clustering on a larg… ▽ More Facial action unit (FAU) intensities are popular descriptors for the analysis of facial behavior. However, FAUs are sparsely represented when only a few are activated at a time. In this study, we explore the possibility of representing the dynamics of facial expressions by adopting algorithms used for word representation in natural language processing. Specifically, we perform clustering on a large dataset of temporal facial expressions with 5.3M frames before applying the Global Vector representation (GloVe) algorithm to learn the embeddings of the facial clusters. We evaluate the usefulness of our learned representations on two downstream tasks: schizophrenia symptom estimation and depression severity regression. These experimental results show the potential effectiveness of our approach for improving the assessment of mental health symptoms over baseline models that use FAU intensities alone. △ Less

Submitted 23 August, 2021; originally announced August 2021.

Comments: Accepted to FG 2021

arXiv:2108.09527 [pdf]

Construction material classification on imbalanced datasets using Vision Transformer (ViT) architecture

Authors: Maryam Soleymani, Mahdi Bonyani, Hadi Mahami, Farnad Nasirzadeh

Abstract: This research proposes a reliable model for identifying different construction materials with the highest accuracy, which is exploited as an advantageous tool for a wide range of construction applications such as automated progress monitoring. In this study, a novel deep learning architecture called Vision Transformer (ViT) is used for detecting and classifying construction materials. The robustne… ▽ More This research proposes a reliable model for identifying different construction materials with the highest accuracy, which is exploited as an advantageous tool for a wide range of construction applications such as automated progress monitoring. In this study, a novel deep learning architecture called Vision Transformer (ViT) is used for detecting and classifying construction materials. The robustness of the employed method is assessed by utilizing different image datasets. For this purpose, the model is trained and tested on two large imbalanced datasets, namely Construction Material Library (CML) and Building Material Dataset (BMD). A third dataset is also generated by combining CML and BMD to create a more imbalanced dataset and assess the capabilities of the utilized method. The achieved results reveal an accuracy of 100 percent in evaluation metrics such as accuracy, precision, recall rate, and f1-score for each material category of three different datasets. It is believed that the suggested model accomplishes a robust tool for detecting and classifying different material types. To date, a number of studies have attempted to automatically classify a variety of building materials, which still have some errors. This research will address the mentioned shortcoming and proposes a model to detect the material type with higher accuracy. The employed model is also capable of being generalized to different datasets. △ Less

Submitted 6 September, 2022; v1 submitted 21 August, 2021; originally announced August 2021.

Comments: 18 pages, 11 figures, 7 tables

arXiv:2104.11135 [pdf, other]

doi 10.1109/MCOMSTD.001.2000070

V2X in 3GPP Standardization: NR Sidelink in Rel-16 and Beyond

Authors: Mehdi Harounabadi, Dariush Mohammad Soleymani, Shubhangi Bhadauria, Martin Leyh, Elke Roth-Mandutz

Abstract: The 5G mobile network brings several new features that can be applied to existing and new applications. High reliability, low latency, and high data rate are some of the features which fulfill the requirements of vehicular networks. Vehicular networks aim to provide safety for road users and several additional advantages such as enhanced traffic efficiency and in-vehicle infotainment services. Thi… ▽ More The 5G mobile network brings several new features that can be applied to existing and new applications. High reliability, low latency, and high data rate are some of the features which fulfill the requirements of vehicular networks. Vehicular networks aim to provide safety for road users and several additional advantages such as enhanced traffic efficiency and in-vehicle infotainment services. This paper summarizes the most important aspects of NR-V2X, which is standardized by 3GPP, focusing on sidelink communication. The main part of this work belongs to the 3GPP Rel-16, which is the first 3GPP release for NR-V2X, and the work/study items of the future Rel-17 △ Less

Submitted 22 April, 2021; originally announced April 2021.

arXiv:2103.01503 [pdf, other]

Coded Computing via Binary Linear Codes: Designs and Performance Limits

Authors: Mahdi Soleymani, Mohammad Vahid Jamali, Hessam Mahdavifar

Abstract: We consider the problem of coded distributed computing where a large linear computational job, such as a matrix multiplication, is divided into $k$ smaller tasks, encoded using an $(n,k)$ linear code, and performed over $n$ distributed nodes. The goal is to reduce the average execution time of the computational job. We provide a connection between the problem of characterizing the average executio… ▽ More We consider the problem of coded distributed computing where a large linear computational job, such as a matrix multiplication, is divided into $k$ smaller tasks, encoded using an $(n,k)$ linear code, and performed over $n$ distributed nodes. The goal is to reduce the average execution time of the computational job. We provide a connection between the problem of characterizing the average execution time of a coded distributed computing system and the problem of analyzing the error probability of codes of length $n$ used over erasure channels. Accordingly, we present closed-form expressions for the execution time using binary random linear codes and the best execution time any linear-coded distributed computing system can achieve. It is also shown that there exist \textit{good} binary linear codes that not only attain (asymptotically) the best performance that any linear code (not necessarily binary) can achieve but also are numerically stable against the inevitable rounding errors in practice. We then develop a low-complexity algorithm for decoding Reed-Muller (RM) codes over erasure channels. Our decoder only involves additions, subtractions, {and inversion of relatively small matrices of dimensions at most $\log n+1$}, and enables coded computation over real-valued data. Extensive numerical analysis of the fundamental results as well as RM- and polar-coded computing schemes demonstrate the excellence of the RM-coded computation in achieving close-to-optimal performance while having a low-complexity decoding and explicit construction. The proposed framework in this paper enables efficient designs of distributed computing systems given the rich literature in the channel coding theory. △ Less

Submitted 4 October, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

Comments: Accepted for publication in IEEE Journal on Selected Areas in Information Theory. arXiv admin note: substantial text overlap with arXiv:1906.10105

arXiv:2101.11653 [pdf, other]

List-Decodable Coded Computing: Breaking the Adversarial Toleration Barrier

Authors: Mahdi Soleymani, Ramy E. Ali, Hessam Mahdavifar, A. Salman Avestimehr

Abstract: We consider the problem of coded computing, where a computational task is performed in a distributed fashion in the presence of adversarial workers. We propose techniques to break the adversarial toleration threshold barrier previously known in coded computing. More specifically, we leverage list-decoding techniques for folded Reed-Solomon codes and propose novel algorithms to recover the correct… ▽ More We consider the problem of coded computing, where a computational task is performed in a distributed fashion in the presence of adversarial workers. We propose techniques to break the adversarial toleration threshold barrier previously known in coded computing. More specifically, we leverage list-decoding techniques for folded Reed-Solomon codes and propose novel algorithms to recover the correct codeword using side information. In the coded computing setting, we show how the master node can perform certain carefully designed extra computations to obtain the side information. The workload of computing this side information is negligible compared to the computations done by each worker. This side information is then utilized to prune the output of the list decoder and uniquely recover the true outcome. We further propose folded Lagrange coded computing (FLCC) to incorporate the developed techniques into a specific coded computing setting. Our results show that FLCC outperforms LCC by breaking the barrier on the number of adversaries that can be tolerated. In particular, the corresponding threshold in FLCC is improved by a factor of two asymptotically compared to that of LCC. △ Less

Submitted 19 August, 2021; v1 submitted 27 January, 2021; originally announced January 2021.

arXiv:2011.00403 [pdf, other]

Towards A Friendly Online Community: An Unsupervised Style Transfer Framework for Profanity Redaction

Authors: Minh Tran, Yipeng Zhang, Mohammad Soleymani

Abstract: Offensive and abusive language is a pressing problem on social media platforms. In this work, we propose a method for transforming offensive comments, statements containing profanity or offensive language, into non-offensive ones. We design a RETRIEVE, GENERATE and EDIT unsupervised style transfer pipeline to redact the offensive comments in a word-restricted manner while maintaining a high level… ▽ More Offensive and abusive language is a pressing problem on social media platforms. In this work, we propose a method for transforming offensive comments, statements containing profanity or offensive language, into non-offensive ones. We design a RETRIEVE, GENERATE and EDIT unsupervised style transfer pipeline to redact the offensive comments in a word-restricted manner while maintaining a high level of fluency and preserving the content of the original text. We extensively evaluate our method's performance and compare it to previous style transfer models using both automatic metrics and human evaluations. Experimental results show that our method outperforms other models on human evaluations and is the only approach that consistently performs well on all automatic evaluation metrics. △ Less

Submitted 31 October, 2020; originally announced November 2020.

Comments: COLING 2020

arXiv:2008.08565 [pdf, other]

Analog Lagrange Coded Computing

Authors: Mahdi Soleymani, Hessam Mahdavifar, A. Salman Avestimehr

Abstract: A distributed computing scenario is considered, where the computational power of a set of worker nodes is used to perform a certain computation task over a dataset that is dispersed among the workers. Lagrange coded computing (LCC), proposed by Yu et al., leverages the well-known Lagrange polynomial to perform polynomial evaluation of the dataset in such a scenario in an efficient parallel fashion… ▽ More A distributed computing scenario is considered, where the computational power of a set of worker nodes is used to perform a certain computation task over a dataset that is dispersed among the workers. Lagrange coded computing (LCC), proposed by Yu et al., leverages the well-known Lagrange polynomial to perform polynomial evaluation of the dataset in such a scenario in an efficient parallel fashion while keeping the privacy of data amidst possible collusion of workers. This solution relies on quantizing the data into a finite field, so that Shamir's secret sharing, as one of its main building blocks, can be employed. Such a solution, however, is not properly scalable with the size of dataset, mainly due to computation overflows. To address such a critical issue, we propose a novel extension of LCC to the analog domain, referred to as analog LCC (ALCC). All the operations in the proposed ALCC protocol are done over the infinite fields of R/C but for practical implementations floating-point numbers are used. We characterize the privacy of data in ALCC, against any subset of colluding workers up to a certain size, in terms of the distinguishing security (DS) and the mutual information security (MIS) metrics. Also, the accuracy of outcome is characterized in a practical setting assuming operations are performed using floating-point numbers. Consequently, a fundamental trade-off between the accuracy of the outcome of ALCC and its privacy level is observed and is numerically evaluated. Moreover, we implement the proposed scheme to perform matrix-matrix multiplication over a batch of matrices. It is observed that ALCC is superior compared to the state-of-the-art LCC, implemented using fixed-point numbers, assuming both schemes use an equal number of bits to represent data symbols. △ Less

Submitted 29 January, 2021; v1 submitted 19 August, 2020; originally announced August 2020.

arXiv:2007.08803 [pdf, other]

Privacy-Preserving Distributed Learning in the Analog Domain

Authors: Mahdi Soleymani, Hessam Mahdavifar, A. Salman Avestimehr

Abstract: We consider the critical problem of distributed learning over data while keeping it private from the computational servers. The state-of-the-art approaches to this problem rely on quantizing the data into a finite field, so that the cryptographic approaches for secure multiparty computing can then be employed. These approaches, however, can result in substantial accuracy losses due to fixed-point… ▽ More We consider the critical problem of distributed learning over data while keeping it private from the computational servers. The state-of-the-art approaches to this problem rely on quantizing the data into a finite field, so that the cryptographic approaches for secure multiparty computing can then be employed. These approaches, however, can result in substantial accuracy losses due to fixed-point representation of the data and computation overflows. To address these critical issues, we propose a novel algorithm to solve the problem when data is in the analog domain, e.g., the field of real/complex numbers. We characterize the privacy of the data from both information-theoretic and cryptographic perspectives, while establishing a connection between the two notions in the analog domain. More specifically, the well-known connection between the distinguishing security (DS) and the mutual information security (MIS) metrics is extended from the discrete domain to the continues domain. This is then utilized to bound the amount of information about the data leaked to the servers in our protocol, in terms of the DS metric, using well-known results on the capacity of single-input multiple-output (SIMO) channel with correlated noise. It is shown how the proposed framework can be adopted to do computation tasks when data is represented using floating-point numbers. We then show that this leads to a fundamental trade-off between the privacy level of data and accuracy of the result. As an application, we also show how to train a machine learning model while keeping the data as well as the trained model private. Then numerical results are shown for experiments on the MNIST dataset. Furthermore, experimental advantages are shown comparing to fixed-point implementations over finite fields. △ Less

Submitted 17 July, 2020; originally announced July 2020.

arXiv:2001.10403 [pdf, ps, other]

Improper Gaussian Signaling for the $K$-user MIMO Interference Channels with Hardware Impairments

Authors: Mohammad Soleymani, Ignacio Santamaria, Peter J. Schreier

Abstract: This paper investigates the performance of improper Gaussian signaling (IGS) for the $K$-user multiple-input, multiple-output (MIMO) interference channel (IC) with hardware impairments (HWI). HWI may arise due to imperfections in the devices like I/Q imbalance, phase noise, etc. With I/Q imbalance, the received signal is a widely linear transformation of the transmitted signal and noise. Thus, the… ▽ More This paper investigates the performance of improper Gaussian signaling (IGS) for the $K$-user multiple-input, multiple-output (MIMO) interference channel (IC) with hardware impairments (HWI). HWI may arise due to imperfections in the devices like I/Q imbalance, phase noise, etc. With I/Q imbalance, the received signal is a widely linear transformation of the transmitted signal and noise. Thus, the effective noise at the receivers becomes improper, which means that its real and imaginary parts are correlated and/or have unequal powers. IGS can improve system performance with improper noise and/or improper interference. In this paper, we study the benefits of IGS for this scenario in terms of two performance metrics: achievable rate and energy efficiency (EE). We consider the rate region, the sum-rate, the EE region and the global EE optimization problems to fully evaluate the IGS performance. To solve these non-convex problems, we employ an optimization framework based on majorization-minimization algorithms, which allow us to obtain a stationary point of any optimization problem in which either the objective function and/or constraints are linear functions of rates. Our numerical results show that IGS can significantly improve the performance of the $K$-user MIMO IC with HWI and I/Q imbalance, where its benefits increase with the number of users, $K$, and the imbalance level, and decrease with the number of antennas. △ Less

Submitted 6 August, 2020; v1 submitted 28 January, 2020; originally announced January 2020.

Comments: accepted

Journal ref: Transaction on Vehicular Technology 2020

arXiv:1911.05609 [pdf, other]

doi 10.1145/3363560

Affective Computing for Large-Scale Heterogeneous Multimedia Data: A Survey

Authors: Sicheng Zhao, Shangfei Wang, Mohammad Soleymani, Dhiraj Joshi, Qiang Ji

Abstract: The wide popularity of digital photography and social networks has generated a rapidly growing volume of multimedia data (i.e., image, music, and video), resulting in a great demand for managing, retrieving, and understanding these data. Affective computing (AC) of these data can help to understand human behaviors and enable wide applications. In this article, we survey the state-of-the-art AC tec… ▽ More The wide popularity of digital photography and social networks has generated a rapidly growing volume of multimedia data (i.e., image, music, and video), resulting in a great demand for managing, retrieving, and understanding these data. Affective computing (AC) of these data can help to understand human behaviors and enable wide applications. In this article, we survey the state-of-the-art AC technologies comprehensively for large-scale heterogeneous multimedia data. We begin this survey by introducing the typical emotion representation models from psychology that are widely employed in AC. We briefly describe the available datasets for evaluating AC algorithms. We then summarize and compare the representative methods on AC of different multimedia types, i.e., images, music, videos, and multimodal data, with the focus on both handcrafted features-based methods and deep learning methods. Finally, we discuss some challenges and future directions for multimedia affective computing. △ Less

Submitted 3 October, 2019; originally announced November 2019.

Comments: Accepted by ACM TOMM

arXiv:1909.07533 [pdf, other]

Analog Subspace Coding: A New Approach to Coding for Non-Coherent Wireless Networks

Authors: Mahdi Soleymani, Hessam Mahdavifar

Abstract: We provide a novel framework to study subspace codes for non-coherent communications in wireless networks. To this end, an analog operator channel is defined with inputs and outputs being subspaces of $\mathbb{C}^n$. Then a certain distance is defined to capture the performance of subspace codes in terms of their capability to recover from interference and rank-deficiency of the network. We also s… ▽ More We provide a novel framework to study subspace codes for non-coherent communications in wireless networks. To this end, an analog operator channel is defined with inputs and outputs being subspaces of $\mathbb{C}^n$. Then a certain distance is defined to capture the performance of subspace codes in terms of their capability to recover from interference and rank-deficiency of the network. We also study the robustness of the proposed model with respect to an additive noise. Furthermore, we propose a new approach to construct subspace codes in the analog domain, also regarded as Grassmann codes, by leveraging polynomial evaluations over finite fields together with characters associated to finite fields that map their elements to the unit circle in the complex plane. The constructed codes, referred to as character-polynomial (CP) codes, are shown to perform better comparing to other existing constructions of Grassmann codes in terms of the trade-off between the rate and the normalized minimum distance, for a wide range of values for $n$. △ Less

Submitted 28 January, 2022; v1 submitted 16 September, 2019; originally announced September 2019.

arXiv:1907.11510 [pdf, ps, other]

AVEC 2019 Workshop and Challenge: State-of-Mind, Detecting Depression with AI, and Cross-Cultural Affect Recognition

Authors: Fabien Ringeval, Björn Schuller, Michel Valstar, NIcholas Cummins, Roddy Cowie, Leili Tavabi, Maximilian Schmitt, Sina Alisamir, Shahin Amiriparian, Eva-Maria Messner, Siyang Song, Shuo Liu, Ziping Zhao, Adria Mallol-Ragolta, Zhao Ren, Mohammad Soleymani, Maja Pantic

Abstract: The Audio/Visual Emotion Challenge and Workshop (AVEC 2019) "State-of-Mind, Detecting Depression with AI, and Cross-cultural Affect Recognition" is the ninth competition event aimed at the comparison of multimedia processing and machine learning methods for automatic audiovisual health and emotion analysis, with all participants competing strictly under the same conditions. The goal of the Challen… ▽ More The Audio/Visual Emotion Challenge and Workshop (AVEC 2019) "State-of-Mind, Detecting Depression with AI, and Cross-cultural Affect Recognition" is the ninth competition event aimed at the comparison of multimedia processing and machine learning methods for automatic audiovisual health and emotion analysis, with all participants competing strictly under the same conditions. The goal of the Challenge is to provide a common benchmark test set for multimodal information processing and to bring together the health and emotion recognition communities, as well as the audiovisual processing communities, to compare the relative merits of various approaches to health and emotion recognition from real-life data. This paper presents the major novelties introduced this year, the challenge guidelines, the data used, and the performance of the baseline systems on the three proposed tasks: state-of-mind recognition, depression assessment with AI, and cross-cultural affect sensing, respectively. △ Less

Submitted 10 July, 2019; originally announced July 2019.

arXiv:1906.10105 [pdf, ps, other]

Coded Distributed Computing: Performance Limits and Code Designs

Authors: Mohammad Vahid Jamali, Mahdi Soleymani, Hessam Mahdavifar

Abstract: We consider the problem of coded distributed computing where a large linear computational job, such as a matrix multiplication, is divided into $k$ smaller tasks, encoded using an $(n,k)$ linear code, and performed over $n$ distributed nodes. The goal is to reduce the average execution time of the computational job. We provide a connection between the problem of characterizing the average executio… ▽ More We consider the problem of coded distributed computing where a large linear computational job, such as a matrix multiplication, is divided into $k$ smaller tasks, encoded using an $(n,k)$ linear code, and performed over $n$ distributed nodes. The goal is to reduce the average execution time of the computational job. We provide a connection between the problem of characterizing the average execution time of a coded distributed computing system and the problem of analyzing the error probability of codes of length $n$ used over erasure channels. Accordingly, we present closed-form expressions for the execution time using binary random linear codes and the best execution time any linear-coded distributed computing system can achieve. It is also shown that there exist good binary linear codes that attain, asymptotically, the best performance any linear code, not necessarily binary, can achieve. We also investigate the performance of coded distributed computing systems using polar and Reed-Muller (RM) codes that can benefit from low-complexity decoding, and superior performance, respectively, as well as explicit constructions. The proposed framework in this paper can enable efficient designs of distributed computing systems given the rich literature in the channel coding theory. △ Less

Submitted 24 June, 2019; originally announced June 2019.

arXiv:1906.04402 [pdf, other]

Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval

Authors: Yale Song, Mohammad Soleymani

Abstract: Visual-semantic embedding aims to find a shared latent space where related visual and textual instances are close to each other. Most current methods learn injective embedding functions that map an instance to a single point in the shared space. Unfortunately, injective embedding cannot effectively handle polysemous instances with multiple possible meanings; at best, it would find an average repre… ▽ More Visual-semantic embedding aims to find a shared latent space where related visual and textual instances are close to each other. Most current methods learn injective embedding functions that map an instance to a single point in the shared space. Unfortunately, injective embedding cannot effectively handle polysemous instances with multiple possible meanings; at best, it would find an average representation of different meanings. This hinders its use in real-world scenarios where individual instances and their cross-modal associations are often ambiguous. In this work, we introduce Polysemous Instance Embedding Networks (PIE-Nets) that compute multiple and diverse representations of an instance by combining global context with locally-guided features via multi-head self-attention and residual learning. To learn visual-semantic embedding, we tie-up two PIE-Nets and optimize them jointly in the multiple instance learning framework. Most existing work on cross-modal retrieval focuses on image-text data. Here, we also tackle a more challenging case of video-text retrieval. To facilitate further research in video-text retrieval, we release a new dataset of 50K video-sentence pairs collected from social media, dubbed MRW (my reaction when). We demonstrate our approach on both image-text and video-text retrieval scenarios using MS-COCO, TGIF, and our new MRW dataset. △ Less

Submitted 17 July, 2019; v1 submitted 11 June, 2019; originally announced June 2019.

Comments: CVPR 2019. Includes supplementary material. Have updated results on TGIF and MRW

arXiv:1806.04903 [pdf, other]

A data-driven approach to mid-level perceptual musical feature modeling

Authors: Anna Aljanaki, Mohammad Soleymani

Abstract: Musical features and descriptors could be coarsely divided into three levels of complexity. The bottom level contains the basic building blocks of music, e.g., chords, beats and timbre. The middle level contains concepts that emerge from combining the basic blocks: tonal and rhythmic stability, harmonic and rhythmic complexity, etc. High-level descriptors (genre, mood, expressive style) are usuall… ▽ More Musical features and descriptors could be coarsely divided into three levels of complexity. The bottom level contains the basic building blocks of music, e.g., chords, beats and timbre. The middle level contains concepts that emerge from combining the basic blocks: tonal and rhythmic stability, harmonic and rhythmic complexity, etc. High-level descriptors (genre, mood, expressive style) are usually modeled using the lower level ones. The features belonging to the middle level can both improve automatic recognition of high-level descriptors, and provide new music retrieval possibilities. Mid-level features are subjective and usually lack clear definitions. However, they are very important for human perception of music, and on some of them people can reach high agreement, even though defining them and therefore, designing a hand-crafted feature extractor for them can be difficult. In this paper, we derive the mid-level descriptors from data. We collect and release a dataset\footnote{https://osf.io/5aupt/} of 5000 songs annotated by musicians with seven mid-level descriptors, namely, melodiousness, tonal and rhythmic stability, modality, rhythmic complexity, dissonance and articulation. We then compare several approaches to predicting these descriptors from spectrograms using deep-learning. We also demonstrate the usefulness of these mid-level features using music emotion recognition as an application. △ Less

Submitted 13 June, 2018; originally announced June 2018.

Comments: 7 pages, ISMIR conference paper

arXiv:1804.04318 [pdf, other]

Cross-Modal Retrieval with Implicit Concept Association

Authors: Yale Song, Mohammad Soleymani

Abstract: Traditional cross-modal retrieval assumes explicit association of concepts across modalities, where there is no ambiguity in how the concepts are linked to each other, e.g., when we do the image search with a query "dogs", we expect to see dog images. In this paper, we consider a different setting for cross-modal retrieval where data from different modalities are implicitly linked via concepts tha… ▽ More Traditional cross-modal retrieval assumes explicit association of concepts across modalities, where there is no ambiguity in how the concepts are linked to each other, e.g., when we do the image search with a query "dogs", we expect to see dog images. In this paper, we consider a different setting for cross-modal retrieval where data from different modalities are implicitly linked via concepts that must be inferred by high-level reasoning; we call this setting implicit concept association. To foster future research in this setting, we present a new dataset containing 47K pairs of animated GIFs and sentences crawled from the web, in which the GIFs depict physical or emotional reactions to the scenarios described in the text (called "reaction GIFs"). We report on a user study showing that, despite the presence of implicit concept association, humans are able to identify video-sentence pairs with matching concepts, suggesting the feasibility of our task. Furthermore, we propose a novel visual-semantic embedding network based on multiple instance learning. Unlike traditional approaches, we compute multiple embeddings from each modality, each representing different concepts, and measure their similarity by considering all possible combinations of visual-semantic embeddings in the framework of multiple instance learning. We evaluate our approach on two video-sentence datasets with explicit and implicit concept association and report competitive results compared to existing approaches on cross-modal retrieval. △ Less

Submitted 25 April, 2018; v1 submitted 12 April, 2018; originally announced April 2018.

arXiv:1801.04384 [pdf, other]

Distributed Multi-User Secret Sharing

Authors: Mahdi Soleymani, Hessam Mahdavifar

Abstract: We consider a distributed secret sharing system that consists of a dealer, $n$ storage nodes, and $m$ users. Each user is given access to a certain subset of storage nodes, where it can download the stored data. The dealer wants to securely convey a specific secret $s_j$ to user $j$ via storage nodes, for $j=1,2,...,m$. More specifically, two secrecy conditions are considered in this multi-user co… ▽ More We consider a distributed secret sharing system that consists of a dealer, $n$ storage nodes, and $m$ users. Each user is given access to a certain subset of storage nodes, where it can download the stored data. The dealer wants to securely convey a specific secret $s_j$ to user $j$ via storage nodes, for $j=1,2,...,m$. More specifically, two secrecy conditions are considered in this multi-user context. The weak secrecy condition is that each user does not get any information about the individual secrets of other users, while the perfect secrecy condition implies that a user does not get any information about the collection of all other users' secrets. In this system, the dealer encodes secrets into several secret shares and loads them into the storage nodes. Given a certain number of storage nodes we find the maximum number of users that can be served in such a system and construct schemes that achieve this with perfect secrecy. Lower bounds on the minimum communication complexity and the storage overhead are characterized given any $n$ and $m$. We construct distributed secret sharing protocols, under certain conditions on the system parameters, that attain the lower bound on the communication complexity while providing perfect secrecy. Furthermore, we construct protocols, again under certain conditions, that simultaneously attain the lower bounds on the communication complexity and the storage overhead while providing weak secrecy, thereby demonstrating schemes that are optimal in terms of both the parameters. It is shown how to modify the proposed protocols in order to construct schemes with balanced storage load and communication complexity. △ Less

Submitted 29 September, 2020; v1 submitted 13 January, 2018; originally announced January 2018.

arXiv:1609.09761 [pdf, other]

Detecting Cognitive Appraisals from Facial Expressions for Interest Recognition

Authors: Mohammad Soleymani

Abstract: Interest makes one hold her attention on the object of interest. Automatic recognition of interest has numerous applications in human-computer interaction. In this paper, we study the facial expressions associated with interest and its underlying and closely related components, namely, curiosity, coping potential, novelty and complexity. To this end, we conducted an experiment in which participant… ▽ More Interest makes one hold her attention on the object of interest. Automatic recognition of interest has numerous applications in human-computer interaction. In this paper, we study the facial expressions associated with interest and its underlying and closely related components, namely, curiosity, coping potential, novelty and complexity. To this end, we conducted an experiment in which participants watched images and micro-videos while a front-facing camera recorded their expressions. After watching each item they self-reported their level of interest, curiosity, coping potential and perceived novelty and complexity. Using an automated method, we tracked facial action units (AU) and studied the relationship between the presence of facial movements with interest and its related components. We then tracked the facial landmarks, e.g., corners of lips, and extracted features from each response. We trained random forests regression models to detect the level of interest, curiosity, and appraisals. We found a large difference between the way people report and react to interesting visual content. The expressions in response to images and micro-videos were not always pronounced depending on the participants. This makes the direct detection of interest from facial expressions a challenging problem. With this work, for the first time, we demonstrate the feasibility of detecting cognitive appraisals from facial expressions which will open the door for appraisal-driven emotion recognition methods. △ Less

Submitted 10 October, 2016; v1 submitted 30 September, 2016; originally announced September 2016.

Comments: 6 pages, discussions and analysis were added. More results are also added and discussed

Showing 1–50 of 55 results for author: Soleymani, M