subscribe to arXiv mailings

Characterizing Continual Learning Scenarios and Strategies for Audio Analysis

Authors: Ruchi Bhatt, Pratibha Kumari, Dwarikanath Mahapatra, Abdulmotaleb El Saddik, Mukesh Saini

Abstract: Audio analysis is useful in many application scenarios. The state-of-the-art audio analysis approaches assume that the data distribution at training and deployment time will be the same. However, due to various real-life environmental factors, the data may encounter drift in its distribution or can encounter new classes in the late future. Thus, a one-time trained model might not perform adequatel… ▽ More Audio analysis is useful in many application scenarios. The state-of-the-art audio analysis approaches assume that the data distribution at training and deployment time will be the same. However, due to various real-life environmental factors, the data may encounter drift in its distribution or can encounter new classes in the late future. Thus, a one-time trained model might not perform adequately. In this paper, we characterize continual learning (CL) approaches in audio analysis. In this paper, we characterize continual learning (CL) approaches, intended to tackle catastrophic forgetting arising due to drifts. As there is no CL dataset for audio analysis, we use DCASE 2020 to 2023 datasets to create various CL scenarios for audio-based monitoring tasks. We have investigated the following CL and non-CL approaches: EWC, LwF, SI, GEM, A-GEM, GDumb, Replay, Naive, cumulative, and joint training. The study is very beneficial for researchers and practitioners working in the area of audio analysis for developing adaptive models. We observed that Replay achieved better results than other methods in the DCASE challenge data. It achieved an accuracy of 70.12% for the domain incremental scenario and an accuracy of 96.98% for the class incremental scenario. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2405.07759 [pdf, other]

doi 10.1109/JIOT.2024.3398548

MADRL-Based Rate Adaptation for 360° Video Streaming with Multi-Viewpoint Prediction

Authors: Haopeng Wang, Zijian Long, Haiwei Dong, Abdulmotaleb El Saddik

Abstract: Over the last few years, 360° video traffic on the network has grown significantly. A key challenge of 360° video playback is ensuring a high quality of experience (QoE) with limited network bandwidth. Currently, most studies focus on tile-based adaptive bitrate (ABR) streaming based on single viewport prediction to reduce bandwidth consumption. However, the performance of models for single-viewpo… ▽ More Over the last few years, 360° video traffic on the network has grown significantly. A key challenge of 360° video playback is ensuring a high quality of experience (QoE) with limited network bandwidth. Currently, most studies focus on tile-based adaptive bitrate (ABR) streaming based on single viewport prediction to reduce bandwidth consumption. However, the performance of models for single-viewpoint prediction is severely limited by the inherent uncertainty in head movement, which can not cope with the sudden movement of users very well. This paper first presents a multimodal spatial-temporal attention transformer to generate multiple viewpoint trajectories with their probabilities given a historical trajectory. The proposed method models viewpoint prediction as a classification problem and uses attention mechanisms to capture the spatial and temporal characteristics of input video frames and viewpoint trajectories for multi-viewpoint prediction. After that, a multi-agent deep reinforcement learning (MADRL)-based ABR algorithm utilizing multi-viewpoint prediction for 360° video streaming is proposed for maximizing different QoE objectives under various network conditions. We formulate the ABR problem as a decentralized partially observable Markov decision process (Dec-POMDP) problem and present a MAPPO algorithm based on centralized training and decentralized execution (CTDE) framework to solve the problem. The experimental results show that our proposed method improves the defined QoE metric by up to 85.5% compared to existing ABR methods. △ Less

Submitted 17 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

Comments: Accepted by IEEE Internet of Things Journal

arXiv:2404.14573 [pdf, other]

doi 10.1109/MIS.2024.3385313

Tile-Weighted Rate-Distortion Optimized Packet Scheduling for 360$^\circ$ VR Video Streaming

Authors: Haopeng Wang, Haiwei Dong, Abdulmotaleb El Saddik

Abstract: A key challenge of 360$^\circ$ VR video streaming is ensuring high quality with limited network bandwidth. Currently, most studies focus on tile-based adaptive bitrate streaming to reduce bandwidth consumption, where resources in network nodes are not fully utilized. This article proposes a tile-weighted rate-distortion (TWRD) packet scheduling optimization system to reduce data volume and improve… ▽ More A key challenge of 360$^\circ$ VR video streaming is ensuring high quality with limited network bandwidth. Currently, most studies focus on tile-based adaptive bitrate streaming to reduce bandwidth consumption, where resources in network nodes are not fully utilized. This article proposes a tile-weighted rate-distortion (TWRD) packet scheduling optimization system to reduce data volume and improve video quality. A multimodal spatial-temporal attention transformer is proposed to predict viewpoint with probability that is used to dynamically weight tiles and corresponding packets. The packet scheduling problem of determining which packets should be dropped is formulated as an optimization problem solved by a dynamic programming solution. Experiment results demonstrate the proposed method outperforms the existing methods under various conditions. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: Accepted by IEEE Intelligent Systems

arXiv:2403.18323 [pdf, other]

doi 10.1109/TMM.2024.3366399

How to Cache Important Contents for Multi-modal Service in Dynamic Networks: A DRL-based Caching Scheme

Authors: Zhe Zhang, Marc St-Hilaire, Xin Wei, Haiwei Dong, Abdulmotaleb El Saddik

Abstract: With the continuous evolution of networking technologies, multi-modal services that involve video, audio, and haptic contents are expected to become the dominant multimedia service in the near future. Edge caching is a key technology that can significantly reduce network load and content transmission latency, which is critical for the delivery of multi-modal contents. However, existing caching app… ▽ More With the continuous evolution of networking technologies, multi-modal services that involve video, audio, and haptic contents are expected to become the dominant multimedia service in the near future. Edge caching is a key technology that can significantly reduce network load and content transmission latency, which is critical for the delivery of multi-modal contents. However, existing caching approaches only rely on a limited number of factors, e.g., popularity, to evaluate their importance for caching, which is inefficient for caching multi-modal contents, especially in dynamic network environments. To overcome this issue, we propose a content importance-based caching scheme which consists of a content importance evaluation model and a caching model. By leveraging dueling double deep Q networks (D3QN) model, the content importance evaluation model can adaptively evaluate contents' importance in dynamic networks. Based on the evaluated contents' importance, the caching model can easily cache and evict proper contents to improve caching efficiency. The simulation results show that the proposed content importance-based caching scheme outperforms existing caching schemes in terms of caching hit ratio (at least 15% higher), reduced network load (up to 22% reduction), average number of hops (up to 27% lower), and unsatisfied requests ratio (more than 47% reduction). △ Less

Submitted 27 March, 2024; originally announced March 2024.

Journal ref: IEEE Transactions on Multimedia (Early Access), 2024

arXiv:2403.18293 [pdf, other]

Efficient Test-Time Adaptation of Vision-Language Models

Authors: Adilbek Karmanov, Dayan Guan, Shijian Lu, Abdulmotaleb El Saddik, Eric Xing

Abstract: Test-time adaptation with pre-trained vision-language models has attracted increasing attention for tackling distribution shifts during the test time. Though prior studies have achieved very promising performance, they involve intensive computation which is severely unaligned with test-time adaptation. We design TDA, a training-free dynamic adapter that enables effective and efficient test-time ad… ▽ More Test-time adaptation with pre-trained vision-language models has attracted increasing attention for tackling distribution shifts during the test time. Though prior studies have achieved very promising performance, they involve intensive computation which is severely unaligned with test-time adaptation. We design TDA, a training-free dynamic adapter that enables effective and efficient test-time adaptation with vision-language models. TDA works with a lightweight key-value cache that maintains a dynamic queue with few-shot pseudo labels as values and the corresponding test-sample features as keys. Leveraging the key-value cache, TDA allows adapting to test data gradually via progressive pseudo label refinement which is super-efficient without incurring any backpropagation. In addition, we introduce negative pseudo labeling that alleviates the adverse impact of pseudo label noises by assigning pseudo labels to certain negative classes when the model is uncertain about its pseudo label predictions. Extensive experiments over two benchmarks demonstrate TDA's superior effectiveness and efficiency as compared with the state-of-the-art. The code has been released in \url{https://kdiaaa.github.io/tda/}. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024. The code has been released in \url{https://kdiaaa.github.io/tda/}

arXiv:2403.15256 [pdf, other]

doi 10.1109/MCE.2024.3364118

Experimental Studies of Metaverse Streaming

Authors: Haopeng Wang, Roberto Martinez-Velazquez, Haiwei Dong, Abdulmotaleb El Saddik

Abstract: Metaverse aims to construct a large, unified, immersive, and shared digital realm by combining various technologies, namely XR (extended reality), blockchain, and digital twin, among others. This article explores the Metaverse from the perspective of multimedia communication by conducting and analyzing real-world experiments on four different Metaverse platforms: VR (virtual reality) Vircadia, VR… ▽ More Metaverse aims to construct a large, unified, immersive, and shared digital realm by combining various technologies, namely XR (extended reality), blockchain, and digital twin, among others. This article explores the Metaverse from the perspective of multimedia communication by conducting and analyzing real-world experiments on four different Metaverse platforms: VR (virtual reality) Vircadia, VR Mozilla Hubs, VRChat, and MR (mixed reality) Virtual City. We first investigate the traffic patterns and network performance in the three VR platforms. After raising the challenges of the Metaverse streaming and investigating the potential methods to enhance Metaverse performance, we propose a remote rendering architecture and verify its advantages through a prototype involving the campus network and MR multimodal interaction by comparison with local rendering. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: Accepted by IEEE Consumer Electronics Magazine

arXiv:2403.14449 [pdf, other]

doi 10.1109/MCE.2024.3381573

Bringing Robots Home: The Rise of AI Robots in Consumer Electronics

Authors: Haiwei Dong, Yang Liu, Ted Chu, Abdulmotaleb El Saddik

Abstract: On March 18, 2024, NVIDIA unveiled Project GR00T, a general-purpose multimodal generative AI model designed specifically for training humanoid robots. Preceding this event, Tesla's unveiling of the Optimus Gen 2 humanoid robot on December 12, 2023, underscored the profound impact robotics is poised to have on reshaping various facets of our daily lives. While robots have long dominated industrial… ▽ More On March 18, 2024, NVIDIA unveiled Project GR00T, a general-purpose multimodal generative AI model designed specifically for training humanoid robots. Preceding this event, Tesla's unveiling of the Optimus Gen 2 humanoid robot on December 12, 2023, underscored the profound impact robotics is poised to have on reshaping various facets of our daily lives. While robots have long dominated industrial settings, their presence within our homes is a burgeoning phenomenon. This can be attributed, in part, to the complexities of domestic environments and the challenges of creating robots that can seamlessly integrate into our daily routines. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: Accepted by IEEE Consumer Electronics Magazine

arXiv:2401.06957 [pdf, other]

EVOKE: Emotion Enabled Virtual Avatar Mapping Using Optimized Knowledge Distillation

Authors: Maryam Nadeem, Raza Imam, Rouqaiah Al-Refai, Meriem Chkir, Mohamad Hoda, Abdulmotaleb El Saddik

Abstract: As virtual environments continue to advance, the demand for immersive and emotionally engaging experiences has grown. Addressing this demand, we introduce Emotion enabled Virtual avatar mapping using Optimized KnowledgE distillation (EVOKE), a lightweight emotion recognition framework designed for the seamless integration of emotion recognition into 3D avatars within virtual environments. Our appr… ▽ More As virtual environments continue to advance, the demand for immersive and emotionally engaging experiences has grown. Addressing this demand, we introduce Emotion enabled Virtual avatar mapping using Optimized KnowledgE distillation (EVOKE), a lightweight emotion recognition framework designed for the seamless integration of emotion recognition into 3D avatars within virtual environments. Our approach leverages knowledge distillation involving multi-label classification on the publicly available DEAP dataset, which covers valence, arousal, and dominance as primary emotional classes. Remarkably, our distilled model, a CNN with only two convolutional layers and 18 times fewer parameters than the teacher model, achieves competitive results, boasting an accuracy of 87% while demanding far less computational resources. This equilibrium between performance and deployability positions our framework as an ideal choice for virtual environment systems. Furthermore, the multi-label classification outcomes are utilized to map emotions onto custom-designed 3D avatars. △ Less

Submitted 12 January, 2024; originally announced January 2024.

Comments: Presented at IEEE 42nd International Conference on Consumer Electronics (ICCE) 2024

arXiv:2401.00393 [pdf]

Generative Model-Driven Synthetic Training Image Generation: An Approach to Cognition in Rail Defect Detection

Authors: Rahatara Ferdousi, Chunsheng Yang, M. Anwar Hossain, Fedwa Laamarti, M. Shamim Hossain, Abdulmotaleb El Saddik

Abstract: Recent advancements in cognitive computing, with the integration of deep learning techniques, have facilitated the development of intelligent cognitive systems (ICS). This is particularly beneficial in the context of rail defect detection, where the ICS would emulate human-like analysis of image data for defect patterns. Despite the success of Convolutional Neural Networks (CNN) in visual defect c… ▽ More Recent advancements in cognitive computing, with the integration of deep learning techniques, have facilitated the development of intelligent cognitive systems (ICS). This is particularly beneficial in the context of rail defect detection, where the ICS would emulate human-like analysis of image data for defect patterns. Despite the success of Convolutional Neural Networks (CNN) in visual defect classification, the scarcity of large datasets for rail defect detection remains a challenge due to infrequent accident events that would result in defective parts and images. Contemporary researchers have addressed this data scarcity challenge by exploring rule-based and generative data augmentation models. Among these, Variational Autoencoder (VAE) models can generate realistic data without extensive baseline datasets for noise modeling. This study proposes a VAE-based synthetic image generation technique for rail defects, incorporating weight decay regularization and image reconstruction loss to prevent overfitting. The proposed method is applied to create a synthetic dataset for the Canadian Pacific Railway (CPR) with just 50 real samples across five classes. Remarkably, 500 synthetic samples are generated with a minimal reconstruction loss of 0.021. A Visual Transformer (ViT) model underwent fine-tuning using this synthetic CPR dataset, achieving high accuracy rates (98%-99%) in classifying the five defect classes. This research offers a promising solution to the data scarcity challenge in rail defect detection, showcasing the potential for robust ICS development in this domain. △ Less

Submitted 30 December, 2023; originally announced January 2024.

Comments: 26 pages, 13 figures, Springer Journal

MSC Class: 68T05; 94A08; 90B25 ACM Class: I.2.6; I.2.10; I.5.4; I.4.10

arXiv:2312.15313 [pdf, other]

doi 10.1109/JIOT.2023.3283335

Human-Centric Resource Allocation for the Metaverse With Multiaccess Edge Computing

Authors: Zijian Long, Haiwei Dong, Abdulmotaleb El Saddik

Abstract: Multi-access edge computing (MEC) is a promising solution to the computation-intensive, low-latency rendering tasks of the metaverse. However, how to optimally allocate limited communication and computation resources at the edge to a large number of users in the metaverse is quite challenging. In this paper, we propose an adaptive edge resource allocation method based on multi-agent soft actor-cri… ▽ More Multi-access edge computing (MEC) is a promising solution to the computation-intensive, low-latency rendering tasks of the metaverse. However, how to optimally allocate limited communication and computation resources at the edge to a large number of users in the metaverse is quite challenging. In this paper, we propose an adaptive edge resource allocation method based on multi-agent soft actor-critic with graph convolutional networks (SAC-GCN). Specifically, SAC-GCN models the multi-user metaverse environment as a graph where each agent is denoted by a node. Each agent learns the interplay between agents by graph convolutional networks with self-attention mechanism to further determine the resource usage for one user in the metaverse. The effectiveness of SAC-GCN is demonstrated through the analysis of user experience, balance of resource allocation, and resource utilization rate by taking a virtual city park metaverse as an example. Experimental results indicate that SAC-GCN outperforms other resource allocation methods in improving overall user experience, balancing resource allocation, and increasing resource utilization rate by at least 27%, 11%, and 8%, respectively. △ Less

Submitted 23 December, 2023; originally announced December 2023.

Journal ref: IEEE Internet of Things Journal, vol. 10, no. 22, pp. 19993-20005, 2023

arXiv:2312.06926 [pdf, other]

Content-Localization based Neural Machine Translation for Informal Dialectal Arabic: Spanish/French to Levantine/Gulf Arabic

Authors: Fatimah Alzamzami, Abdulmotaleb El Saddik

Abstract: Resources in high-resource languages have not been efficiently exploited in low-resource languages to solve language-dependent research problems. Spanish and French are considered high resource languages in which an adequate level of data resources for informal online social behavior modeling, is observed. However, a machine translation system to access those data resources and transfer their cont… ▽ More Resources in high-resource languages have not been efficiently exploited in low-resource languages to solve language-dependent research problems. Spanish and French are considered high resource languages in which an adequate level of data resources for informal online social behavior modeling, is observed. However, a machine translation system to access those data resources and transfer their context and tone to a low-resource language like dialectal Arabic, does not exist. In response, we propose a framework that localizes contents of high-resource languages to a low-resource language/dialects by utilizing AI power. To the best of our knowledge, we are the first work to provide a parallel translation dataset from/to informal Spanish and French to/from informal Arabic dialects. Using this, we aim to enrich the under-resource-status dialectal Arabic and fast-track the research of diverse online social behaviors within and across smart cities in different geo-regions. The experimental results have illustrated the capability of our proposed solution in exploiting the resources between high and low resource languages and dialects. Not only this, but it has also been proven that ignoring dialects within the same language could lead to misleading analysis of online social behavior. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: arXiv admin note: text overlap with arXiv:2312.03727

arXiv:2312.03727 [pdf, other]

Content-Localization based System for Analyzing Sentiment and Hate Behaviors in Low-Resource Dialectal Arabic: English to Levantine and Gulf

Authors: Fatimah Alzamzami, Abdulmotaleb El Saddik

Abstract: Even though online social movements can quickly become viral on social media, languages can be a barrier to timely monitoring and analyzing the underlying online social behaviors (OSB). This is especially true for under-resourced languages on social media like dialectal Arabic; the primary language used by Arabs on social media. Therefore, it is crucial to provide solutions to efficiently exploit… ▽ More Even though online social movements can quickly become viral on social media, languages can be a barrier to timely monitoring and analyzing the underlying online social behaviors (OSB). This is especially true for under-resourced languages on social media like dialectal Arabic; the primary language used by Arabs on social media. Therefore, it is crucial to provide solutions to efficiently exploit resources from high-resourced languages to solve language-dependent OSB analysis in under-resourced languages. This paper proposes to localize content of resources in high-resourced languages into under-resourced Arabic dialects. Content localization goes beyond content translation that converts text from one language to another; content localization adapts culture, language nuances and regional preferences from one language to a specific language/dialect. Automating understanding of the natural and familiar day-to-day expressions in different regions, is the key to achieve a wider analysis of OSB especially for smart cities. In this paper, we utilize content-localization based neural machine translation to develop sentiment and hate classifiers for two low-resourced Arabic dialects: Levantine and Gulf. Not only this but we also leverage unsupervised learning to facilitate the analysis of sentiment and hate predictions by inferring hidden topics from the corresponding data and providing coherent interpretations of those topics in their native language/dialects. The experimental evaluations and proof-of-concept COVID-19 case study on real data have validated the effectiveness of our proposed system in precisely distinguishing sentiments and accurately identifying hate content in both Levantine and Gulf Arabic dialects. Our findings shed light on the importance of considering the unique nature of dialects within the same language and ignoring the dialectal aspect would lead to misleading analysis. △ Less

Submitted 27 November, 2023; originally announced December 2023.

arXiv:2311.17629 [pdf, other]

Efficient Decoder for End-to-End Oriented Object Detection in Remote Sensing Images

Authors: Jiaqi Zhao, Zeyu Ding, Yong Zhou, Hancheng Zhu, Wenliang Du, Rui Yao, Abdulmotaleb El Saddik

Abstract: Object instances in remote sensing images often distribute with multi-orientations, varying scales, and dense distribution. These issues bring challenges to end-to-end oriented object detectors including multi-scale features alignment and a large number of queries. To address these limitations, we propose an end-to-end oriented detector equipped with an efficient decoder, which incorporates two te… ▽ More Object instances in remote sensing images often distribute with multi-orientations, varying scales, and dense distribution. These issues bring challenges to end-to-end oriented object detectors including multi-scale features alignment and a large number of queries. To address these limitations, we propose an end-to-end oriented detector equipped with an efficient decoder, which incorporates two technologies, Rotated RoI attention (RRoI attention) and Selective Distinct Queries (SDQ). Specifically, RRoI attention effectively focuses on oriented regions of interest through a cross-attention mechanism and aligns multi-scale features. SDQ collects queries from intermediate decoder layers and then filters similar queries to obtain distinct queries. The proposed SDQ can facilitate the optimization of one-to-one label assignment, without introducing redundant initial queries or extra auxiliary branches. Extensive experiments on five datasets demonstrate the effectiveness of our method. Notably, our method achieves state-of-the-art performance on DIOR-R (67.31% mAP), DOTA-v1.5 (67.43% mAP), and DOTA-v2.0 (53.28% mAP) with the ResNet50 backbone. △ Less

Submitted 1 December, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: 11 pages, 7 figures, 13 tables

arXiv:2311.14824 [pdf]

A Reusable AI-Enabled Defect Detection System for Railway Using Ensembled CNN

Authors: Rahatara Ferdousi, Fedwa Laamarti, Chunsheng Yang, Abdulmotaleb El Saddik

Abstract: Accurate Defect detection is crucial for ensuring the trustworthiness of intelligent railway systems. Current approaches rely on single deep-learning models, like CNNs, which employ a large amount of data to capture underlying patterns. Training a new defect classifier with limited samples often leads to overfitting and poor performance on unseen images. To address this, researchers have advocated… ▽ More Accurate Defect detection is crucial for ensuring the trustworthiness of intelligent railway systems. Current approaches rely on single deep-learning models, like CNNs, which employ a large amount of data to capture underlying patterns. Training a new defect classifier with limited samples often leads to overfitting and poor performance on unseen images. To address this, researchers have advocated transfer learning and fine-tuning the pre-trained models. However, using a single backbone network in transfer learning still may cause bottleneck issues and inconsistent performance if it is not suitable for a specific problem domain. To overcome these challenges, we propose a reusable AI-enabled defect detection approach. By combining ensemble learning with transfer learning models (VGG-19, MobileNetV3, and ResNet-50), we improved the classification accuracy and achieved consistent performance at a certain phase of training. Our empirical analysis demonstrates better and more consistent performance compared to other state-of-the-art approaches. The consistency substantiates the reusability of the defect detection system for newly evolved defected rail parts. Therefore we anticipate these findings to benefit further research and development of reusable AI-enabled solutions for railway systems. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: 28 pages, 13 Figures, Applied Intelligence Journal, Springer Nature

MSC Class: 68T45; 68T05 ACM Class: I.2.10; I.5.2

arXiv:2311.10256 [pdf]

Exploring User Perceptions of Virtual Reality Scene Design in Metaverse Learning Environments

Authors: Rahatara Ferdousi, Mohammed Faisal, Fedwa Laamarti, Chunsheng Yang, Abdulmotaleb El Saddik

Abstract: Metaverse learning environments allow for a seamless and intuitive transition between activities compared to Virtual Reality (VR) learning environments, due to their interconnected design. The design of VR scenes is important for creating effective learning experiences in the Metaverse. However, there is limited research on the impact of different design elements on user's learning experiences in… ▽ More Metaverse learning environments allow for a seamless and intuitive transition between activities compared to Virtual Reality (VR) learning environments, due to their interconnected design. The design of VR scenes is important for creating effective learning experiences in the Metaverse. However, there is limited research on the impact of different design elements on user's learning experiences in VR scenes. To address this, a study was conducted with 16 participants who interacted with two VR scenes, each with varying design elements such as style, color, texture, object, and background, while watching a short tutorial. Participant rankings of the scenes for learning were obtained using a seven-point Likert scale, and the Mann-Whitney U test was used to validate differences in preference between the scenes. The results showed a significant difference in preference between the scenes. Further analysis using the NASA TLX questionnaire was conducted to examine the impact of this difference on cognitive load, and participant feedback was also considered. The study emphasizes the importance of careful VR scene design to improve the user's learning experience. △ Less

Submitted 21 November, 2023; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: 6 pages,3 figures, accepted to present at IEEE 42nd International Conference on Consumer Electronics

ACM Class: K.3; J.7

arXiv:2309.12137 [pdf, other]

OSN-MDAD: Machine Translation Dataset for Arabic Multi-Dialectal Conversations on Online Social Media

Authors: Fatimah Alzamzami, Abdulmotaleb El Saddik

Abstract: While resources for English language are fairly sufficient to understand content on social media, similar resources in Arabic are still immature. The main reason that the resources in Arabic are insufficient is that Arabic has many dialects in addition to the standard version (MSA). Arabs do not use MSA in their daily communications; rather, they use dialectal versions. Unfortunately, social users… ▽ More While resources for English language are fairly sufficient to understand content on social media, similar resources in Arabic are still immature. The main reason that the resources in Arabic are insufficient is that Arabic has many dialects in addition to the standard version (MSA). Arabs do not use MSA in their daily communications; rather, they use dialectal versions. Unfortunately, social users transfer this phenomenon into their use of social media platforms, which in turn has raised an urgent need for building suitable AI models for language-dependent applications. Existing machine translation (MT) systems designed for MSA fail to work well with Arabic dialects. In light of this, it is necessary to adapt to the informal nature of communication on social networks by developing MT systems that can effectively handle the various dialects of Arabic. Unlike for MSA that shows advanced progress in MT systems, little effort has been exerted to utilize Arabic dialects for MT systems. While few attempts have been made to build translation datasets for dialectal Arabic, they are domain dependent and are not OSN cultural-language friendly. In this work, we attempt to alleviate these limitations by proposing an online social network-based multidialect Arabic dataset that is crafted by contextually translating English tweets into four Arabic dialects: Gulf, Yemeni, Iraqi, and Levantine. To perform the translation, we followed our proposed guideline framework for content translation, which could be universally applicable for translation between foreign languages and local dialects. We validated the authenticity of our proposed dataset by developing neural MT models for four Arabic dialects. Our results have shown a superior performance of our NMT models trained using our dataset. We believe that our dataset can reliably serve as an Arabic multidialectal translation dataset for informal MT tasks. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2308.02039 [pdf, other]

Harnessing Web3 on Carbon Offset Market for Sustainability: Framework and A Case Study

Authors: Chenyu Zhou, Hongzhou Chen, Shiman Wang, Xinyao Sun, Abdulmotaleb El Saddik, Wei Cai

Abstract: Blockchain, pivotal in shaping the metaverse and Web3, often draws criticism for high energy consumption and carbon emission. The rise of sustainability-focused blockchains, especially when intersecting with innovative wireless technologies, revises this predicament. To understand blockchain's role in sustainability, we propose a three-layers structure encapsulating four green utilities: Recording… ▽ More Blockchain, pivotal in shaping the metaverse and Web3, often draws criticism for high energy consumption and carbon emission. The rise of sustainability-focused blockchains, especially when intersecting with innovative wireless technologies, revises this predicament. To understand blockchain's role in sustainability, we propose a three-layers structure encapsulating four green utilities: Recording and Tracking, Wide Verification, Value Trading, and Concept Disseminating. Nori, a decentralized voluntary carbon offset project, serves as our case, illuminating these utilities. Our research unveils unique insights into the on-chain carbon market participants, affect factors of the market, value propositions of NFT-based carbon credits, and the role of social media to spread the concept of carbon offset. We argue that blockchain's contribution to sustainability is significant, with carbon offsetting potentially evolving as a new standard within the blockchain sector. △ Less

Submitted 25 July, 2023; originally announced August 2023.

arXiv:2305.14093 [pdf, other]

Weakly Supervised 3D Open-vocabulary Segmentation

Authors: Kunhao Liu, Fangneng Zhan, Jiahui Zhang, Muyu Xu, Yingchen Yu, Abdulmotaleb El Saddik, Christian Theobalt, Eric Xing, Shijian Lu

Abstract: Open-vocabulary segmentation of 3D scenes is a fundamental function of human perception and thus a crucial objective in computer vision research. However, this task is heavily impeded by the lack of large-scale and diverse 3D open-vocabulary segmentation datasets for training robust and generalizable models. Distilling knowledge from pre-trained 2D open-vocabulary segmentation models helps but it… ▽ More Open-vocabulary segmentation of 3D scenes is a fundamental function of human perception and thus a crucial objective in computer vision research. However, this task is heavily impeded by the lack of large-scale and diverse 3D open-vocabulary segmentation datasets for training robust and generalizable models. Distilling knowledge from pre-trained 2D open-vocabulary segmentation models helps but it compromises the open-vocabulary feature as the 2D models are mostly finetuned with close-vocabulary datasets. We tackle the challenges in 3D open-vocabulary segmentation by exploiting pre-trained foundation models CLIP and DINO in a weakly supervised manner. Specifically, given only the open-vocabulary text descriptions of the objects in a scene, we distill the open-vocabulary multimodal knowledge and object reasoning capability of CLIP and DINO into a neural radiance field (NeRF), which effectively lifts 2D features into view-consistent 3D segmentation. A notable aspect of our approach is that it does not require any manual segmentation annotations for either the foundation models or the distillation process. Extensive experiments show that our method even outperforms fully supervised models trained with segmentation annotations in certain scenes, suggesting that 3D open-vocabulary segmentation can be effectively learned from 2D images and text-image pairs. Code is available at \url{https://github.com/Kunhao-Liu/3D-OVS}. △ Less

Submitted 9 January, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: Accepted to NeurIPS 2023

arXiv:2304.11445 [pdf, other]

Improving Stain Invariance of CNNs for Segmentation by Fusing Channel Attention and Domain-Adversarial Training

Authors: Kudaibergen Abutalip, Numan Saeed, Mustaqeem Khan, Abdulmotaleb El Saddik

Abstract: Variability in staining protocols, such as different slide preparation techniques, chemicals, and scanner configurations, can result in a diverse set of whole slide images (WSIs). This distribution shift can negatively impact the performance of deep learning models on unseen samples, presenting a significant challenge for developing new computational pathology applications. In this study, we propo… ▽ More Variability in staining protocols, such as different slide preparation techniques, chemicals, and scanner configurations, can result in a diverse set of whole slide images (WSIs). This distribution shift can negatively impact the performance of deep learning models on unseen samples, presenting a significant challenge for developing new computational pathology applications. In this study, we propose a method for improving the generalizability of convolutional neural networks (CNNs) to stain changes in a single-source setting for semantic segmentation. Recent studies indicate that style features mainly exist as covariances in earlier network layers. We design a channel attention mechanism based on these findings that detects stain-specific features and modify the previously proposed stain-invariant training scheme. We reweigh the outputs of earlier layers and pass them to the stain-adversarial training branch. We evaluate our method on multi-center, multi-stain datasets and demonstrate its effectiveness through interpretability analysis. Our approach achieves substantial improvements over baselines and competitive performance compared to other methods, as measured by various evaluation metrics. We also show that combining our method with stain augmentation leads to mutually beneficial results and outperforms other techniques. Overall, our study makes significant contributions to the field of computational pathology. △ Less

Submitted 22 April, 2023; originally announced April 2023.

arXiv:2304.00690 [pdf, other]

3D Semantic Segmentation in the Wild: Learning Generalized Models for Adverse-Condition Point Clouds

Authors: Aoran Xiao, Jiaxing Huang, Weihao Xuan, Ruijie Ren, Kangcheng Liu, Dayan Guan, Abdulmotaleb El Saddik, Shijian Lu, Eric Xing

Abstract: Robust point cloud parsing under all-weather conditions is crucial to level-5 autonomy in autonomous driving. However, how to learn a universal 3D semantic segmentation (3DSS) model is largely neglected as most existing benchmarks are dominated by point clouds captured under normal weather. We introduce SemanticSTF, an adverse-weather point cloud dataset that provides dense point-level annotations… ▽ More Robust point cloud parsing under all-weather conditions is crucial to level-5 autonomy in autonomous driving. However, how to learn a universal 3D semantic segmentation (3DSS) model is largely neglected as most existing benchmarks are dominated by point clouds captured under normal weather. We introduce SemanticSTF, an adverse-weather point cloud dataset that provides dense point-level annotations and allows to study 3DSS under various adverse weather conditions. We study all-weather 3DSS modeling under two setups: 1) domain adaptive 3DSS that adapts from normal-weather data to adverse-weather data; 2) domain generalizable 3DSS that learns all-weather 3DSS models from normal-weather data. Our studies reveal the challenge while existing 3DSS methods encounter adverse-weather data, showing the great value of SemanticSTF in steering the future endeavor along this very meaningful research direction. In addition, we design a domain randomization technique that alternatively randomizes the geometry styles of point clouds and aggregates their embeddings, ultimately leading to a generalizable model that can improve 3DSS under various adverse weather effectively. The SemanticSTF and related codes are available at \url{https://github.com/xiaoaoran/SemanticSTF}. △ Less

Submitted 2 April, 2023; originally announced April 2023.

Comments: CVPR2023

arXiv:2303.10598 [pdf, other]

StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields

Authors: Kunhao Liu, Fangneng Zhan, Yiwen Chen, Jiahui Zhang, Yingchen Yu, Abdulmotaleb El Saddik, Shijian Lu, Eric Xing

Abstract: 3D style transfer aims to render stylized novel views of a 3D scene with multi-view consistency. However, most existing work suffers from a three-way dilemma over accurate geometry reconstruction, high-quality stylization, and being generalizable to arbitrary new styles. We propose StyleRF (Style Radiance Fields), an innovative 3D style transfer technique that resolves the three-way dilemma by per… ▽ More 3D style transfer aims to render stylized novel views of a 3D scene with multi-view consistency. However, most existing work suffers from a three-way dilemma over accurate geometry reconstruction, high-quality stylization, and being generalizable to arbitrary new styles. We propose StyleRF (Style Radiance Fields), an innovative 3D style transfer technique that resolves the three-way dilemma by performing style transformation within the feature space of a radiance field. StyleRF employs an explicit grid of high-level features to represent 3D scenes, with which high-fidelity geometry can be reliably restored via volume rendering. In addition, it transforms the grid features according to the reference style which directly leads to high-quality zero-shot style transfer. StyleRF consists of two innovative designs. The first is sampling-invariant content transformation that makes the transformation invariant to the holistic statistics of the sampled 3D points and accordingly ensures multi-view consistency. The second is deferred style transformation of 2D feature maps which is equivalent to the transformation of 3D points but greatly reduces memory footprint without degrading multi-view consistency. Extensive experiments show that StyleRF achieves superior 3D stylization quality with precise geometry reconstruction and it can generalize to various new styles in a zero-shot manner. △ Less

Submitted 24 March, 2023; v1 submitted 19 March, 2023; originally announced March 2023.

Comments: Accepted to CVPR 2023. Project website: https://kunhao-liu.github.io/StyleRF/

arXiv:2301.03413 [pdf]

doi 10.1109/LES.2015.2440761

A Framework of Reconfigurable Transducer Nodes for Smart Home Environments

Authors: Basim Hafidh, Hussein Al Osman, Haiwei Dong, Abdulmotaleb El Saddik

Abstract: This letter presents a transducer network framework that supports the amalgamation of multiple transducers into single wireless nodes. This approach is aimed at decreasing energy consumption by reducing the number of wireless transceivers involved in such networks. To make wireless nodes easily reconfigurable, a plug and play mechanism is applied to enable the clustering of any number of transduce… ▽ More This letter presents a transducer network framework that supports the amalgamation of multiple transducers into single wireless nodes. This approach is aimed at decreasing energy consumption by reducing the number of wireless transceivers involved in such networks. To make wireless nodes easily reconfigurable, a plug and play mechanism is applied to enable the clustering of any number of transducers. Furthermore, an algorithm is proposed to dynamically detect added and removed transducers from a node. Lastly, an XML based protocol is devised to allow nodes to communicate a description of their layout, measured data and control information. To verify the proposed framework, multiple reconfigurable wireless nodes are used to monitor the dynamic condition of a multiple rooms during a period of 24 hours in order to emulate a smart home scenario. △ Less

Submitted 25 December, 2022; originally announced January 2023.

Journal ref: IEEE Embedded Systems Letters, vol. 7, no. 3, pp. 81-84, 2015

arXiv:2301.00726 [pdf, other]

doi 10.1109/JSYST.2016.2553518

3-D Markerless Tracking of Human Gait by Geometric Trilateration of Multiple Kinects

Authors: Lin Yang, Bowen Yang, Haiwei Dong, Abdulmotaleb El Saddik

Abstract: In this paper, we develop an integrated markerless gait tracking system with three Kinect v2 sensors. A geometric principle-based trilateration method is proposed for optimizing the accuracy of the measured gait data. To tackle the data synchronization problem among the Kinect clients and the server, a synchronization mechanism based on NTP (Network Time Protocol) is designed for synchronizing the… ▽ More In this paper, we develop an integrated markerless gait tracking system with three Kinect v2 sensors. A geometric principle-based trilateration method is proposed for optimizing the accuracy of the measured gait data. To tackle the data synchronization problem among the Kinect clients and the server, a synchronization mechanism based on NTP (Network Time Protocol) is designed for synchronizing the server and Kinect clients' clocks. Furthermore, a time schedule is designed for timing each Kinect client's data transmission. In the experiment, participants are asked to perform a 60 s walk while the proposed tracking system obtains the participant's gait data. Six joints (including left hip, right hip, left knee, right knee, left ankle and right ankle) of the participants are tracked where the obtained gait data are described as 6000 {movements} of joint positions (1000 {movements} for each joint). The results show that the trilateration tracking result by the three Kinect sensors has a much higher accuracy compared with the accuracy measured by a single Kinect sensor. Within a randomly sampled time period (67.726 s in the experiment), 98.37% of the frames generated by the gait tracking system have timing errors less than 1 ms, which is much better than the default NTP service embedded in the Windows 8.1 operating system. The accuracy of the proposed system is quantitatively evaluated and verified by a comparison with a commercial medical system (Delsys Trigno Smart Sensor System). △ Less

Submitted 25 December, 2022; originally announced January 2023.

Journal ref: IEEE Systems Journal, vol. 12, no. 2, pp. 1393-1403, 2018

arXiv:2212.14773 [pdf, other]

doi 10.1007/s11042-016-3949-2

Development of an automatic 3D human head scanning-printing system

Authors: Longyu Zhang, Bote Han, Haiwei Dong, Abdulmotaleb El Saddik

Abstract: Three-dimensional (3D) technologies have been developing rapidly recent years, and have influenced industrial, medical, cultural, and many other fields. In this paper, we introduce an automatic 3D human head scanning-printing system, which provides a complete pipeline to scan, reconstruct, select, and finally print out physical 3D human heads. To enhance the accuracy of our system, we developed a… ▽ More Three-dimensional (3D) technologies have been developing rapidly recent years, and have influenced industrial, medical, cultural, and many other fields. In this paper, we introduce an automatic 3D human head scanning-printing system, which provides a complete pipeline to scan, reconstruct, select, and finally print out physical 3D human heads. To enhance the accuracy of our system, we developed a consumer-grade composite sensor (including a gyroscope, an accelerometer, a digital compass, and a Kinect v2 depth sensor) as our sensing device. This sensing device is then mounted on a robot, which automatically rotates around the human subject with approximate 1-meter radius, to capture the full-view information. The data streams are further processed and fused into a 3D model of the subject using a tablet located on the robot. In addition, an automatic selection method, based on our specific system configurations, is proposed to select the head portion. We evaluated the accuracy of the proposed system by comparing our generated 3D head models, from both standard human head model and real human subjects, with the ones reconstructed from FastSCAN and Cyberware commercial laser scanning systems through computing and visualizing Hausdorff distances. Computational cost is also provided to further assess our proposed system. △ Less

Submitted 25 December, 2022; originally announced December 2022.

Journal ref: Multimedia Tools and Applications, vol. 76, no. 3, pp. 4381-4403, 2017

arXiv:2212.14772 [pdf, other]

doi 10.1145/2629673

A Combined Approach Toward Consistent Reconstructions of Indoor Spaces Based on 6D RGB-D Odometry and KinectFusion

Authors: Nadia Figueroa, Haiwei Dong, Abdulmotaleb El Saddik

Abstract: We propose a 6D RGB-D odometry approach that finds the relative camera pose between consecutive RGB-D frames by keypoint extraction and feature matching both on the RGB and depth image planes. Furthermore, we feed the estimated pose to the highly accurate KinectFusion algorithm, which uses a fast ICP (Iterative Closest Point) to fine-tune the frame-to-frame relative pose and fuse the depth data in… ▽ More We propose a 6D RGB-D odometry approach that finds the relative camera pose between consecutive RGB-D frames by keypoint extraction and feature matching both on the RGB and depth image planes. Furthermore, we feed the estimated pose to the highly accurate KinectFusion algorithm, which uses a fast ICP (Iterative Closest Point) to fine-tune the frame-to-frame relative pose and fuse the depth data into a global implicit surface. We evaluate our method on a publicly available RGB-D SLAM benchmark dataset by Sturm et al. The experimental results show that our proposed reconstruction method solely based on visual odometry and KinectFusion outperforms the state-of-the-art RGB-D SLAM system accuracy. Moreover, our algorithm outputs a ready-to-use polygon mesh (highly suitable for creating 3D virtual worlds) without any postprocessing steps. △ Less

Submitted 25 December, 2022; originally announced December 2022.

Journal ref: ACM Trans. Intell. Syst., vol. 6, no. 2, pp. 14:1-10, 2015

arXiv:2212.14771 [pdf, other]

doi 10.1109/JSEN.2017.2671420

Development of a Self-Calibrated Motion Capture System by Nonlinear Trilateration of Multiple Kinects v2

Authors: Bowen Yang, Haiwei Dong, Abdulmotaleb El Saddik

Abstract: In this paper, a Kinect-based distributed and real-time motion capture system is developed. A trigonometric method is applied to calculate the relative position of Kinect v2 sensors with a calibration wand and register the sensors' positions automatically. By combining results from multiple sensors with a nonlinear least square method, the accuracy of the motion capture is optimized. Moreover, to… ▽ More In this paper, a Kinect-based distributed and real-time motion capture system is developed. A trigonometric method is applied to calculate the relative position of Kinect v2 sensors with a calibration wand and register the sensors' positions automatically. By combining results from multiple sensors with a nonlinear least square method, the accuracy of the motion capture is optimized. Moreover, to exclude inaccurate results from sensors, a computational geometry is applied in the occlusion approach, which discovers occluded joint data. The synchronization approach is based on an NTP protocol that synchronizes the time between the clocks of a server and clients dynamically, ensuring that the proposed system is a real-time system. Experiments for validating the proposed system are conducted from the perspective of calibration, occlusion, accuracy, and efficiency. Furthermore, to demonstrate the practical performance of our system, a comparison of previously developed motion capture systems (the linear trilateration approach and the geometric trilateration approach) with the benchmark OptiTrack system is conducted, therein showing that the accuracy of our proposed system is $38.3\%$ and 24.1% better than the two aforementioned trilateration systems, respectively. △ Less

Submitted 25 December, 2022; originally announced December 2022.

Journal ref: IEEE Sensors Journal, vol. 17, no. 8, pp. 2481-2491, 2017

arXiv:2212.13844 [pdf, other]

doi 10.1109/JSEN.2015.2416651

Evaluating and Improving the Depth Accuracy of Kinect for Windows v2

Authors: Lin Yang, Longyu Zhang, Haiwei Dong, Abdulhameed Alelaiwi, Abdulmotaleb El Saddik

Abstract: Microsoft Kinect sensor has been widely used in many applications since the launch of its first version. Recently, Microsoft released a new version of Kinect sensor with improved hardware. However, the accuracy assessment of the sensor remains to be answered. In this paper, we measure the depth accuracy of the newly released Kinect v2 depth sensor, and obtain a cone model to illustrate its accurac… ▽ More Microsoft Kinect sensor has been widely used in many applications since the launch of its first version. Recently, Microsoft released a new version of Kinect sensor with improved hardware. However, the accuracy assessment of the sensor remains to be answered. In this paper, we measure the depth accuracy of the newly released Kinect v2 depth sensor, and obtain a cone model to illustrate its accuracy distribution. We then evaluate the variance of the captured depth values by depth entropy. In addition, we propose a trilateration method to improve the depth accuracy with multiple Kinects simultaneously. The experimental results are provided to ascertain the proposed model and method. △ Less

Submitted 25 December, 2022; originally announced December 2022.

Journal ref: IEEE Sensors Journal, vol. 15, no. 8, pp. 4275-4285, 2015

arXiv:2212.13843 [pdf, other]

doi 10.1109/TMM.2018.2883866

EVM-CNN: Real-Time Contactless Heart Rate Estimation from Facial Video

Authors: Ying Qiu, Yang Liu, Juan Arteaga-Falconi, Haiwei Dong, Abdulmotaleb El Saddik

Abstract: With the increase in health consciousness, noninvasive body monitoring has aroused interest among researchers. As one of the most important pieces of physiological information, researchers have remotely estimated the heart rate (HR) from facial videos in recent years. Although progress has been made over the past few years, there are still some limitations, like the processing time increasing with… ▽ More With the increase in health consciousness, noninvasive body monitoring has aroused interest among researchers. As one of the most important pieces of physiological information, researchers have remotely estimated the heart rate (HR) from facial videos in recent years. Although progress has been made over the past few years, there are still some limitations, like the processing time increasing with accuracy and the lack of comprehensive and challenging datasets for use and comparison. Recently, it was shown that HR information can be extracted from facial videos by spatial decomposition and temporal filtering. Inspired by this, a new framework is introduced in this paper to remotely estimate the HR under realistic conditions by combining spatial and temporal filtering and a convolutional neural network. Our proposed approach shows better performance compared with the benchmark on the MMSE-HR dataset in terms of both the average HR estimation and short-time HR estimation. High consistency in short-time HR estimation is observed between our method and the ground truth. △ Less

Submitted 25 December, 2022; originally announced December 2022.

Journal ref: IEEE Transactions on Multimedia, vol. 21, no. 7, pp. 1778-1787, 2019

arXiv:2212.13842 [pdf]

doi 10.1109/MMUL.2018.2873843

Towards a QoE Model to Evaluate Holographic Augmented Reality Devices

Authors: Longyu Zhang, Haiwei Dong, Abdulmotaleb El Saddik

Abstract: Augmented reality (AR) technology is developing fast and provides users with new ways to interact with the real-world surrounding environment. Although the performance of holographic AR multimedia devices can be measured with traditional quality-of-service parameters, a quality-of-experience (QoE) model can better evaluate the device from the perspective of users. As there are currently no well-re… ▽ More Augmented reality (AR) technology is developing fast and provides users with new ways to interact with the real-world surrounding environment. Although the performance of holographic AR multimedia devices can be measured with traditional quality-of-service parameters, a quality-of-experience (QoE) model can better evaluate the device from the perspective of users. As there are currently no well-recognized models for measuring the QoE of a holographic AR multimedia device, we present a QoE framework and model it with a fuzzy inference system to quantitatively evaluate the device. △ Less

Submitted 25 December, 2022; originally announced December 2022.

Journal ref: IEEE Multimedia, vol. 26, no. 2, pp. 21-32, 2018

arXiv:2212.12910 [pdf, other]

doi 10.1109/JSEN.2020.2999849

Learning to Estimate 3D Human Pose from Point Cloud

Authors: Yufan Zhou, Haiwei Dong, Abdulmotaleb El Saddik

Abstract: 3D pose estimation is a challenging problem in computer vision. Most of the existing neural-network-based approaches address color or depth images through convolution networks (CNNs). In this paper, we study the task of 3D human pose estimation from depth images. Different from the existing CNN-based human pose estimation method, we propose a deep human pose network for 3D pose estimation by takin… ▽ More 3D pose estimation is a challenging problem in computer vision. Most of the existing neural-network-based approaches address color or depth images through convolution networks (CNNs). In this paper, we study the task of 3D human pose estimation from depth images. Different from the existing CNN-based human pose estimation method, we propose a deep human pose network for 3D pose estimation by taking the point cloud data as input data to model the surface of complex human structures. We first cast the 3D human pose estimation from 2D depth images to 3D point clouds and directly predict the 3D joint position. Our experiments on two public datasets show that our approach achieves higher accuracy than previous state-of-art methods. The reported results on both ITOP and EVAL datasets demonstrate the effectiveness of our method on the targeted tasks. △ Less

Submitted 25 December, 2022; originally announced December 2022.

Journal ref: IEEE Sensors Journal, vol. 20, no. 20, pp. 12334-12342, 2020

arXiv:2212.12908 [pdf, other]

doi 10.1109/JSEN.2020.3016611

Sitting Posture Recognition Using a Spiking Neural Network

Authors: Jianquan Wang, Basim Hafidh, Haiwei Dong, Abdulmotaleb El Saddik

Abstract: To increase the quality of citizens' lives, we designed a personalized smart chair system to recognize sitting behaviors. The system can receive surface pressure data from the designed sensor and provide feedback for guiding the user towards proper sitting postures. We used a liquid state machine and a logistic regression classifier to construct a spiking neural network for classifying 15 sitting… ▽ More To increase the quality of citizens' lives, we designed a personalized smart chair system to recognize sitting behaviors. The system can receive surface pressure data from the designed sensor and provide feedback for guiding the user towards proper sitting postures. We used a liquid state machine and a logistic regression classifier to construct a spiking neural network for classifying 15 sitting postures. To allow this system to read our pressure data into the spiking neurons, we designed an algorithm to encode map-like data into cosine-rank sparsity data. The experimental results consisting of 15 sitting postures from 19 participants show that the prediction precision of our SNN is 88.52%. △ Less

Submitted 25 December, 2022; originally announced December 2022.

Journal ref: IEEE Sensors Journal, vol. 21, no. 2, pp. 1779-1786, 2021

arXiv:2212.12907 [pdf]

doi 10.1109/MMUL.2018.2873473

Technical Evaluation of HoloLens for Multimedia: A First Look

Authors: Yang Liu, Haiwei Dong, Longyu Zhang, Abdulmotaleb El Saddik

Abstract: A recently released cutting-edge AR device, Microsoft HoloLens, has attracted considerable attention with its advanced capabilities. In this article, we report the design and execution of a series of experiments to quantitatively evaluate HoloLens' performance in head localization, real environment reconstruction, spatial mapping, hologram visualization, and speech recognition. A recently released cutting-edge AR device, Microsoft HoloLens, has attracted considerable attention with its advanced capabilities. In this article, we report the design and execution of a series of experiments to quantitatively evaluate HoloLens' performance in head localization, real environment reconstruction, spatial mapping, hologram visualization, and speech recognition. △ Less

Submitted 25 December, 2022; originally announced December 2022.

Journal ref: IEEE Multimedia, vol. 25, no. 4, pp. 8-18, 2018

arXiv:2212.10295 [pdf, other]

doi 10.1109/MCE.2022.3165961

Interacting with New York City Data by HoloLens through Remote Rendering

Authors: Zijian Long, Haiwei Dong, Abdulmotaleb El Saddik

Abstract: In the digital era, Extended Reality (XR) is considered the next frontier. However, XR systems are computationally intensive, and they must be implemented within strict latency constraints. Thus, XR devices with finite computing resources are limited in terms of quality of experience (QoE) they can offer, particularly in cases of big 3D data. This problem can be effectively addressed by offloading… ▽ More In the digital era, Extended Reality (XR) is considered the next frontier. However, XR systems are computationally intensive, and they must be implemented within strict latency constraints. Thus, XR devices with finite computing resources are limited in terms of quality of experience (QoE) they can offer, particularly in cases of big 3D data. This problem can be effectively addressed by offloading the highly intensive rendering tasks to a remote server. Therefore, we proposed a remote rendering enabled XR system that presents the 3D city model of New York City on the Microsoft HoloLens. Experimental results indicate that remote rendering outperforms local rendering for the New York City model with significant improvement in average QoE by at least 21%. Additionally, we clarified the network traffic pattern in the proposed XR system developed under the OpenXR standard. △ Less

Submitted 20 December, 2022; originally announced December 2022.

Journal ref: IEEE Consumer Electronics Magazine, vol. 11, no. 5, pp. 64-72, 2022

arXiv:2210.04606 [pdf, ps, other]

Integrating Digital Twin and Advanced Intelligent Technologies to Realize the Metaverse

Authors: Moayad Aloqaily, Ouns Bouachir, Fakhri Karray, Ismaeel Al Ridhawi, Abdulmotaleb El Saddik

Abstract: The advances in Artificial Intelligence (AI) have led to technological advancements in a plethora of domains. Healthcare, education, and smart city services are now enriched with AI capabilities. These technological advancements would not have been realized without the assistance of fast, secure, and fault-tolerant communication media. Traditional processing, communication and storage technologies… ▽ More The advances in Artificial Intelligence (AI) have led to technological advancements in a plethora of domains. Healthcare, education, and smart city services are now enriched with AI capabilities. These technological advancements would not have been realized without the assistance of fast, secure, and fault-tolerant communication media. Traditional processing, communication and storage technologies cannot maintain high levels of scalability and user experience for immersive services. The metaverse is an immersive three-dimensional (3D) virtual world that integrates fantasy and reality into a virtual environment using advanced virtual reality (VR) and augmented reality (AR) devices. Such an environment is still being developed and requires extensive research in order for it to be realized to its highest attainable levels. In this article, we discuss some of the key issues required in order to attain realization of metaverse services. We propose a framework that integrates digital twin (DT) with other advanced technologies such as the sixth generation (6G) communication network, blockchain, and AI, to maintain continuous end-to-end metaverse services. This article also outlines requirements for an integrated, DT-enabled metaverse framework and provides a look ahead into the evolving topic. △ Less

Submitted 3 October, 2022; originally announced October 2022.

Comments: 7 pages, 2 figures, Accepted for publication, IEEE Consumer Electronics Magazine

arXiv:2207.12850 [pdf, other]

SSIVD-Net: A Novel Salient Super Image Classification & Detection Technique for Weaponized Violence

Authors: Toluwani Aremu, Li Zhiyuan, Reem Alameeri, Mustaqeem Khan, Abdulmotaleb El Saddik

Abstract: Detection of violence and weaponized violence in closed-circuit television (CCTV) footage requires a comprehensive approach. In this work, we introduce the \emph{Smart-City CCTV Violence Detection (SCVD)} dataset, specifically designed to facilitate the learning of weapon distribution in surveillance videos. To tackle the complexities of analyzing 3D surveillance video for violence recognition tas… ▽ More Detection of violence and weaponized violence in closed-circuit television (CCTV) footage requires a comprehensive approach. In this work, we introduce the \emph{Smart-City CCTV Violence Detection (SCVD)} dataset, specifically designed to facilitate the learning of weapon distribution in surveillance videos. To tackle the complexities of analyzing 3D surveillance video for violence recognition tasks, we propose a novel technique called \emph{SSIVD-Net} (\textbf{S}alient-\textbf{S}uper-\textbf{I}mage for \textbf{V}iolence \textbf{D}etection). Our method reduces 3D video data complexity, dimensionality, and information loss while improving inference, performance, and explainability through salient-super-Image representations. Considering the scalability and sustainability requirements of futuristic smart cities, the authors introduce the \emph{Salient-Classifier}, a novel architecture combining a kernelized approach with a residual learning strategy. We evaluate variations of SSIVD-Net and Salient Classifier on our SCVD dataset and benchmark against state-of-the-art (SOTA) models commonly employed in violence detection. Our approach exhibits significant improvements in detecting both weaponized and non-weaponized violence instances. By advancing the SOTA in violence detection, our work offers a practical and scalable solution suitable for real-world applications. The proposed methodology not only addresses the challenges of violence detection in CCTV footage but also contributes to the understanding of weapon distribution in smart surveillance. Ultimately, our research findings should enable smarter and more secure cities, as well as enhance public safety measures. △ Less

Submitted 7 November, 2023; v1 submitted 26 July, 2022; originally announced July 2022.

Comments: Contains 5 tables and 3 figures. Accepted at the 2024 SAI Computing Conference

arXiv:2207.07913 [pdf, other]

Dual-branch Hybrid Learning Network for Unbiased Scene Graph Generation

Authors: Chaofan Zheng, Lianli Gao, Xinyu Lyu, Pengpeng Zeng, Abdulmotaleb El Saddik, Heng Tao Shen

Abstract: The current studies of Scene Graph Generation (SGG) focus on solving the long-tailed problem for generating unbiased scene graphs. However, most de-biasing methods overemphasize the tail predicates and underestimate head ones throughout training, thereby wrecking the representation ability of head predicate features. Furthermore, these impaired features from head predicates harm the learning of ta… ▽ More The current studies of Scene Graph Generation (SGG) focus on solving the long-tailed problem for generating unbiased scene graphs. However, most de-biasing methods overemphasize the tail predicates and underestimate head ones throughout training, thereby wrecking the representation ability of head predicate features. Furthermore, these impaired features from head predicates harm the learning of tail predicates. In fact, the inference of tail predicates heavily depends on the general patterns learned from head ones, e.g., "standing on" depends on "on". Thus, these de-biasing SGG methods can neither achieve excellent performance on tail predicates nor satisfying behaviors on head ones. To address this issue, we propose a Dual-branch Hybrid Learning network (DHL) to take care of both head predicates and tail ones for SGG, including a Coarse-grained Learning Branch (CLB) and a Fine-grained Learning Branch (FLB). Specifically, the CLB is responsible for learning expertise and robust features of head predicates, while the FLB is expected to predict informative tail predicates. Furthermore, DHL is equipped with a Branch Curriculum Schedule (BCS) to make the two branches work well together. Experiments show that our approach achieves a new state-of-the-art performance on VG and GQA datasets and makes a trade-off between the performance of tail predicates and head ones. Moreover, extensive experiments on two downstream tasks (i.e., Image Captioning and Sentence-to-Graph Retrieval) further verify the generalization and practicability of our method. △ Less

Submitted 16 July, 2022; originally announced July 2022.

arXiv:1909.10164 [pdf, other]

sZoom: A Framework for Automatic Zoom into High Resolution Surveillance Videos

Authors: Mukesh Saini, Benjamin Guthier, Hao Kuang, Dwarikanath Mahapatra, Abdulmotaleb El Saddik

Abstract: Current cameras are capable of recording high resolution video. While viewing on a mobile device, a user can manually zoom into this high resolution video to get more detailed view of objects and activities. However, manual zooming is not suitable for surveillance and monitoring. It is tiring to continuously keep zooming into various regions of the video. Also, while viewing one region, the operat… ▽ More Current cameras are capable of recording high resolution video. While viewing on a mobile device, a user can manually zoom into this high resolution video to get more detailed view of objects and activities. However, manual zooming is not suitable for surveillance and monitoring. It is tiring to continuously keep zooming into various regions of the video. Also, while viewing one region, the operator may miss activities in other regions. In this paper, we propose sZoom, a framework to automatically zoom into a high resolution surveillance video. The proposed framework selectively zooms into the sensitive regions of the video to present details of the scene, while still preserving the overall context required for situation assessment. A multi-variate Gaussian penalty is introduced to ensure full coverage of the scene. The method achieves near real-time performance through a number of timing optimizations. An extensive user study shows that, while watching a full HD video on a mobile device, the system enhances the security operator's efficiency in understanding the details of the scene by 99% on the average compared to a scaled version of the original high resolution video. The produced video achieved 46% higher ratings for usefulness in a surveillance task. △ Less

Submitted 23 September, 2019; originally announced September 2019.

Showing 1–37 of 37 results for author: Saddik, A E