-
Leveraging Hybrid Intelligence Towards Sustainable and Energy-Efficient Machine Learning
Authors:
Daniel Geissler,
Paul Lukowicz
Abstract:
Hybrid intelligence aims to enhance decision-making, problem-solving, and overall system performance by combining the strengths of both, human cognitive abilities and artificial intelligence. With the rise of Large Language Models (LLM), progressively participating as smart agents to accelerate machine learning development, Hybrid Intelligence is becoming an increasingly important topic for effect…
▽ More
Hybrid intelligence aims to enhance decision-making, problem-solving, and overall system performance by combining the strengths of both, human cognitive abilities and artificial intelligence. With the rise of Large Language Models (LLM), progressively participating as smart agents to accelerate machine learning development, Hybrid Intelligence is becoming an increasingly important topic for effective interaction between humans and machines. This paper presents an approach to leverage Hybrid Intelligence towards sustainable and energy-aware machine learning. When developing machine learning models, final model performance commonly rules the optimization process while the efficiency of the process itself is often neglected. Moreover, in recent times, energy efficiency has become equally crucial due to the significant environmental impact of complex and large-scale computational processes. The contribution of this work covers the interactive inclusion of secondary knowledge sources through Human-in-the-loop (HITL) and LLM agents to stress out and further resolve inefficiencies in the machine learning development process.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Sensor Data Augmentation from Skeleton Pose Sequences for Improving Human Activity Recognition
Authors:
Parham Zolfaghari,
Vitor Fortes Rey,
Lala Ray,
Hyun Kim,
Sungho Suh,
Paul Lukowicz
Abstract:
The proliferation of deep learning has significantly advanced various fields, yet Human Activity Recognition (HAR) has not fully capitalized on these developments, primarily due to the scarcity of labeled datasets. Despite the integration of advanced Inertial Measurement Units (IMUs) in ubiquitous wearable devices like smartwatches and fitness trackers, which offer self-labeled activity data from…
▽ More
The proliferation of deep learning has significantly advanced various fields, yet Human Activity Recognition (HAR) has not fully capitalized on these developments, primarily due to the scarcity of labeled datasets. Despite the integration of advanced Inertial Measurement Units (IMUs) in ubiquitous wearable devices like smartwatches and fitness trackers, which offer self-labeled activity data from users, the volume of labeled data remains insufficient compared to domains where deep learning has achieved remarkable success. Addressing this gap, in this paper, we propose a novel approach to improve wearable sensor-based HAR by introducing a pose-to-sensor network model that generates sensor data directly from 3D skeleton pose sequences. our method simultaneously trains the pose-to-sensor network and a human activity classifier, optimizing both data reconstruction and activity recognition. Our contributions include the integration of simultaneous training, direct pose-to-sensor generation, and a comprehensive evaluation on the MM-Fit dataset. Experimental results demonstrate the superiority of our framework with significant performance improvements over baseline methods.
△ Less
Submitted 25 April, 2024;
originally announced June 2024.
-
Initial Investigation of Kolmogorov-Arnold Networks (KANs) as Feature Extractors for IMU Based Human Activity Recognition
Authors:
Mengxi Liu,
Daniel Geißler,
Dominique Nshimyimana,
Sizhen Bian,
Bo Zhou,
Paul Lukowicz
Abstract:
In this work, we explore the use of a novel neural network architecture, the Kolmogorov-Arnold Networks (KANs) as feature extractors for sensor-based (specifically IMU) Human Activity Recognition (HAR). Where conventional networks perform a parameterized weighted sum of the inputs at each node and then feed the result into a statically defined nonlinearity, KANs perform non-linear computations rep…
▽ More
In this work, we explore the use of a novel neural network architecture, the Kolmogorov-Arnold Networks (KANs) as feature extractors for sensor-based (specifically IMU) Human Activity Recognition (HAR). Where conventional networks perform a parameterized weighted sum of the inputs at each node and then feed the result into a statically defined nonlinearity, KANs perform non-linear computations represented by B-SPLINES on the edges leading to each node and then just sum up the inputs at the node. Instead of learning weights, the system learns the spline parameters. In the original work, such networks have been shown to be able to more efficiently and exactly learn sophisticated real valued functions e.g. in regression or PDE solution. We hypothesize that such an ability is also advantageous for computing low-level features for IMU-based HAR. To this end, we have implemented KAN as the feature extraction architecture for IMU-based human activity recognition tasks, including four architecture variations. We present an initial performance investigation of the KAN feature extractor on four public HAR datasets. It shows that the KAN-based feature extractor outperforms CNN-based extractors on all datasets while being more parameter efficient.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
MuJo: Multimodal Joint Feature Space Learning for Human Activity Recognition
Authors:
Stefan Gerd Fritsch,
Cennet Oguz,
Vitor Fortes Rey,
Lala Ray,
Maximilian Kiefer-Emmanouilidis,
Paul Lukowicz
Abstract:
Human Activity Recognition is a longstanding problem in AI with applications in a broad range of areas: from healthcare, sports and fitness, security, and human computer interaction to robotics. The performance of HAR in real-world settings is strongly dependent on the type and quality of the input signal that can be acquired. Given an unobstructed, high-quality camera view of a scene, computer vi…
▽ More
Human Activity Recognition is a longstanding problem in AI with applications in a broad range of areas: from healthcare, sports and fitness, security, and human computer interaction to robotics. The performance of HAR in real-world settings is strongly dependent on the type and quality of the input signal that can be acquired. Given an unobstructed, high-quality camera view of a scene, computer vision systems, in particular in conjunction with foundational models (e.g., CLIP), can today fairly reliably distinguish complex activities. On the other hand, recognition using modalities such as wearable sensors (which are often more broadly available, e.g, in mobile phones and smartwatches) is a more difficult problem, as the signals often contain less information and labeled training data is more difficult to acquire. In this work, we show how we can improve HAR performance across different modalities using multimodal contrastive pretraining. Our approach MuJo (Multimodal Joint Feature Space Learning), learns a multimodal joint feature space with video, language, pose, and IMU sensor data. The proposed approach combines contrastive and multitask learning methods and analyzes different multitasking strategies for learning a compact shared representation. A large dataset with parallel video, language, pose, and sensor data points is also introduced to support the research, along with an analysis of the robustness of the multimodal joint space for modal-incomplete and low-resource data. On the MM-Fit dataset, our model achieves an impressive Macro F1-Score of up to 0.992 with only 2% of the train data and 0.999 when using all available training data for classification tasks. Moreover, in the scenario where the MM-Fit dataset is unseen, we demonstrate a generalization performance of up to 0.638.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
iKAN: Global Incremental Learning with KAN for Human Activity Recognition Across Heterogeneous Datasets
Authors:
Mengxi Liu,
Sizhen Bian,
Bo Zhou,
Paul Lukowicz
Abstract:
This work proposes an incremental learning (IL) framework for wearable sensor human activity recognition (HAR) that tackles two challenges simultaneously: catastrophic forgetting and non-uniform inputs. The scalable framework, iKAN, pioneers IL with Kolmogorov-Arnold Networks (KAN) to replace multi-layer perceptrons as the classifier that leverages the local plasticity and global stability of spli…
▽ More
This work proposes an incremental learning (IL) framework for wearable sensor human activity recognition (HAR) that tackles two challenges simultaneously: catastrophic forgetting and non-uniform inputs. The scalable framework, iKAN, pioneers IL with Kolmogorov-Arnold Networks (KAN) to replace multi-layer perceptrons as the classifier that leverages the local plasticity and global stability of splines. To adapt KAN for HAR, iKAN uses task-specific feature branches and a feature redistribution layer. Unlike existing IL methods that primarily adjust the output dimension or the number of classifier nodes to adapt to new tasks, iKAN focuses on expanding the feature extraction branches to accommodate new inputs from different sensor modalities while maintaining consistent dimensions and the number of classifier outputs. Continual learning across six public HAR datasets demonstrated the iKAN framework's incremental learning performance, with a last performance of 84.9\% (weighted F1 score) and an average incremental performance of 81.34\%, which significantly outperforms the two existing incremental learning methods, such as EWC (51.42\%) and experience replay (59.92\%).
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Enhancing Inertial Hand based HAR through Joint Representation of Language, Pose and Synthetic IMUs
Authors:
Vitor Fortes Rey,
Lala Shakti Swarup Ray,
Xia Qingxin,
Kaishun Wu,
Paul Lukowicz
Abstract:
Due to the scarcity of labeled sensor data in HAR, prior research has turned to video data to synthesize Inertial Measurement Units (IMU) data, capitalizing on its rich activity annotations. However, generating IMU data from videos presents challenges for HAR in real-world settings, attributed to the poor quality of synthetic IMU data and its limited efficacy in subtle, fine-grained motions. In th…
▽ More
Due to the scarcity of labeled sensor data in HAR, prior research has turned to video data to synthesize Inertial Measurement Units (IMU) data, capitalizing on its rich activity annotations. However, generating IMU data from videos presents challenges for HAR in real-world settings, attributed to the poor quality of synthetic IMU data and its limited efficacy in subtle, fine-grained motions. In this paper, we propose Multi$^3$Net, our novel multi-modal, multitask, and contrastive-based framework approach to address the issue of limited data. Our pretraining procedure uses videos from online repositories, aiming to learn joint representations of text, pose, and IMU simultaneously. By employing video data and contrastive learning, our method seeks to enhance wearable HAR performance, especially in recognizing subtle activities.Our experimental findings validate the effectiveness of our approach in improving HAR performance with IMU data. We demonstrate that models trained with synthetic IMU data generated from videos using our method surpass existing approaches in recognizing fine-grained activities.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
BeSound: Bluetooth-Based Position Estimation Enhancing with Cross-Modality Distillation
Authors:
Hymalai Bello,
Sungho Suh,
Bo Zhou,
Paul Lukowicz
Abstract:
Smart factories leverage advanced technologies to optimize manufacturing processes and enhance efficiency. Implementing worker tracking systems, primarily through camera-based methods, ensures accurate monitoring. However, concerns about worker privacy and technology protection make it necessary to explore alternative approaches. We propose a non-visual, scalable solution using Bluetooth Low Energ…
▽ More
Smart factories leverage advanced technologies to optimize manufacturing processes and enhance efficiency. Implementing worker tracking systems, primarily through camera-based methods, ensures accurate monitoring. However, concerns about worker privacy and technology protection make it necessary to explore alternative approaches. We propose a non-visual, scalable solution using Bluetooth Low Energy (BLE) and ultrasound coordinates. BLE position estimation offers a very low-power and cost-effective solution, as the technology is available on smartphones and is scalable due to the large number of smartphone users, facilitating worker localization and safety protocol transmission. Ultrasound signals provide faster response times and higher accuracy but require custom hardware, increasing costs. To combine the benefits of both modalities, we employ knowledge distillation (KD) from ultrasound signals to BLE RSSI data. Once the student model is trained, the model only takes as inputs the BLE-RSSI data for inference, retaining the advantages of ubiquity and low cost of BLE RSSI. We tested our approach using data from an experiment with twelve participants in a smart factory test bed environment. We obtained an increase of 11.79% in the F1-score compared to the baseline (target model without KD and trained with BLE-RSSI data only).
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
iFace: Hand-Over-Face Gesture Recognition Leveraging Impedance Sensing
Authors:
Mengxi Liu,
Hymalai Bello,
Bo Zhou,
Paul Lukowicz,
Jakob Karolus
Abstract:
Hand-over-face gestures can provide important implicit interactions during conversations, such as frustration or excitement. However, in situations where interlocutors are not visible, such as phone calls or textual communication, the potential meaning contained in the hand-over-face gestures is lost. In this work, we present iFace, an unobtrusive, wearable impedance-sensing solution for recognizi…
▽ More
Hand-over-face gestures can provide important implicit interactions during conversations, such as frustration or excitement. However, in situations where interlocutors are not visible, such as phone calls or textual communication, the potential meaning contained in the hand-over-face gestures is lost. In this work, we present iFace, an unobtrusive, wearable impedance-sensing solution for recognizing different hand-over-face gestures. In contrast to most existing works, iFace does not require the placement of sensors on the user's face or hands. Instead, we proposed a novel sensing configuration, the shoulders, which remains invisible to both the user and outside observers. The system can monitor the shoulder-to-shoulder impedance variation caused by gestures through electrodes attached to each shoulder. We evaluated iFace in a user study with eight participants, collecting six kinds of hand-over-face gestures with different meanings. Using a convolutional neural network and a user-dependent classification, iFace reaches 82.58 \% macro F1 score. We discuss potential application scenarios of iFace as an implicit interaction interface.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Text me the data: Generating Ground Pressure Sequence from Textual Descriptions for HAR
Authors:
Lala Shakti Swarup Ray,
Bo Zhou,
Sungho Suh,
Lars Krupp,
Vitor Fortes Rey,
Paul Lukowicz
Abstract:
In human activity recognition (HAR), the availability of substantial ground truth is necessary for training efficient models. However, acquiring ground pressure data through physical sensors itself can be cost-prohibitive, time-consuming. To address this critical need, we introduce Text-to-Pressure (T2P), a framework designed to generate extensive ground pressure sequences from textual description…
▽ More
In human activity recognition (HAR), the availability of substantial ground truth is necessary for training efficient models. However, acquiring ground pressure data through physical sensors itself can be cost-prohibitive, time-consuming. To address this critical need, we introduce Text-to-Pressure (T2P), a framework designed to generate extensive ground pressure sequences from textual descriptions of human activities using deep learning techniques. We show that the combination of vector quantization of sensor data along with simple text conditioned auto regressive strategy allows us to obtain high-quality generated pressure sequences from textual descriptions with the help of discrete latent correlation between text and pressure maps. We achieved comparable performance on the consistency between text and generated motion with an R squared value of 0.722, Masked R squared value of 0.892, and FID score of 1.83. Additionally, we trained a HAR model with the the synthesized data and evaluated it on pressure dynamics collected by a real pressure sensor which is on par with a model trained on only real data. Combining both real and synthesized training data increases the overall macro F1 score by 5.9 percent.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
iMove: Exploring Bio-impedance Sensing for Fitness Activity Recognition
Authors:
Mengxi Liu,
Vitor Fortes Rey,
Yu Zhang,
Lala Shakti Swarup Ray,
Bo Zhou,
Paul Lukowicz
Abstract:
Automatic and precise fitness activity recognition can be beneficial in aspects from promoting a healthy lifestyle to personalized preventative healthcare. While IMUs are currently the prominent fitness tracking modality, through iMove, we show bio-impedence can help improve IMU-based fitness tracking through sensor fusion and contrastive learning.To evaluate our methods, we conducted an experimen…
▽ More
Automatic and precise fitness activity recognition can be beneficial in aspects from promoting a healthy lifestyle to personalized preventative healthcare. While IMUs are currently the prominent fitness tracking modality, through iMove, we show bio-impedence can help improve IMU-based fitness tracking through sensor fusion and contrastive learning.To evaluate our methods, we conducted an experiment including six upper body fitness activities performed by ten subjects over five days to collect synchronized data from bio-impedance across two wrists and IMU on the left wrist.The contrastive learning framework uses the two modalities to train a better IMU-only classification model, where bio-impedance is only required at the training phase, by which the average Macro F1 score with the input of a single IMU was improved by 3.22 \% reaching 84.71 \% compared to the 81.49 \% of the IMU baseline model. We have also shown how bio-impedance can improve human activity recognition (HAR) directly through sensor fusion, reaching an average Macro F1 score of 89.57 \% (two modalities required for both training and inference) even if Bio-impedance alone has an average macro F1 score of 75.36 \%, which is outperformed by IMU alone. In addition, similar results were obtained in an extended study on lower body fitness activity classification, demonstrating the generalisability of our approach.Our findings underscore the potential of sensor fusion and contrastive learning as valuable tools for advancing fitness activity recognition, with bio-impedance playing a pivotal role in augmenting the capabilities of IMU-based systems.
△ Less
Submitted 3 June, 2024; v1 submitted 31 January, 2024;
originally announced February 2024.
-
Lessons Learned from Designing an Open-Source Automated Feedback System for STEM Education
Authors:
Steffen Steinert,
Lars Krupp,
Karina E. Avila,
Anke S. Janssen,
Verena Ruf,
David Dzsotjan,
Christian De Schryver,
Jakob Karolus,
Stefan Ruzika,
Karen Joisten,
Paul Lukowicz,
Jochen Kuhn,
Norbert Wehn,
Stefan Küchemann
Abstract:
As distance learning becomes increasingly important and artificial intelligence tools continue to advance, automated systems for individual learning have attracted significant attention. However, the scarcity of open-source online tools that are capable of providing personalized feedback has restricted the widespread implementation of research-based feedback systems. In this work, we present RATsA…
▽ More
As distance learning becomes increasingly important and artificial intelligence tools continue to advance, automated systems for individual learning have attracted significant attention. However, the scarcity of open-source online tools that are capable of providing personalized feedback has restricted the widespread implementation of research-based feedback systems. In this work, we present RATsApp, an open-source automated feedback system (AFS) that incorporates research-based features such as formative feedback. The system focuses on core STEM competencies such as mathematical competence, representational competence, and data literacy. It also allows lecturers to monitor students' progress. We conducted a survey based on the technology acceptance model (TAM2) among a set of students (N=64). Our findings confirm the applicability of the TAM2 framework, revealing that factors such as the relevance of the studies, output quality, and ease of use significantly influence the perceived usefulness. We also found a linear relation between the perceived usefulness and the intention to use, which in turn is a significant predictor of the frequency of use. Moreover, the formative feedback feature of RATsApp received positive feedback, indicating its potential as an educational tool. Furthermore, as an open-source platform, RATsApp encourages public contributions to its ongoing development, fostering a collaborative approach to improve educational tools.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Body-Area Capacitive or Electric Field Sensing for Human Activity Recognition and Human-Computer Interaction: A Comprehensive Survey
Authors:
Sizhen Bian,
Mengxi Liu,
Bo Zhou,
Paul Lukowicz,
Michele Magno
Abstract:
Due to the fact that roughly sixty percent of the human body is essentially composed of water, the human body is inherently a conductive object, being able to, firstly, form an inherent electric field from the body to the surroundings and secondly, deform the distribution of an existing electric field near the body. Body-area capacitive sensing, also called body-area electric field sensing, is bec…
▽ More
Due to the fact that roughly sixty percent of the human body is essentially composed of water, the human body is inherently a conductive object, being able to, firstly, form an inherent electric field from the body to the surroundings and secondly, deform the distribution of an existing electric field near the body. Body-area capacitive sensing, also called body-area electric field sensing, is becoming a promising alternative for wearable devices to accomplish certain tasks in human activity recognition and human-computer interaction. Over the last decade, researchers have explored plentiful novel sensing systems backed by the body-area electric field. On the other hand, despite the pervasive exploration of the body-area electric field, a comprehensive survey does not exist for an enlightening guideline. Moreover, the various hardware implementations, applied algorithms, and targeted applications result in a challenging task to achieve a systematic overview of the subject. This paper aims to fill in the gap by comprehensively summarizing the existing works on body-area capacitive sensing so that researchers can have a better view of the current exploration status. To this end, we first sorted the explorations into three domains according to the involved body forms: body-part electric field, whole-body electric field, and body-to-body electric field, and enumerated the state-of-art works in the domains with a detailed survey of the backed sensing tricks and targeted applications. We then summarized the three types of sensing frontends in circuit design, which is the most critical part in body-area capacitive sensing, and analyzed the data processing pipeline categorized into three kinds of approaches. Finally, we described the challenges and outlooks of body-area electric sensing.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
CoSS: Co-optimizing Sensor and Sampling Rate for Data-Efficient AI in Human Activity Recognition
Authors:
Mengxi Liu,
Zimin Zhao,
Daniel Geißler,
Bo Zhou,
Sungho Suh,
Paul Lukowicz
Abstract:
Recent advancements in Artificial Neural Networks have significantly improved human activity recognition using multiple time-series sensors. While employing numerous sensors with high-frequency sampling rates usually improves the results, it often leads to data inefficiency and unnecessary expansion of the ANN, posing a challenge for their practical deployment on edge devices. Addressing these iss…
▽ More
Recent advancements in Artificial Neural Networks have significantly improved human activity recognition using multiple time-series sensors. While employing numerous sensors with high-frequency sampling rates usually improves the results, it often leads to data inefficiency and unnecessary expansion of the ANN, posing a challenge for their practical deployment on edge devices. Addressing these issues, our work introduces a pragmatic framework for data-efficient utilization in HAR tasks, considering the optimization of both sensor modalities and sampling rate simultaneously. Central to our approach are the designed trainable parameters, termed 'Weight Scores,' which assess the significance of each sensor modality and sampling rate during the training phase. These scores guide the sensor modalities and sampling rate selection. The pruning method allows users to make a trade-off between computational budgets and performance by selecting the sensor modalities and sampling rates according to the weight score ranking. We tested our framework's effectiveness in optimizing sensor modality and sampling rate selection using three public HAR benchmark datasets. The results show that the sensor and sampling rate combination selected via CoSS achieves similar classification performance to configurations using the highest sampling rate with all sensors but at a reduced hardware cost.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
The Power of Training: How Different Neural Network Setups Influence the Energy Demand
Authors:
Daniel Geißler,
Bo Zhou,
Mengxi Liu,
Sungho Suh,
Paul Lukowicz
Abstract:
This work offers a heuristic evaluation of the effects of variations in machine learning training regimes and learning paradigms on the energy consumption of computing, especially HPC hardware with a life-cycle aware perspective. While increasing data availability and innovation in high-performance hardware fuels the training of sophisticated models, it also fosters the fading perception of energy…
▽ More
This work offers a heuristic evaluation of the effects of variations in machine learning training regimes and learning paradigms on the energy consumption of computing, especially HPC hardware with a life-cycle aware perspective. While increasing data availability and innovation in high-performance hardware fuels the training of sophisticated models, it also fosters the fading perception of energy consumption and carbon emission. Therefore, the goal of this work is to raise awareness about the energy impact of general training parameters and processes, from learning rate over batch size to knowledge transfer. Multiple setups with different hyperparameter configurations are evaluated on three different hardware systems. Among many results, we have found out that even with the same model and hardware to reach the same accuracy, improperly set training hyperparameters consume up to 5 times the energy of the optimal setup. We also extensively examined the energy-saving benefits of learning paradigms including recycling knowledge through pretraining and sharing knowledge through multitask training.
△ Less
Submitted 8 May, 2024; v1 submitted 3 January, 2024;
originally announced January 2024.
-
Challenges and Opportunities of Moderating Usage of Large Language Models in Education
Authors:
Lars Krupp,
Steffen Steinert,
Maximilian Kiefer-Emmanouilidis,
Karina E. Avila,
Paul Lukowicz,
Jochen Kuhn,
Stefan Küchemann,
Jakob Karolus
Abstract:
The increased presence of large language models (LLMs) in educational settings has ignited debates concerning negative repercussions, including overreliance and inadequate task reflection. Our work advocates moderated usage of such models, designed in a way that supports students and encourages critical thinking. We developed two moderated interaction methods with ChatGPT: hint-based assistance an…
▽ More
The increased presence of large language models (LLMs) in educational settings has ignited debates concerning negative repercussions, including overreliance and inadequate task reflection. Our work advocates moderated usage of such models, designed in a way that supports students and encourages critical thinking. We developed two moderated interaction methods with ChatGPT: hint-based assistance and presenting multiple answer choices. In a study with students (N=40) answering physics questions, we compared the effects of our moderated models against two baseline settings: unmoderated ChatGPT access and internet searches. We analyzed the interaction strategies and found that the moderated versions exhibited less unreflected usage (e.g., copy \& paste) compared to the unmoderated condition. However, neither ChatGPT-supported condition could match the ratio of reflected usage present in internet searches. Our research highlights the potential benefits of moderating language models, showing a research direction toward designing effective AI-supported educational strategies.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Remaining useful life prediction of Lithium-ion batteries using spatio-temporal multimodal attention networks
Authors:
Sungho Suh,
Dhruv Aditya Mittal,
Hymalai Bello,
Bo Zhou,
Mayank Shekhar Jha,
Paul Lukowicz
Abstract:
Lithium-ion batteries are widely used in various applications, including electric vehicles and renewable energy storage. The prediction of the remaining useful life (RUL) of batteries is crucial for ensuring reliable and efficient operation, as well as reducing maintenance costs. However, determining the life cycle of batteries in real-world scenarios is challenging, and existing methods have limi…
▽ More
Lithium-ion batteries are widely used in various applications, including electric vehicles and renewable energy storage. The prediction of the remaining useful life (RUL) of batteries is crucial for ensuring reliable and efficient operation, as well as reducing maintenance costs. However, determining the life cycle of batteries in real-world scenarios is challenging, and existing methods have limitations in predicting the number of cycles iteratively. In addition, existing works often oversimplify the datasets, neglecting important features of the batteries such as temperature, internal resistance, and material type. To address these limitations, this paper proposes a two-stage RUL prediction scheme for Lithium-ion batteries using a spatio-temporal multimodal attention network (ST-MAN). The proposed ST-MAN is to capture the complex spatio-temporal dependencies in the battery data, including the features that are often neglected in existing works. Despite operating without prior knowledge of end-of-life (EOL) events, our method consistently achieves lower error rates, boasting mean absolute error (MAE) and mean square error (MSE) of 0.0275 and 0.0014, respectively, compared to existing convolutional neural networks (CNN) and long short-term memory (LSTM)-based methods. The proposed method has the potential to improve the reliability and efficiency of battery operations and is applicable in various industries.
△ Less
Submitted 6 June, 2024; v1 submitted 29 October, 2023;
originally announced October 2023.
-
A Novel Local-Global Feature Fusion Framework for Body-weight Exercise Recognition with Pressure Mapping Sensors
Authors:
Davinder Pal Singh,
Lala Shakti Swarup Ray,
Bo Zhou,
Sungho Suh,
Paul Lukowicz
Abstract:
We present a novel local-global feature fusion framework for body-weight exercise recognition with floor-based dynamic pressure maps. One step further from the existing studies using deep neural networks mainly focusing on global feature extraction, the proposed framework aims to combine local and global features using image processing techniques and the YOLO object detection to localize pressure…
▽ More
We present a novel local-global feature fusion framework for body-weight exercise recognition with floor-based dynamic pressure maps. One step further from the existing studies using deep neural networks mainly focusing on global feature extraction, the proposed framework aims to combine local and global features using image processing techniques and the YOLO object detection to localize pressure profiles from different body parts and consider physical constraints. The proposed local feature extraction method generates two sets of high-level local features consisting of cropped pressure mapping and numerical features such as angular orientation, location on the mat, and pressure area. In addition, we adopt a knowledge distillation for regularization to preserve the knowledge of the global feature extraction and improve the performance of the exercise recognition. Our experimental results demonstrate a notable 11 percent improvement in F1 score for exercise recognition while preserving label-specific features.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Unreflected Acceptance -- Investigating the Negative Consequences of ChatGPT-Assisted Problem Solving in Physics Education
Authors:
Lars Krupp,
Steffen Steinert,
Maximilian Kiefer-Emmanouilidis,
Karina E. Avila,
Paul Lukowicz,
Jochen Kuhn,
Stefan Küchemann,
Jakob Karolus
Abstract:
Large language models (LLMs) have recently gained popularity. However, the impact of their general availability through ChatGPT on sensitive areas of everyday life, such as education, remains unclear. Nevertheless, the societal impact on established educational methods is already being experienced by both students and educators. Our work focuses on higher physics education and examines problem sol…
▽ More
Large language models (LLMs) have recently gained popularity. However, the impact of their general availability through ChatGPT on sensitive areas of everyday life, such as education, remains unclear. Nevertheless, the societal impact on established educational methods is already being experienced by both students and educators. Our work focuses on higher physics education and examines problem solving strategies. In a study, students with a background in physics were assigned to solve physics exercises, with one group having access to an internet search engine (N=12) and the other group being allowed to use ChatGPT (N=27). We evaluated their performance, strategies, and interaction with the provided tools. Our results showed that nearly half of the solutions provided with the support of ChatGPT were mistakenly assumed to be correct by the students, indicating that they overly trusted ChatGPT even in their field of expertise. Likewise, in 42% of cases, students used copy & paste to query ChatGPT -- an approach only used in 4% of search engine queries -- highlighting the stark differences in interaction behavior between the groups and indicating limited reflection when using ChatGPT. In our work, we demonstrated a need to (1) guide students on how to interact with LLMs and (2) create awareness of potential shortcomings for users.
△ Less
Submitted 21 August, 2023;
originally announced September 2023.
-
Two-stage Early Prediction Framework of Remaining Useful Life for Lithium-ion Batteries
Authors:
Dhruv Mittal,
Hymalai Bello,
Bo Zhou,
Mayank Shekhar Jha,
Sungho Suh,
Paul Lukowicz
Abstract:
Early prediction of remaining useful life (RUL) is crucial for effective battery management across various industries, ranging from household appliances to large-scale applications. Accurate RUL prediction improves the reliability and maintainability of battery technology. However, existing methods have limitations, including assumptions of data from the same sensors or distribution, foreknowledge…
▽ More
Early prediction of remaining useful life (RUL) is crucial for effective battery management across various industries, ranging from household appliances to large-scale applications. Accurate RUL prediction improves the reliability and maintainability of battery technology. However, existing methods have limitations, including assumptions of data from the same sensors or distribution, foreknowledge of the end of life (EOL), and neglect to determine the first prediction cycle (FPC) to identify the start of the unhealthy stage. This paper proposes a novel method for RUL prediction of Lithium-ion batteries. The proposed framework comprises two stages: determining the FPC using a neural network-based model to divide the degradation data into distinct health states and predicting the degradation pattern after the FPC to estimate the remaining useful life as a percentage. Experimental results demonstrate that the proposed method outperforms conventional approaches in terms of RUL prediction. Furthermore, the proposed method shows promise for real-world scenarios, providing improved accuracy and applicability for battery management.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Worker Activity Recognition in Manufacturing Line Using Near-body Electric Field
Authors:
Sungho Suh,
Vitor Fortes Rey,
Sizhen Bian,
Yu-Chi Huang,
Jože M. Rožanec,
Hooman Tavakoli Ghinani,
Bo Zhou,
Paul Lukowicz
Abstract:
Manufacturing industries strive to improve production efficiency and product quality by deploying advanced sensing and control systems. Wearable sensors are emerging as a promising solution for achieving this goal, as they can provide continuous and unobtrusive monitoring of workers' activities in the manufacturing line. This paper presents a novel wearable sensing prototype that combines IMU and…
▽ More
Manufacturing industries strive to improve production efficiency and product quality by deploying advanced sensing and control systems. Wearable sensors are emerging as a promising solution for achieving this goal, as they can provide continuous and unobtrusive monitoring of workers' activities in the manufacturing line. This paper presents a novel wearable sensing prototype that combines IMU and body capacitance sensing modules to recognize worker activities in the manufacturing line. To handle these multimodal sensor data, we propose and compare early, and late sensor data fusion approaches for multi-channel time-series convolutional neural networks and deep convolutional LSTM. We evaluate the proposed hardware and neural network model by collecting and annotating sensor data using the proposed sensing prototype and Apple Watches in the testbed of the manufacturing line. Experimental results demonstrate that our proposed methods achieve superior performance compared to the baseline methods, indicating the potential of the proposed approach for real-world applications in manufacturing industries. Furthermore, the proposed sensing prototype with a body capacitive sensor and feature fusion method improves by 6.35%, yielding a 9.38% higher macro F1 score than the proposed sensing prototype without a body capacitive sensor and Apple Watch data, respectively.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
PressureTransferNet: Human Attribute Guided Dynamic Ground Pressure Profile Transfer using 3D simulated Pressure Maps
Authors:
Lala Shakti Swarup Ray,
Vitor Fortes Rey,
Bo Zhou,
Sungho Suh,
Paul Lukowicz
Abstract:
We propose PressureTransferNet, a novel method for Human Activity Recognition (HAR) using ground pressure information. Our approach generates body-specific dynamic ground pressure profiles for specific activities by leveraging existing pressure data from different individuals. PressureTransferNet is an encoder-decoder model taking a source pressure map and a target human attribute vector as inputs…
▽ More
We propose PressureTransferNet, a novel method for Human Activity Recognition (HAR) using ground pressure information. Our approach generates body-specific dynamic ground pressure profiles for specific activities by leveraging existing pressure data from different individuals. PressureTransferNet is an encoder-decoder model taking a source pressure map and a target human attribute vector as inputs, producing a new pressure map reflecting the target attribute. To train the model, we use a sensor simulation to create a diverse dataset with various human attributes and pressure profiles. Evaluation on a real-world dataset shows its effectiveness in accurately transferring human attributes to ground pressure profiles across different scenarios. We visually confirm the fidelity of the synthesized pressure shapes using a physics-based deep learning model and achieve a binary R-square value of 0.79 on areas with ground contact. Validation through classification with F1 score (0.911$\pm$0.015) on physical pressure mat data demonstrates the correctness of the synthesized pressure maps, making our method valuable for data augmentation, denoising, sensor simulation, and anomaly detection. Applications span sports science, rehabilitation, and bio-mechanics, contributing to the development of HAR systems.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
Selecting the motion ground truth for loose-fitting wearables: benchmarking optical MoCap methods
Authors:
Lala Shakti Swarup Ray,
Bo Zhou,
Sungho Suh,
Paul Lukowicz
Abstract:
To help smart wearable researchers choose the optimal ground truth methods for motion capturing (MoCap) for all types of loose garments, we present a benchmark, DrapeMoCapBench (DMCB), specifically designed to evaluate the performance of optical marker-based and marker-less MoCap. High-cost marker-based MoCap systems are well-known as precise golden standards. However, a less well-known caveat is…
▽ More
To help smart wearable researchers choose the optimal ground truth methods for motion capturing (MoCap) for all types of loose garments, we present a benchmark, DrapeMoCapBench (DMCB), specifically designed to evaluate the performance of optical marker-based and marker-less MoCap. High-cost marker-based MoCap systems are well-known as precise golden standards. However, a less well-known caveat is that they require skin-tight fitting markers on bony areas to ensure the specified precision, making them questionable for loose garments. On the other hand, marker-less MoCap methods powered by computer vision models have matured over the years, which have meager costs as smartphone cameras would suffice. To this end, DMCB uses large real-world recorded MoCap datasets to perform parallel 3D physics simulations with a wide range of diversities: six levels of drape from skin-tight to extremely draped garments, three levels of motions and six body type - gender combinations to benchmark state-of-the-art optical marker-based and marker-less MoCap methods to identify the best-performing method in different scenarios. In assessing the performance of marker-based and low-cost marker-less MoCap for casual loose garments both approaches exhibit significant performance loss (>10cm), but for everyday activities involving basic and fast motions, marker-less MoCap slightly outperforms marker-based MoCap, making it a favorable and cost-effective choice for wearable studies.
△ Less
Submitted 25 July, 2023; v1 submitted 21 July, 2023;
originally announced July 2023.
-
Origami Single-end Capacitive Sensing for Continuous Shape Estimation of Morphing Structures
Authors:
Lala Shakti Swarup Ray,
Daniel Geißler,
Bo Zhou,
Paul Lukowicz,
Berit Greinke
Abstract:
In this work, we propose a novel single-end morphing capacitive sensing method for shape tracking, FxC, by combining Folding origami structures and Capacitive sensing to detect the morphing structural motions using state-of-the-art sensing circuits and deep learning. It was observed through embedding areas of origami structures with conductive materials as single-end capacitive sensing patches, th…
▽ More
In this work, we propose a novel single-end morphing capacitive sensing method for shape tracking, FxC, by combining Folding origami structures and Capacitive sensing to detect the morphing structural motions using state-of-the-art sensing circuits and deep learning. It was observed through embedding areas of origami structures with conductive materials as single-end capacitive sensing patches, that the sensor signals change coherently with the motion of the structure. Different from other origami capacitors where the origami structures are used in adjusting the thickness of the dielectric layer of double-plate capacitors, FxC uses only a single conductive plate per channel, and the origami structure directly changes the geometry of the conductive plate. We examined the operation principle of morphing single-end capacitors through 3D geometry simulation combined with physics theoretical deduction, which deduced similar behaviour as observed in experimentation. Then a software pipeline was developed to use the sensor signals to reconstruct the dynamic structural geometry through data-driven deep neural network regression of geometric primitives extracted from vision tracking. We created multiple folding patterns to validate our approach, based on folding patterns including Accordion, Chevron, Sunray and V-Fold patterns with different layouts of capacitive sensors using paper-based and textile-based materials. Experimentation results show that the geometry primitives predicted from the capacitive signals have a strong correlation with the visual ground truth with R-squared value of up to 95% and tracking error of 6.5 mm for patches. The simulation and machine learning constitute two-way information exchange between the sensing signals and structural geometry.
△ Less
Submitted 28 April, 2024; v1 submitted 3 July, 2023;
originally announced July 2023.
-
Don't freeze: Finetune encoders for better Self-Supervised HAR
Authors:
Vitor Fortes Rey,
Dominique Nshimyimana,
Paul Lukowicz
Abstract:
Recently self-supervised learning has been proposed in the field of human activity recognition as a solution to the labelled data availability problem. The idea being that by using pretext tasks such as reconstruction or contrastive predictive coding, useful representations can be learned that then can be used for classification. Those approaches follow the pretrain, freeze and fine-tune procedure…
▽ More
Recently self-supervised learning has been proposed in the field of human activity recognition as a solution to the labelled data availability problem. The idea being that by using pretext tasks such as reconstruction or contrastive predictive coding, useful representations can be learned that then can be used for classification. Those approaches follow the pretrain, freeze and fine-tune procedure. In this paper we will show how a simple change - not freezing the representation - leads to substantial performance gains across pretext tasks. The improvement was found in all four investigated datasets and across all four pretext tasks and is inversely proportional to amount of labelled data. Moreover the effect is present whether the pretext task is carried on the Capture24 dataset or directly in unlabelled data of the target dataset.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
SynthCal: A Synthetic Benchmarking Pipeline to Compare Camera Calibration Algorithms
Authors:
Lala Shakti Swarup Ray,
Bo Zhou,
Lars Krupp,
Sungho Suh,
Paul Lukowicz
Abstract:
Accurate camera calibration is crucial for various computer vision applications. However, measuring camera parameters in the real world is challenging and arduous, and there needs to be a dataset with ground truth to evaluate calibration algorithms' accuracy. In this paper, we present SynthCal, a synthetic camera calibration benchmarking pipeline that generates images of calibration patterns to me…
▽ More
Accurate camera calibration is crucial for various computer vision applications. However, measuring camera parameters in the real world is challenging and arduous, and there needs to be a dataset with ground truth to evaluate calibration algorithms' accuracy. In this paper, we present SynthCal, a synthetic camera calibration benchmarking pipeline that generates images of calibration patterns to measure and enable accurate quantification of calibration algorithm performance in camera parameter estimation. We present a SynthCal-generated calibration dataset with four common patterns, two camera types, and two environments with varying view, distortion, lighting, and noise levels. The dataset evaluates single-view calibration algorithms by measuring reprojection and root-mean-square errors for identical patterns and camera settings. Additionally, we analyze the significance of different patterns using Zhang's method, which estimates intrinsic and extrinsic camera parameters with known correspondences between 3D points and their 2D projections in different configurations and environments. The experimental results demonstrate the effectiveness of SynthCal in evaluating various calibration algorithms and patterns.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
ClothFit: Cloth-Human-Attribute Guided Virtual Try-On Network Using 3D Simulated Dataset
Authors:
Yunmin Cho,
Lala Shakti Swarup Ray,
Kundan Sai Prabhu Thota,
Sungho Suh,
Paul Lukowicz
Abstract:
Online clothing shopping has become increasingly popular, but the high rate of returns due to size and fit issues has remained a major challenge. To address this problem, virtual try-on systems have been developed to provide customers with a more realistic and personalized way to try on clothing. In this paper, we propose a novel virtual try-on method called ClothFit, which can predict the draping…
▽ More
Online clothing shopping has become increasingly popular, but the high rate of returns due to size and fit issues has remained a major challenge. To address this problem, virtual try-on systems have been developed to provide customers with a more realistic and personalized way to try on clothing. In this paper, we propose a novel virtual try-on method called ClothFit, which can predict the draping shape of a garment on a target body based on the actual size of the garment and human attributes. Unlike existing try-on models, ClothFit considers the actual body proportions of the person and available cloth sizes for clothing virtualization, making it more appropriate for current online apparel outlets. The proposed method utilizes a U-Net-based network architecture that incorporates cloth and human attributes to guide the realistic virtual try-on synthesis. Specifically, we extract features from a cloth image using an auto-encoder and combine them with features from the user's height, weight, and cloth size. The features are concatenated with the features from the U-Net encoder, and the U-Net decoder synthesizes the final virtual try-on image. Our experimental results demonstrate that ClothFit can significantly improve the existing state-of-the-art methods in terms of photo-realistic virtual try-on results.
△ Less
Submitted 24 June, 2023;
originally announced June 2023.
-
Human-AI Coevolution
Authors:
Dino Pedreschi,
Luca Pappalardo,
Emanuele Ferragina,
Ricardo Baeza-Yates,
Albert-Laszlo Barabasi,
Frank Dignum,
Virginia Dignum,
Tina Eliassi-Rad,
Fosca Giannotti,
Janos Kertesz,
Alistair Knott,
Yannis Ioannidis,
Paul Lukowicz,
Andrea Passarella,
Alex Sandy Pentland,
John Shawe-Taylor,
Alessandro Vespignani
Abstract:
Human-AI coevolution, defined as a process in which humans and AI algorithms continuously influence each other, increasingly characterises our society, but is understudied in artificial intelligence and complexity science literature. Recommender systems and assistants play a prominent role in human-AI coevolution, as they permeate many facets of daily life and influence human choices on online pla…
▽ More
Human-AI coevolution, defined as a process in which humans and AI algorithms continuously influence each other, increasingly characterises our society, but is understudied in artificial intelligence and complexity science literature. Recommender systems and assistants play a prominent role in human-AI coevolution, as they permeate many facets of daily life and influence human choices on online platforms. The interaction between users and AI results in a potentially endless feedback loop, wherein users' choices generate data to train AI models, which, in turn, shape subsequent user preferences. This human-AI feedback loop has peculiar characteristics compared to traditional human-machine interaction and gives rise to complex and often ``unintended'' social outcomes. This paper introduces Coevolution AI as the cornerstone for a new field of study at the intersection between AI and complexity science focused on the theoretical, empirical, and mathematical investigation of the human-AI feedback loop. In doing so, we: (i) outline the pros and cons of existing methodologies and highlight shortcomings and potential ways for capturing feedback loop mechanisms; (ii) propose a reflection at the intersection between complexity science, AI and society; (iii) provide real-world examples for different human-AI ecosystems; and (iv) illustrate challenges to the creation of such a field of study, conceptualising them at increasing levels of abstraction, i.e., technical, epistemological, legal and socio-political.
△ Less
Submitted 3 May, 2024; v1 submitted 23 June, 2023;
originally announced June 2023.
-
MeciFace: Mechanomyography and Inertial Fusion-based Glasses for Edge Real-Time Recognition of Facial and Eating Activities
Authors:
Hymalai Bello,
Sungho Suh,
Bo Zhou,
Paul Lukowicz
Abstract:
The increasing prevalence of stress-related eating behaviors and their impact on overall health highlights the importance of effective and ubiquitous monitoring systems. In this paper, we present MeciFace, an innovative wearable technology designed to monitor facial expressions and eating activities in real-time on-the-edge (RTE). MeciFace aims to provide a low-power, privacy-conscious, and highly…
▽ More
The increasing prevalence of stress-related eating behaviors and their impact on overall health highlights the importance of effective and ubiquitous monitoring systems. In this paper, we present MeciFace, an innovative wearable technology designed to monitor facial expressions and eating activities in real-time on-the-edge (RTE). MeciFace aims to provide a low-power, privacy-conscious, and highly accurate tool for promoting healthy eating behaviors and stress management. We employ lightweight convolutional neural networks as backbone models for facial expression and eating monitoring scenarios. The MeciFace system ensures efficient data processing with a tiny memory footprint, ranging from 11KB to 19 KB. During RTE evaluation, the system achieves an F1-score of < 86% for facial expression recognition and 94% for eating/drinking monitoring, for the RTE of unseen users (user-independent case).
△ Less
Submitted 3 April, 2024; v1 submitted 19 June, 2023;
originally announced June 2023.
-
Unsupervised Statistical Feature-Guided Diffusion Model for Sensor-based Human Activity Recognition
Authors:
Si Zuo,
Vitor Fortes Rey,
Sungho Suh,
Stephan Sigg,
Paul Lukowicz
Abstract:
Human activity recognition (HAR) from on-body sensors is a core functionality in many AI applications: from personal health, through sports and wellness to Industry 4.0. A key problem holding up progress in wearable sensor-based HAR, compared to other ML areas, such as computer vision, is the unavailability of diverse and labeled training data. Particularly, while there are innumerable annotated i…
▽ More
Human activity recognition (HAR) from on-body sensors is a core functionality in many AI applications: from personal health, through sports and wellness to Industry 4.0. A key problem holding up progress in wearable sensor-based HAR, compared to other ML areas, such as computer vision, is the unavailability of diverse and labeled training data. Particularly, while there are innumerable annotated images available in online repositories, freely available sensor data is sparse and mostly unlabeled. We propose an unsupervised statistical feature-guided diffusion model specifically optimized for wearable sensor-based human activity recognition with devices such as inertial measurement unit (IMU) sensors. The method generates synthetic labeled time-series sensor data without relying on annotated training data. Thereby, it addresses the scarcity and annotation difficulties associated with real-world sensor data. By conditioning the diffusion model on statistical information such as mean, standard deviation, Z-score, and skewness, we generate diverse and representative synthetic sensor data. We conducted experiments on public human activity recognition datasets and compared the method to conventional oversampling and state-of-the-art generative adversarial network methods. Experimental results demonstrate that this can improve the performance of human activity recognition and outperform existing techniques.
△ Less
Submitted 19 May, 2024; v1 submitted 30 May, 2023;
originally announced June 2023.
-
CaptAinGlove: Capacitive and Inertial Fusion-Based Glove for Real-Time on Edge Hand Gesture Recognition for Drone Control
Authors:
Hymalai Bello,
Sungho Suh,
Daniel Geißler,
Lala Ray,
Bo Zhou,
Paul Lukowicz
Abstract:
We present CaptAinGlove, a textile-based, low-power (1.15Watts), privacy-conscious, real-time on-the-edge (RTE) glove-based solution with a tiny memory footprint (2MB), designed to recognize hand gestures used for drone control. We employ lightweight convolutional neural networks as the backbone models and a hierarchical multimodal fusion to reduce power consumption and improve accuracy. The syste…
▽ More
We present CaptAinGlove, a textile-based, low-power (1.15Watts), privacy-conscious, real-time on-the-edge (RTE) glove-based solution with a tiny memory footprint (2MB), designed to recognize hand gestures used for drone control. We employ lightweight convolutional neural networks as the backbone models and a hierarchical multimodal fusion to reduce power consumption and improve accuracy. The system yields an F1-score of 80% for the offline evaluation of nine classes; eight hand gesture commands and null activity. For the RTE, we obtained an F1-score of 67% (one user).
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
FieldHAR: A Fully Integrated End-to-end RTL Framework for Human Activity Recognition with Neural Networks from Heterogeneous Sensors
Authors:
Mengxi Liu,
Bo Zhou,
Zimin Zhao,
Hyeonseok Hong,
Hyun Kim,
Sungho Suh,
Vitor Fortes Rey,
Paul Lukowicz
Abstract:
In this work, we propose an open-source scalable end-to-end RTL framework FieldHAR, for complex human activity recognition (HAR) from heterogeneous sensors using artificial neural networks (ANN) optimized for FPGA or ASIC integration. FieldHAR aims to address the lack of apparatus to transform complex HAR methodologies often limited to offline evaluation to efficient run-time edge applications. Th…
▽ More
In this work, we propose an open-source scalable end-to-end RTL framework FieldHAR, for complex human activity recognition (HAR) from heterogeneous sensors using artificial neural networks (ANN) optimized for FPGA or ASIC integration. FieldHAR aims to address the lack of apparatus to transform complex HAR methodologies often limited to offline evaluation to efficient run-time edge applications. The framework uses parallel sensor interfaces and integer-based multi-branch convolutional neural networks (CNNs) to support flexible modality extensions with synchronous sampling at the maximum rate of each sensor. To validate the framework, we used a sensor-rich kitchen scenario HAR application which was demonstrated in a previous offline study. Through resource-aware optimizations, with FieldHAR the entire RTL solution was created from data acquisition to ANN inference taking as low as 25\% logic elements and 2\% memory bits of a low-end Cyclone IV FPGA and less than 1\% accuracy loss from the original FP32 precision offline study. The RTL implementation also shows advantages over MCU-based solutions, including superior data acquisition performance and virtually eliminating ANN inference bottleneck.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
A Knowledge Distillation framework for Multi-Organ Segmentation of Medaka Fish in Tomographic Image
Authors:
Jwalin Bhatt,
Yaroslav Zharov,
Sungho Suh,
Tilo Baumbach,
Vincent Heuveline,
Paul Lukowicz
Abstract:
Morphological atlases are an important tool in organismal studies, and modern high-throughput Computed Tomography (CT) facilities can produce hundreds of full-body high-resolution volumetric images of organisms. However, creating an atlas from these volumes requires accurate organ segmentation. In the last decade, machine learning approaches have achieved incredible results in image segmentation t…
▽ More
Morphological atlases are an important tool in organismal studies, and modern high-throughput Computed Tomography (CT) facilities can produce hundreds of full-body high-resolution volumetric images of organisms. However, creating an atlas from these volumes requires accurate organ segmentation. In the last decade, machine learning approaches have achieved incredible results in image segmentation tasks, but they require large amounts of annotated data for training. In this paper, we propose a self-training framework for multi-organ segmentation in tomographic images of Medaka fish. We utilize the pseudo-labeled data from a pretrained Teacher model and adopt a Quality Classifier to refine the pseudo-labeled data. Then, we introduce a pixel-wise knowledge distillation method to prevent overfitting to the pseudo-labeled data and improve the segmentation performance. The experimental results demonstrate that our method improves mean Intersection over Union (IoU) by 5.9% on the full dataset and enables keeping the quality while using three times less markup.
△ Less
Submitted 24 February, 2023;
originally announced February 2023.
-
InMyFace: Inertial and Mechanomyography-Based Sensor Fusion for Wearable Facial Activity Recognition
Authors:
Hymalai Bello,
Luis Alfredo Sanchez Marin,
Sungho Suh,
Bo Zhou,
Paul Lukowicz
Abstract:
Recognizing facial activity is a well-understood (but non-trivial) computer vision problem. However, reliable solutions require a camera with a good view of the face, which is often unavailable in wearable settings. Furthermore, in wearable applications, where systems accompany users throughout their daily activities, a permanently running camera can be problematic for privacy (and legal) reasons.…
▽ More
Recognizing facial activity is a well-understood (but non-trivial) computer vision problem. However, reliable solutions require a camera with a good view of the face, which is often unavailable in wearable settings. Furthermore, in wearable applications, where systems accompany users throughout their daily activities, a permanently running camera can be problematic for privacy (and legal) reasons. This work presents an alternative solution based on the fusion of wearable inertial sensors, planar pressure sensors, and acoustic mechanomyography (muscle sounds). The sensors were placed unobtrusively in a sports cap to monitor facial muscle activities related to facial expressions. We present our integrated wearable sensor system, describe data fusion and analysis methods, and evaluate the system in an experiment with thirteen subjects from different cultural backgrounds (eight countries) and both sexes (six women and seven men). In a one-model-per-user scheme and using a late fusion approach, the system yielded an average F1 score of 85.00% for the case where all sensing modalities are combined. With a cross-user validation and a one-model-for-all-user scheme, an F1 score of 79.00% was obtained for thirteen participants (six females and seven males). Moreover, in a hybrid fusion (cross-user) approach and six classes, an average F1 score of 82.00% was obtained for eight users. The results are competitive with state-of-the-art non-camera-based solutions for a cross-user study. In addition, our unique set of participants demonstrates the inclusiveness and generalizability of the approach.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
PresSim: An End-to-end Framework for Dynamic Ground Pressure Profile Generation from Monocular Videos Using Physics-based 3D Simulation
Authors:
Lala Shakti Swarup Ray,
Bo Zhou,
Sungho Suh,
Paul Lukowicz
Abstract:
Ground pressure exerted by the human body is a valuable source of information for human activity recognition (HAR) in unobtrusive pervasive sensing. While data collection from pressure sensors to develop HAR solutions requires significant resources and effort, we present a novel end-to-end framework, PresSim, to synthesize sensor data from videos of human activities to reduce such effort significa…
▽ More
Ground pressure exerted by the human body is a valuable source of information for human activity recognition (HAR) in unobtrusive pervasive sensing. While data collection from pressure sensors to develop HAR solutions requires significant resources and effort, we present a novel end-to-end framework, PresSim, to synthesize sensor data from videos of human activities to reduce such effort significantly. PresSim adopts a 3-stage process: first, extract the 3D activity information from videos with computer vision architectures; then simulate the floor mesh deformation profiles based on the 3D activity information and gravity-included physics simulation; lastly, generate the simulated pressure sensor data with deep learning models. We explored two approaches for the 3D activity information: inverse kinematics with mesh re-targeting, and volumetric pose and shape estimation. We validated PresSim with an experimental setup with a monocular camera to provide input and a pressure-sensing fitness mat (80x28 spatial resolution) to provide the sensor ground truth, where nine participants performed a set of predefined yoga sequences.
△ Less
Submitted 1 February, 2023;
originally announced February 2023.
-
The Contribution of Human Body Capacitance/Body-Area Electric Field To Individual and Collaborative Activity Recognition
Authors:
Sizhen Bian,
Vitor Fortes Rey,
Siyu Yuan,
Paul Lukowicz
Abstract:
The current dominated wearable body motion sensor is IMU. This work presented an alternative wearable motion-sensing approach: human body capacitance (HBC, also commonly defined as body-area electric field). While being less robust in tracking the posture and trajectory, HBC has two properties that make it an attractive. First, the deployment of the sensing node on the being tracked body part is n…
▽ More
The current dominated wearable body motion sensor is IMU. This work presented an alternative wearable motion-sensing approach: human body capacitance (HBC, also commonly defined as body-area electric field). While being less robust in tracking the posture and trajectory, HBC has two properties that make it an attractive. First, the deployment of the sensing node on the being tracked body part is not a requirement for HBC sensing approach. Second, HBC is sensitive to the body's interaction with its surroundings, including both touching and being in the immediate proximity of people and objects. We first described the sensing principle for HBC, sensor architecture and implementation, and methods for evaluation. We then presented two case studies demonstrating the usefulness of HBC as a complement/alternative to IMUs. First, we explored the exercise recognition and repetition counting of seven machine-free leg-only exercises and eleven general gym workouts with the signal source of HBC and IMU. The HBC sensing shows significant advantages over the IMU signals in classification(0.89 vs 0.78 in F-score) and counting(0.982 vs 0.938 in accuracy) of the leg-only exercises. For the general gym workouts, HBC only shows recognition improvement for certain workouts like adductor where legs alone complete the movement. And it also supplies better results over the IMU for workouts counting(0.800 vs. 0.756 when wearing the sensors on the wrist). In the second case, we tried to recognize actions related to manipulating objects and physical collaboration between users by using a wrist-worn HBC sensing unit. We detected collaboration between the users with 0.69 F-score when receiving data from a single user and 0.78 when receiving data from both users. The capacitive sensor can improve the recognition of collaborative activities with an F-score over a single wrist accelerometer approach by 16\%.
△ Less
Submitted 26 October, 2022;
originally announced October 2022.
-
Smart Cup: An impedance sensing based fluid intake monitoring system for beverages classification and freshness detection
Authors:
Mengxi Liu,
Sizhen Bian,
Bo Zhou,
Agnes Grünerbl,
Paul Lukowicz
Abstract:
This paper presents a novel beverage intake monitoring system that can accurately recognize beverage kinds and freshness. By mounting carbon electrodes on the commercial cup, the system measures the electrochemical impedance spectrum of the fluid in the cup. We studied the frequency sensitivity of the electrochemical impedance spectrum regarding distinct beverages and the importance of features li…
▽ More
This paper presents a novel beverage intake monitoring system that can accurately recognize beverage kinds and freshness. By mounting carbon electrodes on the commercial cup, the system measures the electrochemical impedance spectrum of the fluid in the cup. We studied the frequency sensitivity of the electrochemical impedance spectrum regarding distinct beverages and the importance of features like amplitude, phase, and real and imaginary components for beverage classification. The results show that features from a low-frequency domain (100 Hz to 1000 Hz) provide more meaningful information for beverage classification than the higher frequency domain. Twenty beverages, including carbonated drinks and juices, were classified with nearly perfect accuracy using a supervised machine learning approach. The same performance was also observed in the freshness recognition, where four different kinds of milk and fruit juice were studied.
△ Less
Submitted 8 October, 2022;
originally announced October 2022.
-
Learning from the Best: Contrastive Representations Learning Across Sensor Locations for Wearable Activity Recognition
Authors:
Vitor Fortes Rey,
Sungho Suh,
Paul Lukowicz
Abstract:
We address the well-known wearable activity recognition problem of having to work with sensors that are non-optimal in terms of information they provide but have to be used due to wearability/usability concerns (e.g. the need to work with wrist-worn IMUs because they are embedded in most smart watches). To mitigate this problem we propose a method that facilitates the use of information from senso…
▽ More
We address the well-known wearable activity recognition problem of having to work with sensors that are non-optimal in terms of information they provide but have to be used due to wearability/usability concerns (e.g. the need to work with wrist-worn IMUs because they are embedded in most smart watches). To mitigate this problem we propose a method that facilitates the use of information from sensors that are only present during the training process and are unavailable during the later use of the system. The method transfers information from the source sensors to the latent representation of the target sensor data through contrastive loss that is combined with the classification loss during joint training. We evaluate the method on the well-known PAMAP2 and Opportunity benchmarks for different combinations of source and target sensors showing average (over all activities) F1 score improvements of between 5% and 13% with the improvement on individual activities, particularly well suited to benefit from the additional information going up to between 20% and 40%.
△ Less
Submitted 4 October, 2022;
originally announced October 2022.
-
Smart-Badge: A wearable badge with multi-modal sensors for kitchen activity recognition
Authors:
Mengxi Liu,
Sungho Suh,
Bo Zhou,
Agnes Gruenerbl,
Paul Lukowicz
Abstract:
Human health is closely associated with their daily behavior and environment. However, keeping a healthy lifestyle is still challenging for most people as it is difficult to recognize their living behaviors and identify their surrounding situations to take appropriate action. Human activity recognition is a promising approach to building a behavior model of users, by which users can get feedback a…
▽ More
Human health is closely associated with their daily behavior and environment. However, keeping a healthy lifestyle is still challenging for most people as it is difficult to recognize their living behaviors and identify their surrounding situations to take appropriate action. Human activity recognition is a promising approach to building a behavior model of users, by which users can get feedback about their habits and be encouraged to develop a healthier lifestyle. In this paper, we present a smart light wearable badge with six kinds of sensors, including an infrared array sensor MLX90640 offering privacy-preserving, low-cost, and non-invasive features, to recognize daily activities in a realistic unmodified kitchen environment. A multi-channel convolutional neural network (MC-CNN) based on data and feature fusion methods is applied to classify 14 human activities associated with potentially unhealthy habits. Meanwhile, we evaluate the impact of the infrared array sensor on the recognition accuracy of these activities. We demonstrate the performance of the proposed work to detect the 14 activities performed by ten volunteers with an average accuracy of 92.44 % and an F1 score of 88.27 %.
△ Less
Submitted 12 January, 2023; v1 submitted 3 October, 2022;
originally announced October 2022.
-
TASKED: Transformer-based Adversarial learning for human activity recognition using wearable sensors via Self-KnowledgE Distillation
Authors:
Sungho Suh,
Vitor Fortes Rey,
Paul Lukowicz
Abstract:
Wearable sensor-based human activity recognition (HAR) has emerged as a principal research area and is utilized in a variety of applications. Recently, deep learning-based methods have achieved significant improvement in the HAR field with the development of human-computer interaction applications. However, they are limited to operating in a local neighborhood in the process of a standard convolut…
▽ More
Wearable sensor-based human activity recognition (HAR) has emerged as a principal research area and is utilized in a variety of applications. Recently, deep learning-based methods have achieved significant improvement in the HAR field with the development of human-computer interaction applications. However, they are limited to operating in a local neighborhood in the process of a standard convolution neural network, and correlations between different sensors on body positions are ignored. In addition, they still face significant challenging problems with performance degradation due to large gaps in the distribution of training and test data, and behavioral differences between subjects. In this work, we propose a novel Transformer-based Adversarial learning framework for human activity recognition using wearable sensors via Self-KnowledgE Distillation (TASKED), that accounts for individual sensor orientations and spatial and temporal features. The proposed method is capable of learning cross-domain embedding feature representations from multiple subjects datasets using adversarial learning and the maximum mean discrepancy (MMD) regularization to align the data distribution over multiple domains. In the proposed method, we adopt the teacher-free self-knowledge distillation to improve the stability of the training procedure and the performance of human activity recognition. Experimental results show that TASKED not only outperforms state-of-the-art methods on the four real-world public HAR datasets (alone or combined) but also improves the subject generalization effectively.
△ Less
Submitted 8 December, 2022; v1 submitted 14 September, 2022;
originally announced September 2022.
-
Magnetic Field Based Hand Tracking
Authors:
Sizhen Bian,
Kexuan Guo,
Mengxi Liu,
Bo Zhou,
Paul Lukowicz
Abstract:
Sensor-based 3D hand tracking is still challenging despite the massive exploration of different sensing modalities in the past decades. This work describes the design, implementation, and evaluation of a novel induced magnetic field-based 3D hand tracking system, aiming to address the shortcomings of existing approaches and supply an alternative solution. This system is composed of a set of transm…
▽ More
Sensor-based 3D hand tracking is still challenging despite the massive exploration of different sensing modalities in the past decades. This work describes the design, implementation, and evaluation of a novel induced magnetic field-based 3D hand tracking system, aiming to address the shortcomings of existing approaches and supply an alternative solution. This system is composed of a set of transmitters for the magnetic field generation, a receiver for field strength sensing, and the Zigbee units for synchronization. In more detail, the transmitters generate the oscillating magnetic fields with a registered sequence, the receiver senses the strength of the induced magnetic field by a customized three axes coil, which is configured as the LC oscillator with the same oscillating frequency so that an induced current shows up when the receiver is located in the field of the generated magnetic field. Five scenarios are explored to evaluate the performance of the proposed system in hand tracking regarding the transmitters deployment: "in front of a whiteboard", "above a table", "in front of and in a shelf", "in front of the waist and chest", and "around the waist". The true-range multilateration method is used to calculate the coordinates of the hand in 3D space. Compared with the ground truth collected by a commercial ultrasound positioning system, the presented magnetic field-based system shows a robust accuracy of around ten centimeters with the transmitters deployed both off-body and on-body(in front of waist and chest), which indicates the feasibility of the proposed sensing modality in 3D hand tracking.
△ Less
Submitted 18 July, 2022;
originally announced July 2022.
-
Estimation of 3D Body Shape and Clothing Measurements from Frontal- and Side-view Images
Authors:
Kundan Sai Prabhu Thota,
Sungho Suh,
Bo Zhou,
Paul Lukowicz
Abstract:
The estimation of 3D human body shape and clothing measurements is crucial for virtual try-on and size recommendation problems in the fashion industry but has always been a challenging problem due to several conditions, such as lack of publicly available realistic datasets, ambiguity in multiple camera resolutions, and the undefinable human shape space. Existing works proposed various solutions to…
▽ More
The estimation of 3D human body shape and clothing measurements is crucial for virtual try-on and size recommendation problems in the fashion industry but has always been a challenging problem due to several conditions, such as lack of publicly available realistic datasets, ambiguity in multiple camera resolutions, and the undefinable human shape space. Existing works proposed various solutions to these problems but could not succeed in the industry adaptation because of complexity and restrictions. To solve the complexity and challenges, in this paper, we propose a simple yet effective architecture to estimate both shape and measures from frontal- and side-view images. We utilize silhouette segmentation from the two multi-view images and implement an auto-encoder network to learn low-dimensional features from segmented silhouettes. Then, we adopt a kernel-based regularized regression module to estimate the body shape and measurements. The experimental results show that the proposed method provides competitive results on the synthetic dataset, NOMO-3d-400-scans Dataset, and RGB Images of humans captured in different cameras.
△ Less
Submitted 28 May, 2022;
originally announced May 2022.
-
Adversarial Deep Feature Extraction Network for User Independent Human Activity Recognition
Authors:
Sungho Suh,
Vitor Fortes Rey,
Paul Lukowicz
Abstract:
User dependence remains one of the most difficult general problems in Human Activity Recognition (HAR), in particular when using wearable sensors. This is due to the huge variability of the way different people execute even the simplest actions. In addition, detailed sensor fixtures and placement will be different for different people or even at different times for the same users. In theory, the p…
▽ More
User dependence remains one of the most difficult general problems in Human Activity Recognition (HAR), in particular when using wearable sensors. This is due to the huge variability of the way different people execute even the simplest actions. In addition, detailed sensor fixtures and placement will be different for different people or even at different times for the same users. In theory, the problem can be solved by a large enough data set. However, recording data sets that capture the entire diversity of complex activity sets is seldom practicable. Instead, models are needed that focus on features that are invariant across users. To this end, we present an adversarial subject-independent feature extraction method with the maximum mean discrepancy (MMD) regularization for human activity recognition. The proposed model is capable of learning a subject-independent embedding feature representation from multiple subjects datasets and generalizing it to unseen target subjects. The proposed network is based on the adversarial encoder-decoder structure with the MMD realign the data distribution over multiple subjects. Experimental results show that the proposed method not only outperforms state-of-the-art methods over the four real-world datasets but also improves the subject generalization effectively. We evaluate the method on well-known public data sets showing that it significantly improves user-independent performance and reduces variance in results.
△ Less
Submitted 23 October, 2021;
originally announced October 2021.
-
Generalized multiscale feature extraction for remaining useful life prediction of bearings with generative adversarial networks
Authors:
Sungho Suh,
Paul Lukowicz,
Yong Oh Lee
Abstract:
Bearing is a key component in industrial machinery and its failure may lead to unwanted downtime and economic loss. Hence, it is necessary to predict the remaining useful life (RUL) of bearings. Conventional data-driven approaches of RUL prediction require expert domain knowledge for manual feature extraction and may suffer from data distribution discrepancy between training and test data. In this…
▽ More
Bearing is a key component in industrial machinery and its failure may lead to unwanted downtime and economic loss. Hence, it is necessary to predict the remaining useful life (RUL) of bearings. Conventional data-driven approaches of RUL prediction require expert domain knowledge for manual feature extraction and may suffer from data distribution discrepancy between training and test data. In this study, we propose a novel generalized multiscale feature extraction method with generative adversarial networks. The adversarial training learns the distribution of training data from different bearings and is introduced for health stage division and RUL prediction. To capture the sequence feature from a one-dimensional vibration signal, we adapt a U-Net architecture that reconstructs features to process them with multiscale layers in the generator of the adversarial network. To validate the proposed method, comprehensive experiments on two rotating machinery datasets have been conducted to predict the RUL. The experimental results show that the proposed feature extraction method can effectively predict the RUL and outperforms the conventional RUL prediction approaches based on deep neural networks. The implementation code is available at https://github.com/opensuh/GMFE.
△ Less
Submitted 26 September, 2021;
originally announced September 2021.
-
Detecting Video Game Player Burnout with the Use of Sensor Data and Machine Learning
Authors:
Anton Smerdov,
Andrey Somov,
Evgeny Burnaev,
Bo Zhou,
Paul Lukowicz
Abstract:
Current research in eSports lacks the tools for proper game practising and performance analytics. The majority of prior work relied only on in-game data for advising the players on how to perform better. However, in-game mechanics and trends are frequently changed by new patches limiting the lifespan of the models trained exclusively on the in-game logs. In this article, we propose the methods bas…
▽ More
Current research in eSports lacks the tools for proper game practising and performance analytics. The majority of prior work relied only on in-game data for advising the players on how to perform better. However, in-game mechanics and trends are frequently changed by new patches limiting the lifespan of the models trained exclusively on the in-game logs. In this article, we propose the methods based on the sensor data analysis for predicting whether a player will win the future encounter. The sensor data were collected from 10 participants in 22 matches in League of Legends video game. We have trained machine learning models including Transformer and Gated Recurrent Unit to predict whether the player wins the encounter taking place after some fixed time in the future. For 10 seconds forecasting horizon Transformer neural network architecture achieves ROC AUC score 0.706. This model is further developed into the detector capable of predicting that a player will lose the encounter occurring in 10 seconds in 88.3% of cases with 73.5% accuracy. This might be used as a players' burnout or fatigue detector, advising players to retreat. We have also investigated which physiological features affect the chance to win or lose the next in-game encounter.
△ Less
Submitted 22 August, 2021; v1 submitted 29 November, 2020;
originally announced December 2020.
-
Yet it moves: Learning from Generic Motions to Generate IMU data from YouTube videos
Authors:
Vitor Fortes Rey,
Kamalveer Kaur Garewal,
Paul Lukowicz
Abstract:
Human activity recognition (HAR) using wearable sensors has benefited much less from recent advances in Machine Learning than fields such as computer vision and natural language processing. This is to a large extent due to the lack of large scale repositories of labeled training data. In our research we aim to facilitate the use of online videos, which exists in ample quantity for most activities…
▽ More
Human activity recognition (HAR) using wearable sensors has benefited much less from recent advances in Machine Learning than fields such as computer vision and natural language processing. This is to a large extent due to the lack of large scale repositories of labeled training data. In our research we aim to facilitate the use of online videos, which exists in ample quantity for most activities and are much easier to label than sensor data, to simulate labeled wearable motion sensor data. In previous work we already demonstrate some preliminary results in this direction focusing on very simple, activity specific simulation models and a single sensor modality (acceleration norm)\cite{10.1145/3341162.3345590}. In this paper we show how we can train a regression model on generic motions for both accelerometer and gyro signals and then apply it to videos of the target activities to generate synthetic IMU data (acceleration and gyro norms) that can be used to train and/or improve HAR models. We demonstrate that systems trained on simulated data generated by our regression model can come to within around 10% of the mean F1 score of a system trained on real sensor data. Furthermore we show that by either including a small amount of real sensor data for model calibration or simply leveraging the fact that (in general) we can easily generate much more simulated data from video than we can collect in terms of real sensor data the advantage of real sensor data can be eventually equalized.
△ Less
Submitted 23 November, 2020;
originally announced November 2020.
-
Collection and Validation of Psychophysiological Data from Professional and Amateur Players: a Multimodal eSports Dataset
Authors:
Anton Smerdov,
Bo Zhou,
Paul Lukowicz,
Andrey Somov
Abstract:
Proper training and analytics in eSports require accurately collected and annotated data. Most eSports research focuses exclusively on in-game data analysis, and there is a lack of prior work involving eSports athletes' psychophysiological data. In this paper, we present a dataset collected from professional and amateur teams in 22 matches in League of Legends video game with more than 40 hours of…
▽ More
Proper training and analytics in eSports require accurately collected and annotated data. Most eSports research focuses exclusively on in-game data analysis, and there is a lack of prior work involving eSports athletes' psychophysiological data. In this paper, we present a dataset collected from professional and amateur teams in 22 matches in League of Legends video game with more than 40 hours of recordings. Recorded data include the players' physiological activity, e.g. movements, pulse, saccades, obtained from various sensors, self-reported aftermatch survey, and in-game data. An important feature of the dataset is simultaneous data collection from five players, which facilitates the analysis of sensor data on a team level. Upon the collection of dataset we carried out its validation. In particular, we demonstrate that stress and concentration levels for professional players are less correlated, meaning more independent playstyle. Also, we show that the absence of team communication does not affect the professional players as much as amateur ones. To investigate other possible use cases of the dataset, we have trained classical machine learning algorithms for skill prediction and player re-identification using 3-minute sessions of sensor data. Best models achieved 0.856 and 0.521 (0.10 for a chance level) accuracy scores on a validation set for skill prediction and player re-id problems, respectively. The dataset is available at https://github.com/smerdov/eSports Sensors Dataset.
△ Less
Submitted 23 August, 2021; v1 submitted 2 November, 2020;
originally announced November 2020.
-
Discriminative feature generation for classification of imbalanced data
Authors:
Sungho Suh,
Paul Lukowicz,
Yong Oh Lee
Abstract:
The data imbalance problem is a frequent bottleneck in the classification performance of neural networks. In this paper, we propose a novel supervised discriminative feature generation (DFG) method for a minority class dataset. DFG is based on the modified structure of a generative adversarial network consisting of four independent networks: generator, discriminator, feature extractor, and classif…
▽ More
The data imbalance problem is a frequent bottleneck in the classification performance of neural networks. In this paper, we propose a novel supervised discriminative feature generation (DFG) method for a minority class dataset. DFG is based on the modified structure of a generative adversarial network consisting of four independent networks: generator, discriminator, feature extractor, and classifier. To augment the selected discriminative features of the minority class data by adopting an attention mechanism, the generator for the class-imbalanced target task is trained, and the feature extractor and classifier are regularized using the pre-trained features from a large source data. The experimental results show that the DFG generator enhances the augmentation of the label-preserved and diverse features, and the classification results are significantly improved on the target task. The feature generation model can contribute greatly to the development of data augmentation methods through discriminative feature generation and supervised attention methods.
△ Less
Submitted 24 October, 2020;
originally announced October 2020.
-
Two-stage generative adversarial networks for document image binarization with color noise and background removal
Authors:
Sungho Suh,
Jihun Kim,
Paul Lukowicz,
Yong Oh Lee
Abstract:
Document image enhancement and binarization methods are often used to improve the accuracy and efficiency of document image analysis tasks such as text recognition. Traditional non-machine-learning methods are constructed on low-level features in an unsupervised manner but have difficulty with binarization on documents with severely degraded backgrounds. Convolutional neural network-based methods…
▽ More
Document image enhancement and binarization methods are often used to improve the accuracy and efficiency of document image analysis tasks such as text recognition. Traditional non-machine-learning methods are constructed on low-level features in an unsupervised manner but have difficulty with binarization on documents with severely degraded backgrounds. Convolutional neural network-based methods focus only on grayscale images and on local textual features. In this paper, we propose a two-stage color document image enhancement and binarization method using generative adversarial neural networks. In the first stage, four color-independent adversarial networks are trained to extract color foreground information from an input image for document image enhancement. In the second stage, two independent adversarial networks with global and local features are trained for image binarization of documents of variable size. For the adversarial neural networks, we formulate loss functions between a discriminator and generators having an encoder-decoder structure. Experimental results show that the proposed method achieves better performance than many classical and state-of-the-art algorithms over the Document Image Binarization Contest (DIBCO) datasets, the LRDE Document Binarization Dataset (LRDE DBD), and our shipping label image dataset. We plan to release the shipping label dataset as well as our implementation code at github.com/opensuh/DocumentBinarization/.
△ Less
Submitted 27 April, 2021; v1 submitted 20 October, 2020;
originally announced October 2020.
-
Fusion of Global-Local Features for Image Quality Inspection of Shipping Label
Authors:
Sungho Suh,
Paul Lukowicz,
Yong Oh Lee
Abstract:
The demands of automated shipping address recognition and verification have increased to handle a large number of packages and to save costs associated with misdelivery. A previous study proposed a deep learning system where the shipping address is recognized and verified based on a camera image capturing the shipping address and barcode area. Because the system performance depends on the input im…
▽ More
The demands of automated shipping address recognition and verification have increased to handle a large number of packages and to save costs associated with misdelivery. A previous study proposed a deep learning system where the shipping address is recognized and verified based on a camera image capturing the shipping address and barcode area. Because the system performance depends on the input image quality, inspection of input image quality is necessary for image preprocessing. In this paper, we propose an input image quality verification method combining global and local features. Object detection and scale-invariant feature transform in different feature spaces are developed to extract global and local features from several independent convolutional neural networks. The conditions of shipping label images are classified by fully connected fusion layers with concatenated global and local features. The experimental results regarding real captured and generated images show that the proposed method achieves better performance than other methods. These results are expected to improve the shipping address recognition and verification system by applying different image preprocessing steps based on the classified conditions.
△ Less
Submitted 26 August, 2020;
originally announced August 2020.
-
Give more data, awareness and control to individual citizens, and they will help COVID-19 containment
Authors:
Mirco Nanni,
Gennady Andrienko,
Albert-László Barabási,
Chiara Boldrini,
Francesco Bonchi,
Ciro Cattuto,
Francesca Chiaromonte,
Giovanni Comandé,
Marco Conti,
Mark Coté,
Frank Dignum,
Virginia Dignum,
Josep Domingo-Ferrer,
Paolo Ferragina,
Fosca Giannotti,
Riccardo Guidotti,
Dirk Helbing,
Kimmo Kaski,
Janos Kertesz,
Sune Lehmann,
Bruno Lepri,
Paul Lukowicz,
Stan Matwin,
David Megías Jiménez,
Anna Monreale
, et al. (14 additional authors not shown)
Abstract:
The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the phase 2 of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countri…
▽ More
The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the phase 2 of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countries. A centralized approach, where data sensed by the app are all sent to a nation-wide server, raises concerns about citizens' privacy and needlessly strong digital surveillance, thus alerting us to the need to minimize personal data collection and avoiding location tracking. We advocate the conceptual advantage of a decentralized approach, where both contact and location data are collected exclusively in individual citizens' "personal data stores", to be shared separately and selectively, voluntarily, only when the citizen has tested positive for COVID-19, and with a privacy preserving level of granularity. This approach better protects the personal sphere of citizens and affords multiple benefits: it allows for detailed information gathering for infected people in a privacy-preserving fashion; and, in turn this enables both contact tracing, and, the early detection of outbreak hotspots on more finely-granulated geographic scale. Our recommendation is two-fold. First to extend existing decentralized architectures with a light touch, in order to manage the collection of location data locally on the device, and allow the user to share spatio-temporal aggregates - if and when they want, for specific aims - with health authorities, for instance. Second, we favour a longer-term pursuit of realizing a Personal Data Store vision, giving users the opportunity to contribute to collective good in the measure they want, enhancing self-awareness, and cultivating collective efforts for rebuilding society.
△ Less
Submitted 16 April, 2020; v1 submitted 10 April, 2020;
originally announced April 2020.