subscribe to arXiv mailings

Maintenance Required: Updating and Extending Bootstrapped Human Activity Recognition Systems for Smart Homes

Authors: Shruthi K. Hiremath, Thomas Ploetz

Abstract: Developing human activity recognition (HAR) systems for smart homes is not straightforward due to varied layouts of the homes and their personalized settings, as well as idiosyncratic behaviors of residents. As such, off-the-shelf HAR systems are effective in limited capacity for an individual home, and HAR systems often need to be derived "from scratch", which comes with substantial efforts and o… ▽ More Developing human activity recognition (HAR) systems for smart homes is not straightforward due to varied layouts of the homes and their personalized settings, as well as idiosyncratic behaviors of residents. As such, off-the-shelf HAR systems are effective in limited capacity for an individual home, and HAR systems often need to be derived "from scratch", which comes with substantial efforts and often is burdensome to the resident. Previous work has successfully targeted the initial phase. At the end of this initial phase, we identify seed points. We build on bootstrapped HAR systems and introduce an effective updating and extension procedure for continuous improvement of HAR systems with the aim of keeping up with ever changing life circumstances. Our method makes use of the seed points identified at the end of the initial bootstrapping phase. A contrastive learning framework is trained using these seed points and labels obtained for the same. This model is then used to improve the segmentation accuracy of the identified prominent activities. Improvements in the activity recognition system through this procedure help model the majority of the routine activities in the smart home. We demonstrate the effectiveness of our procedure through experiments on the CASAS datasets that show the practical value of our approach. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 12 pages, 5 figures, accepted at The 6th International Conference on Activity and Behavior Computing, under print at IEEE Explore

arXiv:2406.13777 [pdf, other]

Game of LLMs: Discovering Structural Constructs in Activities using Large Language Models

Authors: Shruthi K. Hiremath, Thomas Ploetz

Abstract: Human Activity Recognition is a time-series analysis problem. A popular analysis procedure used by the community assumes an optimal window length to design recognition pipelines. However, in the scenario of smart homes, where activities are of varying duration and frequency, the assumption of a constant sized window does not hold. Additionally, previous works have shown these activities to be made… ▽ More Human Activity Recognition is a time-series analysis problem. A popular analysis procedure used by the community assumes an optimal window length to design recognition pipelines. However, in the scenario of smart homes, where activities are of varying duration and frequency, the assumption of a constant sized window does not hold. Additionally, previous works have shown these activities to be made up of building blocks. We focus on identifying these underlying building blocks--structural constructs, with the use of large language models. Identifying these constructs can be beneficial especially in recognizing short-duration and infrequent activities. We also propose the development of an activity recognition procedure that uses these building blocks to model activities, thus helping the downstream task of activity monitoring in smart homes. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 6 pages, 2 figures

arXiv:2406.05900 [pdf, other]

Large Language Models Memorize Sensor Datasets! Implications on Human Activity Recognition Research

Authors: Harish Haresamudram, Hrudhai Rajasekhar, Nikhil Murlidhar Shanbhogue, Thomas Ploetz

Abstract: The astonishing success of Large Language Models (LLMs) in Natural Language Processing (NLP) has spurred their use in many application domains beyond text analysis, including wearable sensor-based Human Activity Recognition (HAR). In such scenarios, often sensor data are directly fed into an LLM along with text instructions for the model to perform activity classification. Seemingly remarkable res… ▽ More The astonishing success of Large Language Models (LLMs) in Natural Language Processing (NLP) has spurred their use in many application domains beyond text analysis, including wearable sensor-based Human Activity Recognition (HAR). In such scenarios, often sensor data are directly fed into an LLM along with text instructions for the model to perform activity classification. Seemingly remarkable results have been reported for such LLM-based HAR systems when they are evaluated on standard benchmarks from the field. Yet, we argue, care has to be taken when evaluating LLM-based HAR systems in such a traditional way. Most contemporary LLMs are trained on virtually the entire (accessible) internet -- potentially including standard HAR datasets. With that, it is not unlikely that LLMs actually had access to the test data used in such benchmark experiments.The resulting contamination of training data would render these experimental evaluations meaningless. In this paper we investigate whether LLMs indeed have had access to standard HAR datasets during training. We apply memorization tests to LLMs, which involves instructing the models to extend given snippets of data. When comparing the LLM-generated output to the original data we found a non-negligible amount of matches which suggests that the LLM under investigation seems to indeed have seen wearable sensor data from the benchmark datasets during training. For the Daphnet dataset in particular, GPT-4 is able to reproduce blocks of sensor readings. We report on our investigations and discuss potential implications on HAR research, especially with regards to reporting results on experimental evaluation △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2405.12368 [pdf, other]

Layout Agnostic Human Activity Recognition in Smart Homes through Textual Descriptions Of Sensor Triggers (TDOST)

Authors: Megha Thukral, Sourish Gunesh Dhekane, Shruthi K. Hiremath, Harish Haresamudram, Thomas Ploetz

Abstract: Human activity recognition (HAR) using ambient sensors in smart homes has numerous applications for human healthcare and wellness. However, building general-purpose HAR models that can be deployed to new smart home environments requires a significant amount of annotated sensor data and training overhead. Most smart homes vary significantly in their layouts, i.e., floor plans and the specifics of s… ▽ More Human activity recognition (HAR) using ambient sensors in smart homes has numerous applications for human healthcare and wellness. However, building general-purpose HAR models that can be deployed to new smart home environments requires a significant amount of annotated sensor data and training overhead. Most smart homes vary significantly in their layouts, i.e., floor plans and the specifics of sensors embedded, resulting in low generalizability of HAR models trained for specific homes. We address this limitation by introducing a novel, layout-agnostic modeling approach for HAR systems in smart homes that utilizes the transferrable representational capacity of natural language descriptions of raw sensor data. To this end, we generate Textual Descriptions Of Sensor Triggers (TDOST) that encapsulate the surrounding trigger conditions and provide cues for underlying activities to the activity recognition models. Leveraging textual embeddings, rather than raw sensor data, we create activity recognition systems that predict standard activities across homes without either (re-)training or adaptation on target homes. Through an extensive evaluation, we demonstrate the effectiveness of TDOST-based models in unseen smart homes through experiments on benchmarked CASAS datasets. Furthermore, we conduct a detailed analysis of how the individual components of our approach affect downstream activity recognition performance. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2402.01049 [pdf, other]

IMUGPT 2.0: Language-Based Cross Modality Transfer for Sensor-Based Human Activity Recognition

Authors: Zikang Leng, Amitrajit Bhattacharjee, Hrudhai Rajasekhar, Lizhe Zhang, Elizabeth Bruda, Hyeokhyen Kwon, Thomas Plötz

Abstract: One of the primary challenges in the field of human activity recognition (HAR) is the lack of large labeled datasets. This hinders the development of robust and generalizable models. Recently, cross modality transfer approaches have been explored that can alleviate the problem of data scarcity. These approaches convert existing datasets from a source modality, such as video, to a target modality (… ▽ More One of the primary challenges in the field of human activity recognition (HAR) is the lack of large labeled datasets. This hinders the development of robust and generalizable models. Recently, cross modality transfer approaches have been explored that can alleviate the problem of data scarcity. These approaches convert existing datasets from a source modality, such as video, to a target modality (IMU). With the emergence of generative AI models such as large language models (LLMs) and text-driven motion synthesis models, language has become a promising source data modality as well as shown in proof of concepts such as IMUGPT. In this work, we conduct a large-scale evaluation of language-based cross modality transfer to determine their effectiveness for HAR. Based on this study, we introduce two new extensions for IMUGPT that enhance its use for practical HAR application scenarios: a motion filter capable of filtering out irrelevant motion sequences to ensure the relevance of the generated virtual IMU data, and a set of metrics that measure the diversity of the generated data facilitating the determination of when to stop generating virtual IMU data for both effective and efficient processing. We demonstrate that our diversity metrics can reduce the effort needed for the generation of virtual IMU data by at least 50%, which open up IMUGPT for practical use cases beyond a mere proof of concept. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.10185 [pdf, other]

Transfer Learning in Human Activity Recognition: A Survey

Authors: Sourish Gunesh Dhekane, Thomas Ploetz

Abstract: Sensor-based human activity recognition (HAR) has been an active research area, owing to its applications in smart environments, assisted living, fitness, healthcare, etc. Recently, deep learning based end-to-end training has resulted in state-of-the-art performance in domains such as computer vision and natural language, where large amounts of annotated data are available. However, large quantiti… ▽ More Sensor-based human activity recognition (HAR) has been an active research area, owing to its applications in smart environments, assisted living, fitness, healthcare, etc. Recently, deep learning based end-to-end training has resulted in state-of-the-art performance in domains such as computer vision and natural language, where large amounts of annotated data are available. However, large quantities of annotated data are not available for sensor-based HAR. Moreover, the real-world settings on which the HAR is performed differ in terms of sensor modalities, classification tasks, and target users. To address this problem, transfer learning has been employed extensively. In this survey, we focus on these transfer learning methods in the application domains of smart home and wearables-based HAR. In particular, we provide a problem-solution perspective by categorizing and presenting the works in terms of their contributions and the challenges they address. We also present an updated view of the state-of-the-art for both application domains. Based on our analysis of 205 papers, we highlight the gaps in the literature and provide a roadmap for addressing them. This survey provides a reference to the HAR community, by summarizing the existing works and providing a promising research agenda. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: 40 pages, 5 figures, 7 tables

arXiv:2311.09514 [pdf, other]

Know Thy Neighbors: A Graph Based Approach for Effective Sensor-Based Human Activity Recognition in Smart Homes

Authors: Srivatsa P, Thomas Plötz

Abstract: There has been a resurgence of applications focused on Human Activity Recognition (HAR) in smart homes, especially in the field of ambient intelligence and assisted living technologies. However, such applications present numerous significant challenges to any automated analysis system operating in the real world, such as variability, sparsity, and noise in sensor measurements. Although state-of-th… ▽ More There has been a resurgence of applications focused on Human Activity Recognition (HAR) in smart homes, especially in the field of ambient intelligence and assisted living technologies. However, such applications present numerous significant challenges to any automated analysis system operating in the real world, such as variability, sparsity, and noise in sensor measurements. Although state-of-the-art HAR systems have made considerable strides in addressing some of these challenges, they especially suffer from a practical limitation: they require successful pre-segmentation of continuous sensor data streams before automated recognition, i.e., they assume that an oracle is present during deployment, which is capable of identifying time windows of interest across discrete sensor events. To overcome this limitation, we propose a novel graph-guided neural network approach that performs activity recognition by learning explicit co-firing relationships between sensors. We accomplish this by learning a more expressive graph structure representing the sensor network in a smart home, in a data-driven manner. Our approach maps discrete input sensor measurements to a feature space through the application of attention mechanisms and hierarchical pooling of node embeddings. We demonstrate the effectiveness of our proposed approach by conducting several experiments on CASAS datasets, showing that the resulting graph-guided neural network outperforms the state-of-the-art method for HAR in smart homes across multiple datasets and by large margins. These results are promising because they push HAR for smart homes closer to real-world applications. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2310.14390 [pdf, other]

Cross-Domain HAR: Few Shot Transfer Learning for Human Activity Recognition

Authors: Megha Thukral, Harish Haresamudram, Thomas Ploetz

Abstract: The ubiquitous availability of smartphones and smartwatches with integrated inertial measurement units (IMUs) enables straightforward capturing of human activities. For specific applications of sensor based human activity recognition (HAR), however, logistical challenges and burgeoning costs render especially the ground truth annotation of such data a difficult endeavor, resulting in limited scale… ▽ More The ubiquitous availability of smartphones and smartwatches with integrated inertial measurement units (IMUs) enables straightforward capturing of human activities. For specific applications of sensor based human activity recognition (HAR), however, logistical challenges and burgeoning costs render especially the ground truth annotation of such data a difficult endeavor, resulting in limited scale and diversity of datasets. Transfer learning, i.e., leveraging publicly available labeled datasets to first learn useful representations that can then be fine-tuned using limited amounts of labeled data from a target domain, can alleviate some of the performance issues of contemporary HAR systems. Yet they can fail when the differences between source and target conditions are too large and/ or only few samples from a target application domain are available, each of which are typical challenges in real-world human activity recognition scenarios. In this paper, we present an approach for economic use of publicly available labeled HAR datasets for effective transfer learning. We introduce a novel transfer learning framework, Cross-Domain HAR, which follows the teacher-student self-training paradigm to more effectively recognize activities with very limited label information. It bridges conceptual gaps between source and target domains, including sensor locations and type of activities. Through our extensive experimental evaluation on a range of benchmark datasets, we demonstrate the effectiveness of our approach for practically relevant few shot activity recognition scenarios. We also present a detailed analysis into how the individual components of our framework affect downstream performance. △ Less

Submitted 22 October, 2023; originally announced October 2023.

arXiv:2310.12085 [pdf, other]

On the Benefit of Generative Foundation Models for Human Activity Recognition

Authors: Zikang Leng, Hyeokhyen Kwon, Thomas Plötz

Abstract: In human activity recognition (HAR), the limited availability of annotated data presents a significant challenge. Drawing inspiration from the latest advancements in generative AI, including Large Language Models (LLMs) and motion synthesis models, we believe that generative AI can address this data scarcity by autonomously generating virtual IMU data from text descriptions. Beyond this, we spotli… ▽ More In human activity recognition (HAR), the limited availability of annotated data presents a significant challenge. Drawing inspiration from the latest advancements in generative AI, including Large Language Models (LLMs) and motion synthesis models, we believe that generative AI can address this data scarcity by autonomously generating virtual IMU data from text descriptions. Beyond this, we spotlight several promising research pathways that could benefit from generative AI for the community, including the generating benchmark datasets, the development of foundational models specific to HAR, the exploration of hierarchical structures within HAR, breaking down complex activities, and applications in health sensing and activity summarization. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: Generative AI for Pervasive Computing (GenAI4PC) Symposium within UbiComp/ISWC 2023

arXiv:2306.01108 [pdf, other]

Towards Learning Discrete Representations via Self-Supervision for Wearables-Based Human Activity Recognition

Authors: Harish Haresamudram, Irfan Essa, Thomas Ploetz

Abstract: Human activity recognition (HAR) in wearable computing is typically based on direct processing of sensor data. Sensor readings are translated into representations, either derived through dedicated preprocessing, or integrated into end-to-end learning. Independent of their origin, for the vast majority of contemporary HAR, those representations are typically continuous in nature. That has not alway… ▽ More Human activity recognition (HAR) in wearable computing is typically based on direct processing of sensor data. Sensor readings are translated into representations, either derived through dedicated preprocessing, or integrated into end-to-end learning. Independent of their origin, for the vast majority of contemporary HAR, those representations are typically continuous in nature. That has not always been the case. In the early days of HAR, discretization approaches have been explored - primarily motivated by the desire to minimize computational requirements, but also with a view on applications beyond mere recognition, such as, activity discovery, fingerprinting, or large-scale search. Those traditional discretization approaches, however, suffer from substantial loss in precision and resolution in the resulting representations with detrimental effects on downstream tasks. Times have changed and in this paper we propose a return to discretized representations. We adopt and apply recent advancements in Vector Quantization (VQ) to wearables applications, which enables us to directly learn a mapping between short spans of sensor data and a codebook of vectors, resulting in recognition performance that is generally on par with their contemporary, continuous counterparts - sometimes surpassing them. Therefore, this work presents a proof-of-concept for demonstrating how effective discrete representations can be derived, enabling applications beyond mere activity classification but also opening up the field to advanced tools for the analysis of symbolic sequences, as they are known, for example, from domains such as natural language processing. Based on an extensive experimental evaluation on a suite of wearables-based benchmark HAR tasks, we demonstrate the potential of our learned discretization scheme and discuss how discretized sensor data analysis can lead to substantial changes in HAR. △ Less

Submitted 1 June, 2023; originally announced June 2023.

arXiv:2305.13541 [pdf, ps, other]

doi 10.1145/3596234

ConvBoost: Boosting ConvNets for Sensor-based Activity Recognition

Authors: Shuai Shao, Yu Guan, Bing Zhai, Paolo Missier, Thomas Ploetz

Abstract: Human activity recognition (HAR) is one of the core research themes in ubiquitous and wearable computing. With the shift to deep learning (DL) based analysis approaches, it has become possible to extract high-level features and perform classification in an end-to-end manner. Despite their promising overall capabilities, DL-based HAR may suffer from overfitting due to the notoriously small, often i… ▽ More Human activity recognition (HAR) is one of the core research themes in ubiquitous and wearable computing. With the shift to deep learning (DL) based analysis approaches, it has become possible to extract high-level features and perform classification in an end-to-end manner. Despite their promising overall capabilities, DL-based HAR may suffer from overfitting due to the notoriously small, often inadequate, amounts of labeled sample data that are available for typical HAR applications. In response to such challenges, we propose ConvBoost -- a novel, three-layer, structured model architecture and boosting framework for convolutional network based HAR. Our framework generates additional training data from three different perspectives for improved HAR, aiming to alleviate the shortness of labeled training data in the field. Specifically, with the introduction of three conceptual layers--Sampling Layer, Data Augmentation Layer, and Resilient Layer -- we develop three "boosters" -- R-Frame, Mix-up, and C-Drop -- to enrich the per-epoch training data by dense-sampling, synthesizing, and simulating, respectively. These new conceptual layers and boosters, that are universally applicable for any kind of convolutional network, have been designed based on the characteristics of the sensor data and the concept of frame-wise HAR. In our experimental evaluation on three standard benchmarks (Opportunity, PAMAP2, GOTOV) we demonstrate the effectiveness of our ConvBoost framework for HAR applications based on variants of convolutional networks: vanilla CNN, ConvLSTM, and Attention Models. We achieved substantial performance gains for all of them, which suggests that the proposed approach is generic and can serve as a practical solution for boosting the performance of existing ConvNet-based HAR models. This is an open-source project, and the code can be found at https://github.com/sshao2013/ConvBoost △ Less

Submitted 22 May, 2023; originally announced May 2023.

Comments: 21 pages

Journal ref: Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 7, 2, Article 75 (June 2023)

arXiv:2305.03187 [pdf, other]

Generating Virtual On-body Accelerometer Data from Virtual Textual Descriptions for Human Activity Recognition

Authors: Zikang Leng, Hyeokhyen Kwon, Thomas Plötz

Abstract: The development of robust, generalized models in human activity recognition (HAR) has been hindered by the scarcity of large-scale, labeled data sets. Recent work has shown that virtual IMU data extracted from videos using computer vision techniques can lead to substantial performance improvements when training HAR models combined with small portions of real IMU data. Inspired by recent advances i… ▽ More The development of robust, generalized models in human activity recognition (HAR) has been hindered by the scarcity of large-scale, labeled data sets. Recent work has shown that virtual IMU data extracted from videos using computer vision techniques can lead to substantial performance improvements when training HAR models combined with small portions of real IMU data. Inspired by recent advances in motion synthesis from textual descriptions and connecting Large Language Models (LLMs) to various AI models, we introduce an automated pipeline that first uses ChatGPT to generate diverse textual descriptions of activities. These textual descriptions are then used to generate 3D human motion sequences via a motion synthesis model, T2M-GPT, and later converted to streams of virtual IMU data. We benchmarked our approach on three HAR datasets (RealWorld, PAMAP2, and USC-HAD) and demonstrate that the use of virtual IMU training data generated using our new approach leads to significantly improved HAR model performance compared to only using real IMU data. Our approach contributes to the growing field of cross-modality transfer methods and illustrate how HAR models can be improved through the generation of virtual training data that do not require any manual effort. △ Less

Submitted 4 May, 2023; originally announced May 2023.

arXiv:2212.13918 [pdf, other]

Simple Yet Surprisingly Effective Training Strategies for LSTMs in Sensor-Based Human Activity Recognition

Authors: Shuai Shao, Yu Guan, Xin Guan, Paolo Missier, Thomas Ploetz

Abstract: Human Activity Recognition (HAR) is one of the core research areas in mobile and wearable computing. With the application of deep learning (DL) techniques such as CNN, recognizing periodic or static activities (e.g, walking, lying, cycling, etc.) has become a well studied problem. What remains a major challenge though is the sporadic activity recognition (SAR) problem, where activities of interest… ▽ More Human Activity Recognition (HAR) is one of the core research areas in mobile and wearable computing. With the application of deep learning (DL) techniques such as CNN, recognizing periodic or static activities (e.g, walking, lying, cycling, etc.) has become a well studied problem. What remains a major challenge though is the sporadic activity recognition (SAR) problem, where activities of interest tend to be non periodic, and occur less frequently when compared with the often large amount of irrelevant background activities. Recent works suggested that sequential DL models (such as LSTMs) have great potential for modeling nonperiodic behaviours, and in this paper we studied some LSTM training strategies for SAR. Specifically, we proposed two simple yet effective LSTM variants, namely delay model and inverse model, for two SAR scenarios (with and without time critical requirement). For time critical SAR, the delay model can effectively exploit predefined delay intervals (within tolerance) in form of contextual information for improved performance. For regular SAR task, the second proposed, inverse model can learn patterns from the time series in an inverse manner, which can be complementary to the forward model (i.e.,LSTM), and combining both can boost the performance. These two LSTM variants are very practical, and they can be deemed as training strategies without alteration of the LSTM fundamentals. We also studied some additional LSTM training strategies, which can further improve the accuracy. We evaluated our models on two SAR and one non-SAR datasets, and the promising results demonstrated the effectiveness of our approaches in HAR applications. △ Less

Submitted 23 December, 2022; originally announced December 2022.

Comments: 11 pages

arXiv:2211.06173 [pdf, other]

Investigating Enhancements to Contrastive Predictive Coding for Human Activity Recognition

Authors: Harish Haresamudram, Irfan Essa, Thomas Ploetz

Abstract: The dichotomy between the challenging nature of obtaining annotations for activities, and the more straightforward nature of data collection from wearables, has resulted in significant interest in the development of techniques that utilize large quantities of unlabeled data for learning representations. Contrastive Predictive Coding (CPC) is one such method, learning effective representations by l… ▽ More The dichotomy between the challenging nature of obtaining annotations for activities, and the more straightforward nature of data collection from wearables, has resulted in significant interest in the development of techniques that utilize large quantities of unlabeled data for learning representations. Contrastive Predictive Coding (CPC) is one such method, learning effective representations by leveraging properties of time-series data to setup a contrastive future timestep prediction task. In this work, we propose enhancements to CPC, by systematically investigating the encoder architecture, the aggregator network, and the future timestep prediction, resulting in a fully convolutional architecture, thereby improving parallelizability. Across sensor positions and activities, our method shows substantial improvements on four of six target datasets, demonstrating its ability to empower a wide range of application scenarios. Further, in the presence of very limited labeled data, our technique significantly outperforms both supervised and self-supervised baselines, positively impacting situations where collecting only a few seconds of labeled data may be possible. This is promising, as CPC does not require specialized data transformations or reconstructions for learning effective representations. △ Less

Submitted 11 November, 2022; originally announced November 2022.

arXiv:2211.01342 [pdf, other]

Fine-grained Human Activity Recognition Using Virtual On-body Acceleration Data

Authors: Zikang Leng, Yash Jain, Hyeokhyen Kwon, Thomas Plötz

Abstract: Previous work has demonstrated that virtual accelerometry data, extracted from videos using cross-modality transfer approaches like IMUTube, is beneficial for training complex and effective human activity recognition (HAR) models. Systems like IMUTube were originally designed to cover activities that are based on substantial body (part) movements. Yet, life is complex, and a range of activities of… ▽ More Previous work has demonstrated that virtual accelerometry data, extracted from videos using cross-modality transfer approaches like IMUTube, is beneficial for training complex and effective human activity recognition (HAR) models. Systems like IMUTube were originally designed to cover activities that are based on substantial body (part) movements. Yet, life is complex, and a range of activities of daily living is based on only rather subtle movements, which bears the question to what extent systems like IMUTube are of value also for fine-grained HAR, i.e., When does IMUTube break? In this work we first introduce a measure to quantitatively assess the subtlety of human movements that are underlying activities of interest--the motion subtlety index (MSI)--which captures local pixel movements and pose changes in the vicinity of target virtual sensor locations, and correlate it to the eventual activity recognition accuracy. We then perform a "stress-test" on IMUTube and explore for which activities with underlying subtle movements a cross-modality transfer approach works, and for which not. As such, the work presented in this paper allows us to map out the landscape for IMUTube applications in practical scenarios. △ Less

Submitted 2 November, 2022; originally announced November 2022.

arXiv:2210.07354 [pdf, other]

Finding Islands of Predictability in Action Forecasting

Authors: Daniel Scarafoni, Irfan Essa, Thomas Ploetz

Abstract: We address dense action forecasting: the problem of predicting future action sequence over long durations based on partial observation. Our key insight is that future action sequences are more accurately modeled with variable, rather than one, levels of abstraction, and that the optimal level of abstraction can be dynamically selected during the prediction process. Our experiments show that most p… ▽ More We address dense action forecasting: the problem of predicting future action sequence over long durations based on partial observation. Our key insight is that future action sequences are more accurately modeled with variable, rather than one, levels of abstraction, and that the optimal level of abstraction can be dynamically selected during the prediction process. Our experiments show that most parts of future action sequences can be predicted confidently in fine detail only in small segments of future frames, which are effectively ``islands'' of high model prediction confidence in a ``sea'' of uncertainty. We propose a combination Bayesian neural network and hierarchical convolutional segmentation model to both accurately predict future actions and optimally select abstraction levels. We evaluate this approach on standard datasets against existing state-of-the-art systems and demonstrate that our ``islands of predictability'' approach maintains fine-grained action predictions while also making accurate abstract predictions where systems were previously unable to do so, and thus results in substantial, monotonic increases in accuracy. △ Less

Submitted 13 October, 2022; originally announced October 2022.

MSC Class: I.2

arXiv:2202.12938 [pdf, other]

Assessing the State of Self-Supervised Human Activity Recognition using Wearables

Authors: Harish Haresamudram, Irfan Essa, Thomas Plötz

Abstract: The emergence of self-supervised learning in the field of wearables-based human activity recognition (HAR) has opened up opportunities to tackle the most pressing challenges in the field, namely to exploit unlabeled data to derive reliable recognition systems for scenarios where only small amounts of labeled training samples can be collected. As such, self-supervision, i.e., the paradigm of 'pretr… ▽ More The emergence of self-supervised learning in the field of wearables-based human activity recognition (HAR) has opened up opportunities to tackle the most pressing challenges in the field, namely to exploit unlabeled data to derive reliable recognition systems for scenarios where only small amounts of labeled training samples can be collected. As such, self-supervision, i.e., the paradigm of 'pretrain-then-finetune' has the potential to become a strong alternative to the predominant end-to-end training approaches, let alone hand-crafted features for the classic activity recognition chain. Recently a number of contributions have been made that introduced self-supervised learning into the field of HAR, including, Multi-task self-supervision, Masked Reconstruction, CPC, and SimCLR, to name but a few. With the initial success of these methods, the time has come for a systematic inventory and analysis of the potential self-supervised learning has for the field. This paper provides exactly that. We assess the progress of self-supervised HAR research by introducing a framework that performs a multi-faceted exploration of model performance. We organize the framework into three dimensions, each containing three constituent criteria, such that each dimension captures specific aspects of performance, including the robustness to differing source and target conditions, the influence of dataset characteristics, and the feature space characteristics. We utilize this framework to assess seven state-of-the-art self-supervised methods for HAR, leading to the formulation of insights into the properties of these techniques and to establish their value towards learning representations for diverse scenarios. △ Less

Submitted 19 November, 2022; v1 submitted 21 February, 2022; originally announced February 2022.

Comments: updated

arXiv:2111.10245 [pdf, other]

doi 10.1145/3494961

Ubi-SleepNet: Advanced Multimodal Fusion Techniques for Three-stage Sleep Classification Using Ubiquitous Sensing

Authors: Bing Zhai, Yu Guan, Michael Catt, Thomas Ploetz

Abstract: Sleep is a fundamental physiological process that is essential for sustaining a healthy body and mind. The gold standard for clinical sleep monitoring is polysomnography(PSG), based on which sleep can be categorized into five stages, including wake/rapid eye movement sleep (REM sleep)/Non-REM sleep 1 (N1)/Non-REM sleep 2 (N2)/Non-REM sleep 3 (N3). However, PSG is expensive, burdensome, and not sui… ▽ More Sleep is a fundamental physiological process that is essential for sustaining a healthy body and mind. The gold standard for clinical sleep monitoring is polysomnography(PSG), based on which sleep can be categorized into five stages, including wake/rapid eye movement sleep (REM sleep)/Non-REM sleep 1 (N1)/Non-REM sleep 2 (N2)/Non-REM sleep 3 (N3). However, PSG is expensive, burdensome, and not suitable for daily use. For long-term sleep monitoring, ubiquitous sensing may be a solution. Most recently, cardiac and movement sensing has become popular in classifying three-stage sleep, since both modalities can be easily acquired from research-grade or consumer-grade devices (e.g., Apple Watch). However, how best to fuse the data for the greatest accuracy remains an open question. In this work, we comprehensively studied deep learning (DL)-based advanced fusion techniques consisting of three fusion strategies alongside three fusion methods for three-stage sleep classification based on two publicly available datasets. Experimental results demonstrate important evidence that three-stage sleep can be reliably classified by fusing cardiac/movement sensing modalities, which may potentially become a practical tool to conduct large-scale sleep stage assessment studies or long-term self-tracking on sleep. To accelerate the progression of sleep research in the ubiquitous/wearable computing community, we made this project open source, and the code can be found at: https://github.com/bzhai/Ubi-SleepNet. △ Less

Submitted 19 November, 2021; originally announced November 2021.

Comments: Accepted in IMWUT for 2021 Dec issue

arXiv:2105.09787 [pdf, other]

doi 10.1145/3561533

Explainable Activity Recognition for Smart Home Systems

Authors: Devleena Das, Yasutaka Nishimura, Rajan P. Vivek, Naoto Takeda, Sean T. Fish, Thomas Ploetz, Sonia Chernova

Abstract: Smart home environments are designed to provide services that help improve the quality of life for the occupant via a variety of sensors and actuators installed throughout the space. Many automated actions taken by a smart home are governed by the output of an underlying activity recognition system. However, activity recognition systems may not be perfectly accurate and therefore inconsistencies i… ▽ More Smart home environments are designed to provide services that help improve the quality of life for the occupant via a variety of sensors and actuators installed throughout the space. Many automated actions taken by a smart home are governed by the output of an underlying activity recognition system. However, activity recognition systems may not be perfectly accurate and therefore inconsistencies in smart home operations can lead users reliant on smart home predictions to wonder "why did the smart home do that?" In this work, we build on insights from Explainable Artificial Intelligence (XAI) techniques and introduce an explainable activity recognition framework in which we leverage leading XAI methods to generate natural language explanations that explain what about an activity led to the given classification. Within the context of remote caregiver monitoring, we perform a two-step evaluation: (a) utilize ML experts to assess the sensibility of explanations, and (b) recruit non-experts in two user remote caregiver monitoring scenarios, synchronous and asynchronous, to assess the effectiveness of explanations generated via our framework. Our results show that the XAI approach, SHAP, has a 92% success rate in generating sensible explanations. Moreover, in 83% of sampled scenarios users preferred natural language explanations over a simple activity label, underscoring the need for explainable activity recognition systems. Finally, we show that explanations generated by some XAI methods can lead users to lose confidence in the accuracy of the underlying activity recognition model. We make a recommendation regarding which existing XAI method leads to the best performance in the domain of smart home automation, and discuss a range of topics for future work to further improve explainable activity recognition. △ Less

Submitted 26 May, 2023; v1 submitted 20 May, 2021; originally announced May 2021.

Comments: In ACM Transactions on Interactive Intelligent Systems

arXiv:2103.15987 [pdf, other]

PLAN-B: Predicting Likely Alternative Next Best Sequences for Action Prediction

Authors: Dan Scarafoni, Irfan Essa, Thomas Ploetz

Abstract: Action prediction focuses on anticipating actions before they happen. Recent works leverage probabilistic approaches to describe future uncertainties and sample future actions. However, these methods cannot easily find all alternative predictions, which are essential given the inherent unpredictability of the future, and current evaluation protocols do not measure a system's ability to find such a… ▽ More Action prediction focuses on anticipating actions before they happen. Recent works leverage probabilistic approaches to describe future uncertainties and sample future actions. However, these methods cannot easily find all alternative predictions, which are essential given the inherent unpredictability of the future, and current evaluation protocols do not measure a system's ability to find such alternatives. We re-examine action prediction in terms of its ability to predict not only the top predictions, but also top alternatives with the accuracy@k metric. In addition, we propose Choice F1: a metric inspired by F1 score which evaluates a prediction system's ability to find all plausible futures while keeping only the most probable ones. To evaluate this problem, we present a novel method, Predicting the Likely Alternative Next Best, or PLAN-B, for action prediction which automatically finds the set of most likely alternative futures. PLAN-B consists of two novel components: (i) a Choice Table which ensures that all possible futures are found, and (ii) a "Collaborative" RNN system which combines both action sequence and feature information. We demonstrate that our system outperforms state-of-the-art results on benchmark datasets. △ Less

Submitted 29 March, 2021; originally announced March 2021.

arXiv:2012.05333 [pdf, other]

Contrastive Predictive Coding for Human Activity Recognition

Authors: Harish Haresamudram, Irfan Essa, Thomas Ploetz

Abstract: Feature extraction is crucial for human activity recognition (HAR) using body-worn movement sensors. Recently, learned representations have been used successfully, offering promising alternatives to manually engineered features. Our work focuses on effective use of small amounts of labeled data and the opportunistic exploitation of unlabeled data that are straightforward to collect in mobile and u… ▽ More Feature extraction is crucial for human activity recognition (HAR) using body-worn movement sensors. Recently, learned representations have been used successfully, offering promising alternatives to manually engineered features. Our work focuses on effective use of small amounts of labeled data and the opportunistic exploitation of unlabeled data that are straightforward to collect in mobile and ubiquitous computing scenarios. We hypothesize and demonstrate that explicitly considering the temporality of sensor data at representation level plays an important role for effective HAR in challenging scenarios. We introduce the Contrastive Predictive Coding (CPC) framework to human activity recognition, which captures the long-term temporal structure of sensor data streams. Through a range of experimental evaluations on real-life recognition tasks, we demonstrate its effectiveness for improved HAR. CPC-based pre-training is self-supervised, and the resulting learned representations can be integrated into standard activity chains. It leads to significantly improved recognition performance when only small amounts of labeled training data are available, thereby demonstrating the practical value of our approach. △ Less

Submitted 9 December, 2020; originally announced December 2020.

arXiv:2007.06062 [pdf, other]

Transfer Learning for Activity Recognition in Mobile Health

Authors: Yuchao Ma, Andrew T. Campbell, Diane J. Cook, John Lach, Shwetak N. Patel, Thomas Ploetz, Majid Sarrafzadeh, Donna Spruijt-Metz, Hassan Ghasemzadeh

Abstract: While activity recognition from inertial sensors holds potential for mobile health, differences in sensing platforms and user movement patterns cause performance degradation. Aiming to address these challenges, we propose a transfer learning framework, TransFall, for sensor-based activity recognition. TransFall's design contains a two-tier data transformation, a label estimation layer, and a model… ▽ More While activity recognition from inertial sensors holds potential for mobile health, differences in sensing platforms and user movement patterns cause performance degradation. Aiming to address these challenges, we propose a transfer learning framework, TransFall, for sensor-based activity recognition. TransFall's design contains a two-tier data transformation, a label estimation layer, and a model generation layer to recognize activities for the new scenario. We validate TransFall analytically and empirically. △ Less

Submitted 12 July, 2020; originally announced July 2020.

arXiv:2006.05675 [pdf, other]

IMUTube: Automatic Extraction of Virtual on-body Accelerometry from Video for Human Activity Recognition

Authors: Hyeokhyen Kwon, Catherine Tong, Harish Haresamudram, Yan Gao, Gregory D. Abowd, Nicholas D. Lane, Thomas Ploetz

Abstract: The lack of large-scale, labeled data sets impedes progress in developing robust and generalized predictive models for on-body sensor-based human activity recognition (HAR). Labeled data in human activity recognition is scarce and hard to come by, as sensor data collection is expensive, and the annotation is time-consuming and error-prone. To address this problem, we introduce IMUTube, an automate… ▽ More The lack of large-scale, labeled data sets impedes progress in developing robust and generalized predictive models for on-body sensor-based human activity recognition (HAR). Labeled data in human activity recognition is scarce and hard to come by, as sensor data collection is expensive, and the annotation is time-consuming and error-prone. To address this problem, we introduce IMUTube, an automated processing pipeline that integrates existing computer vision and signal processing techniques to convert videos of human activity into virtual streams of IMU data. These virtual IMU streams represent accelerometry at a wide variety of locations on the human body. We show how the virtually-generated IMU data improves the performance of a variety of models on known HAR datasets. Our initial results are very promising, but the greater promise of this work lies in a collective approach by the computer vision, signal processing, and activity recognition communities to extend this work in ways that we outline. This should lead to on-body, sensor-based HAR becoming yet another success story in large-dataset breakthroughs in recognition. △ Less

Submitted 4 August, 2020; v1 submitted 29 May, 2020; originally announced June 2020.

arXiv:2005.11228 [pdf, other]

Leveraging WiFi Network Logs to Infer Student Collocation and its Relationship with Academic Performance

Authors: V. Das Swain, H. Kwon, S. Sargolzaei, B. Saket, M. Bin Morshed, K. Tran, D. Patel, Y. Tian, J. Philipose, Y. Cui, T. Plötz, M. De Choudhury, G. D. Abowd

Abstract: A comprehensive understanding of collocation can help understand performance outcomes. For university cohorts, this needs data that describes large groups over a long period. Harnessing user devices to infer this, while tempting, is challenged by privacy concerns, power consumption, and maintenance issues. Alternatively, embedding new sensors in the environment is limited by the expense of coverin… ▽ More A comprehensive understanding of collocation can help understand performance outcomes. For university cohorts, this needs data that describes large groups over a long period. Harnessing user devices to infer this, while tempting, is challenged by privacy concerns, power consumption, and maintenance issues. Alternatively, embedding new sensors in the environment is limited by the expense of covering the entire campus. We investigate the feasibility of leveraging WiFi association logs for this purpose. While these provide coarse approximations of location, these are easily obtainable and depict multiple users on campus over a semester. We explore how these coarse collocations are related to individual performance. Specifically, we inspect the association between individual performance and the collocation behaviors of project group members. We study 163 students (in 54 project groups) over 14 weeks. After describing how we determine collocation with the WiFi logs, we present a study to analyze how collocation within groups relates to a student's final score. We find collocation behaviors show a significant correlation (Pearson's r = 0.24) with performance -- better than both peer feedback or individual behaviors like attendance. Finally, we discuss how repurposing WiFi logs can facilitate applications for domains like mental wellbeing and physical health. △ Less

Submitted 5 May, 2021; v1 submitted 22 May, 2020; originally announced May 2020.

Comments: 25 pages, 10 figures, 5 tables

ACM Class: J.4

arXiv:1902.08068 [pdf, other]

Towards Reliable, Automated General Movement Assessment for Perinatal Stroke Screening in Infants Using Wearable Accelerometers

Authors: Yan Gao, Yang Long, Yu Guan, Anna Basu, Jessica Baggaley, Thomas Ploetz

Abstract: Perinatal stroke (PS) is a serious condition that, if undetected and thus untreated, often leads to life-long disability, in particular Cerebral Palsy (CP). In clinical settings, Prechtl's General Movement Assessment (GMA) can be used to classify infant movements using a Gestalt approach, identifying infants at high risk of developing PS. Training and maintenance of assessment skills are essential… ▽ More Perinatal stroke (PS) is a serious condition that, if undetected and thus untreated, often leads to life-long disability, in particular Cerebral Palsy (CP). In clinical settings, Prechtl's General Movement Assessment (GMA) can be used to classify infant movements using a Gestalt approach, identifying infants at high risk of developing PS. Training and maintenance of assessment skills are essential and expensive for the correct use of GMA, yet many practitioners lack these skills, preventing larger-scale screening and leading to significant risks of missing opportunities for early detection and intervention for affected infants. We present an automated approach to GMA, based on body-worn accelerometers and a novel sensor data analysis method-Discriminative Pattern Discovery (DPD)-that is designed to cope with scenarios where only coarse annotations of data are available for model training. We demonstrate the effectiveness of our approach in a study with 34 newborns (21 typically developing infants and 13 PS infants with abnormal movements). Our method is able to correctly recognise the trials with abnormal movements with at least the accuracy that is required by newly trained human annotators (75%), which is encouraging towards our ultimate goal of an automated PS screening system that can be used population-wide. △ Less

Submitted 21 February, 2019; originally announced February 2019.

Comments: Gao and Long share equal contributions; This work has been accepted for publication in ACM IMWUT (Ubicomp) 2019;

arXiv:1811.10493 [pdf, other]

Robust Cross-View Gait Recognition with Evidence: A Discriminant Gait GAN (DiGGAN) Approach

Authors: BingZhang Hu, Yu Guan, Yan Gao, Yang Long, Nicholas Lane, Thomas Ploetz

Abstract: Gait as a biometric trait has attracted much attention in many security and privacy applications such as identity recognition and authentication, during the last few decades. Because of its nature as a long-distance biometric trait, gait can be easily collected and used to identify individuals non-intrusively through CCTV cameras. However, it is very difficult to develop robust automated gait reco… ▽ More Gait as a biometric trait has attracted much attention in many security and privacy applications such as identity recognition and authentication, during the last few decades. Because of its nature as a long-distance biometric trait, gait can be easily collected and used to identify individuals non-intrusively through CCTV cameras. However, it is very difficult to develop robust automated gait recognition systems, since gait may be affected by many covariate factors such as clothing, walking speed, camera view angle etc. Out of them, large view angle changes has been deemed as the most challenging factor as it can alter the overall gait appearance substantially. Existing works on gait recognition are far from enough to provide satisfying performances because of such view changes. Furthermore, very few works have considered evidences -- the demonstrable information revealing the reliabilities of decisions, which are regarded as important demands in machine learning-based recognition/authentication applications. To address these issues, in this paper we propose a Discriminant Gait Generative Adversarial Network, namely DiGGAN, which can effectively extract view-invariant features for cross-view gait recognition; and more importantly, to transfer gait images to different views -- serving as evidences and showing how the decisions have been made. Quantitative experiments have been conducted on the two most popular cross-view gait datasets, the OU-MVLP and CASIA-B, where the proposed DiGGAN has outperformed state-of-the-art methods. Qualitative analysis has also been provided and demonstrates the proposed DiGGAN's capability in providing evidences. △ Less

Submitted 17 September, 2020; v1 submitted 26 November, 2018; originally announced November 2018.

Comments: Submitted to ACM Transactions on Intelligent Systems and Technology

arXiv:1810.12575 [pdf, other]

Neural Nearest Neighbors Networks

Authors: Tobias Plötz, Stefan Roth

Abstract: Non-local methods exploiting the self-similarity of natural signals have been well studied, for example in image analysis and restoration. Existing approaches, however, rely on k-nearest neighbors (KNN) matching in a fixed feature space. The main hurdle in optimizing this feature space w.r.t. application performance is the non-differentiability of the KNN selection rule. To overcome this, we propo… ▽ More Non-local methods exploiting the self-similarity of natural signals have been well studied, for example in image analysis and restoration. Existing approaches, however, rely on k-nearest neighbors (KNN) matching in a fixed feature space. The main hurdle in optimizing this feature space w.r.t. application performance is the non-differentiability of the KNN selection rule. To overcome this, we propose a continuous deterministic relaxation of KNN selection that maintains differentiability w.r.t. pairwise distances, but retains the original KNN as the limit of a temperature parameter approaching zero. To exploit our relaxation, we propose the neural nearest neighbors block (N3 block), a novel non-local processing layer that leverages the principle of self-similarity and can be used as building block in modern neural network architectures. We show its effectiveness for the set reasoning task of correspondence classification as well as for image restoration, including image denoising and single image super-resolution, where we outperform strong convolutional neural network (CNN) baselines and recent non-local models that rely on KNN selection in hand-chosen features spaces. △ Less

Submitted 30 October, 2018; originally announced October 2018.

Comments: to appear at NIPS*2018, code available at https://github.com/visinf/n3net/

arXiv:1809.09912 [pdf]

Geographical veracity of indicators derived from mobile phone data

Authors: Maarten Vanhoof, Thomas Ploetz, Zbigniew Smoreda

Abstract: In this contribution we summarize insights on the geographical veracity of using mobile phone data to create (statistical) indicators. We focus on problems that persist with spatial allocation, spatial delineation and spatial aggregation of information obtained from mobile phone data. For each of the cases, we offer insights from our works on a French CDR dataset and propose both short and long te… ▽ More In this contribution we summarize insights on the geographical veracity of using mobile phone data to create (statistical) indicators. We focus on problems that persist with spatial allocation, spatial delineation and spatial aggregation of information obtained from mobile phone data. For each of the cases, we offer insights from our works on a French CDR dataset and propose both short and long term solutions. As such, we aim at offering a list of challenges, and a roadmap for future work on the topic. △ Less

Submitted 26 September, 2018; originally announced September 2018.

Comments: 4 pages, 3 figures, 2 tables. Short paper contributed to the Netmob 2017 conference in Milan

arXiv:1809.07567 [pdf]

doi 10.2478/jos-2018-0046

Assessing the quality of home detection from mobile phone data for official statistics

Authors: Maarten Vanhoof, Fernando Reis, Thomas Ploetz, Zbigniew Smoreda

Abstract: Mobile phone data are an interesting new data source for official statistics. However, multiple problems and uncertainties need to be solved before these data can inform, support or even become an integral part of statistical production processes. In this paper, we focus on arguably the most important problem hindering the application of mobile phone data in official statistics: detecting home loc… ▽ More Mobile phone data are an interesting new data source for official statistics. However, multiple problems and uncertainties need to be solved before these data can inform, support or even become an integral part of statistical production processes. In this paper, we focus on arguably the most important problem hindering the application of mobile phone data in official statistics: detecting home locations. We argue that current efforts to detect home locations suffer from a blind deployment of criteria to define a place of residence and from limited validation possibilities. We support our argument by analysing the performance of five home detection algorithms (HDAs) that have been applied to a large, French, Call Detailed Record (CDR) dataset (~18 million users, 5 months). Our results show that criteria choice in HDAs influences the detection of home locations for up to about 40% of users, that HDAs perform poorly when compared with a validation dataset (the 35°-gap), and that their performance is sensitive to the time period and the duration of observation. Based on our findings and experiences, we offer several recommendations for official statistics. If adopted, our recommendations would help in ensuring a more reliable use of mobile phone data vis-à-vis official statistics. △ Less

Submitted 20 September, 2018; originally announced September 2018.

Comments: 30 pages, 3 figures, 1 table, presented at NTTS 2017, draft for a paper to appear in the Journal of Official Statistics

arXiv:1808.06398 [pdf]

Detecting home locations from CDR data: introducing spatial uncertainty to the state-of-the-art

Authors: Maarten Vanhoof, Fernando Reis, Zbigniew Smoreda, Thomas Ploetz

Abstract: Non-continuous location traces inferred from Call Detail Records (CDR) at population scale are increasingly becoming available for research and show great potential for automated detection of meaningful places. Yet, a majority of Home Detection Algorithms (HDAs) suffer from "blind" deployment of criteria to define homes and from limited possibilities for validation. In this paper, we investigate t… ▽ More Non-continuous location traces inferred from Call Detail Records (CDR) at population scale are increasingly becoming available for research and show great potential for automated detection of meaningful places. Yet, a majority of Home Detection Algorithms (HDAs) suffer from "blind" deployment of criteria to define homes and from limited possibilities for validation. In this paper, we investigate the performance and capabilities of five popular criteria for home detection based on a very large mobile phone dataset from France (~18 million users, 6 months). Furthermore, we construct a data-driven framework to assess the spatial uncertainty related to the application of HDAs. Our findings appropriate spatial uncertainty in HDA and, in extension, for detection of meaningful places. We show how spatial uncertainties on the individuals' level can be assessed in absence of ground truth annotation, how they relate to traditional, high-level validation practices and how they can be used to improve results for, e.g., nation-wide population estimation. △ Less

Submitted 20 August, 2018; originally announced August 2018.

Comments: 13 pages, 7 figures, contributed to the Mobile Tartu 2016 Conference

arXiv:1805.08367 [pdf, other]

Adaptive App Design by Detecting Handedness

Authors: Kriti Nelavelli, Thomas Ploetz

Abstract: Taller and sleeker smartphone devices are becoming the new norm. More screen space and very responsive touchscreens have made for enjoyable experiences available to us at all times. However, after years of interacting with smaller, portable devices, we still try to use these large smartphones on the go, and do not want to change how, where, and when we interact with them. The older devices were ea… ▽ More Taller and sleeker smartphone devices are becoming the new norm. More screen space and very responsive touchscreens have made for enjoyable experiences available to us at all times. However, after years of interacting with smaller, portable devices, we still try to use these large smartphones on the go, and do not want to change how, where, and when we interact with them. The older devices were easier to use with one hand, when mobile. Now, with bigger devices, users have trouble accessing all parts of the screen with one hand. We need to recognize the limitations in usability due to these large screens. We must start designing user interfaces that are more conducive to one hand usage, which is the preferred way of interacting with the phone. This paper introduces Adaptive App Design, a design methodology that promotes dynamic and adaptive interfaces for one handed usage. We present a novel method of recognizing which hand the user is interacting with and suggest how to design friendlier interfaces for them by presenting a set of design guidelines for this methodology. △ Less

Submitted 21 May, 2018; originally announced May 2018.

Comments: 10 pages, 5 figures

ACM Class: H.5.2

arXiv:1805.07648 [pdf, other]

On Attention Models for Human Activity Recognition

Authors: Vishvak S Murahari, Thomas Ploetz

Abstract: Most approaches that model time-series data in human activity recognition based on body-worn sensing (HAR) use a fixed size temporal context to represent different activities. This might, however, not be apt for sets of activities with individ- ually varying durations. We introduce attention models into HAR research as a data driven approach for exploring relevant temporal context. Attention model… ▽ More Most approaches that model time-series data in human activity recognition based on body-worn sensing (HAR) use a fixed size temporal context to represent different activities. This might, however, not be apt for sets of activities with individ- ually varying durations. We introduce attention models into HAR research as a data driven approach for exploring relevant temporal context. Attention models learn a set of weights over input data, which we leverage to weight the temporal context being considered to model each sensor reading. We construct attention models for HAR by adding attention layers to a state- of-the-art deep learning HAR model (DeepConvLSTM) and evaluate our approach on benchmark datasets achieving sig- nificant increase in performance. Finally, we visualize the learned weights to better understand what constitutes relevant temporal context. △ Less

Submitted 19 May, 2018; originally announced May 2018.

arXiv:1803.10586 [pdf, other]

Stochastic Variational Inference with Gradient Linearization

Authors: Tobias Plötz, Anne S. Wannenwetsch, Stefan Roth

Abstract: Variational inference has experienced a recent surge in popularity owing to stochastic approaches, which have yielded practical tools for a wide range of model classes. A key benefit is that stochastic variational inference obviates the tedious process of deriving analytical expressions for closed-form variable updates. Instead, one simply needs to derive the gradient of the log-posterior, which i… ▽ More Variational inference has experienced a recent surge in popularity owing to stochastic approaches, which have yielded practical tools for a wide range of model classes. A key benefit is that stochastic variational inference obviates the tedious process of deriving analytical expressions for closed-form variable updates. Instead, one simply needs to derive the gradient of the log-posterior, which is often much easier. Yet for certain model classes, the log-posterior itself is difficult to optimize using standard gradient techniques. One such example are random field models, where optimization based on gradient linearization has proven popular, since it speeds up convergence significantly and can avoid poor local optima. In this paper we propose stochastic variational inference with gradient linearization (SVIGL). It is similarly convenient as standard stochastic variational inference - all that is required is a local linearization of the energy gradient. Its benefit over stochastic variational inference with conventional gradient methods is a clear improvement in convergence speed, while yielding comparable or even better variational approximations in terms of KL divergence. We demonstrate the benefits of SVIGL in three applications: Optical flow estimation, Poisson-Gaussian denoising, and 3D surface reconstruction. △ Less

Submitted 28 March, 2018; originally announced March 2018.

Comments: To appear at CVPR 2018

arXiv:1707.01317 [pdf, other]

Robust Multi-Image HDR Reconstruction for the Modulo Camera

Authors: Florian Lang, Tobias Plötz, Stefan Roth

Abstract: Photographing scenes with high dynamic range (HDR) poses great challenges to consumer cameras with their limited sensor bit depth. To address this, Zhao et al. recently proposed a novel sensor concept - the modulo camera - which captures the least significant bits of the recorded scene instead of going into saturation. Similar to conventional pipelines, HDR images can be reconstructed from multipl… ▽ More Photographing scenes with high dynamic range (HDR) poses great challenges to consumer cameras with their limited sensor bit depth. To address this, Zhao et al. recently proposed a novel sensor concept - the modulo camera - which captures the least significant bits of the recorded scene instead of going into saturation. Similar to conventional pipelines, HDR images can be reconstructed from multiple exposures, but significantly fewer images are needed than with a typical saturating sensor. While the concept is appealing, we show that the original reconstruction approach assumes noise-free measurements and quickly breaks down otherwise. To address this, we propose a novel reconstruction algorithm that is robust to image noise and produces significantly fewer artifacts. We theoretically analyze correctness as well as limitations, and show that our approach significantly outperforms the baseline on real data. △ Less

Submitted 5 July, 2017; originally announced July 2017.

Comments: to appear at the 39th German Conference on Pattern Recognition (GCPR) 2017

arXiv:1707.01313 [pdf, other]

Benchmarking Denoising Algorithms with Real Photographs

Authors: Tobias Plötz, Stefan Roth

Abstract: Lacking realistic ground truth data, image denoising techniques are traditionally evaluated on images corrupted by synthesized i.i.d. Gaussian noise. We aim to obviate this unrealistic setting by developing a methodology for benchmarking denoising techniques on real photographs. We capture pairs of images with different ISO values and appropriately adjusted exposure times, where the nearly noise-f… ▽ More Lacking realistic ground truth data, image denoising techniques are traditionally evaluated on images corrupted by synthesized i.i.d. Gaussian noise. We aim to obviate this unrealistic setting by developing a methodology for benchmarking denoising techniques on real photographs. We capture pairs of images with different ISO values and appropriately adjusted exposure times, where the nearly noise-free low-ISO image serves as reference. To derive the ground truth, careful post-processing is needed. We correct spatial misalignment, cope with inaccuracies in the exposure parameters through a linear intensity transform based on a novel heteroscedastic Tobit regression model, and remove residual low-frequency bias that stems, e.g., from minor illumination changes. We then capture a novel benchmark dataset, the Darmstadt Noise Dataset (DND), with consumer cameras of differing sensor sizes. One interesting finding is that various recent techniques that perform well on synthetic noise are clearly outperformed by BM3D on photographs with real noise. Our benchmark delineates realistic evaluation scenarios that deviate strongly from those commonly used in the scientific literature. △ Less

Submitted 5 July, 2017; originally announced July 2017.

Comments: To appear at CVPR17. See our website (www.visinf.tu-darmstadt.de) for a version with high-resolution images

arXiv:1703.09370 [pdf, other]

doi 10.1145/3090076

Ensembles of Deep LSTM Learners for Activity Recognition using Wearables

Authors: Yu Guan, Thomas Ploetz

Abstract: Recently, deep learning (DL) methods have been introduced very successfully into human activity recognition (HAR) scenarios in ubiquitous and wearable computing. Especially the prospect of overcoming the need for manual feature design combined with superior classification capabilities render deep neural networks very attractive for real-life HAR application. Even though DL-based approaches now out… ▽ More Recently, deep learning (DL) methods have been introduced very successfully into human activity recognition (HAR) scenarios in ubiquitous and wearable computing. Especially the prospect of overcoming the need for manual feature design combined with superior classification capabilities render deep neural networks very attractive for real-life HAR application. Even though DL-based approaches now outperform the state-of-the-art in a number of recognitions tasks of the field, yet substantial challenges remain. Most prominently, issues with real-life datasets, typically including imbalanced datasets and problematic data quality, still limit the effectiveness of activity recognition using wearables. In this paper we tackle such challenges through Ensembles of deep Long Short Term Memory (LSTM) networks. We have developed modified training procedures for LSTM networks and combine sets of diverse LSTM learners into classifier collectives. We demonstrate, both formally and empirically, that Ensembles of deep LSTM learners outperform the individual LSTM networks. Through an extensive experimental evaluation on three standard benchmarks (Opportunity, PAMAP2, Skoda) we demonstrate the excellent recognition capabilities of our approach and its potential for real-life applications of human activity recognition. △ Less

Submitted 27 March, 2017; originally announced March 2017.

Comments: accepted for publication in ACM IMWUT (Ubicomp) 2017

arXiv:1604.08880 [pdf, other]

Deep, Convolutional, and Recurrent Models for Human Activity Recognition using Wearables

Authors: Nils Y. Hammerla, Shane Halloran, Thomas Ploetz

Abstract: Human activity recognition (HAR) in ubiquitous computing is beginning to adopt deep learning to substitute for well-established analysis techniques that rely on hand-crafted feature extraction and classification techniques. From these isolated applications of custom deep architectures it is, however, difficult to gain an overview of their suitability for problems ranging from the recognition of ma… ▽ More Human activity recognition (HAR) in ubiquitous computing is beginning to adopt deep learning to substitute for well-established analysis techniques that rely on hand-crafted feature extraction and classification techniques. From these isolated applications of custom deep architectures it is, however, difficult to gain an overview of their suitability for problems ranging from the recognition of manipulative gestures to the segmentation and identification of physical activities like running or ascending stairs. In this paper we rigorously explore deep, convolutional, and recurrent approaches across three representative datasets that contain movement data captured with wearable sensors. We describe how to train recurrent approaches in this setting, introduce a novel regularisation approach, and illustrate how they outperform the state-of-the-art on a large benchmark dataset. Across thousands of recognition experiments with randomly sampled model configurations we investigate the suitability of each model for different tasks in HAR, explore the impact of hyperparameters using the fANOVA framework, and provide guidelines for the practitioner who wants to apply deep learning in their problem setting. △ Less

Submitted 29 April, 2016; originally announced April 2016.

Comments: Extended version has been accepted for publication at International Joint Conference on Artificial Intelligence (IJCAI)

arXiv:1510.02071 [pdf, other]

doi 10.1109/CVPR.2013.338

Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition

Authors: Vinay Bettadapura, Grant Schindler, Thomaz Plotz, Irfan Essa

Abstract: We present data-driven techniques to augment Bag of Words (BoW) models, which allow for more robust modeling and recognition of complex long-term activities, especially when the structure and topology of the activities are not known a priori. Our approach specifically addresses the limitations of standard BoW approaches, which fail to represent the underlying temporal and causal information that i… ▽ More We present data-driven techniques to augment Bag of Words (BoW) models, which allow for more robust modeling and recognition of complex long-term activities, especially when the structure and topology of the activities are not known a priori. Our approach specifically addresses the limitations of standard BoW approaches, which fail to represent the underlying temporal and causal information that is inherent in activity streams. In addition, we also propose the use of randomly sampled regular expressions to discover and encode patterns in activities. We demonstrate the effectiveness of our approach in experimental evaluations where we successfully recognize activities and detect anomalies in four complex datasets. △ Less

Submitted 7 October, 2015; originally announced October 2015.

Comments: 8 pages

Journal ref: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2013) -- Pages 2619 - 2626

arXiv:1312.6995 [pdf, other]

doi 10.1016/j.pmcj.2014.05.006

Towards Using Unlabeled Data in a Sparse-coding Framework for Human Activity Recognition

Authors: Sourav Bhattacharya, Petteri Nurmi, Nils Hammerla, Thomas Plötz

Abstract: We propose a sparse-coding framework for activity recognition in ubiquitous and mobile computing that alleviates two fundamental problems of current supervised learning approaches. (i) It automatically derives a compact, sparse and meaningful feature representation of sensor data that does not rely on prior expert knowledge and generalizes extremely well across domain boundaries. (ii) It exploits… ▽ More We propose a sparse-coding framework for activity recognition in ubiquitous and mobile computing that alleviates two fundamental problems of current supervised learning approaches. (i) It automatically derives a compact, sparse and meaningful feature representation of sensor data that does not rely on prior expert knowledge and generalizes extremely well across domain boundaries. (ii) It exploits unlabeled sample data for bootstrapping effective activity recognizers, i.e., substantially reduces the amount of ground truth annotation required for model estimation. Such unlabeled data is trivial to obtain, e.g., through contemporary smartphones carried by users as they go about their everyday activities. Based on the self-taught learning paradigm we automatically derive an over-complete set of basis vectors from unlabeled data that captures inherent patterns present within activity data. Through projecting raw sensor data onto the feature space defined by such over-complete sets of basis vectors effective feature extraction is pursued. Given these learned feature representations, classification backends are then trained using small amounts of labeled training data. We study the new approach in detail using two datasets which differ in terms of the recognition tasks and sensor modalities. Primarily we focus on transportation mode analysis task, a popular task in mobile-phone based sensing. The sparse-coding framework significantly outperforms the state-of-the-art in supervised learning approaches. Furthermore, we demonstrate the great practical potential of the new approach by successfully evaluating its generalization capabilities across both domain and sensor modalities by considering the popular Opportunity dataset. Our feature learning approach outperforms state-of-the-art approaches to analyzing activities in daily living. △ Less

Submitted 23 July, 2014; v1 submitted 25 December, 2013; originally announced December 2013.

Comments: 18 pages, 12 figures, Pervasive and Mobile Computing, 2014

Showing 1–39 of 39 results for author: Plötz, T