Photoplethysmography (PPG) is a non-invasive wearable sensing method used in millions of devices for heart rate monitoring. However, PPG signals are highly susceptible to a variety of noise sources, including motion artifacts, sensor noise, and biological factors, especially in real-world wearable settings. These make designing generalizable models to accurately interpret cardiac activities challenging. This paper proposes a focus shift from learning with noisy signals to utilizing the characteristics of a mathematically modelled PPG waveform in an adversarial setting to increase the signal-to-noise ratio. The results show the proposed approach is robust against noisy data. We evaluated the model in a user study (N=22), where it was tested against unseen PPG data collected from a new sensor and users under three different activity levels. Results showed the generalisability of the approach compared to the state-of-the-art and it maintains consistent performance improvements across diverse user activities. We successfully implemented our model on a commonly used (android) mobile device, confirming its ability to provide fast inferences in a resource-constrained setting.
Electroencephalogram (EEG)-based Brain-Computer Interfaces (BCIs) have garnered significant interest across various domains, including rehabilitation and robotics. Despite advancements in neural network-based EEG decoding, maintaining performance across diverse user populations remains challenging due to feature distribution drift. This paper presents an effective approach to address this challenge by implementing a lightweight and efficient on-device learning engine for wearable motor imagery recognition. The proposed approach, applied to the well-established EEGNet architecture, enables real-time and accurate adaptation to EEG signals from unregistered users. Leveraging the newly released low-power parallel RISC-V-based processor, GAP9 from Greeenwaves, and the Physionet EEG Motor Imagery dataset, we demonstrate a remarkable accuracy gain of up to 7.31% with respect to the baseline with a memory footprint of 15.6 KByte. Furthermore, by optimizing the input stream, we achieve enhanced real-time performance without compromising inference accuracy. Our tailored approach exhibits inference time of 14.9 ms and 0.76 mJ per single inference and 20 us and 0.83 uJ per single update during online training. These findings highlight the feasibility of our method for edge EEG devices as well as other battery-powered wearable AI systems suffering from subject-dependant feature distribution drift.
One of the often overlooked features of wearable physiological sensing devices is the possibility for institutions to observe and manage the health and well-being of a group of people. For example, a school could use wearables to monitor for increased levels of stress or sickness within the student body. The accuracy and utility of current sensing devices make this possible. However, in real-world settings, many users do not use devices consistently or correctly. Moreover, institutional monitoring of individuals raises privacy concerns. In this study, we use real-world data to identify periods of cyclical stress in a cohort of 103 Japanese university students while preserving each student's privacy. We observe heightened waking heart rate (HR) and maximum waking HR, alongside corroborating evidence from sleep HR, sleep heart rate variability (HRV), activity patterns, and sleep phases, during periods coinciding with significant academic and societal events, suggesting clear associations with stress. We show the feasibility of detecting collective changes in stress biomarkers within a cohort using consumer wearables.
Due to the scarcity of labeled sensor data in HAR, prior research has turned to video data to synthesize Inertial Measurement Units (IMU) data, capitalizing on its rich activity annotations. However, generating IMU data from videos presents challenges for HAR in real-world settings, attributed to the poor quality of synthetic IMU data and its limited efficacy in subtle, fine-grained motions. In this paper, we propose Multi³Net, our novel multi-modal, multitask, and contrastive-based framework approach to address the issue of limited data. Our pretraining procedure uses videos from online repositories, aiming to learn joint representations of text, pose, and IMU simultaneously. By employing video data and contrastive learning, our method seeks to enhance wearable HAR performance, especially in recognizing subtle activities. Our experimental findings validate the effectiveness of our approach in improving HAR performance with IMU data. We demonstrate that models trained with synthetic IMU data generated from videos using our method surpass existing approaches in recognizing fine-grained activities.
This paper introduces three lip-based product biosensors as novel form factors for health monitoring that display pH levels through color variation. Using the unique properties of lip-based products such as different colors, easy application and reapplication on lips, interaction with saliva, we aim to provide an always-available and non-invasive access to information typically obtained through lab analysis. This paper presents our skin-safe fabrication processes and technical evaluations of a lipstick, lip tint, and lip gloss. We created a mobile app with a Convolutional Neural Network (CNN) model to detect the pH levels. Our dataset involved six users, eight different lighting conditions, three cameras and seven pH levels. The results showed improved detection of pH variations compared to traditional methods. A user study with 11 participants was conducted to evaluate usability. This approach offers a convenient and unexplored form factor for monitoring biochemical information, blending self-expression with health awareness.
Self-recording eating behaviors is a step towards a healthy lifestyle recommended by many health professionals. However, the current practice of manually recording eating activities using paper records or smartphone apps is often unsustainable and inaccurate. Smart glasses have emerged as a promising wearable form factor for tracking eating behaviors, but existing systems primarily identify when eating occurs without capturing details of the eating activities (E.g., what is being eaten). In this paper, we present EchoGuide, an application and system pipeline that leverages low-power active acoustic sensing to guide head-mounted cameras to capture egocentric videos, enabling efficient and detailed analysis of eating activities. By combining active acoustic sensing for eating detection with video captioning models and large-scale language models for retrieval augmentation, EchoGuide intelligently clips and analyzes videos to create concise, relevant activity records on eating. We evaluated EchoGuide with 9 participants in naturalistic settings involving eating activities, demonstrating high-quality summarization and significant reductions in video data needed, paving the way for practical, scalable eating activity tracking.
Tactile feedback mechanisms enhance the user experience of modern wearables by stimulating the sense of touch and enabling intuitive interactions. Electro-tactile stimulation-based tactile interfaces stand out due to their compact form factor and ability to deliver localized tactile sensations. Integrating force sensing with electro-tactile stimulation creates more responsive bidirectional systems that are beneficial in applications requiring precise control and feedback. However, current research often relies on separate sensors for force sensing, increasing system complexity and raising challenges in system scalability. We propose a novel approach that utilizes 3D-printed modified surfaces as the electro-tactile electrode interface to sense applied force and deliver feedback simultaneously without additional sensors. This method simplifies the system, maintains flexibility, and leverages the rapid prototyping capabilities of 3D printing. The functionality of this approach is validated through a user study (N=10), and two practical applications are proposed, both incorporating simultaneous sensing and tactile feedback.
As wearable-based data annotation remains, to date, a tedious, time-consuming task requiring researchers to dedicate substantial time, benchmark datasets within the field of Human Activity Recognition in lack richness and size compared to datasets available within related fields. Recently, vision foundation models such as CLIP have gained significant attention, helping the vision community advance in finding robust, generalizable feature representations. With the majority of researchers within the wearable community relying on vision modalities to overcome the limited expressiveness of wearable data and accurately label their to-be-released benchmark datasets offline, we propose a novel, clustering-based annotation pipeline to significantly reduce the amount of data that needs to be annotated by a human annotator. We show that using our approach, the annotation of centroid clips suffices to achieve average labelling accuracies close to 90% across three publicly available HAR benchmark datasets. Using the weakly annotated datasets, we further demonstrate that we can match the accuracy scores of fully-supervised deep learning classifiers across all three benchmark datasets. Code as well as supplementary figures and results are publicly downloadable via github.com/mariusbock/weak_har.
Smartwatches have firmly established themselves as a popular wearable form factor. The potential expansion of their interaction space to nearby surfaces offers a promising avenue for enhancing input accuracy and usability beyond the confines of a small screen. However, a key challenge is in detecting continuous contact states with the surface to inform the start and end of stateful interactions. In this paper, we introduce SoundScroll, enabling a rapid and precise determination of contact state and fingertip speed of sliding finger. We leverage vibrations from friction between a moving finger and a surface. Our proof-of-concept wristband captures a dual-channel vibration signal for robust sensing, considering both on-skin and in-air components. Our software predicts a finger sliding state as fast as 20 ms with an accuracy of 93.3%. Augmenting prior approaches detecting tap events, SoundScroll can be a robust, low-latency, and precise contact and motion sensing technique.
Making a body-worn device wearable is a deceptively difficult challenge: it is not enough to make a functional device and simply put it on the body; it must also work with the wearer to create a positive body-product relationship. In order for a body-mounted device to achieve true wearability, proper attention must also be paid to a multitude of additional questions, including things like device size, mass, rigidity, comfort, aesthetics, and other critical design criteria above and beyond raw functionality. This paper reviews papers from the International Symposium on Wearable Computing (ISWC) and the Journal of Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT) to understand the extent to which these professional communities acknowledge and incorporate fundamental questions of wearability into their work. The results found that while wearability is relatively present in the literature (84.44% of ISWC papers, 74.54% of IMWUT papers), there are many papers that fail to address wearability with the appropriate scientific rigor assuming they do address wearability at all. These results demonstrate the need for the wearables community to recognize the functional importance of wearability and its necessity to the design process.
Transportation mode detection (TMD) for wheelchair users is essential for applications that facilitate enhancing accessibility and quality of life. Yet, the lack of extensive datasets from disabled individuals hinders the development of tailored TMD systems. Our study assesses two data collection methods in TMD for disability research: using non-wheelchair users to simulate wheelchair activities (Simulation Real IMU) and generating synthetic sensor data from videos (Virtual IMU). Results show that, when using a larger dataset and multiple sensor modalities, models trained on Simulation Real IMU perform better. However, models trained on both Simulation Real IMU and Virtual IMU exhibited similar performances when sensors were restricted to accelerometer and gyroscope only. This finding guides future researchers toward the use of Simulation Real IMU for comprehensive, multimodal sensor studies, provided they have sufficient budget and time. However, the more cost and time-efficient Virtual IMU can be a viable alternative in scenarios using basic sensors.
This work proposes an incremental learning (IL) framework for wearable sensor human activity recognition (HAR) that tackles two challenges simultaneously: catastrophic forgetting and non-uniform inputs. The scalable framework, iKAN, pioneers IL with Kolmogorov-Arnold Networks (KAN) to replace multi-layer perceptrons as the classifier that leverages the local plasticity and global stability of splines. To adapt KAN for HAR, iKAN uses task-specific feature branches and a feature redistribution layer. Unlike existing IL methods that primarily adjust the output dimension or the number of classifier nodes to adapt to new tasks, iKAN focuses on expanding the feature extraction branches to accommodate new inputs from different sensor modalities while maintaining consistent dimensions and the number of classifier outputs. Continual learning across six public HAR datasets demonstrated the iKAN framework's IL performance, with a last performance of 84.9% (weighted F1 score) and an average incremental performance of 81.34%, which significantly outperforms the two existing incremental learning methods, such as Elastic Weight Consolidation (EWC) method (51.42%) and experience replay (59.92%).
We introduce MunchSonic, an AI-powered active acoustic sensing system integrated into eyeglasses to track fine-grained dietary actions. MunchSonic emits inaudible ultrasonic waves from the eyeglass frame, with the reflected signals capturing detailed positions and movements of body parts, including the mouth, jaw, arms, and hands involved in eating. These signals are processed by a deep learning pipeline to classify six actions: hand-to-mouth movements for food intake, chewing, drinking, talking, face-hand touching, and other activities (null). In an unconstrained study with 12 participants, MunchSonic achieved a 93.5% macro F1-score in a user-independent evaluation with a 2-second resolution in tracking these actions, also demonstrating its effectiveness in tracking eating episodes and food intake frequency within those episodes.
Student engagement plays a vital role in academic success with high engagement often linked to positive educational outcomes. Traditionally, student engagement is measured through self-reports, which are both labour-intensive and not real-time. An emerging alternative is monitoring physiological signals such as Electrodermal Activity (EDA) and Inter-Beat Interval (IBI), which reflect students' emotional and cognitive states. In this research, we analyzed these signals from 23 students wearing Empatica E4 devices in real-world scenarios. Diverging from previous studies focused on lab settings or specific subjects, we examined physiological synchrony at the intra-student level across various courses. We also assessed how different courses influence physiological responses and identified consistent temporal patterns. Our findings show unique physiological response patterns among students, enhancing our understanding of student engagement dynamics. This opens up possibilities for tailoring educational strategies based on unobtrusive sensing data to optimize learning outcomes.
A lifestyle of wearing multiple wearable devices is becoming a daily routine. However, the use of wearable devices may narrow the user's UFOV (Useful Field Of View). Not only presenting information by wearable devices but also wearing wearable devices themselves may increase cognitive load, and it causes the UFOV narrowing. It is important to understand how wearable devices affect the UFOV because the narrowed UFOV could lead to overlooking surrounding objects. In this paper, we focused on an optical AR HMD (Head-Mounted Display) that partially occupied the visual field and investigated the effects of wearing the device and presenting information on the UFOV. The experimental results confirmed that wearing the optical AR HMD significantly narrows the UFOV, and presenting information further narrows it significantly. Additionally, reacting to the presented information significantly narrows the UFOV. Based on these results, we presented a guideline for using optical AR HMDs.
Occupation information can be utilized by digital assistants to provide occupation-specific personalized task support, including interruption management, task planning, and recommendations. Prior research in the digital workplace assistant domain requires users to input their occupation information for effective support. However, as many individuals switch between multiple occupations daily, current solutions falter without continuous user input. To address this, this study introduces WorkR, a framework that leverages passive sensing to capture pervasive signals from various task activities, addressing three challenges: the lack of a passive sensing architecture, personalization of occupation characteristics, and discovering latent relationships among occupation variables. We argue that signals from application usage, movements, social interactions, and the environment can inform a user's occupation. WorkR uses a Variational Autoencoder (VAE) to derive latent features for training models to infer occupations. Our experiments with an anonymized, context-rich activity and task log dataset demonstrate that our models can accurately infer occupations with more than 91% accuracy across six ISO occupation categories.
We present RetailOpt, a novel opt-in, easy-to-deploy system for tracking customer movements offline in indoor retail environments. The system uses readily accessible information from customer smartphones and retail apps, including motion data, store maps, and purchase records. This eliminates the need for additional hardware installations/maintenance and ensures customers full data control. Specifically, RetailOpt first uses inertial navigation to recover relative trajectories from smartphone motion data. The store map and purchase records are cross-referenced to identify a list of visited shelves, providing anchors to localize the relative trajectories in a store through continuous and discrete optimization. We demonstrate the effectiveness of our system in five diverse environments. The system, if successful, would produce accurate customer movement data, essential for a broad range of retail applications including customer behavior analysis and in-store navigation.
Neural network models have demonstrated exceptional performance in wearable human activity recognition (HAR) tasks. However, the increasing size or complexity of HAR models significantly impacts their deployment on wearable devices with limited computational power. In this study, we introduce a novel HAR model architecture named Multi-Layer Perceptron-HAR (MLP-HAR), which contains solely fully connected layers. This model is specifically designed to address the unique characteristics of HAR tasks, such as multi-modality interaction and global temporal information. The MLP-HAR model employs fully connected layers that alternately operate along the modality and temporal dimensions, enabling multiple fusions of information across these dimensions. Our proposed model demonstrates comparable performance with other state-of-the-art HAR models on six open-source datasets, while utilizing significantly fewer learnable parameters and exhibiting lower model complexity. Specifically, the complexity of our model is at least ten times smaller than that of the TinyHAR model and several hundred times smaller than the benchmark model DeepConvLSTM. Additionally, due to its purely fully connected layer-based architecture, MLP-HAR offers the advantage of ease of deployment. To substantiate these claims, we report the inference time performance of MLP-HAR on the Samsung Galaxy Watch 5 PRO and the Arduino Portenta H7 LITE, comparing it against other state-of-the-art HAR models.
The lack of haptic sensations beyond very simple vibration feedback diminishes the feeling of presence in Virtual Reality. Research suggested various approaches to deliver haptic sensations to the user's palm. However, these approaches are typically limited in the number of actuation directions and only focus on enhancing the system's output, ignoring haptic input. We present Embracer, a wrist-mounted encountered-type haptic controller that addresses these gaps by rendering forces along three axes through a sphere-shaped end effector within the user's palm. Using modified servo motors, we sense user-performed manipulations of the end effector as an input modality. In this paper, we contribute the design and implementation of Embracer together with a preliminary technical evaluation. By providing a more comprehensive haptic feedback system, Embracer enhances the realism and immersion of haptic feedback and user control.
In recent years, research on sensory ear-worn devices ("earables") explored measuring body temperature at different locations in and around the ears. While the tympanic membrane (eardrum) is a well-established medical gold-standard position, earables face challenges integrating body temperature sensors because of the limited space available, the diversity of earphone designs, and the obstruction of the eardrum by the audio hardware components. Therefore, to understand the trade-offs in accuracy between different sensor positions in and around the ears, we contribute the first systematic comparison of locations around the ear. Based on related work, existing earable form factors, and a pre-study with thermal ear images from four subjects, we selected five positions to compare to tympanic temperature: concha, ear canal and three positions spread behind the ear. Subsequently, we developed a custom earable with six optical temperature sensors that can measure all positions simultaneously. In a study with 12 participants at room temperature and settled isothermal conditions, we find that compared to the tympanic membrane, the mean temperature difference was 0.30 °C colder at the concha (sufficient according to American Society for Testing and Materials) and ear canal, and 0.6 °C colder at positions behind the ear. Exposing participants to varying environmental conditions and physical movements resulted in unreliable measurements which could not be calibrated.
Smart glasses, such as the Epson Moverio and Tooz DevKit, are often designed so that the display is near the ear and an optical combiner is positioned to reflect the display into the user's eye. When the display is off, this combiner can still be seen as an out-of-focus edge or discoloration in the lens. Therefore, positioning the combiner to be unobtrusive when the display is off is essential to the users? comfort. We simulated monocular, right-eyed displays with optical combiners' centermost edges offset by different angles from the primary position of gaze (PPOG) to evaluate participants? perceived comfort. Results suggest that to improve user's comfort, the edges of optical combiner should be beyond 20.2° towards the nose or 8.7° towards the ear from the PPOG horizontally.
Emerging wearable construction toolkits offer new avenues for hands-on learning through an accessible and creative making process. This paper uses an on-skin wearable prototyping toolkit in hands-on workshops with a total of 45 middle-school students aged between 11 and 15. Besides investigating the effectiveness of utilizing the on-skin toolkit to foster creativity, we iteratively designed and optimized the workshop format, which consists of a hands-on tutorial, a group-making process, and a presentation of project prototypes. Our findings suggest positive engagement and interest in the making process from the middle-school students who participated in the on-skin wearable workshop.