Program At-a-glance

UbiComp / ISWC 2020: Program At-a-glance

Download the Program At-a-glance as PDF here.


Keynotes on Emotion and Expression
Keynote on Conversational Systems
Keynote on Pandemics


Conference Opening
Awards, Gadget Show, Town Hall
Gather Town


Monday, September 14, 2020 18:00-19:30 EDT
Tuesday, September 15, 2020 10:00-11:30 EDT
Tuesday, September 15, 2020 17:00-18:30 EDT
Wednesday, September 16, 2020 09:00-10:30 EDT
Wednesday, September 16, 2020 18:00-19:30 EDT

Papers by Session

Session 1A:
‘Security, Privacy, and Acceptance’
Monday, September 14, 2020 18:00-19:30 EDT

View Session 1A Presentations

Security and privacy pose a serious barrier to the use of mobile technology by older adults. While support from family and friends is known to be an effective enabler in older adults’ technology adoption, we know very little about the family members’ motivations for providing help, the context, and the process in which they provide it. To bridge this gap, we have conducted a mixed method study, qualitatively analyzing the helpers’ assistance stories and quantitatively estimating the factors that affect helpers’ willingness to offer assistance to older relatives regarding mobile security and privacy problems. Our findings point to the potential for helping older relatives, i.e., people are more willing to help and guide them than other social groups. Furthermore, we show that familiarity with an older relative’s preferences is essential in providing meaningful support. We discuss our findings in the context of developing a theory of collective efficacy for security and privacy and new collaborative technologies that can reduce the barriers to social help.


With the rising popularity of wearable devices and sensors, shielding Body Area Networks (BANs) from eavesdroppers has become an urgent problem to solve. Since the conventional key distribution systems are too onerous for resource-constrained wearable sensors, researchers are pursuing a new light-weight key generation approach that enables two wearable devices attached at different locations of the user body to generate an identical key simultaneously simply from their independent observations of user gait. A key challenge for such gait-based key generation lies in matching the bits of the keys generated by independent devices despite the noisy sensor measurements, especially when the devices are located far apart on the body affected by different sources of noise. To address the challenge, we propose a novel machine learning framework, called Auto-Key, that uses an autoencoder to help one device predict the gait observations at another distant device attached to the same body and generate the key using the predicted sensor data. We prototype the proposed method and evaluate it using a public acceleration dataset collected from 15 real subjects wearing accelerometers attached to seven different locations of the body. Our results show that, on average, Auto-Key increases the matching rate of independently generated bits from two sensors attached at two different locations by 16.5%, which speeds up the successful generation of fully-matching symmetric keys at independent wearable sensors by a factor of 1.9. In the proposed framework, a subject-specific model can be trained with 50% fewer data and 88% less time by retraining a pre-trained general model when compared to training a new model from scratch. The reduced training complexity makes Auto-Key more practical for edge computing, which provides better privacy protection to biometric and behavioral data compared to cloud-based training.


With the increasing social acceptance and openness, more and more sexual-minority men (SMM) have succeeded in creating and sustaining steady relationships in recent years. Maintaining steady relationships is beneficial to the wellbeing of SMM both mentally and physically. However, the relationship maintaining for them is also challenging due to the much less supports compared to the heterosexual couples, so that it is important to identify those SMM in steady relationship and provide corresponding personalized assistance. Furthermore, knowing SMM’s relationship and the correlations with other visible features is also beneficial for optimizing the social applications’ functionalities in terms of privacy preserving and friends recommendation. With the prevalence of SMM-oriented social apps (called SMMSA for short), this paper investigates the relationship status of SMM from a new perspective, that is, by introducing the SMM’s online digital footprints left on SMMSA (e.g., presented profile, social interactions, expressions, sentiment, and mobility trajectories). Specifically, using a filtered dataset containing 2,359 active SMMSA users with their self-reported relationship status and publicly available app usage data, we explore the correlations between SMM’s relationship status and their online digital footprints on SMMSA and present a set of interesting findings. Moreover, we demonstrate that by utilizing such correlations, it has the potential to construct machine-learning-based models for relationship status inference. Finally, we elaborate on the implications of our findings from the perspective of better understanding the SMM community and improving their social welfare.


Recently, the ubiquity of mobile devices leads to an increasing demand of public network services, e.g., WiFi hot spots.
As a part of this trend, modern transportation systems are equipped with public WiFi devices to provide Internet access for passengers as people spend a large amount of time on public transportation in their daily life.
However, one of the key issues in public WiFi spots is the privacy concern due to its open access nature.
Existing works either studied location privacy risk in human traces or privacy leakage in private networks such as cellular networks based on the data from cellular carriers.
To the best of our knowledge, none of these work has been focused on bus WiFi privacy based on large-scale real-world data.
In this paper, to explore the privacy risk in bus WiFi systems, we focus on two key questions how likely bus WiFi users can be uniquely re-identified if partial usage information is leaked and how we can protect users from the leaked information.
To understand the above questions, we conduct a case study in a large-scale bus WiFi system, which contains 20 million connection records and 78 million location records from 770 thousand bus WiFi users during a two-month period.
Technically, we design two models for our uniqueness analyses and protection, i.e., a PB-FIND model to identify the probability a user can be uniquely re-identified from leaked information;
a PB-HIDE model to protect users from potentially leaked information.
Specifically, we systematically measure the user uniqueness on users’ finger traces (i.e., connection URL and domain), foot traces (i.e., locations), and hybrid traces (i.e., both finger and foot traces).
Our measurement results reveal
(i) 97.8% users can be uniquely re-identified by 4 random domain records of their finger traces and 96.2% users can be uniquely re-identified by 5 random locations on buses;
(ii) 98.1% users can be uniquely re-identified by only 2 random records if both their connection records and locations are leaked to attackers.
Moreover, the evaluation results show
our PB-HIDE algorithm protects more than 95% users from the potentially leaked information by inserting only 1.5% synthetic records in the original dataset to preserve their data utility.


The rapid adoption of Smartphone devices has caused increasing security and privacy risks and breaches. Catching up with ever-evolving contemporary smartphone technology challenges leads older adults (aged 50+) to reduce or to abandon their use of mobile technology. To tackle this problem, we present AppMoD, a community-based approach that allows delegation of security and privacy decisions a trusted social connection, such as a family member or a close friend. The trusted social connection can assist in the appropriate decision or make it on behalf of the user. We implement the approach as an Android app and describe the results of three user studies (n=50 altogether), in which pairs of older adults and family members used the app in a controlled experiment. Using app anomalies as an ongoing case study, we show how delegation improves the accuracy of decisions made by older adults. Also, we show how combining decision-delegation with crowdsourcing can enhance the advice given and improve the decision-making process. Our results suggest that a community-based approach can improve the state of mobile security and privacy.


The remarkable success of machine learning has fostered a growing number of cloud-based intelligent services for mobile users. Such a service requires a user to send data, e.g. image, voice, and video, to the provider, which presents a serious challenge to user privacy. To address this, prior works either obfuscate the data, e.g., add noise and remove identity information, or send representations extracted from the data, e.g., anonymized features. They struggle to balance between the service utility and data privacy because obfuscated data reduces utility and extracted representation may still reveal sensitive information.

This work departs from prior works in methodology: we leverage adversarial learning to a better balance between privacy and utility. We design a representation encoder that generates the feature representations to optimize against the privacy disclosure risk of sensitive information (a measure of privacy) by the privacy adversaries and concurrently optimize with the task inference accuracy (a measure of utility) by the utility discriminator. The result is the privacy adversarial network (PAN), a novel deep model with the new training algorithm, that can automatically learn representations from the raw data. And the trained encoder can be deployed on the user side to generate representations that satisfy the task-defined utility requirements and the user-specified/agnostic privacy budgets.

Intuitively, PAN adversarially forces the extracted representations to only convey the information required by the target task. Surprisingly, this constitutes an implicit regularization that actually improves task accuracy. As a result, PAN achieves better utility and better privacy at the same time! We report extensive experiments on six popular datasets and demonstrate the superiority of PAN compared with alternative methods reported in prior work.


CAPTCHAs are used to distinguish between human- and computer-generated (i.e., bot) online traffic. As there is an ever-increasing amount of online traffic from mobile devices, it is necessary to design CAPTCHAs that work well on mobile devices. In this paper, we present SenCAPTCHA, a mobile-first CAPTCHA that leverages the device’s orientation sensors. SenCAPTCHA works by showing users an image of an animal and asking them to tilt their device to guide a red ball into the center of that animal’s eye. SenCAPTCHA is especially useful for devices with small screen sizes (e.g., smartphones, smartwatches). In this paper, we describe the design of SenCAPTCHA and demonstrate that it is resilient to various machine learning based attacks. We also report on two usability studies of SenCAPTCHA involving a total of 472 participants; our results show that SenCAPTCHA is viewed as an “enjoyable” CAPTCHA and that it is preferred by over half of the participants to other existing CAPTCHA systems.


Recent years have witnessed the surge of biometric-based user authentication for mobile devices due to its promising security and convenience. As a natural and widely-existed behavior, human speaking has been exploited for user authentication. Existing voice-based user authentication explores the unique characteristics from either the voiceprint or mouth movements, which is vulnerable to replay attacks and mimic attacks. During speaking, the vocal tract, including the static shape and dynamic movements, also exhibits the individual uniqueness, and they are hardly eavesdropped and imitated by adversaries. Hence, our work aims to employ the individual uniqueness of vocal tract to realize user authentication on mobile devices. Moreover, most voice-based user authentications are passphrase-dependent, which significantly degrade the user experience. Thus, such user authentications are pressed to be implemented in a passphrase-independent manner while being able to resist various attacks. In this paper, we propose a user authentication system, VocalLock, which senses the whole vocal tract during speaking to identify different individuals in a passphrase-independent manner on smartphones leveraging acoustic signals. VocalLock first utilizes FMCW on acoustic signals to characterize both the static shape and dynamic movements of the vocal tract during speaking, and then constructs a passphrase-independent user authentication model based on the unique characteristics of vocal tract through GMM-UBM. The proposed VocalLock can resist various spoofing attacks, while achieving a satisfactory user experience. Extensive experiments in real environments demonstrate VocalLock can accurately authenticate user identity in a passphrase-independent manner and successfully resist various attacks.


Advertising is an unavoidable albeit a frustrating part of mobile interactions. Due to limited form factor, mobile advertisements often resort to intrusive strategies where they temporarily block the user’s view in an attempt to increase effectiveness by forcing the user’s attention. While such strategies contribute to advertising awareness and effectiveness, they do so at the cost of degrading the user’s overall experience and can lead to frustration and annoyance. In this paper, we contribute by developing Perceptive Ads as an intelligent advertisement placement strategy that minimizes disruptions caused by ads while preserving their effectiveness. Our work is the first to simultaneously consider the needs of users, app developers, and advertisers. Ensuring the needs of all stakeholders are taken into account is essential for the adoption of advertising strategies as users (and indirectly developers) would reject strategies that are disruptive but effective, while advertisers would reject strategies that are non-disruptive but inefficient. We demonstrate the effectiveness of our technique through a user study with N = 16 participants and two representative examples of mobile apps that commonly integrate advertisements: a game and a news app. Results from the study demonstrate that our approach improves perception towards advertisements by 43.75% without affecting application interactivity while at the same time increasing advertisement effectiveness by 37.5% compared to a state-of-the-art baseline.


Deep neural networks (DNNs) continue to demonstrate superior generalization performance in an increasing range of applications, including speech recognition and image understanding. Recent innovations in compression algorithms, design of efficient architectures and hardware accelerators have prompted a rapid growth in deploying DNNs on mobile and IoT devices to redefine user experiences. Relying on the superior inference quality of DNNs, various voice-enabled devices have started to pervade our everyday lives and are increasingly used for, e.g., opening and closing doors, starting or stopping washing machines, ordering products online, and authenticating monetary transactions. As the popularity of these voice-enabled services increases, so does their risk of being attacked. Recently, DNNs have been shown to be extremely brittle under adversarial attacks and people with malicious intentions can potentially exploit this vulnerability to compromise DNN-based voice-enabled systems. Although some existing work already highlights the vulnerability of audio models, very little is known of the behaviour of compressed on-device audio models under adversarial attacks. This paper bridges this gap by investigating thoroughly the vulnerabilities of compressed audio DNNs and makes a stride towards making compressed models robust. In particular, we propose a stochastic compression technique that generates compressed models with greater robustness to adversarial attacks. We present an extensive set of evaluations on adversarial vulnerability and robustness of DNNs in two diverse audio recognition tasks, while considering two popular attack algorithms: FGSM and PGD. We found that error rates of conventionally trained audio DNNs under attack can be as high as 100%. Under both white- and black-box attacks, our proposed approach is found to decrease the error rate of DNNs under attack by a large margin.


Session 1B:
‘Displays, Tactile, and New Interaction’
Monday, September 14, 2020 18:00-19:30 EDT

View Session 1B Presentations

Public-speaking situations such as classroom lectures, seminars, and meetings, where speakers must actively engage the audience, require considerable effort from the speaker in gathering verbal and non-verbal feedback from the audience. Garnering feedback can be made easier by technologies such as augmented reality (AR) capable of displaying information in 3D space surrounding the speaker. We present an AR-enabled presentation display system to provide real-time feedback from the audience to the speaker during a presentation. The feedback includes names and affective states of the audience, icons requesting a change in volume and rate of speech, and annotated questions and comments. The design of the feedback system was informed by findings from an exploratory study with academic professionals experienced in delivering presentations. In a between-subjects study, we evaluated presentations displaying feedback information spatially overlaid above the heads of the audience members in one condition and in the periphery of the presenter’s view in another condition, as compared with a no-AR control condition. Results showed that the presenters in the overlay condition called upon the audience by name significantly more often than in the peripheral and the control conditions, and they rated the overlay condition as more reliable, helpful, and wanting to use it for future presentations compared to the peripheral condition. Overall, AR feedback was considered useful by both the presenters and the audience and did not negatively impact speaker confidence and state anxiety of the presenter.


Tactile pavings are public works for visually impaired people, designed to indicate a particular path to follow by providing haptic cues underfoot. However, they face many limitations such as installation errors, obstructions, degradation, and limited coverage. To address these issues, we propose Virtual Paving, which aims to assist independent navigation by rendering a smooth path to visually impaired people through multi-modal feedback. This work assumes that a path has been planned to avoid obstacles and focuses on the feedback design to guide users along the path safely, smoothly, and efficiently. Firstly, we extracted the design guidelines of Virtual Paving based on an investigation into visually impaired people’s current practices and issues with tactile pavings. Next, we developed a multi-modal solution through co-design and evaluation with visually impaired users. This solution included (1) vibrotactile feedback on the shoulders and waist to give readily-perceivable directional cues and (2) audio feedback to describe road conditions ahead of the user. Finally, we evaluated the proposed solution through user tests. Guided by the designed feedback, 16 visually impaired participants successfully completed 127 out of 128 trials with 2.1m-wide basic paths, including straight and curved paths. Subjective feedback indicated that our solution to render Virtual Paving was easy for users to learn, and it also enabled them to walk smoothly. The feasibility and potential limitations for Virtual Paving to support independent navigation in real environments are discussed.


A parent’s capacity to understand the mental states of both him/herself and the child is considered to play a significant role in various aspects of parent-child relationship–e.g., lowering parental stress and supporting cognitive development of the child. We propose Dyadic Mirror, a wearable smart mirror which is designed to foster the aforementioned parental capacity in everyday parent-child interaction. Its key feature is to provide a parent with a second-person live-view from the child, i.e., the parent’s own face as seen by the child, during their face-to-face interaction. Dyadic Mirror serves as a straightforward cue that helps the parent be aware of (1) his/her emotional state, and (2) the way he/she would be now being seen by the child, thereby facilitate the parent to infer the child’s mental state. To evaluate Dyadic Mirror under unconstrained parent-child interactions in real-life, we implemented the working prototype of Dyadic Mirror and deployed it to 6 families over 4 weeks. The participating parents reported extensive experiences with Dyadic Mirror, supporting that Dyadic Mirror has helped them be aware of their recurring but unconscious behaviors, understand their children’s feelings, reason with the children’s behaviors, and find self-driven momenta to better their attitude and expressions towards their children.


Digital displays are a ubiquitous feature of public spaces; London recently deployed a whole network of new displays in its Underground stations, and the screens on One Time Square (New York) allow for presentation of over 16, 000 square feet of digital media. However, despite decades of research into pervasive displays, the problem of scheduling content is under-served and there is little forward momentum in addressing the challenges brought with large-scale and open display networks. This paper presents the first comprehensive architectural model for scheduling in current and anticipated pervasive display systems. In contrast to prior work, our three-stage model separates out the process of high level goal setting from content filtering and selection. Our architecture is motivated by an extensive review of the literature and a detailed consideration of requirements. The architecture is realised with an implementation designed to serve the world’s largest and longest-running research testbed of pervasive displays. A mixed-methods evaluation confirms the viability of the architecture from three angles: demonstrating capability to meet the articulated requirements, performance that comfortably fits within the demands of typical display deployments, and evidence of its ability to serve as the day-to-day scheduling platform for the previously described research testbed. Based on our evaluation and a reflection on paper as a whole, we identify ten implications that will shape future research and development in pervasive display scheduling.


One of the main measures to evaluate a head-mounted display (HMD) based experience is the state of feeling present in virtual reality (VR).
The detection of disturbances of such an experience that occur over time, namely breaks in presence (BIP), enables the evaluation and improvement of these.
Existing methods do not detect BIPs, e.g., questionnaires, or are complex in their application and evaluation, e.g., physiological and behavioral measures.
We propose a handy post-experience method in which users reflect on their experienced state of presence by drawing a line in a paper-based drawing template.
The amplitude of the drawn line represents the state of presence of the temporal progress of the experience.
We propose a descriptive model that describes temporal variations in the drawings by the definition of relevant points over time, e.g., putting on the HMD, phases of the experience, transition into VR, and parameters, e.g., the transition time.
The descriptive model enables us to objectively evaluate user drawings and represent the course of the drawings by a defined set of parameters.
Our exploratory user study (N=30) showed that the drawings are very consistent between participants and the method is able to securely detect a variety of BIPs.
Moreover, the results indicate that the method might be used in the future to evaluate the strength of BIPs and to reflect the temporal course of a presence experience in detail.
Additional application examples and a detailed discussion pave the way for others to use our method.
Further, they serve as a motivation to continue working on the method and the general understanding of temporal fluctuations of the presence experience.


We proposeHeadCross, a head-based interaction method to select targets on VR and AR head-mounted displays (HMD). UsingHeadCross, users control the pointer with head movements and to select a target, users move the pointer into the target and then back across the target boundary. In this way, users can select targets without using their hands, which is helpful when users’ hands are occupied by other tasks, e.g., while holding the handrails. However, a major challenge for head-based methods is the false positive problems: unintentional head movements may be incorrectly recognized asHeadCrossgesturesand trigger the selections. To address this issue, we first conduct a user study (Study 1) to observe user behavior whileperformingHeadCrossand identify the behavior differences betweenHeadCrossand other types of head movements. Based on the results, we discuss design implications, extract useful features, and develop the recognition algorithm forHeadCross.To evaluateHeadCross, we conduct two user studies. In Study 2, we comparedHeadCrossto the dwell-based selection method,button-press method, and mid-air gesture-based method. Two typical target selection tasks (text entry and menu selection)are tested on both VR and AR interfaces. Results showed that compared to the dwell-based method, HeadCross improved the sense of control; and compared to two hand-based methods, HeadCross improved the interaction efficiency and reduced fatigue. In Study 3, we comparedHeadCrossto three alternative designs of head-only selection methods. Results show thatHeadCrosswas perceived to be significantly faster than the alternatives. We conclude with the discussion on the interaction potential and limitations of HeadCross


The desire to stay connected to one another over large distances has guided decades of telepresence research. Most focuses on stationary solutions that can deliver high-fidelity telepresence experiences, but these are usually impractical for the wider population who cannot afford the necessary proprietary equipment or are unwilling to regress to non-mobile communication. In this paper we present Mobileportation, a nomadic telepresence prototype that takes advantage of recent developments in mobile technology to provide immersive experiences wherever the user desires by allowing for seamless transitions between ego- and exocentric views within a mutually shared three-dimensional environment. The results of a user study are also discussed that show Mobileportation’s ability to induce a sense of presence within this environment and with the remote communication partner, as well as the potential of this platform for future telepresence research.


On-skin displays have emerged as a seamless form factor for visualizing information. However, the non-traditional form factor of these on-skin displays and how they present notifications on the skin may raise concerns for public wear. These perceptions will impact whether a device is eventually adopted or rejected by society. Therefore, researchers must consider the societal facets of device design. In this paper, we study social perceptions towards interacting with a color-changing on-skin display. We examined third-person perspectives through a 254-person online survey. The study was conducted in the United States and Taiwan to distill cross-cultural attitudes. This structured study sheds light on designing on-skin displays reflective of cultural considerations.


The COVID-19 pandemic dictated that wearing face masks during public interactions was the new norm across much of the globe. As the masks naturally occlude part of the wearer’s face, the part of communication that occurs through facial expressions is lost, and could reduce acceptance of mask wear. To address the issue, we created 2 face mask prototypes, incorporating simple expressive display elements and evaluated them in a user study. Aiming toexplore the potential for low-cost solutions, suitable for large-scale deployment, our concepts utilized bi-state electrochromic displays. One concept Mouthy Mask aimed to reproduce the image of the wearer’s mouth, whilst the Smiley Mask was symbolic in nature. The smart face masks were considered useful in public contexts to support short socially expected rituals. Generally a visualization directly representing the wearer’s mouth was preferred to an emoji style visualization. As a contribution, our work presents a stepping stone towards productizable solutions for smart face masks that potentially increase the acceptability of face mask wear in public.


People who are deaf and hard of hearing often have difficulty realizing when someone is attempting to get their attention, especially when mobile. Speech recognition coupled with a head-worn display (HWD) may aid in awareness of when someone calls the user’s name. As our intended users are often oversubscribed with experiments, we chose to test non-deaf and hard of hearing subjects while refining our procedures. Preliminary findings from three hearing participants wearing sound masking headphones and performing a mobile task suggest that a HWD display may be faster than, and preferred to, a smartphone for displaying captions for attending to one’s name being called.


Session 1C: ‘Sensing I (Behaviour and Emotions)’
Monday, September 14, 2020 18:00-19:30 EDT

View Session 1C Presentations

There has been an increasing interest in the problem of inferring emotional states of individuals using sensor and user-generated information as diverse as GPS traces, social media data and smartphone interaction patterns. One aspect that has received little attention is the use of visual context information extracted from the surroundings of individuals and how they relate to it. In this paper, we present an observational study of the relationships between the emotional states of individuals and objects present in their visual environment automatically extracted from smartphone images using deep learning techniques.

We developed MyMood, a smartphone application that allows users to periodically log their emotional state together with pictures from their everyday lives, while passively gathering sensor measurements. We conducted an in-the-wild study with 22 participants and collected 3,305 mood reports with photos. Our findings show context-dependent associations between objects surrounding individuals and self-reported emotional state intensities. The applications of this work are potentially many, from the design of interior and outdoor spaces to the development of intelligent applications for positive behavioral intervention, and more generally for supporting computational psychology studies.


Brain circuit functioning and connectivity between specific regions allow us to learn, remember, recognize and think as humans. In this paper, we ask the question if mobile sensing from phones can predict brain functional connectivity. We study the brain resting-state functional connectivity (RSFC) between the ventromedial prefrontal cortex (vmPFC) and the amygdala, which has been shown by neuroscientists to be associated with mental illness such as anxiety and depression. We discuss initial results and insights from the NeuroCence study, an exploratory study of 105 first year college students using neuroimaging and mobile sensing across one semester. We observe correlations between several behavioral features from students’ mobile phones and connectivity between vmPFC and amygdala, including conversation duration (r=0.365, p<0.001), sleep onset time (r=0.299, p<0.001) and the number of phone unlocks (r=0.253, p=0.029). We use a support vector classifier and 10-fold cross validation and show that we can classify whether students have higher (i.e., stronger) or lower (i.e., weaker) vmPFC-amygdala RSFC purely based on mobile sensing data with an F1 score of 0.793. To the best of our knowledge, this is the first paper to report that resting-state brain functional connectivity can be predicted using passive sensing data from mobile phones.


The study of student engagement has attracted growing interests to address problems such as low academic performance, disaffection, and high dropout rates. Existing approaches to measuring student engagement typically rely on survey-based instruments. While effective, those approaches are time-consuming and labour-intensive. Meanwhile, both the response rate and quality of the survey are usually poor. As an alternative, in this paper, we investigate whether we can infer and predict engagement at multiple dimensions, just using sensors. We hypothesize that multidimensional student engagement level can be translated into physiological responses and activity changes during the class, and also be affected by the environmental changes. Therefore, we aim to explore the following questions: Can we measure the multiple dimensions of high school student’s learning engagement including emotional, behavioural and cognitive engagement with sensing data in the wild? Can we derive the activity, physiological, and environmental factors contributing to the different dimensions of student learning engagement? If yes, which sensors are the most useful in differentiating each dimension of the learning engagement? Then, we conduct an in-situ study in a high school from 23 students and 6 teachers in 144 classes over 11 courses for 4 weeks. We present the n-Gage, a student engagement sensing system using a combination of sensors from wearables and environments to automatically detect student in-class multidimensional learning engagement. Extensive experiment results show that n-Gage can accurately predict multidimensional student engagement in real-world scenarios with an average mean absolute error (MAE) of 0.788 and root mean square error (RMSE) of 0.975 using all the sensors. We also show a set of interesting findings of how different factors (e.g., combinations of sensors, school subjects, CO2 level) affect each dimension of the student learning engagement.


This work presents HeartQuake, a low cost, accurate, non-intrusive, geophone-based sensing system for extracting accurate electrocardiogram (ECG) patterns using heartbeat vibrations that penetrate through a bed mattress. In HeartQuake, cardiac activity-originated vibration patterns are captured on a geophone and sent to a server, where the data is filtered to remove the sensor’s internal noise and passed on to a bidirectional long short term memory (Bi-LSTM) deep learning model for ECG waveform estimation. To the best of our knowledge, this is the first solution that can non-intrusively provide accurate ECG waveform characteristics instead of more basic abstract features such as the heart rate. Our extensive experimental results with baseline dataset collected from 21 study participants and a longitudinal dataset from 15 study participants suggest that HeartQuake, even when using a general non-personalized model, can detect all five ECG peaks (e.g., P, Q, R, S, T) with an average error of as low as 13 msec when participants are stationary on the bed. Furthermore, clinically used ECG metrics such as RR interval and QRS segment width can be estimated with errors as low as 3 msec and 10 msec, respectively. When additional noise factors are present (e.g., external vibration and various sleeping habits), the estimation error increases, but can be mitigated by using a personalized model. Finally, a qualitative study with 11 physicians on the clinically perceived quality of HeartQuake-generated ECG signals suggests that HeartQuake can effectively serve as a screening tool for detecting and diagnosing abnormal cardiovascular conditions. In addition, HeartQuake’s low-cost and non-intrusive nature allow it to be deployed in larger scales compared to current ECG monitoring solutions


Recent wearable devices enable continuous and unobtrusive monitoring of human’s physiological parameters, like e.g., electrodermal activity and heart rate, over long periods of time in everyday life settings. Continuous monitoring of these parameters enables the creation of systems able to predict affective states and stress with the goal of providing feedback to improve them. Deployment of such systems in everyday life settings is still complex and prone to errors due to the low quality of the collected data impacted by the presence of artifacts. In this paper we present an automatic approach to detect artifacts in electrodermal activity (EDA) signals collected in-the-wild over long periods of time. To this end we first perform a systematic literature review and compile a set of guidelines for human annotators to label artifacts manually and we use these labels as ground-truth to test our automatic approach. To evaluate our approach, we collect physiological data from 13 participants in-the-wild and two human annotators label 107.56 hours of this data set. We make the data set publicly available to other researchers upon request. Our model achieves a recall of 98% for clean and shape artifacts classification on data collected in-the-wild using leave-one-subject-out cross-validation, which is 42 percentage points higher than the baseline. We show that state of the art approaches do not generalize well when tested with completely in-the-wild data and identify only 17% of the artifacts present in our data set, even after manual adaption. We further test the robustness of our approach over time using leave-one-day-out and achieve very similar performance. We then introduce a new metric to evaluate the quality of EDA segments that considers the impact of not only artifacts in the shape of EDA but also artifacts generated by environmental temperature changes or user’s high intensity movement. Our results imply that we can eliminate the need for human annotators or significantly reduce the time they need to label data. Also, our approach can be used in an online manner to automatically detect artifacts in EDA signals.


Recent advances in Automated Dietary Monitoring (ADM) with wearables have shown promising results in eating detection in naturalistic environments. However, determining what an individual is consuming remains a significant challenge. In this paper, we present results of a food type classification study based on a sub-centimeter scale wireless intraoral sensor that continuously measures temperature and jawbone movement. We explored the feasibility of classifying nine different types of foods into five classes based on their water-content and typical serving temperature in a controlled environment (n=4). We demonstrated that the system can classify foods into five classes with a weighted accuracy of 77.5% using temperature-derived features only and with a weighted accuracy of 85.0% using both temperature- and acceleration-derived features. Despite the limitations of our study, these results are encouraging and suggest that intraoral computing might be a viable direction for ADM in the future.


Theatre provides a unique environment in which to obtain detailed data on social interactions in a controlled and repeatable manner. This work introduces a method for capturing and characterising the underlying emotional intent of performers in a scripted scene using in-ear accelerometers. Each scene is acted with different underlying emotional intentions using the theatrical technique of Actioning. The goal of the work is to uncover characteristics in the joint movement patterns that reveal information on the positive or negative valence of these intentions. Preliminary findings over 3×12 (Covid-19 restricted) non-actor trials suggests people are more energetic and more in-sync when using positive versus negative intentions.


This paper investigates the possibility of using soft smart textiles over the hair regions to detect chewing activities under episodes of snacking in a simulated scenario with everyday activities. The planar pressure textile sensors are used to perform mechanomyography of the temporalis muscles in the form of a cap. 10 participants contributed 30 recording sessions with time periods between 30 and 60 minutes. A frequency analysis method is developed to detect moments of snacking events with continuous sliding windows on 1-second time granularity. Our approach results in a baseline 80% accuracy, over 85% after outlier removal, and above 90% accuracy for some of the participants.


We present the GastroDigitalShirt, a smart T-Shirt for capturing abdominal sounds produced during digestion. The garment prototype embeds an array of eight miniaturised microphones connected to a low-power wearable computer and is designed for long-term recording. We present the microphone integration and shirt wiring layout. With the GastroDigitalShirt we monitored the different digestion phases over six hours in four healthy participants with no prior gastro-intestinal diseases. The collected data were annotated by two independent raters to mark Bowel Sounds (BS) instances. The interrater agreement was substantial, with Cohen’s Kappa of 0.7, confirming a consistent labeling approach. Overall 3046 BS instances were individually annotated. The extracted BS were structured by Hierarchical Agglomerative Clustering. The analysis highlighted the presence of 4 BS types. The results show that our prototype can capture the main BS types reported in literature.


Session 2A: ‘Low-Power and Energy Harvesting’
Tuesday, September 15, 2020 10:00-11:30 EDT

View Session 2A Presentations

For kinetic-powered body area networks, we explore the feasibility of converting energy harvesting patterns for device authentication and symmetric secret keys generation continuously. The intuition is that at any given time, multiple wearable devices harvest kinetic energy from the same user activity, such as walking, which allows them to independently observe a common secret energy harvesting pattern not accessible to outside devices. Such continuous KEH-based authentication and key generation is expected to be highly power efficient as it obviates the need to employ any extra sensors, such as accelerometer, to precisely track the walking patterns. Unfortunately, lack of precise activity tracking introduces bit mismatches between the independently generated keys, which makes KEH-based authentication and symmetric key generation a challenging problem. We propose KEHKey, a KEH-based authentication and key generation system that employs a compressive sensing-based information reconciliation protocol for wearable devices to effectively correct any mismatches in generated keys. We implement KEHKey using off-the-shelf piezoelectric energy harvesting products and evaluate its performance with data collected from 24 subjects wearing the devices on different body locations including head, torso and hands. Our results show that KEHKey is able to generate the same key for two KEH-embedded devices at a speed of 12.57 bps while reducing energy consumption by 59% compared to accelerometer-based methods, which makes it suitable for continuous operation. Finally, we demonstrate that KEHKey can successfully defend against typical adversarial attacks. In particular, KEHKey is found to be more resilient to video side channel attacks than its accelerometer-based counterparts.


This paper introduces intermittent learning — the goal of which is to enable energy harvested computing platforms capable of executing certain classes of machine learning tasks effectively and efficiently. We identify unique challenges to intermittent learning relating to the data and application semantics of machine learning tasks, and to address these challenges, we devise 1) an algorithm that determines a sequence of actions to achieve the desired learning objective under tight energy constraints, and 2) propose three heuristics that help an intermittent learner decide whether to learn or discard training examples at run-time which increases the energy efficiency of the system. We implement and evaluate three intermittent learning applications that learn the 1) air quality, 2) human presence, and 3) vibration using solar, RF, and kinetic energy harvesters, respectively. We demonstrate that the proposed framework improves the energy efficiency of a learner by up to 100% and cuts down the number of learning examples by up to 50% when compared to state-of-the-art intermittent computing systems that do not implement the proposed intermittent learning framework.


Ubiquitous computing requires robust and sustainable sensing techniques to detect users for explicit and implicit inputs. Existing solutions with cameras can be privacy-invasive. Battery-powered sensors require user maintenance, preventing practical ubiquitous sensor deployment. We present OptoSense, a general-purpose self-powered sensing system which senses ambient light at the surface level of everyday objects as a high-fidelity signal to infer user activities and interactions. To situate the novelty of OptoSense among prior work and highlight the generalizability of the approach, we propose a design framework of ambient light sensing surfaces, enabling implicit activity sensing and explicit interactions in a wide range of use cases with varying sensing dimensions (0D, 1D, 2D), fields of view (wide, narrow), and perspectives (egocentric, allocentric). OptoSense supports this framework through example applications ranging from object use and indoor traffic detection, to liquid sensing and multitouch input. Additionally, the system can achieve high detection accuracy while being self-powered by ambient light. On-going improvements that replace Optosense’s silicon-based sensors with organic semiconductors (OSCs) enable devices that are ultra-thin, flexible, and cost effective to scale.


We present UbiquiTouch, an ultra low power wireless touch interface. With an average power consumption of 30.91μW, UbiquiTouch can run on energy harvested from ambient light. It achieves this performance through low power touch sensing and passive communication to a nearby smartphone using ambient FM backscatter. This approach allows UbiquiTouch to be deployed in mobile situations both in indoor and outdoor locations, without the need for any additional infrastructure for operation. To demonstrate the potential of this technology, we evaluate it in several different and realistic scenarios. Finally, we address the future application space for this technology.


Electric vehicles (EVs) have experienced a sensational growth in the past few years, due to the potential of mitigating global warming and energy scarcity problems. However, the high manufacturing cost of battery packs and limited battery lifetime hinder EVs from further development. Especially, electric bus, as one of the most important means of public transportation, suffers from long daily operation time and peak-hour passenger overload, which aggravate its battery degradation. To address this issue, we propose a novel data-driven battery-lifetimeaware bus scheduling system. Leveraging practical bus GPS and transaction datasets, we conduct a detailed analysis of passenger behaviors and design a reliable prediction model for passenger arrival rate at each station. By taking passenger waiting queue at each bus station analogous to data buffer in network systems, we apply Lyapunov optimization and obtain a bus scheduling strategy with reliable performance guarantee on both battery degradation rate and passengers’ service quality. To verify the effectiveness of the system, we evaluate our design on a 12-month electric bus operation datasets from the city of Shenzhen. The experimental results show that, compared with two baseline methods, our system reduces the battery degradation rate by 14.3% and 21.7% under the same passenger arrival rate, while preserving good passenger service quality.


We present ENGAGE, the first battery-free, personal mobile gaming device powered by energy harvested from the gamer actions and sunlight. Our design implements a power failure resilient Nintendo Game Boy emulator that can run off-the-shelf classic Game Boy games like Tetris or Super Mario Land. This emulator is capable of intermittent operation by tracking memory usage, avoiding the need for always checkpointing all volatile memory, and decouples the game loop from user interface mechanics allowing for restoration after power failure. We build custom hardware that harvests energy from gamer button presses and sunlight, and leverages a mixed volatility memory architecture for efficient intermittent emulation of game binaries. Beyond a fun toy, our design represents the first battery-free system design for continuous user attention despite frequent power failures caused by intermittent energy harvesting. We tackle key challenges in intermittent computing for interaction including seamless displays and dynamic incentive-based gameplay for energy harvesting. This work provides a reference implementation and framework for a future of battery-free mobile gaming in a more sustainable Internet of Things.


We propose Zygarde — which is an energy- and accuracy-aware soft real-time task scheduling framework for batteryless systems that flexibly execute deep learning tasks1 that are suitable for running on microcontrollers. The sporadic nature of harvested energy, resource constraints of the embedded platform, and the computational demand of deep neural networks (DNNs) pose a unique and challenging real-time scheduling problem for which no solutions have been proposed in the literature. We empirically study the problem and model the energy harvesting pattern as well as the trade-off between the accuracy and execution of a DNN. We develop an imprecise computing-based scheduling algorithm that improves the timeliness of DNN tasks on intermittently powered systems. We evaluate Zygarde using four standard datasets as well as by deploying it in six real-life applications involving audio and camera sensor systems. Results show that Zygarde decreases the execution time by up to 26% and schedules 9% – 34% more tasks with up to 21% higher inference accuracy, compared to traditional schedulers such as the earliest deadline first (EDF).


Glasses are a suitable platform for embedding sensors and displays around our heads to support our daily lives. Furthermore, aesthetic features, durability, and portability are essential properties of glasses. However, designing such smart glasses is challenging, because connecting different glass frames both mechanically and electrically, result in smart glasses with bulky hinges.
To overcome this challenge, we propose a new design to embed inductively coupled coil pairs adjacent to glasses hinges to deliver power and data wirelessly to the frames.
Positioning the coils next to the hinges creates sufficient area for a large transmission and reception coil while maintaining the utility of the glasses.
Consequently, we were able to achieve over 85% power efficiency and a communication rate of 50~Mbps between coils that are small enough to be embedded inside the frame of conventional glasses, available on the market.


Session 2B: ‘Health and Wellbeing I’
Tuesday, September 15, 2020 10:00-11:30 EDT

View Session 2B Presentations

Personal informatics systems for the work environment can help improving workers’ well-being and productivity. Using both self-reported data logged manually by the users and information automatically inferred from sensor measurements, such systems may track users’ activities at work and help them reflect on their work habits through insightful data visualizations. They can further support interventions like, e.g., blocking distractions during work activities or suggest the user to take a break. The ability to automatically recognize when the user is engaged in a work activity or taking a break is thus a fundamental primitive such systems need to implement. In this paper, we explore the use of data collected from personal devices – smartwatches, laptops, and smartphones – to automatically recognize when users are working or taking breaks. We
collect a data set of of continuous streams of sensor data captured from personal devices along with labels indicating whether a user is working or taking a break. We use multiple instruments to facilitate the collection of users’ self-reported labels and discuss our experience with this approach. We analyse the available data – 449 labelled activities of nine knowledge workers collected during a typical work week – using machine learning techniques and show that user-independent models can achieve a (F1 score) of 94% for the identification of work activities and of 69% for breaks, outperforming baseline methods by 5-10 and 12-54 percentage points, respectively.


Post-traumatic stress disorder (PTSD) negatively influences a person’s ability to cope and increases psychiatric morbidity. The existing diagnostic tools of PTSD are often difficult to administer within marginalized communities due to language and cultural barriers, lack of skilled clinicians, and stigma around disclosing traumatic experiences. Here, we present an initial proof of concept for a novel, low-cost, and creative method to screen the potential cases of PTSD based on free-hand sketches within three different communities in Bangladesh: Rohingya refugees (n=44), slum-dwellers (n=35), and engineering students (n=85). Due to the low overhead and nonverbal nature of sketching, our proposed method potentially overcomes communication and resource barriers. Using corner and edge detection algorithms, we extracted three features (number of corners, number and average length of strokes) from the images of free-hand sketches. We used these features along with sketch themes, participants’ gender and group to train multiple logistic regression models for potentially screening PTSD (accuracy: 82.9-87.9%). We improved the accuracy (99.29%) by integrating EEG data with sketch features in a Random Forest model for the refugee population. Our proposed initial assessment method of PTSD based on sketches could potentially be integrated with phones and EEG headsets, making it widely accessible to the underrepresented communities.


Exploiting the capabilities of smartphones for monitoring social anxiety shows promise for advancing our ability to both identify indicators of and treat social anxiety in natural settings. Smart devices allow researchers to collect passive data unobtrusively through built-in sensors and active data using subjective, self-report measures with Ecological Momentary Assessment (EMA) studies. Prior work has established the potential to predict subjective measures from passive data. However, the majority of the past work on social anxiety has focused on a limited subset of self-reported measures. Furthermore, the data collected in real-world studies often results in numerous missing values in one or more data streams, which ultimately reduces the usable data for analysis and limits the potential of machine learning algorithms. We explore several approaches for addressing these problems in a smartphone based monitoring and intervention study of eighty socially anxious participants over a five week period. Our work complements and extends prior work in two directions: (i) we show the predictability of seven different self-reported dimensions of social anxiety, and (ii) we explore four imputation methods to handle missing data and evaluate their effectiveness in the prediction of subjective measures from the passive data. Our evaluation shows imputation of missing data reduces prediction error by as much as 22%. We discuss the implications of these results for future research.


Context plays a key role in impulsive adverse behaviors such as fights, suicide attempts, binge-drinking, and smoking lapse. Several contexts dissuade such behaviors, but some may trigger adverse impulsive behaviors. We define these latter contexts as ‘opportunity’ contexts, as their passive detection from sensors can be used to deliver context-sensitive interventions.

In this paper, we define the general concept of ‘opportunity’ contexts and apply it to the case of smoking cessation. We operationalize the smoking ‘opportunity’ context, using self-reported smoking allowance and cigarette availability. We show its clinical utility by establishing its association with smoking occurrences using Granger causality. Next, we mine several informative features from GPS traces, including the novel location context of smoking spots, to develop the SmokingOpp model for automatically detecting the smoking ‘opportunity’ context. Finally, we train and evaluate the SmokingOpp model using 15 million GPS points and 3,432 self-reports from 90 newly abstinent smokers in a smoking cessation study.


Traditional methods for screening and diagnosis of alcohol dependence are typically administered by trained clinicians in medical settings and often rely on interview responses. These self-reports can be unintentionally or deliberately false, and misleading answers can, in turn, lead to inaccurate assessment and diagnosis. In this study, we examine the use of user-game interaction patterns on mobile games to develop an automated diagnostic and screening tool for alcohol-dependent patients. Our approach relies on the capture of interaction patterns during gameplay, while potential patients engage with popular mobile games on smartphones. The captured signals include gameplay performance, touch gestures, and device motion, with the intention of identifying patients with alcohol dependence. We evaluate the classification performance of various supervised learning algorithms on data collected from 40 patients and 40 age-matched healthy adults. The results show that patients with alcohol dependence can be automatically identified accurately using the ensemble of touch, device motion, and gameplay performance features on 3-minute samples (accuracy=0.95, sensitivity=0.95, and specificity=0.95). The present findings provide strong evidence suggesting the potential use of user-game interaction metrics on existing mobile games as discriminant features for developing an implicit measure to identify alcohol dependence conditions. In addition to supporting healthcare professionals in clinical decision-making, the game-based self-screening method could be used as a novel strategy to promote alcohol dependence screening, especially outside of clinical settings.


Monitoring sleep posture is important for avoiding bedsores after surgery, reducing apnea events, tracking the progression of Parkinson’s disease, and even alerting epilepsy patients to potentially fatal sleep postures. Today, there is no easy way to track sleep postures. Past work has proposed installing cameras in the bedroom, mounting accelerometers on the subject’s chest, or embedding pressure sensors in their bedsheets. Unfortunately, such solutions jeopardize either the privacy of the user or their sleep comfort.

In this paper, we introduce BodyCompass, the first RF-based system that provides accurate sleep posture monitoring overnight in the user’s own home. BodyCompass works by studying the RF reflections in the environment. It disentangles RF signals that bounced off the subject’s body from other multipath signals. It then analyzes those signals via a custom machine learning algorithm to infer the subject’s sleep posture. BodyCompass is easily transferable and can apply to new homes and users with minimal effort. We empirically evaluate BodyCompass using over 200 nights of sleep data from 26 subjects in their own homes. Our results show that, given one week, one night, or 16 minutes of labeled data from the subject, BodyCompass’s corresponding accuracy is 94%, 87%, and 84%, respectively.


Continuous wearable sensor data in high resolution contain physiological and behavioral information that can be utilized to predict human health and wellbeing, establishing the foundation for developing early warning systems to eventually improve human health and wellbeing. We propose a deep neural network framework, the Locally Connected Long Short-Term Memory Denoising AutoEncoder (LC-LSTM-DAE), to automatically extract features from passively collected raw sensor data and perform personalized prediction of self-reported mood, health, and stress scores with high precision. We enabled personalized learning of features by finetuning the general representation model with participant-specific data. The framework was evaluated using wearable sensor data and wellbeing labels collected from college students (total 6391 days from N=239). Sensor data include skin temperature, skin conductance, and acceleration; wellbeing labels include self-reported mood, health and stress scored 0 — 100. Compared to the prediction performance based on hand-crafted features, the proposed framework achieved higher precision with a smaller number of features. We also provide statistical interpretation and visual explanation to the automatically learned features and the prediction models. Our results show the possibility of predicting self-reported mood, health, and stress accurately using an interpretable deep learning framework, ultimately for developing real-time health and wellbeing monitoring and intervention systems that can benefit various populations.


Traditionally, sleep monitoring has been performed in hospital or clinic environments, requiring complex and expensive equipment set-up and expert scoring. Wearable devices increasingly provide a viable alternative for sleep monitoring, able to collect movement and heart rate (HR) data. In this work, we present a set of algorithms for sleep-wake and sleep-stage classification based upon actigraphy and cardiac sensing amongst 1,743 participants. We devise movement and cardiac features that could be extracted from research-grade wearable sensors and derive models and evaluate their performance in the largest open-access dataset for human sleep science. Our results demonstrated that neural network models outperform traditional machine learning methods and heuristic models for both sleep-wake and sleep-stage classification. Convolutional neural networks (CNNs) and long-short term memory (LSTM) networks were the best performers for sleep-wake and sleep-stage classification, respectively. Using SHAP (SHapley Additive exPlanation) with Random Forest we identified that frequency features from cardiac sensors are critical to sleep-stage classification. Finally, we introduced an ensemble-based approach to sleep-stage classification, which outperformed all other baselines, achieving an accuracy of 78.2% and F1 score of 69.8% on the classification task for three sleep stages. Together, this work represents the first systematic multimodal evaluation of sleep-wake and sleep-stage classification in a large, diverse population. Alongside the presentation of an accurate sleep-stage classification approach, the results highlight multimodal wearable sensing approaches as scalable methods for accurate sleep-classification, providing guidance on optimal algorithm deployment for automated sleep assessment. The code used in this study can be found online at: https://github.com/bzhai/multimodal_sleep_stage_benchmark.git


Fatigue is one of the key factors in the loss of work efficiency and health-related quality of life, and most fatigue assessment methods were based on self-reporting, which may suffer from many factors such as recall bias. To address this issue, we developed an automated system using wearable sensing and machine learning techniques for objective fatigue assessment. ECG/Actigraphy data were collected from subjects in free-living environments. Preprocessing and feature engineering methods were applied, before interpretable solution and deep learning solution were introduced. Specifically, for interpretable solution, we proposed a feature selection approach which can select less correlated and high informative features for better understanding system’s decision-making process. For deep learning solution, we used state-of-the-art self-attention model, based on which we further proposed a consistency self-attention (CSA) mechanism for fatigue assessment. Extensive experiments were conducted, and very promising results were achieved.


We present a wearable, oscillating magnetic field-based proximity sensing system to monitor social distancing as suggested to prevent COVID 19 spread (being between 1.5 and 2.0m) apart. We evaluate the system both in controlled lab experiments and in a real life large hardware store setting. We demonstrate that, due physical properties of the magnetic field, the system is much more robust than current BT based sensing, in particular being nearly 100% correct when it comes to distinguishing between distances above and below the 2.0m threshold.


Session 2C: ‘Sensing II (Context and Environment)’
Tuesday, September 15, 2020 10:00-11:30 EDT

View Session 2C Presentations

Besides passive sensing, ecological momentary assessments (EMAs) are one of the primary methods to collect in-the-moment data in ubiquitous computing and mobile health. While EMAs have the advantage of low recall bias, a disadvantage is that they frequently interrupt the user and thus long-term adherence is generally poor. In this paper, we propose a less-disruptive self-reporting method, “assisted recall,” in which in the evening individuals are asked to answer questions concerning a moment from earlier in the day assisted by contextual information such as location, physical activity, and ambient sounds collected around the moment to be recalled. blue Such contextual information is automatically collected from phone sensor data, so that self-reporting does not require devices other than a smartphone. We hypothesized that providing assistance based on such automatically collected contextual information would increase recall accuracy (i.e., if recall responses for a moment match the EMA responses at the same moment) as compared to no assistance, and we hypothesized that the overall completion rate of evening recalls (assisted or not) would be higher than for in-the-moment EMAs. We conducted a two-week study (N=54) where participants completed recalls and EMAs each day. We found that providing assistance via contextual information increased recall accuracy by 5.6% (p=0.032) and the overall recall completion rate was on average 27.8% (p<0.001) higher than that of EMAs.


In this study, we investigate the effects of social context, personal and mobile phone usage on the inference of work engagement/challenge levels of knowledge workers and their responsiveness to well-being related notifications. Our results show that mobile application usage is associated to the responsiveness and work engagement/challenge levels of knowledge workers. We also developed multi-level (within- and between-subjects) models for the inference of attentional states and engagement/challenge levels with mobile application usage indicators as inputs, such as the number of applications used prior to notifications, the number of switches between applications, and application category usage. The results of our analysis show that the following features are effective for the inference of attentional states and engagement/challenge levels: the number of switches between mobile applications in the last 45 minutes and the duration of application usage in the last 5 minutes before users’ response to ESM messages.


Several psychologists posit that performance is not only a function of personality but also of situational contexts, such as day-level activities. Yet in practice, since only personality assessments are used to infer job performance, they provide a limited perspective by ignoring activity. However, multi-modal sensing has the potential to characterize these daily activities. This paper illustrates how empirically measured activity data complements traditional effects of personality to explaina worker’s performance. We leverage sensors in commodity devices to quantify the activity context of 603 information workers. By applying classical clustering methods on this multi-sensor data, we take a person-centered approach to describe workers in terms of both personality and activity. We encapsulate both these facets into an analytical framework that we call organizational personas. On interpreting these organizational personas we find empirical evidence to support that, independent of a worker’s personality, their activity is associated with job performance. While the effects of personality are consistent with the literature, we find that the activity is equally effective in explaining organizational citizenship behavior and is less but significantly effective for task proficiency and deviant behaviors. Specifically, personas that exhibit a daily-activity pattern with fewer location visits, batched phone-use, shorter desk-sessions and longer sleep duration, tend to perform better on all three performance metrics. Organizational personas are a descriptive framework to identify the testable hypotheses that can disentangle the role of malleable aspects like activity in determining the performance of a worker population.


Learning a new language is difficult and time-consuming. Apart from dedicated classroom study, second language (L2) learners often lack opportunities to switch their attention to vocabulary learning over other daily routines. In this work, we propose a method that enables L2 learners to study new vocabulary items during their dead time, such as when commuting to school or work. We developed a smartphone application, VocaBura, which combines audio learning with location-relevant L1-L2 word pairs to allow users to discover new vocabulary items while walking past buildings, shops and other locations. Our evaluation results indicated that Japanese beginner level English learners were able to retain more vocabulary items with the proposed method compared to traditional audio-based study despite being less aware of L2 vocabulary acquisition having occurred. In our second study, we report on the level of English vocabulary coverage for L2 learning achievable with our proposed method. We discuss several design implications for educational technologies supporting second language learning.


With the rapid development of Internet services and mobile devices, nowadays, users can connect to online services anytime and anywhere. Naturally, user’s online activity behavior is coupled with time and location contexts and highly influenced by them. Therefore, personalized context-aware online activity modelling and prediction is very meaningful and necessary but also very challenging, due to the complicated relationship between users, activities, spatial and temporal contexts and data sparsity issues. To tackle the challenges, we introduce offline check-in data as auxiliary data and build a user-location-time-activity 4D-tensor and a location-time-POI 3D-tensor, aiming to model the relationship between different entities and transfer semantic features of time and location contexts among them. Accordingly, in this paper we propose a transfer learning based collaborative tensor factorization method to achieve personalized context-aware online activity prediction. Based on real-world datasets, we compare the performance of our method with several state-of-the-arts and demonstrate that our method can provide more effective prediction results in the high sparsity scenario. With only 30% of observed time and location contexts, our solution can achieve 40% improvement in predicting user’s Top5 activity behavior in new time and location scenarios. Our study is the first step forward for transferring knowledge learned from offline check-in behavior to online activity prediction to provide better personalized context-aware recommendation services for mobile users.


Forecasting the fire risk is of great importance to fire prevention deployments in a city, which can reduce loss even deaths caused by fires. However, it is very challenging because fires are influenced by many complex factors, including spatial correlations, temporal dependencies, even the mixture of these two and external factors. Firstly, the fire risk of a region is influenced by temporal effect of internal factors (e.g., the historical fire risk records) and temporal effect of external factors (e.g., weather). Secondly, a region’s fire risk is not only influenced by its inherent geospatial attributes (e.g., POIs) but also dependent on other regions in spatial.
To address these challenges, we propose a machine learning approach to forecast the fire risk, entitled NeuroFire. NeuroFire can represent internal and external temporal effect then combine the temporal representation and spatial dependencies by a spatial-temporal loss function.
Experimental evaluations on real-world datasets show that our NeuroFire outperforms 9 baselines, demonstrating the performance of our approach by several visualizations. Moreover, we implement a citywide fire forecasting system named CityGuard to display the analysis and forecasting results, which can assist the fire rescue department in deploying fire prevention.


Air pollution is a global health threat. Except static official air quality stations, mobile sensing systems are deployed for urban air pollution monitoring to achieve larger sensing coverage and greater sampling granularity. However, the data sparsity and irregularity also bring great challenges for pollution map recovery. To address these problems, we propose a deep autoencoder framework based inference algorithm. Under the framework, a partially observed pollution map formed by the irregular samples are input into the model, then an encoder and a decoder work together to recover the entire pollution map. Inside the decoder, we adopt a convolutional long short-term memory (ConvLSTM) model by revealing its physical interpretation with an atmospheric dispersion model, and further present a weather-related ConvLSTM to enable quasi real-time applications.
To evaluate our algorithm, a half-year data collection was deployed with a real-world system on a coastal area including the Sino-Singapore Tianjin Eco-city in north China. With the resolution of 500m×500m×1h, our offline method is proved to have high robustness against low sampling coverage and accidental sensor errors, obtaining 14.9% performance improvement over existing methods. Our quasi real-time model better captures the spatiotemporal dependencies in the pollution map with unevenly distributed samples than other real-time approaches, obtaining 4.2% error reduction.


Encounters with casual acquaintances are common in our daily lives. In such situations, people are sometimes unable to find an appropriate topic for conversation, and as such, an awkward silence follows. However, we believe that this awkward encounter can be an opportunity to build a good relationship with the acquaintance through a brief conversation if an appropriate topic is discovered. In this study, we examined a method to enrich casual conversations for an unintended encounter by following three strategies. (1) an online questionnaire survey that involves 10,750 participants to determine how they experience awkward encounters. (2) the design and implementation of a smartwatch-based topic suggestion that relies on finding a commonality in the users’ video-viewing histories. (3) demos and semi-structured interviews that involves 15 participants to evaluate this approach. This investigation demonstrates that this novel approach can help users overcome the awkwardness of conversations with casual acquaintances.


Sound can provide important information about the environment, human activity, and situational cues but can be inaccessible to deaf or hard of hearing (DHH) people. In this paper, we explore a wearable tactile technology to provide sound feedback to DHH people. After implementing a wrist-worn tactile prototype, we performed a four-week field study with 12 DHH people. Participants reported that our device increased awareness of sounds by conveying actionable cues (e.g., appliance alerts) and ‘experiential’ sound information (e.g., bird chirp patterns).


Session 3A: ‘Location and Human Mobility’
Tuesday, September 15, 2020 17:00-18:30 EDT

View Session 3A Presentations

With the popularity of mobile devices and location-based social network, understanding and modelling the human mobility becomes an important topic in the field of ubiquitous computing. With the model developing from personal models with own information to the joint models with population information, the prediction performance of proposed models become better and better. Meanwhile, the privacy issues of these models come into the view of community and the public: collecting and uploading private data to the centralized server without enough regulation. In this paper, we propose PMF, a privacy-preserving mobility prediction framework via federated learning, to solve this problem without significantly sacrificing the prediction performance. In our framework, based on the deep learning mobility model, no private data is uploaded into the centralized server and the only uploaded thing is the updated model parameters which are difficult to crack and thus more secure. Furthermore, we design a group optimization method for the training on local devices to achieve better trade-off between performance and privacy. Finally, we propose a fine-tuned personal adaptor for personal modelling to further improve the prediction performance. We conduct extensive experiments on three real-life mobility datasets to demonstrate the superiority and effectiveness of our methods in privacy protection settings.


Knowing accurate indoor locations of pedestrians has great social and commercial values, such as pedestrian heatmapping and targeted advertising. Location estimation with sequential inputs (e.g., geomagnetic sequences) has received much attention lately, mainly because they enhance the localization accuracy with temporal correlations. Nevertheless, it is challenging to realize accurate localization with geomagnetic sequences due to environmental factors, such as non-uniform ferromagnetic disturbances. To address this, we propose MAIL, a multi-scale attention-guided indoor localization network, which turns these challenges into favorable advantages. Our key contributions are as follows. First, instead of extracting a single holistic feature from an input sequence directly, we design a scale-based feature extraction unit that takes variational anomalies at different scales into consideration. Second, we propose an attention generation scheme that identifies attention values for different scales. Rather than setting fixed numbers, MAIL learns them adaptively with the input sequence, thus increasing its adaptability and generality. Third, guided by attention values, we fuse multi-scale features by paying more attention to prominent ones and estimate current location with the fused feature. We evaluate the performance of MAIL in three different trial sites. Evaluation results show that MAIL reduces the mean localization error by more than 36% compared with the state-of-the-art competing schemes.


We developed Tourgether, an app that enables tourists’ mutual sharing of their experiences via check-ins in real time, to enhance their awareness and exploration of various points of interest (POIs) in a tourism region. We conducted formative studies and a between-subjects field experiment to assess how tourists used Tourgether in their travels, and the influence of real-time experience-sharing on unplanned POI visits, respectively. The results of the formative studies indicated that seeing shared real-time experiences encouraged tourists to explore and make unplanned visits to less well-known POIs, and that their decisions to make unplanned POI visits were dependent on familiarity, worthiness, and convenience. The app also created a feeling of co-presence among tourists, boosting their desire to interact with others. Two strong motivators for tourists to check in on the app were identified: contributing to other tourists, and recording journeys. Our experimental results further showed that seeing shared real-time experiences prompted the participants to make more unplanned visits than would have been the case if they had not seen them. This influence was more prominent among tourists who planned more POI visits. Other differences in the usage and influence of Tourgether across these two groups will also be discussed.


Human mobility modeling has many applications in location-based services, mobile networking, city management, and epidemiology. Previous sensing approaches for human mobility are mainly categorized into two types: stationary sensing systems (e.g., surveillance cameras and toll booths) and mobile sensing systems (e.g., smartphone apps and vehicle tracking devices). However, stationary sensing systems only provide mobility information of human in limited coverage (e.g., camera-equipped roads) and mobile sensing systems only capture a limited number of people (e.g., people using a particular smartphone app). In this work, we design a novel system Mohen to model human mobility with a heterogeneous sensing system. The key novelty of Mohen is to fundamentally extend the sensing coverage of a large-scale stationary sensing system with a small-scale sensing system. Based on the evaluation on data from real-world urban sensing systems, our system outperforms them by 35% and achieves a competitive result to an Oracle method.


With more and more frequent population movement between different cities, like users’ travel or business trip, recommending personalized cross-city Point-of-Interests (POIs) for these users has become an important scenario of POI recommendation tasks. However, traditional models degrade significantly due to sparsity problem because travelers only have limited visiting behaviors. Through a detailed analysis of real-world check-data, we observe 1) the phenomenon of travelers’ interest drift and transfer co-exist between hometown and current city; 2) differences between popular POIs among locals and travelers. Motivated by this, we propose a POI Recommendation framework with User Interest Drift and Transfer (PR-UIDT), which jointly considers above two factors when designing user and POI latent vector. In this framework, user vector is divided into a city-independent part and another city-dependent part, and POI is represented as two independent vectors for locals and travelers, respectively. To evaluate the proposed framework, we implement it with a square error based matrix factorization model and a ranking error based matrix factorization model, respectively, and conduct extensive experiments on three real-world datasets. The experiment results demonstrate the superiority of PR-UIDT framework, with a relative improvement of 0.4% ∼ 20.5% over several state-of-the-art baselines, as well as the practicality of applying this framework to real-world applications and multi-city scenarios. Further qualitative analysis confirms both the plausibility and validity of combining user interest transfer and drift into cross-city POI recommendation.


Cellular data usage consumption prediction is an important topic in cellular networks related researches. Accurately predicting future data usage can benefit both the cellular operators and the users, which can further enable a wide range of applications. Different from previous work focusing on statistical approaches, in this paper, we propose a scheme called CellPred to predict cellular data usage from an individual user perspective considering user behavior patterns. Specifically, we utilize explicit user behavioral tags collected from subscription data to function as an external aid to enhance the user’s mobility and usage prediction. Then we aggregate individual user data usage to cell tower level to obtain the final prediction results.
To our knowledge, this is the first work studying cellular data usage prediction from an individual user behavior-aware perspective based on large-scale cellular signaling and behavior tags from the subscription data. The results show that our method improves the data usage prediction accuracy compared to the state-of-the-art methods; we also comprehensively demonstrate the impact of contextual factors on CellPred performance.
Our work can shed light on broad cellular networks researches related to human mobility and data usage. Finally, we discuss issues such as limitations, applications of our approach, and insights from our work.


Human mobility prediction is essential to a variety of human-centered computing applications achieved through upgrading of location-based services (LBS) to future-location-based services (FLBS). Previous studies on human mobility prediction have mainly focused on centralized human mobility prediction, where user mobility data are collected, trained and predicted at the cloud server side. However, such a centralized approach leads to a high risk of privacy issues, and a real-time centralized system for processing such a large volume of distributed data is extremely difficult to apply. Moreover, a large and dynamic set of users makes the predictive model extremely challenging to personalize. In this paper, we propose a novel decentralized attention-based human mobility predictor in which 1) no additional training procedure is required for personalized prediction, 2) no additional training procedure is required for incremental learning, and 3) the predictor can be trained and predicted in a decentralized way. We tested our method on big data of real-world mobile phone user GPS and on Android devices, and achieved a low-power consumption and a good prediction accuracy without collecting user data in the server or applying additional training on the user side.


Due to the recent proliferation of location-based services indoors, the need for an accurate floor estimation technique that is easy to deploy in any typical multi-story building is higher than ever. Current approaches that attempt to solve the floor localization problem include sensor-based systems and 3D fingerprinting. Nevertheless, these systems incur high deployment and maintenance overhead, suffer from sensor drift and calibration issues, and/or are not available to all users.

In this paper, we propose StoryTeller, a deep learning-based technique for floor prediction in multi-story buildings. StoryTeller leverages the ubiquitous WiFi signals to generate images that are input to a Convolutional Neural Network (CNN) which is trained to predict floors based on detected patterns in visible WiFi scans. Input images are created such that they capture the current WiFi-scan in an AP-independent manner. In addition, a novel virtual building concept is used to normalize the information in order to make them building-independent. This allows StoryTeller to reuse a trained network for a completely new building, significantly reducing the deployment overhead.

We have implemented and evaluated StoryTeller using three different buildings with a side-by-side comparison with the state-of-the-art floor estimation techniques. The results show that StoryTeller can estimate the user’s floor at least 98.3% within one floor of the actual ground truth floor. This accuracy is consistent across the different testbeds and for scenarios where the models used were trained in a completely different building than the tested building. This highlights StoryTeller’s ability to generalize to new buildings and its promise as a scalable, low-overhead, high-accuracy floor localization system.


The objective of public resource allocation, e.g., the deployment of billboards, surveillance cameras, base stations, trash bins, is to serve more people. However, due to the dynamics of human mobility patterns, people are distributed unevenly on the spatial and temporal domains. As a result, in many cases, redundant resources have to be deployed to meet the crowd coverage requirements, which leads to high deployment costs and low usage. Fortunately, with the development of unmanned vehicles, the dynamic allocation of those public resources becomes possible.
To this end, we provide the first attempt to design an effective and efficient scheduling algorithm for the dynamic public resource allocation. We formulate the problem as a novel multi-agent long-term maximal coverage scheduling (MALMCS) problem, which considers the crowd coverage and the energy limitation during a whole day. Two main components are employed in the system: 1) multi-step crowd flow prediction, which makes multi-step crowd flow prediction given the current crowd flows and external factors; and 2) energy adaptive scheduling, which employs a two-step heuristic algorithm, i.e., energy adaptive scheduling (EADS), to generate a scheduling plan that maximizes the crowd coverage within the service time for agents. Extensive experiments based on real crowd flow data in Happy Valley (a popular theme park in Beijing) demonstrate the effectiveness and efficiency of our approach.


Session 3B: ‘Touch, Gestures, and Posture’
Tuesday, September 15, 2020 17:00-18:30 EDT

View Session 3B Presentations

A multi-touch interactive tabletop is designed to embody the benefits of a digital computer within the familiar surface of a physical tabletop. However, the nature of current multi-touch tabletops to detect and react to all forms of touch, including unintentional touches, impedes users from acting naturally on them. In our research, we leverage gaze direction, head orientation and screen contact data to identify and filter out unintentional touches, so that users can take full advantage of the physical properties of an interactive tabletop,e.g., resting hands or leaning on the tabletop during the interaction. To achieve this, we first conducted a user study to identify behavioral pattern differences (gaze, head and touch) between completing usual tasks on digital versus physical tabletops. We then compiled our findings into five types of spatiotemporal features, and train a machine learning model to recognize unintentional touches with an F1 score of 91.3%, outperforming the state-of-the-art model by 4.3%. Finally, we evaluated our algorithm in a real-time filtering system. A user study shows that our algorithm is stable and the improved tabletop effectively screens out unintentional touches, and provide more relaxing and natural user experience. By linking their gaze and head behavior to their touch behavior, our work sheds light on the possibility of future tabletop technology to improve the understanding of users’ input intention.


For mobile or wearable devices with a small touchscreen, handwriting input (instead of typing on the touchscreen) is highly desirable for efficient human-computer interaction. Previous passive acoustic-based handwriting solutions mainly focus on print-style capital input, which is inconsistent with people’s daily habits and thus causes inconvenience. In this paper, we propose WritingRecorder, a novel universal text entry system that enables free-style lowercase handwriting recognition. WritingRecorder leverages the built-in microphone of the smartphones to record the handwritten sound, and then designs an adaptive segmentation method to detect letter fragments in real-time from the recorded sound. Then we design a neural network named Inception-LSTM to extract the hidden and unique acoustic pattern associated with the writing trajectory of each letter and thus classify each letter. Moreover, we adopt a word selection method based on language model, so as to recognize legislate words from all possible letter combinations. We implement WritingRecorder as an APP on mobile phones and conduct the extensive experimental evaluation. The results demonstrate that WritingRecorder works in real-time and can achieve 93.2% accuracy even for new users without collecting and training on their handwriting samples, under a series of practical scenarios.


In this paper, we present FingerTrak, a minimal-obtrusive wristband that enables continuous 3D finger tracking and hand pose estimation with four miniature thermal cameras mounted closely on a form-fitting wristband. FingerTrak explores the feasibility of continuously reconstructing the entire hand postures (20 finger joints positions) without the needs of seeing all fingers. We demonstrate that our system is able to estimate the entire hand posture by observing only the outline of the hand, i.e., hand silhouettes from the wrist using low-resolution (32*24) thermal cameras. A customized deep neural network is developed to learn to ”stitch” these multi-view images and estimate 20 joints positions in 3D space. Our user study with 11 participants shows that the system can achieve an average angular error of 6.46° when tested under the same background, and 8.06° when tested under a different background. FingerTrak also shows encouraging results with the re-mounting of the device and has the potential to reconstruct some of the complicated poses. We conclude this paper with further discussions of the opportunities and challenges of this technology.


A finger held in the air exhibits microvibrations, which are reduced when it touches a static object. When a finger moves along a surface, the friction between them produces vibrations, which can not be produced with a free-moving finger in the air. With an inertial measurement unit (IMU) capturing such motion characteristics, we demonstrate the feasibility to detect contact between the finger and static objects. We call our technique ActualTouch. Studies show that a single nail-mounted IMU on the index finger provides sufficient data to train a binary touch status classifier (i.e.,touchvs.no-touch), with an accuracy above 95%, generalised across users. This model, trained on a rigid tabletop surface, was found to retain an average accuracy of 96% for 7 other types of everyday surfaces with varying rigidity, and in walking and sitting scenarios where no touch occurred. ActualTouch can be combined with other interaction techniques, such as in a uni-stroke gesture recogniser on arbitrary surfaces, where touch status from ActualTouch is used to delimit the motion gesture data that feed into the recogniser. We demonstrate the potential of ActualTouch in a range of scenarios, such as interaction for augmented reality applications, and leveraging daily surfaces and objects for ad-hoc interactions.


Previous studies have shown that visually impaired users face a unique set of pain points in smartphone interaction including locating and removing the phone from a pocket, two-handed interaction while holding a cane, and keeping personal data private in a public setting. In this paper, we present a ring-based input interaction that enables in-pocket smartphone operation. By wearing a ring with an Inertial Measurement Unit on the index finger, users can perform gestures on any surface (e.g., tables, thighs) using subtle, one-handed gestures and receive auditory feedback via earphones. We conducted participatory studies to obtain a set of versatile commands and corresponding gestures. We subsequently trained an SVM model to recognize these gestures and achieved a mean accuracy of 95.5% on 15 classifications. Evaluation results showed that our ring interaction is more efficient than some baseline phone interactions and is easy, private, and fun to use.


Flying drones have become common objects in our daily lives, serving a multitude of purposes. Many of these purposes involve outdoor scenarios where the user combines drone control with another activity. Traditional interaction methods rely on physical or virtual joysticks that occupy both hands, thus restricting drone usability. In this paper, we investigate one-handed human-to-drone-interaction by leveraging three modalities: force, touch, and IMU. After prototyping three different combinations of these modalities on a smartphone, we evaluate them against the current commercial standard through two user experiments. These experiments help us to find the combination of modalities that strikes a compromise between user performance, perceived task load, wrist rotation, and interaction area size. Accordingly, we select a method that achieves faster task completion times than the two-handed commercial baseline by 16.54% with the merits of subtle user behaviours inside a small-size ring-form device and implements this method within the ring-form device. The last experiment involving 12 participants shows that thanks to its small size and weight, the ring device displays better performance than the same method implemented on a mobile phone. Furthermore, users unanimously found the device useful for controlling a drone in mobile scenarios (AVG = 3.92/5), easy to use (AVG = 3.58/5) and easy to learn (AVG = 3.58/5). Our findings give significant design clues in search of subtle and effective interaction through finger augmentation devices with drone control. The users with our prototypical system and a multi-modal on-finger device can control a drone with subtle wrist rotation(pitch gestures: 43.24° amplitude and roll gestures: 46.35° amplitude) and unnoticeable thumb presses within a miniature-sized area of (1.08 * 0.61 cm2).


Hand gestures provide a natural and easy-to-use way to input commands. However, few works have studied the design space of bimanual hand gestures or attempted to infer gestures that involve devices on both hands. We explore the design space of hand-to-hand gestures, a group of gestures that are performed by touching one hand with the other hand. Hand-to-hand gestures are easy to perform and provide haptic feedback on both hands. Moreover, hand-to-hand gestures generate simultaneous vibration on two hands that can be sensed by dual off-the-shelf wrist-worn devices. In this work, we derive a hand-to-hand gesture vocabulary with subjective ratings from users and select gesture sets for real-life scenarios. We also take advantage of devices on both wrists to demonstrate their gesture-sensing capability. Our results show that the recognition accuracy for fourteen gestures is 94.6% when the user is stationary, and the accuracy for five gestures is 98.4% or 96.3% when the user is walking or running, respectively. This is significantly more accurate than a single device worn on either wrist. Our further evaluation also validates that users can easily remember hand-to-hand gestures and use our technique to invoke commands in real-life contexts.


Inputting a pattern or PIN code on the touch screen is a popular method to prevent unauthorized access to mobile devices. However, these sensitive tokens are highly susceptible to being inferred by various types of side-channel attacks, which can compromise the security of the private data stored in the device. This paper presents a second-factor authentication method, TouchPrint, which relies on the user’s hand posture shape traits (dependent on the individual different posture type and unique hand geometry biometrics)
when the user inputs PIN or pattern. It is robust against the behavioral variability of inputting a passcode and places no restrictions on input manner (e.g., number of the finger touching the screen, moving speed, or pressure). To capture the spatial characteristic of the user’s hand posture shape when input the PIN or pattern, TouchPrint performs active acoustic sensing to scan the user’s hand posture when his/her finger remains static at some reference positions on the screen (e.g., turning points for the pattern and the number buttons for the PIN code), and extracts the multipath effect feature from the echo signals reflected by the hand. Then, TouchPrint fuses with the spatial multipath feature-based identification results generated from the multiple reference positions to facilitate a reliable and secure MFA system. We build a prototype on smartphone and then evaluate the performance of TouchPrint comprehensively in a variety of scenarios. The experiment results demonstrate that TouchPrint can effectively defend against the replay attacks and imitate attacks. Moreover, TouchPrint can achieve an authentication accuracy of about 92% with only ten training samples.


Obtaining a signal useful for continuous pointing input is still an open problem for wearables. While magnetic field sensing is one promising approach, there are significant limitations. Our key contribution in this work is a simulation of a system that tracks a magnet in 3D while also accounting for the ambient magnetic field. The simulated sensor data is processed and the position and rotation is determined by using magnetic field equations, a particle filter and a kinematic model of the hand.


Temporal synchronous target selection is an association-free selection technique: users select a target by generating signals (e.g., finger taps and hand claps) in sync with its unique temporal pattern. However, classical pattern set design and input recognition algorithm of such techniques did not leverage users’ behavioral information, which limits their robustness to imprecise inputs. In this paper, we improve these two key components by modeling users’ interaction behavior. In the first user study, we asked users to tap a finger in sync with blinking patterns with various period and delay, and modeled their finger tapping ability using Gaussian distribution. Based on the results, we generated pattern sets for up to 22 targets that minimized the possibility of confusion due to imprecise inputs. In the second user study, we validated that the optimized pattern sets could reduce error rate from 23% to 7% for the classical Correlation recognizer. We also tested a novel Bayesian, which achieved higher selection accuracy than the Correlation recognizer when the input sequence is short. The informal evaluation results show that the selection technique can be effectively scaled to different modalities and sensing techniques.


Session 3C: ‘Sensing III (RF and other sensing modes)’
Tuesday, September 15, 2020 17:00-18:30 EDT

View Session 3C Presentations

Home assistant devices such as Amazon Echo and Google Home have become tremendously popular in the last couple of years. However, due to their voice-controlled functionality, these devices are not accessible to Deaf and Hard-of-Hearing (DHH) people. Given that over half a million people in the United States communicate using American Sign Language (ASL), there is a need of a home assistant system that can recognize ASL. The objective of this work is to design a home assistant system for DHH users (referred to as mmASL) that can perform ASL recognition using 60 GHz millimeter-wave wireless signals. mmASL has two important components. First, it can perform reliable wake-word detection using spatial spectrograms. Second, using a scalable and extensible multi-task deep learning model, mmASL can learn the phonological properties of ASL signs and use them to accurately recognize the ASL signs. We implement mmASL on 60 GHz software radio platform with phased array, and evaluate it using a large-scale data collection from 15 signers, 50 ASL signs and over 12K sign instances. We show that mmASL is tolerant to the presence of other interfering users and their activities, change of environment and different user positions. We compare mmASL with a well-studied Kinect and RGB camera based ASL recognition systems, and find that it can achieve a comparable performance (87% average accuracy of sign recognition), validating the feasibility of using 60 GHz mmWave system for ASL sign recognition.


Near-Infrared Spectroscopy (NIRS) is a non-invasive sensing technique which can be used to acquire information on an object’s chemical composition. Although NIRS is conventionally used in dedicated laboratories, the recent introduction of miniaturized NIRS scanners has greatly expanded the use cases of this technology. Previous work from the UbiComp community shows that miniaturized NIRS can be successfully adapted to identify medical pills and alcohol concentration. In this paper, we further extend this technology to identify sugar (sucrose) contents in everyday drinks. We developed a standalone mobile device which includes inter alia a NIRS scanner and a 3D printed clamp. The clamp can be attached to a straw-like tube to sense a liquid’s sucrose content. Through a series of studies, we show that our technique can accurately measure sucrose levels in both lab-made samples and commercially available drinks, as well as classify commercial drinks. Furthermore, we show that our method is robust to variations in the ambient temperature and lighting conditions. Overall, our system can estimate the concentration of sugar with ±0.29 g/100ml error in lab-made samples and < 2.0 g/100ml error in 18 commercial drinks, and can identify everyday drinks with > 99% accuracy. Furthermore, in our analysis, we are able to discern three characteristic wavelengths in the near-infrared region (1055 nm, 1235 nm and 1545 nm) with acute responses to sugar (sucrose). Our proposed protocol contributes to the development of everyday “food scanners” consumers.


In recent years, we have seen efforts made to simultaneously monitor the respiration of multiple persons based on the channel state information (CSI) retrieved from commodity WiFi devices. Existing approaches mainly rely on spectral analysis of the CSI amplitude to obtain respiration rate information, leading to multiple limitations: (1) spectral analysis works when multiple persons exhibit dramatically different respiration rates, however, it fails to resolve similar rates; (2) spectral analysis can only obtain the average respiration rate over a period of time, and it is unable to capture the detailed rate change over time; (3) they fail to sense the respiration when a target is located at the “blind spots” even the target is close to the sensing devices.

To overcome these limitations, we propose MultiSense, the first WiFi-based system that can robustly and continuously sense the detailed respiration patterns of multiple persons even they have very similar respiration rates and are physically closely located. The key insight of our solution is that the commodity WiFi hardware nowadays is usually equipped with multiple antennas. Thus, each individual antenna can receive a different mix copy of signals reflected from multiple persons. We successfully prove that the reflected signals are linearly mixed at each antenna and propose to model the multi-person respiration sensing as a blind source separation (BSS) problem. Then, we solve it using independent component analysis (ICA) to separate the mixed signal and obtain the reparation information of each person. Extensive experiments show that with only one pair of transceivers, each equipped with three antennas, MultiSense is able to accurately monitor respiration even in the presence of four persons, with the mean absolute respiration rate error of 0.73 bpm (breaths per minute).


Eyelid stickers are thin strips that temporarily create a crease when attached to the eyelid. The direct contact with the crease that increases and decreases the pressure on the eyelid sticker provides a novel opportunity for sensing blinking. We present Eslucent, an on-skin wearable capacitive sensing device that affords blink detection, building on the form factor of eyelid stickers. It consists of an art layer, conductive thread, fiber eyelid stickers, coated with conductive liquid, and applied the device onto the eyelid crease with adhesive temporary tattoo paper. Eslucent detects blinks during intentional blinking and four involuntary activities by a falling edge detection algorithm in a user study of 14 participants. The average precision was 82% and recall was 70% while achieving the precision and recall of more than 90% in intentional blinking. By embedding interactive technology into a daily beauty product, Eslucent explores a novel wearable form factor for blink detection.


Recent research has shown great potential of exploiting Channel State Information (CSI) retrieved from commodity Wi-Fi devices for contactless human sensing in smart homes. Despite much work on Wi-Fi based indoor localization and motion/intrusion detection, no prior solution is capable of detecting a person entering a room with a precise sensing boundary, making room-based services infeasible in the real world. In this paper, we present WiBorder, an innovative technique for accurate determination of Wi-Fi sensing boundary. The key idea is to harness antenna diversity to effectively eliminate random phase shifts while amplifying through-wall amplitude attenuation. By designing a novel sensing metric and correlating it with human’s through-wall discrimination, WiBorder is able to precisely determine Wi-Fi sensing boundaries by leveraging walls in our daily environments. To demonstrate the effectiveness of WiBorder, we have developed an intrusion detection system and an area detection system. Extensive results in real-life scenarios show that our intrusion detection system achieves a high detection rate of 99.4% and a low false alarm rate of 0.68%, and the area detection system’s accuracy can be as high as 97.03%. To the best of our knowledge, WiBorder is the first work that enables precise sensing boundary determination via through-wall discrimination, which can immediately benefit other Wi-Fi based applications.


This paper presents an RF-based assistive technology for voice impairments (i.e., dysphonia), which occurs in an estimated 1% of the global population. We specifically focus on acquired voice disorders where users continue to be able to make facial and lip gestures associated with speech. Despite the rich literature on assistive technologies in this space, there remains a gap for a solution that neither requires external infrastructure in the environment, battery-powered sensors on skin or body-worn manual input devices.

We present RFTattoo, which to our knowledge is the first wireless speech recognition system for voice impairments using batteryless and flexible RFID tattoos. We design specialized wafer-thin tattoos attached around the user’s face and easily hidden by makeup. We build models that process signal variations from these tattoos to a portable RFID reader to recognize various facial gestures corresponding to distinct classes of sounds. We then develop natural language processing models that infer meaningful words and sentences based on the observed series of gestures. A detailed user study with 10 users reveals 86% accuracy in reconstructing the top-100 words in the English language, even without the users making any sounds.


Wireless signals have been extensively utilized for contactless sensing in the past few years. Due to the intrinsic nature of employing the weak target-reflected signal for sensing, the sensing range is limited. For instance, WiFi and RFID can achieve 3-6 meter sensing range while acoustic-based sensing is limited to less than one meter. In this work, we identify exciting sensing opportunities with LoRa, which is the new long-range communication technology designed for IoT communication. We explore the sensing capability of LoRa, both theoretically and experimentally. We develop the sensing model to characterize the relationship between target movement and signal variation, and propose novel techniques to increase LoRa sensing range to over 25 meters for human respiration sensing. We further build a prototype system which is capable of sensing both coarse-grained and fine-grained human activities. Experimental results show that (1) human respiration can still be sensed when the target is 25 meters away from the LoRa devices, and 15 meters away with a wall in between; and (2) human walking (both displacement and direction) can be tracked accurately even when the target is 30 meters away from the LoRa transceiver pair.


This paper explores the possibility of tracking finger drawings in the air leveraging WiFi signals from commodity devices.
Prior solutions typically require user to hold a wireless transmitter, or need proprietary wireless hardware. They can only recognize a small set of pre-defined hand gestures. This paper introduces FingerDraw, the first sub-wavelength level finger motion tracking system using commodity WiFi devices, without attaching any sensor to finger.
FingerDraw can reconstruct finger drawing trajectory such as digits, alphabets, and symbols with the setting of one WiFi transmitter and two WiFi receivers. It uses a two-antenna receiver to sense the sub-wavelength scale displacement of finger motion in each direction. The theoretical underpinning of FingerDraw is our proposed CSI-quotient model, which uses the channel quotient between two antennas of the receiver to cancel out the noise in CSI amplitude and the random offsets in CSI phase, and quantifies the correlation between CSI value dynamics and object displacement. This channel quotient is sensitive to and enables us to detect small changes in In-phase and Quadrature parts of channel state information due to finger movement.
Our experimental results show that the overall median tracking accuracy is 1.27 cm, and the recognition of drawing ten digits in the air achieves an average accuracy of over 93.0%.


Given the significant amount of time people spend in vehicles, health issues under driving condition have become a major concern. Such issues may vary from fatigue, asthma, stroke, to even heart attack, yet they can be adequately indicated by vital signs and abnormal activities. Therefore, in-vehicle vital sign monitoring can help us predict and hence prevent these issues. Whereas existing sensor-based (including camera) methods could be used to detect these indicators, privacy concern and system complexity both call for a convenient yet effective and robust alternative. This paper aims to develop V^2iFi, an intelligent system performing monitoring tasks using a COTS impulse radio mounted on the windshield. V^2iFi is capable of reliably detecting driver’s vital signs under driving condition and with the presence of passengers, thus allowing for potentially inferring corresponding health issues. Compared with prior work based on Wi-Fi CSI, V^2iFi is able to distinguish reflected signals from multiple users, and hence provide finer-grained measurements under more realistic settings. We evaluate V^2iFi both in lab environments and during real-life road tests; the results demonstrate that respiratory rate, heart rate, and heart rate variability can all be estimated accurately. Based on these estimation results, we further discuss how machine learning models can be applied on top of V^2iFi so as to improve both physiological and psychological wellbeing in driving environments.


Session 4A: ‘IoT and Software Tools’
Wednesday, September 16, 2020 09:00-10:30 EDT

View Session 4A Presentations

The proliferation of smart home devices has created new opportunities for empirical research in ubiquitous computing, ranging from security and privacy to personal health. Yet, data from smart home deployments are hard to come by, and existing empirical studies of smart home devices typically involve only a small number of devices in lab settings. To contribute to data-driven smart home research, we crowdsource the largest known dataset of labeled network traffic from smart home devices from within real-world home networks. To do so, we developed and released IoT Inspector, an open-source tool that allows users to observe the traffic from smart home devices on their own home networks. Between April 10, 2019 and January 21, 2020, 5,404 users have installed IoT Inspector, allowing us to collect labeled network traffic from 54,094 smart home devices. At the time of publication, IoT Inspector is still gaining users and collecting data from more devices. We demonstrate how this data enables new research into smart homes through two case studies focused on security and privacy. First, we find that many device vendors, including Amazon and Google, use outdated TLS versions and send unencrypted traffic, sometimes to advertising and tracking services. Second, we discover that smart TVs from at least 10 vendors communicated with advertising and tracking services. Finally, we find widespread cross-border communications, sometimes unencrypted, between devices and Internet services that are located in countries with potentially poor privacy practices. To facilitate future reproducible research in smart homes, we will release the IoT Inspector data to the public.


As the Internet of Things (IoT) proliferates, the potential for its opportunistic interaction with traditional mobile apps becomes apparent. We argue that to fully take advantage of this potential, mobile apps must become things themselves, and interact in a smart space like their hardware counterparts. We present an extension to our Atlas thing architecture on smartphones, allowing mobile apps to behave as things and provide powerful services and functionalities. To this end, we also consider the role of the mobile app developer, and introduce actionable keywords (AKWs)—a dynamically programmable description—to enable potential thing to thing interactions. The AKWs empower the mobile app to dynamically react to services provided by other things, without being known a priori by the original app developer. In this paper, we present the mobile-apps-as-things (MAAT) concept along with its AKW concept and programming construct. For MAAT to be adopted by developers, changes to the existing development environments (IDE) should remain minimal to stay acceptable and practically usable, thus we also propose an IDE plugin to simplify the addition of this dynamic behavior. We present details of MAAT, along with the implementation of the IDE plugin, and give a detailed benchmarking evaluation to assess the responsiveness of our implementation to impromptu interactions and dynamic app behavioral changes. We also investigate another study, targeting Android developers, which evaluates the acceptability and usability of the MAAT IDE plugin.


Existing Internet of Things (IoT) solutions require expensive infrastructure for sending and receiving data. Emerging technologies such as ambient backscatter help fill this gap by enabling uplink communication for IoT devices. However, there is still no efficient solution to enable low-cost and low-power downlink communication for ambient backscatter systems. In this paper we present Glaze, a system that overlays data on existing wireless signals to create a new channel of downlink communication for IoT backscatter devices. In particular, Glaze uses a new technique that introduces small perturbations to existing signals to convey data. We evaluate the performance of Glaze and show how it can be used across wireless standards such as FM, TV, or Wi-Fi to communicate with devices with minimal impact on existing data transmissions.


The ability to sense ambient temperature pervasively, albeit crucial for many applications, is not yet available, causing problems such as degraded indoor thermal comfort and unexpected/premature shutoffs of mobile devices. To enable pervasive sensing of ambient temperature, we propose use of mobile device batteries as thermometers based on (i) the fact that people always carry their battery-powered smart phones, and (ii) our empirical finding that the temperature of mobile devices’ batteries is highly correlated with that of their operating environment. Specifically, we design and implement Batteries-as-Thermometers (BaT), a temperature sensing service based on the information of mobile device batteries, expanding the ability to sense the device’s ambient temperature without requiring additional sensors or taking up the limited on-device space. We have evaluated BaT on 6 Android smartphones using 19 laboratory experiments and 36 real-life field-tests, showing an average of 1.25°C error in sensing the ambient temperature.


We present the design and implementation of DIO, a novel digital-physical construction toolkit to enable constructionist learning for children from age group 8-12 years. The toolkit comprises of dome-shaped (D) tangible modules with various attachments that allow suspension on the body of multiple children and/or in the environment to support a variety of sensing/input (I), actuation/output (O) functionalities. The modules are enabled for wireless communication and can be linked together using an Augmented Reality based programming interface running on a smartphone. The smartphone recognizes our hemispherical modules omnidirectionally through novel computer vision-based 3D patterns; custom-made to provide logical as well as semantic encoding. In this paper, we show how, owing to its unique form-factor, the toolkit enables multi-user constructions for the children and offers a shared learning experience. We further reflect on our learning from a one-year-long iterative design process and contribute a social scaffolding based procedure to engage K-12 children with such constructionist toolkits effectively.


Professional programmers are significantly outnumbered by end-users of software, making it problematic to predict the diverse, dynamic needs of these users in advance. An end-user development (EUD) approach, supporting the creation and modification of software independent of professional developers, is one potential solution.
EUD activities are applicable to the work practices of psychology researchers and clinicians, who increasingly rely on software for assessment of participants and patients, but must also depend on developers to realise their requirements.
In practice, however, the adoption of EUD technology by these two end-user groups is contingent on various contextual factors that are not well understood.
In this paper, we therefore establish recommendations for the design of EUD tools allowing non-programmers to develop apps to collect data from participants in their everyday lives, known as “experience sampling” apps.
We first present interviews conducted with psychology researchers and practising clinicians on their current working practices and motivation to adopt EUD tools. We then describe our observation of a chronic disease management clinic. Finally, we describe three case studies of psychology researchers using our EUD tool Jeeves to undertake experience sampling studies, and synthesise recommendations and requirements for tools allowing the EUD of experience sampling apps.


Two common approaches for automating IoT smart spaces are having users write rules using trigger-action programming (TAP) or training machine learning models based on observed actions. In this paper, we unite these approaches. We introduce and evaluate Trace2TAP, a novel method for automatically synthesizing TAP rules from traces (time-stamped logs of sensor readings and manual actuations of devices). We present a novel algorithm that uses symbolic reasoning and SAT-solving to synthesize TAP rules from traces. Compared to prior approaches, our algorithm synthesizes generalizable rules more comprehensively and fully handles nuances like out-of-order events. Trace2TAP also iteratively proposes modified TAP rules when users manually revert automations. We implemented our approach on Samsung SmartThings. Through formative deployments in ten offices, we developed a clustering/ranking system and visualization interface to intelligibly present the synthesized rules to users. We evaluated Trace2TAP through a field study in seven additional offices. Participants frequently selected rules ranked highly by our clustering/ranking system. Participants varied in their automation priorities, and they sometimes chose rules that would seem less desirable by traditional metrics like precision and recall. Trace2TAP supports these differing priorities by comprehensively synthesizing TAP rules and bringing humans into the loop during automation.


We present Rataplan, a robust and resilient pixel-based approach for linking multi-modal proxies to automated sequences of actions in graphical user interfaces (GUIs). With Rataplan, users demonstrate a sequence of actions and answer human-readable follow-up questions to clarify their desire for automation. After demonstrating a sequence, the user can link a proxy input control to the action which can then be used as a shortcut for automating a sequence. Alternatively, output proxies use a notification model in which content is pushed when it becomes available. As an example use case, Rataplan uses keyboard shortcuts and tangible user interfaces (TUIs) as input proxies, and TUIs as output proxies. Instead of relying on available APIs, Rataplan automates GUIs using pixel-based reverse engineering. This ensures our approach can be used with all applications that offer a GUI, including web applications. We implemented a set of important strategies to support robust automation of modern interfaces that have a flat and minimal style, have frequent data and state changes, and have dynamic viewports.


Earable computing gains growing attention within research and becomes ubiquitous in society. However, there is an emerging need for prototyping devices as critical drivers of innovation. In our work, we reviewed the features of existing earable platforms. Based on 24 publications, we characterized the design space of earable prototyping. We used the open eSense platform (6-axis IMU, auditory I/O) to evaluate the problem-based learning usability of non-experts. We collected data from 79 undergraduate students who developed 39 projects. Our questionnaire-based results suggest that the platform creates interest in the subject matter and supports self-directed learning. The projects align with the research space, indicating ease of use, but lack contributions for more challenging topics. Additionally, many projects included games not present in current research. The average SUS score of the platform was 67.0. The majority of problems are technical issues (e.g., connecting, playing music).


Session 4B: ‘Health and Wellbeing II’
Wednesday, September 16, 2020 09:00-10:30 EDT

View Session 4B Presentations

With the recent proliferation of mobile health technologies, health scientists are increasingly interested in developing just-in-time adaptive interventions (JITAIs), typically delivered via notifications on mobile devices and designed to help users prevent negative health outcomes and to promote the adoption and maintenance of healthy behaviors. A JITAI involves a sequence of decision rules (i.e., treatment policies) that take the user’s current context as input and specify whether and what type of intervention should be provided at the moment. In this work, we describe a reinforcement learning (RL) algorithm that continuously learns and improves the treatment policy embedded in the JITAI as data is being collected from the user. This work is motivated by our collaboration on designing an RL algorithm for HeartSteps V2 based on data collected HeartSteps V1. HeartSteps is a physical activity mobile health application. The RL algorithm developed in this work is being used in HeartSteps V2 to decide, five times per day, whether to deliver a context-tailored activity suggestion.


Recent advancements in sensing techniques for mHealth applications have led to successful development and deployments of several mHealth intervention designs, including Just-In-Time Adaptive Interventions (JITAI). JITAIs show great potential because they aim to provide the right type and amount of support, at the right time. Timing the delivery of a JITAI such as the user is receptive and available to engage with the intervention is crucial for a JITAI to succeed. Although previous research has extensively explored the role of context in users’ responsiveness towards generic phone notifications, it has not been thoroughly explored for actual mHealth interventions. In this work, we explore the factors affecting users’ receptivity towards JITAIs. To this end, we conducted a study with 189 participants, over a period of 6 weeks, where participants received interventions to improve their physical activity levels. The interventions were delivered by a chatbot-based digital coach — Ally — which was available on Android and iOS platforms.

We define several metrics to gauge receptivity towards the interventions, and found that (1) several participant-specific characteristics (age, personality, and device type) show significant associations with the overall participant receptivity over the course of the study, and that (2) several contextual factors (day/time, phone battery, phone interaction, physical activity, and location), show significant associations with the participant receptivity, in-the-moment. Further, we explore the relationship between the effectiveness of the intervention and receptivity towards those interventions; based on our analyses, we speculate that being receptive to interventions helped participants achieve physical activity goals, which in turn motivated participants to be more receptive to future interventions. Finally, we build machine-learning models to detect receptivity, with up to a 77% increase in F1 score over a biased random classifier.


Maximizing the motor practice in stroke survivors’ living environments may significantly improve the functional recovery of their stroke-affected upper-limb. A wearable system that can continuously monitor upper-limb performance has been considered as an effective clinical solution for its potential to provide patient-centered, data-driven feedback to improve the motor dosage. Towards that end, we investigate a system leveraging a pair of finger-worn, ring-type accelerometers capable of monitoring both gross-arm and fine-hand movements that are clinically relevant to the performance of daily activities. In this work, we conduct a mixed-methods study to (1) quantitatively evaluate the efficacy of finger-worn accelerometers in measuring clinically relevant information regarding stroke survivors’ upper-limb performance, and (2) qualitatively investigate design requirements for the self-monitoring system, based on data collected from 25 stroke survivors and seven occupational therapists. Our quantitative findings demonstrate strong face and convergent validity of the finger-worn accelerometers, and its responsiveness to changes in motor behavior. Our qualitative findings provide a detailed account of the current rehabilitation process while highlighting several challenges that therapists and stroke survivors face. This study offers promising directions for the design of a self-monitoring system that can encourage the affected limb use during stroke survivors’ daily living.


Estimating the category and quality of interpersonal relationships from ubiquitous phone sensor data matters for studying mental well-being and social support. Prior work focused on using communication volume to estimate broad relationship categories, often with small samples. Here we contextualize communications by combining phone logs with demographic and location data to predict interpersonal relationship roles on a varied sample population using automated machine learning methods, producing better performance (F1=0.68) than using communication features alone (F1=0.62). We also explore the effect of age variation in the underlying training sample on interpersonal relationship prediction and find that models trained on younger subgroups, which is popular in the field via student participation and recruitment, generalize poorly to the wider population. Our results not only illustrate the value of using data across demographics, communication patterns and semantic locations for relationship prediction, but also underscore the importance of considering population heterogeneity in phone-based personal sensing studies.


Although self-tracking offers potential for a more complete, accurate, and longer-term understanding of personal health, many people struggle with or fail to achieve their goals for health-related self-tracking. This paper investigates how to address challenges that result from current self-tracking tools leaving a person’s goals for their data unstated and lacking explicit support. We examine supporting people and health providers in expressing and pursuing their tracking-related goals via goal-directed self-tracking, a novel method to represent relationships between tracking goals and underlying data. Informed by a reanalysis of data from a prior study of migraine tracking goals, we created a paper prototype to explore whether and how goal-directed self-tracking could address current disconnects between the goals people have for data in their chronic condition management and the tools they use to support such goals. We examined this prototype in interviews with 14 people with migraine and 5 health providers. Our findings indicate the potential for scaffolding goal-directed self-tracking to: 1) elicit different types and hierarchies of management and tracking goals; 2) help people prepare for all stages of self-tracking towards a specific goal; and 3) contribute additional expertise in patient-provider collaboration. Based on our findings, we present implications for the design of tools that explicitly represent and support an individual’s specific self-tracking goals.


Natural disasters cause long-lasting mental health problems such as PTSD in children. Following the 2011 Earthquake and Tsunami in Japan, we witnessed a shift of toy block play behavior in young children who suffered from stress after the disaster. The behavior reflected their emotional responses to the traumatic event. In this paper, we explore the feasibility of using data captured from block-play to assess children’s stress after a major natural disaster. We prototyped sets of sensor-embedded toy blocks, AssessBlocks, that automate quantitative play data acquisition. During a three-year period, the blocks were dispatched to fifty-two post-disaster children. Within a free play session, we captured block features, a child’s playing behavior, and stress evaluated by several methods. The result from our analysis reveal correlations between block play features and stress measurements and show initial promise of using the effectiveness of using AssessBlocks to assess children’s stress after a disaster. We provide detailed insights into the potential as well as the challenges of our approach and unique conditions. From these insights we summarize guidelines for future research in automated play assessment systems that support children’s mental health.


Mobile food logging is important but people find it tedious and difficult to do. Our work tackles the challenging aspect of
searching a large food database on a small mobile screen. We describe the design of the EaT (Eat and Track) app with its
Search-Accelerator to support searches on >6,000 foods. We designed a study to harness data from a large nutrition study to provide insights about the use and user experience of EaT. We report the results of our evaluation: a 12-participant lab study and a public health research field study where 1,027-participants entered their nutrition intake for 3 days, logging 30,715 food items. We also analysed 1,163 user-created food entries from 670 participants to gain insights about the causes of failures in the food search. Our core contributions are: 1) the design and evaluation of EaT’s support for accurate and detailed food logging; 2) our study design that harnesses a nutrition research study to provide insights about timeliness of logging and the strengths and weaknesses of the search; 3) new performance benchmarks for mobile food logging.


We developed a contactless syndromic surveillance platform FluSense that aims to expand the current paradigm of influenza-like illness (ILI) surveillance by capturing crowd-level bio-clinical signals directly related to physical symptoms of ILI from hospital waiting areas in an unobtrusive and privacy-sensitive manner. FluSense proposes a novel edge-computing sensor system, models and data processing pipelines that will track crowd behaviors and influenza-related indicators, such as coughs, and to predict daily ILI and laboratory-confirmed influenza caseloads. FluSense uses a microphone array and a thermal camera along with a neural computing engine to passively and continuously characterize speech and cough activities along with changes in crowd density on the edge in a real-time manner. We conducted an IRB-approved 7 month-long study from December 10, 2018 to July 12, 2019 where we deployed FluSense in four public waiting rooms within the hospital of a large university. During this period, the FluSense platform collected and analyzed more than 350,000 waiting room thermal images and 21 million non-speech audio samples from the hospital waiting areas. FluSense can accurately predict daily patient counts with a Pearson correlation coefficient of 0.95. We compared signals from FluSense with the gold standard laboratory-confirmed influenza case data obtained in the same facility and found that our sensor-based features are strongly correlated with laboratory-confirmed influenza trends.


More than one million people in the US suffer from hemianopia, which blinds the vision in one half of the peripheral vision in both eyes. Hemianopic patients are often not aware of what they cannot see and frequently bump into walls, trip over objects, or walk into people on the side where the peripheral vision is diminished. We present an augmented reality based assistive technology that expands the peripheral vision of hemianopic patients at all distances. In a pilot trial, we evaluate the utility of this assistive technology for ten hemianopic patients. We measure and compare outcomes related to target identification and visual search in the participants. Improvements in target identification are noted in all participants ranging from 18% to 72%. Similarly, all the participants benefit from the assistive technology in performing a visual search task with an average increase of 24% in the number of successful searches compared to unaided trials. The proposed technology is the first instance of an electronic vision enhancement tool for hemianopic patients and is expected to maximize the residual vision and quality of life in this growing, yet largely overlooked population.


We present the design, implementation, and evaluation of a multi-sensor, low-power, necklace NeckSense, for automatically and unobtrusively capturing fine-grained information about an individual’s eating activity and eating episodes, across an entire waking day in a naturalistic setting. NeckSense fuses and classifies the proximity of the necklace from the chin, the ambient light, the Lean Forward Angle, and the energy signals to determine chewing sequences, a building block of the eating activity. It then clusters the identified chewing sequences to determine eating episodes. We tested NeckSense on 11 participants with and 9 participants without obesity, across two studies, where we collected more than 470 hours of data in a naturalistic setting. Our results demonstrate that NeckSense enables reliable eating detection for individuals with diverse body mass index (BMI) profiles, across an entire waking day, even in free-living environments. Overall, our system achieves an F1-score of 81.6% in detecting eating episodes in an exploratory study. Moreover, our system can achieve an F1-score of 77.1% for episodes even in an all-day-around free-living setting. With more than 15.8 hours of battery life, NeckSense will allow researchers and dietitians to better understand natural chewing and eating behaviors. In the future, researchers and dietitians can use NeckSense to provide appropriate real-time interventions when an eating episode is detected or when problematic eating is identified.


Session 4C: ‘Human Activity Recognition I’
Wednesday, September 16, 2020 09:00-10:30 EDT

View Session 4C Presentations

Activity recognition (AR) and user recognition (UR) using wearable sensors are two key tasks in ubiquitous and mobile computing. Currently, they still face some challenging problems. For one thing, due to the variations in how users perform activities, the performance of a well-trained AR model typically drops on new users. For another, existing UR models are powerless to activity changes, as there are significant differences between the sensor data in different activity scenarios. To address these problems, we propose METIER (deep multi-task learning based activity and user recognition) model, which solves AR and UR tasks jointly and transfers knowledge across them. User-related knowledge from UR task helps AR task to model user characteristics, and activity-related knowledge from AR task guides UR task to handle activity changes. METIER softly shares parameters between AR and UR networks, and optimizes these two networks jointly. The commonalities and differences across tasks are exploited to promote AR and UR tasks simultaneously. Furthermore, mutual attention mechanism is introduced to enable AR and UR tasks to exploit their knowledge to highlight important features for each other. Experiments are conducted on three public datasets, and the results show that our model can achieve competitive performance on both tasks.


Wearable sensors are increasingly becoming the primary interface for monitoring human activities. However, in order to scale human activity recognition (HAR) using wearable sensors to million of users and devices, it is imperative that HAR computational models are robust against real-world heterogeneity in inertial sensor data. In this paper, we study the problem of wearing diversity which pertains to the placement of the wearable sensor on the human body, and demonstrate that even state-of-the-art deep learning models are not robust against these factors. The core contribution of the paper lies in presenting a first-of-its-kind in-depth study of unsupervised domain adaptation (UDA) algorithms in the context of wearing diversity — we develop and evaluate three adaptation techniques on four HAR datasets to evaluate their relative performance towards addressing the issue of wearing diversity. More importantly, we also do a careful analysis to learn the downsides of each UDA algorithm and uncover several implicit data-related assumptions without which these algorithms suffer a major degradation in accuracy. Taken together, our experimental findings caution against using UDA as a silver bullet for adapting HAR models to new domains, and serve as practical guidelines for HAR practitioners as well as pave the way for future research on domain adaptation in HAR.


Human activity recognition (HAR) aims at recognizing activities by training models on the large quantity of sensor data. Since it is time-consuming and expensive to acquire abundant labeled data, transfer learning becomes necessary for HAR by transferring knowledge from existing domains. However, there are two challenges existing in cross-dataset activity recognition. The first challenge is source domain selection. Given a target task and several available source domains, it is difficult to determine how to select the most similar source domain to the target domain such that negative transfer can be avoided. The second one is accurately activity transfer. After source domain selection, how to achieve accurate knowledge transfer between the selected source and the target domain remains another challenge. In this paper, we propose an Adaptive Spatial-Temporal Transfer Learning (ASTTL) approach to tackle both of the above two challenges in cross-dataset HAR. ASTTL learns the spatial features in transfer learning by adaptively evaluating the relative importance between the marginal and conditional probability distributions. Besides, it captures the temporal features via incremental manifold learning. Therefore, ASTTL can learn the adaptive spatial-temporal features for cross-dataset HAR and can be used for both source domain selection and accurate activity transfer. We evaluate the performance of ASTTL through extensive experiments on 4 public HAR datasets, which demonstrates its effectiveness. Furthermore, based on ASTTL, we design and implement an adaptive cross-dataset HAR system called Client-Cloud Collaborative Adaptive Activity Recognition System (3C2ARS) to perform HAR in the real environment. By collecting activities in the smartphone and transferring knowledge in the cloud server, ASTTL can significantly improve the performance of source domain selection and accurate activity transfer.


Recently, significant efforts are made to explore device-free human activity recognition techniques that utilize the information collected by existing indoor wireless infrastructures without the need for the monitored subject to carry a dedicated device. Most of the existing work, however, focuses their attention on the analysis of the signal received by a single device. In practice, there are usually multiple devices “observing” the same subject. Each of these devices can be regarded as an information source and provides us an unique “view” of the observed subject. Intuitively, if we can combine the complementary information carried by the multiple views, we will be able to improve the activity recognition accuracy. Towards this end, we propose DeepMV, a unified multi-view deep learning framework, to learn informative representations of heterogeneous device-free data. DeepMV can combine different views’ information weighted by the quality of their data and extract commonness shared across different environments to improve the recognition performance. To evaluate the proposed DeepMV model, we set up a testbed using commercialized WiFi and acoustic devices. Experiment results show that DeepMV can effectively recognize activities and outperform the state-of-the-art human activity recognition methods.


The locomotor-respiratory coupling (LRC) ratio of a person doing exercise is an important parameter to reflect the exercise safety and effectiveness. Existing approaches that can measure LRC either rely on specialized and costly devices or use heavy sensors, bringing much inconvenience to people during exercise. To overcome these limitations, we propose ER-Rhythm using low-cost and lightweight RFID tags attached on the human body to simultaneously extract and couple the exercise and respiration rhythm for LRC estimation. ER-Rhythm captures exercise locomotion rhythm from the signals of the tags on limbs. However, extracting respiration rhythm from the signals of the tags on the chest during exercise is a challenging task because the minute respiration movement can be overwhelmed by the large torso movement. To address this challenge, we first leverage the unique characteristic of human respiratory mechanism to measure the chest movement while breathing, and then perform dedicated signal fusion of multiple tags interrogated by a pair of antennas to remove the torso movement effect. In addition, we take advantage of the multi-path effect of RF signals to reduce the number of needed antennas for respiration pattern extraction to save the system cost. To couple the exercise and respiration rhythm, we adopt a correlation-based approach to facilitate LRC estimation. The experimental results show that LRC can be estimated accurately up to 92%-95% of the time.


Wearable computing platforms, such as smartwatches and head-mounted mixed reality displays, demand new input devices for high-fidelity interaction. We present AuraRing, a wearable magnetic tracking system designed for tracking fine-grained finger movement. The hardware consists of a ring with an embedded electromagnetic transmitter coil and a wristband with multiple sensor coils. By measuring the magnetic fields at different points around the wrist, AuraRing estimates the five degree-of-freedom pose of the ring. We develop two different approaches to pose reconstruction—a first-principles iterative approach and a closed-form neural network approach. Notably, AuraRing requires no runtime supervised training, ensuring user and session independence. AuraRing has a resolution of 0.1 mm and a dynamic accuracy of 4.4 mm, as measured through a user evaluation with optical ground truth. The ring is completely self-contained and consumes just2.3 mW of power.


Human activity recognition (HAR) plays an irreplaceable role in various applications and has been a prosperous research topic for years. Recent studies show significant progress in feature extraction (i.e., data representation) using deep learning techniques. However, they face significant challenges in capturing multi-modal spatial-temporal patterns from the sensory data, and they commonly overlook the variants between subjects. We propose a Discriminative Adversarial MUlti-view Network (DAMUN) to address the above issues in sensor-based HAR. We first design a multi-view feature extractor to obtain representations of sensory data streams from temporal, spatial, and spatio-temporal views using convolutional networks. Then, we fuse the multi-view representations into a robust joint representation through a trainable Hadamard fusion module, and finally employ a Siamese adversarial network architecture to decrease the variants between the representations of different subjects. We have conducted extensive experiments under an iterative left-one-subject-out setting on three real-world datasets and demonstrated both the effectiveness and robustness of our approach.


Sensor data streams from wearable devices and smart environments are widely studied in areas like human activity recognition (HAR), person identification, or health monitoring. However, most of the previous works in activity and sensor stream analysis have been focusing on one aspect of the data, e.g. only recognizing the type of the activity or only identifying the person who performed the activity. We instead propose an approach that uses a weakly supervised multi-output siamese network that learns to map the data into multiple representation spaces, where each representation space focuses on one aspect of the data. The representation vectors of the data samples are positioned in the space such that the data with the same semantic meaning in that aspect are closely located to each other. Therefore, as demonstrated with a set of experiments, the trained model can provide metrics for clustering data based on multiple aspects, allowing it to address multiple tasks simultaneously and even to outperform single task supervised methods in many situations. In addition, further experiments are presented that in more detail analyze the effect of the architecture and of using multiple tasks within this framework, that investigate the scalability of the model to include additional tasks, and that demonstrate the ability of the framework to combine data for which only partial relationship information with respect to the target tasks is available.


Eyelid stickers are thin strips that temporarily create a crease when attached to the eyelid. The direct contact with the crease that increases and decreases the pressure on the eyelid sticker provides a novel opportunity for sensing blinking. We present Eslucent, an on-skin wearable capacitive sensing device that affords blink detection, building on the form factor of eyelid stickers. It consists of an art layer, conductive thread, fiber eyelid stickers, coated with conductive liquid, and applied the device onto the eyelid crease with adhesive temporary tattoo paper. Eslucent detects blinks during intentional blinking and four involuntary activities by a falling edge detection algorithm in a user study of 14 participants. The average precision was 82% and recall was 70% while achieving the precision and recall of more than 90% in intentional blinking. By embedding interactive technology into a daily beauty product, Eslucent explores a novel wearable form factor for blink detection.


The ubiquitous availability of wearable sensing devices has rendered large scale collection of movement data a straightforward endeavor. Yet, annotation of these data remains a challenge and as such, publicly available datasets for human activity recognition (HAR) are typically limited in size as well as in variability, which constrains HAR model training and effectiveness. We introduce masked reconstruction as a viable self-supervised pre-training objective for human activity recognition and explore its effectiveness in comparison to state-of-the-art unsupervised learning techniques. In scenarios with small labeled datasets, the pre-training results in improvements over end-to-end learning on two of the four benchmark datasets. This is promising because the pre-training objective can be integrated “as is” into state-of-the-art recognition pipelines to effectively facilitate improved model robustness, and thus, ultimately, leading to better recognition performance.


Session 5A: ‘Driving and Transportation’
Wednesday, September 16, 2020 18:00-19:30 EDT

View Session 5A Presentations

Commutes provide an opportune time and space for interventions that mitigate stress–particularly stress accumulated during the workday. In this study, we test the efficacy and safety of haptic guided slow breathing interventions of short duration while driving. We also present design and experimental implications for evolving these interventions from prior simulator to moving vehicle scenarios. We ran a controlled study (N=24) testing a haptic guided breathing system in a closed circuit under normal and stressful driving conditions. Results show the intervention to be successful in both user adoption and system effectiveness with an 82% rate of engagement in intervention and clear reduction of breathing rate and physiological arousal, with no effect on driving safety and minimal effect on performance. The haptic intervention received positive acceptance from the participants: all indicated a willingness to engage with the intervention in the future and all rated the intervention as safe for traffic applications. The results of this study encourage further investigations exploring the use of the intervention on public roads and monitoring for longitudinal health benefits.


As a countermeasure to visual-manual distractions, auditory-verbal (voice) interfaces are becoming increasingly popular for in-vehicle systems. This opens up new opportunities for drivers to receive proactive personalized services from various service domains. However, prior studies warned that such interactions can cause cognitive distraction due to the nature of concurrent multitasking with a limited amount of cognitive resources. In this study, we examined (1) how the varying demands of proactive voice tasks under diverse driving situations impact driver interruptibility, and (2) how drivers adapt their concurrent multitasking of driving and proactive voice tasks, and how the adaptive behaviors are related to driver interruptibility. Our quantitative and qualitative analyses showed that in addition to the driving-task demand, the voice-task demand and adaptive behaviors are also significantly related to driver interruptibility. Additionally, we discuss how our findings can be used to design and realize three types of flow-control mechanisms for voice interactions that can improve driver interruptibility.


With the trend of vehicles becoming increasingly connected and potentially autonomous, vehicles are being equipped with rich sensing and communication devices. Various vehicular services based on shared real-time sensor data of vehicles from a fleet have been proposed to improve the urban efficiency, e.g., HD-live map, and traffic accident recovery. However, due to the high cost of data uploading (e.g., monthly fees for a cellular network), it would be impractical to make all well-equipped vehicles to upload real-time sensor data constantly. To better utilize these limited uploading resources and achieve an optimal road segment sensing coverage, we present a real-time sensing task scheduling framework, i.e., RISC, for Resource-Constraint modeling for urban sensing by scheduling sensing tasks of commercial vehicles with sensors based on the predictability of vehicles’ mobility patterns. In particular, we utilize the commercial vehicles, including taxicabs, buses, and logistics trucks as mobile sensors to sense urban phenomena, e.g., traffic, by using the equipped vehicular sensors, e.g., dash-cam, lidar, automotive radar, etc.
We implement RISC on a Chinese city Shenzhen with one-month real-world data from (i) a taxi fleet with 14 thousand vehicles; (ii) a bus fleet with 13 thousand vehicles; (iii) a truck fleet with 4 thousand vehicles. Further, we design an application, i.e., track suspect vehicles (e.g., hit-and-run vehicles), to evaluate the performance of RISC on the urban sensing aspect based on the data from a regular vehicle (i.e., personal car) fleet with 11 thousand vehicles. The evaluation results show that compared to the state-of-the-art solutions, we improved sensing coverage (i.e., the number of road segments covered by sensing vehicles) by 10% on average.


Road safety is a major public health issue across the globe and over two-thirds of the road accidents occur at nighttime under low-light conditions or darkness. The state of the driver and her/his actions are the key factors impacting road safety. How can we monitor these in a cost-effective manner and in low-light conditions? RGB cameras present in smartphones perform poorly in low-lighting conditions due to lack of information captured. Hence, existing monitoring solutions rely upon specialized hardware such as infrared cameras or thermal cameras in low-light conditions, but are limited to only high-end vehicles owing to the cost of the hardware. We present InSight, a windshield-mounted smartphone-based system that can be retrofitted to the vehicle to monitor the state of the driver, specifically driver fatigue (based on frequent yawning and eye closure) and driver distraction (based on their direction of gaze). Challenges arise from designing an accurate, yet low-cost and non-intrusive system to continuously monitor the state of the driver.

In this paper, we present two novel and practical approaches for continuous driver monitoring in low-light conditions: (i) Image synthesis: enabling monitoring in low-light conditions using just the smartphone RGB camera by synthesizing a thermal image from RGB with a Generative Adversarial Network, and (ii) Near-IR LED: using a low-cost near-IR (NIR) LED attachment to the smartphone, where the NIR LED acts as a light source to illuminate the driver’s face, which is not visible to the human eyes, but can be captured by standard smartphone cameras without any specialized hardware. We show that the proposed techniques can capture the driver’s face accurately in low-lighting conditions to monitor driver’s state. Further, since NIR and thermal imagery is significantly different than RGB images, we present a systematic approach to generate labelled data, which is used to train existing computer vision models. We present an extensive evaluation of both the approaches with data collected from 15 drivers in controlled basement area and on real roads in low-light conditions. The proposed NIR LED setup has an accuracy (F1-score) of 85% and 93.8% in detecting driver fatigue and distraction, respectively in low-light.


Ultra-Wideband (UWB) is a popular technology to provide high accuracy localization, asset tracking and access control applications. Due to the accurate ranging feature and robustness to relay attacks, car manufacturers are upgrading the keyless entry infrastructure to UWB. As car occupancy monitoring is an essential step to support regulatory requirements and provide customized user experience, we build CarOSense to explore the possibility of reusing UWB keyless infrastructure as an orthogonal sensing modality to detect per-seat car occupancy. CarOSense uses a novel deep learning model, MaskMIMO, to learn spatial/time features by 2D convolutions and per-seat attentions by a multi-task mask. We collect UWB data from 10 car locations with up to 16 occupancy states in each location. We implement CarOSense as a cross-platform demo and evaluate it in 15 different scenarios, including leave-one-out test of unknown car locations and stress test of unseen scenarios. Results show that the average accuracy is 94.4% for leave-one-out test and 87.0% for stress test. CarOSense is robust in a large set of untrained scenarios with the model trained on a small set of training data. We also benchmark the computation cost and demonstrate that CarOSense is lightweight and can run smoothly in real-time on embedded devices.


Origin-destination (OD) travel time estimation is of paramount importance for applications such as intelligent transportation. In this work, we propose a new solution for OD travel time estimation, with road surveillance camera data. The surveillance information supports accurate and reliable observations at camera-equipped intersections, but is associated with missing and incomplete surveillance records at the camera-free intersections. To overcome this, we propose a modified version of multi-layer graph convolutional networks. The camera surveillance data is used to extract the traffic flow of each intersection, the extracted information serves as the input of the multi-layer GCN based model, based on which the real-time traffic status can be predicted. To enhance the estimation accuracy, we address the effects of various features for the travel time estimation with encoder-decoder networks and embedding techniques. We further improve the generalization of our model by using multi-task learning. Extensive experiments on real datasets are done to verify the effectiveness of our proposals.


Recent studies have proposed to use the Channel State Information (CSI) of WiFi wireless channel for human gesture recognition. As an important application, CSI-based driver activity recognition in passenger vehicles has received increasing research attention. However, a serious limitation of almost all the existing WiFi-based recognition solutions is that they can only recognize the activity of a single person at a time, because the activities of other people (if performed at the same time) can interfere with the WiFi signals. In a sharp contrast, there can often be one or more passengers in any vehicles.
In this paper, we propose CARIN, CSI-based driver Activity Recognition under the INterference of passengers. CARIN features a combination-based solution that profiles all the possible activity combinations of driver and (one or more) passengers in offline training and then performs recognition online. To attack possible combination explosion, we first leverage in-car pressure sensors to significantly reduce combinations, because there are only limited seating options in a passenger vehicle. We then formulate a distance minimization problem for fast runtime recognition. In addition, a period analysis methodology is designed based on the kNN classifier to recognize activities that have a sequence of body movements, like continuous head nodding due to driver fatigue. Our results in a real car with 3,000 real-world traces show that CARIN can achieve an overall F1 score of 90.9%, and outperform the three state-of-the-art solutions by 32.2%.


Our society is witnessing a rapid taxi electrification process. Compared to conventional gas taxis, a key drawback of electric taxis is their prolonged charging time, which potentially reduces drivers’ daily operation time and income. In addition, insufficient charging stations, intensive charging peaks, and heuristic-based charging station choice of drivers also significantly decrease the charging efficiency of electric taxi charging networks. To improve the charging efficiency (e.g., reduce queuing time in stations) of electric taxi charging networks, in this paper, we design a fairness-aware Pareto efficient charging recommendation system called FairCharge, which aims to minimize the total charging idle time (traveling time + queuing time) in a fleet-oriented fashion combined with fairness constraints. Different from existing works, FairCharge considers fairness as a constraint to potentially achieve long-term social benefits. In addition, our FairCharge considers not only current charging requests, but also possible charging requests of other nearby electric taxis in a near-future duration.
More importantly, we simulate and evaluate FairCharge with real-world streaming data from the Chinese city Shenzhen, including GPS data and transaction data from more than 16,400 electric taxis, coupled with the data of 117 charging stations, which constitute, to our knowledge, the largest electric taxi network in the world.
The extensive experimental results show that our fairness-aware FairCharge effectively reduces queuing time and idle time of the Shenzhen electric taxi fleet by 80.2% and 67.7%, simultaneously.


Session 5B: ‘Speech interaction + Fabrication’
Wednesday, September 16, 2020 18:00-19:30 EDT

View Session 5B Presentations

A recent topic of considerable interest in the “smart building” community involves building interactive devices using sensors and rapidly creating these objects using new fabrication methods. However, much of this work has been done at what might be called hand scale, with less attention paid to larger objects and structures (at furniture or room scales) despite the fact that we are very often literally surrounded by such objects. In this work, we present a new set of techniques for creating interactive objects at these scales. We demonstrate the fabrication of both input sensors and displays directly into cast materials — those formed from a liquid or paste which solidifies in a mold; including, for example: concrete, plaster, polymer resins, and composites.

Through our novel set of sensing and fabrication techniques, we enable human activity recognition at room-scale and across a variety of materials. Our techniques create objects that appear the same as typical passive objects, but contain internal fiber optics for both input sensing and simple displays. We use a new fabrication device to inject optical fibers into CNC milled molds. Fiber Bragg Grating optical sensors configured as very sensitive vibration sensors are embedded in these objects. These require no internal power, can be placed at multiple locations along with a single fiber, and can be interrogated from the end of the fiber. We evaluate the performance of our system by creating two full-scale application prototypes: an interactive wall, and an interactive table. With these prototypes, we demonstrate the ability of our system to sense a variety of human activities across eight different users. Our tests show that with suitable materials these sensors can detect and classify both direct interactions (such as tapping) and more subtle vibrations caused by activities such as walking across the floor nearby.


We propose Silver Tape, a simple yet novel fabrication technique to transfer inkjet-printed silver traces from paper onto versatile substrates, without time-/space- consuming processes such as screen printing or heat sintering. This allows users to quickly implement silver traces with a variety of properties by exploiting a wide range of substrates. For instance, high flexibility can be achieved with Scotch tape, high transparency with polydimethylsiloxane (PDMS), heat durability with Kapton polyimide tape, water solubility with 3M water-soluble tape, and beyond. Many of these properties are not achievable with conventional substrates that are used for inkjet-printing conductive traces. Specifically, our technique leverages the commonly undesired low adhesion property of the inkjet printing films and repurposes these films as temporary transfer media. We describe our fabrication methods with a library of materials we can utilize, evaluate the mechanical and electrical properties of the transferred traces, and conclude with several demonstrative applications. We believe Silver Tape enriches novel interactions for the ubiquitous computing domain, by enabling digital fabrication of electronics on versatile materials, surfaces, and shapes.


Rapid prototyping and fast manufacturing processes are critical drivers for implementing wearable devices. This paper shows an exemplary method for building flexible, fully elastomeric, vibrotactile electromagnetic actuators based on the Lorentz force law.
This paper also introduces the design parameters required for well-functioning actuators and studies the properties of such actuators. The crucial element of actuator is a helical planer coil manufactured from “capillary” silver TPU (Thermoplastic polyurethane), an ultra-stretchable conductor. This paper leverages the novel material to manufacture soft vibration actuators in fewer and simpler steps than previous approaches. Best practice and procedure for building a wearable actuator are reported. We show that dimension of actuators are easily configurable and can be printed in batch-size-one using 3D printing. Actuators can be attached directly to the skin as all the components of FLECTILE are made from biocompatible polymers. Tests on the driving properties have confirmed that the actuator could reach a broad scope of frequency up to 200 Hz with a small voltage (5 V) required. A user study showed that vibrations of the actuator are well perceivable by six study participants under an observing, hovering, and resting condition.


Dental braces are a semi-permanent dental treatment that are in direct contact with our metabolism (saliva), food and liquids we ingest, and our environment while smiling or talking. This paper introduces braceIO, biochemical ligatures on dental braces that change colors depending on saliva concentration levels (pH, nitric oxide and acid uric), and can be read by an external device. This work presents our fabrication process of the ligatures and external device, and the technical evaluation of the absorption time, colorimetric measurement tests and the color map to the biosensor level in the app. This project aims to maintain the shape, wearability and aesthetics of traditional ligatures but with embedded biosensors. We propose a novel device that senses metabolism changes with a different biosensor ligature worn in each tooth to access multiple biodata and create seamless interactive devices.


Over the past few years, the technological vision of the HCI and UbiComp communities regarding conversational devices has become manifest in the form of smart speakers such as Google Home and Amazon Echo. Even though millions of households have adopted and integrated these devices into their daily lives, we lack a deep understanding of how different members of a household use such devices. To this end, we conducted interviews with 18 families and collected their Google Home Activity logs to understand the usage patterns of adults and children. Our findings reveal that there are substantial differences in the ways smart speakers are used by adults and children in families over an extended period of time. We report on how parents influence children’s use and how different users perceive the devices. Finally, we discuss the implications of our findings and provide guidelines for improving the design of future smart speakers and conversational agents.


With the rapid growth of artificial intelligence and mobile computing, intelligent speech interface has recently become one of the prevalent trends and has already presented huge potentials to the public. To address the privacy leakage issue during the speech interaction or accommodate some special demands, silent speech interfaces have been proposed to enable people’s communication without vocalizing their sound (e.g., lip reading, tongue tracking). However, most existing silent speech mechanisms require either background illuminations or additional wearable devices. In this study, we propose the EchoWhisper as a novel user-friendly, smartphone-based silent speech interface. The proposed technique takes advantage of the micro-Doppler effect of the acoustic wave resulting from mouth and tongue movements and assesses the acoustic features of beamformed reflected echoes captured by the dual microphones in the smartphone. Using human subjects who perform a daily conversation task with over 45 different words, our system can achieve a WER (word error rate) of 8.33%, which shows the effectiveness of inferring silent speech content. Moreover, EchoWhisper has also demonstrated its reliability and robustness to a variety of configuration settings and environmental factors, such as smartphone orientations and distances, ambient noises, body motions, and so on.


Smart speakers, which wait for voice commands and complete tasks for users, are becoming part of common households. While voice commands came with basic functionalities in the earlier days, as the market grew, various commands with critical functionalities were developed; e.g., access banking services, send money, open front door. Such voice commands can cause serious consequences once smart speakers are attacked. Recent research shows that smart speakers are vulnerable to malicious voice commands sent from other speakers (e.g., TV, baby monitor, radio) in the same area. In this work, we propose the SPEAKER-SONAR, a sonar-based liveness detection system for smart speakers. Our approach aims to protect the smart speakers from remote attackers that leverage network-connected speakers to send malicious commands. The key idea of our approach is to make sure that the voice command is indeed coming from the user. For this purpose, the SPEAKER-SONAR emits an inaudible sound and tracks the user’s direction to compare it with the direction of the received voice command. The SPEAKER-SONAR does not require additional action from the user and works through an automatic consistency check. We built the SPEAKER-SONAR on a raspberry pi 3b, a circular microphone array, and a commodity speaker by imitating the Amazon Echo. Our evaluation shows that the SPEAKER-SONAR can reject remote voice attacks with an average accuracy of 95.5% in 2 meters, which significantly raises the bar for remote attackers. To the best of our knowledge, our defense is able to defend against known remote voice attack techniques.


Many consumers now rely on different forms of voice assistants, both stand-alone devices and those built into smartphones. Currently, these systems react to specific wake-words, such as “Alexa,” “Siri,” or “Ok Google.” However, with advancements in natural language processing, the next generation of voice assistants could instead always listen to the acoustic environment and proactively provide services and recommendations based on conversations without being explicitly invoked. We refer to such devices as “always listening voice assistants” and explore expectations around their potential use. In this paper, we report on a 178-participant survey investigating the potential services people anticipate from such a device and how they feel about sharing their data for these purposes. Our findings reveal that participants can anticipate a wide range of services pertaining to a conversation; however, most of the services are very similar to those that existing voice assistants currently provide with explicit commands. Participants are more likely to consent to share conversation when they do not find it sensitive, they are comfortable with the service and find it beneficial, and when they already own a stand-alone voice assistant. Based on our findings we discuss the privacy challenges in designing an always-listening voice assistant.


Users of voice assistants often report that they fall into patterns of using their device for a limited set of interactions, like checking the weather and setting alarms. However, it’s not clear if limited use is, in part, due to lack of learning about the device’s functionality. We recruited 10 diverse families to participate in a one-month deployment study of the Echo Dot, enabling us to investigate: 1) which features families are aware of and engage with, and 2) how families explore, discover, and learn to use the Echo Dot. Through audio recordings of families’ interactions with the device and pre- and post-deployment interviews, we find that families’ breadth of use decreases steadily over time and that families learn about functionality through trial and error, asking the Echo Dot about itself, and through outside influencers such as friends and family. Formal outside learning influencers, such as manufacturer emails, are less influential. Drawing from diffusion of innovation theory, we describe how a home-based voice interface might be positioned as a near-peer to the user, and that by describing its own functionality using just-in-time learning, the home-based voice interface becomes a trustworthy learning influencer from which users can discover new functionalities.


The popularity of conversational agents (CAs) in the form of AI speakers that support ubiquitous smart homes has increased because of their seamless interaction. However, recent studies have revealed that the use of AI speakers decreases over time, which shows that current agents do not fully support smart homes. Because of this problem, the possibility of unobtrusive, invisible intelligence without a physical device has been suggested. To explore CA design direction that enhances the user experience in smart homes, we aimed to understand each feature by comparing an invisible agent with visible ones embedded in stand-alone AI speakers. We conducted a drawing study to examine users’ mental models formed through communicating with two different physical entities (i.e., visible and invisible CAs). From the drawings, interviews, and surveys, we identified how users’ mental models and interactions differed depending on the presence of a physical entity. We found that a physical entity affected users’ perceptions, expectations, and interactions toward the agent.


Session 5C: ‘Human Activity Recognition II’
Wednesday, September 16, 2020 18:00-19:30 EDT

View Session 5C Presentations

The paper enhances deep-neural-network-based inference in sensing applications by introducing a lightweight attention mechanism called the global attention module for multi-sensor information fusion. This mechanism is capable of utilizing information collected from higher layers of the neural network to selectively amplify the influence of informative features and suppress unrelated noise at the fusion layer. We successfully integrate this mechanism into a new end-to-end learning framework, called GlobalFusion, where two global attention modules are deployed for spatial fusion and sensing modality fusion, respectively. Through an extensive evaluation on four public human activity recognition (HAR) datasets, we successfully demonstrate the effectiveness of GlobalFusion at improving information fusion quality. The new approach outperforms the state-of-the-art algorithms on all four datasets with a clear margin. We also show that the learned attention weights agree well with human intuition. We then validate the efficiency of GlobalFusion by testing its inference time and energy consumption on commodity IoT devices. Only a negligible overhead is induced by the global attention modules.


Wearable-based human-computer interaction is a promising technology to enable various applications. This paper aims to track the 3D posture of the entire limb, both wrist/ankle and elbow/knee, of a user wearing a smart device. This limb tracking technology can trace the geometric motion of the limb, without introducing any training stage usually required in gesture recognition approaches. Nonetheless, the tracked limb motion can also be used as a generic input for gesture-based applications. The 3D posture of a limb is defined by the relative positions among main joints, e.g., shoulder, elbow, and wrist for an arm. When a smartwatch is worn on the wrist of a user, its position is affected by both elbow and shoulder motions. It is challenging to infer the entire 3D posture when only given a single point of sensor data from the smartwatch. In this paper, we propose LimbMotion, an accurate and real-time limb tracking system. The performance gain of LimbMotion comes from multiple key technologies, including an accurate attitude estimator based on a novel two-step filter, fast acoustic ranging, and point clouds-based positioning. We implemented LimbMotion and evaluated its performance using extensive experiments, including different gestures, moving speeds, users, and limbs. Results show that LimbMotion achieves real-time tracking with a median error of 7.5cm to 8.9cm, which outperforms the state-of-the-art approach by about 32%.


Smart homes of the future are envisioned to have the ability to recognize many types of home activities such as running a washing machine, flushing the toilet, and using a microwave. In this paper, we present a new sensing technology, VibroSense, which is able to recognize 18 different types of activities throughout a house by observing structural vibration patterns on a wall or ceiling using a laser Doppler vibrometer. The received vibration data is processed and sent to a deep neural network which is trained to distinguish between 18 activities. We conducted a system evaluation, where we collected data of 18 home activities in 5 different houses for 2 days in each house. The results demonstrated that our system can recognize 18 home activities with an average accuracy of up to 96.6%. After re-setup of the device on the second day, the average recognition accuracy decreased to 89.4%. We also conducted follow-up experiments, where we evaluated VibroSense under various scenarios to simulate real-world conditions. These included simulating online recognition, differentiating between specific stages of a device’s activity, and testing the effects of shifting the laser’s position during re-setup. Based on these results, we discuss the opportunities and challenges of applying VibroSense in real-world applications.


The development and validation of computational models to detect daily human behaviors (e.g., eating, smoking, brushing) using wearable devices requires labeled data collected from the natural field environment, with tight time synchronization of the micro-behaviors (e.g., start/end times of hand-to-mouth gestures during a smoking puff or an eating gesture) and the associated labels. Video data is increasingly being used for such label collection. Unfortunately, wearable devices and video cameras with independent (and drifting) clocks make tight time synchronization challenging. To address this issue, we present the Window Induced Shift Estimation method for Synchronization (SyncWISE) approach. We demonstrate the feasibility and effectiveness of our method by synchronizing the timestamps of a wearable camera and wearable accelerometer from 163 videos representing 45.2 hours of data from 21 participants enrolled in a real-world smoking cessation study. Our approach shows significant improvement over the state-of-the-art, even in the presence of high data loss, achieving 91 synchronization accuracy given a synchronization tolerance of 700 milliseconds. Our method also achieves state-of-the-art synchronization performance on the CMU-MMAC dataset.


Extracting informative and meaningful temporal segments from high-dimensional wearable sensor data, smart devices, or IoT data is a vital preprocessing step in applications such as Human Activity Recognition (HAR), trajectory prediction, gesture recognition, and lifelogging. In this paper, we propose ESPRESSO (Entropy and ShaPe awaRe timE-Series SegmentatiOn), a hybrid segmentation model for multi-dimensional time-series that is formulated to exploit the entropy and temporal shape properties of time-series. ESPRESSO differs from existing methods that focus upon particular statistical or temporal properties of time-series exclusively. As part of model development, a novel temporal representation of time-series 𝑊𝐶𝐴𝐶 was introduced along with a greedy search approach that estimate segments based upon the entropy metric.ESPRESSOwasshown to offer superior performance to four state-of-the-art methods across seven public datasets of wearable and wear-free sensing. In addition, we undertake a deeper investigation of these datasets to understand how ESPRESSO and its constituent methods perform with respect to different dataset characteristics. Finally, we provide two interesting case-studies to show how applying ESPRESSO can assist in inferring daily activity routines and the emotional state of humans.


This paper presents a robust unsupervised method for recognizing factory work using sensor data from body-worn acceleration sensors.
In line-production systems, each factory worker repetitively performs a predefined work process with each process consisting of a sequence of operations.
Because of the difficulty in collecting labeled sensor data from each factory worker, unsupervised factory activity recognition has been attracting attention in the ubicomp community.
However, prior unsupervised factory activity recognition methods can be adversely affected by any outlier activities performed by the workers.
In this study, we propose a robust factory activity recognition method that tracks frequent sensor data motifs, which can correspond to particular actions performed by the workers, that appear in each iteration of the work processes.
Specifically, this study proposes tracking two types of motifs: period motifs and action motifs, during the unsupervised recognition process.
A period motif is a unique data segment that occurs only once in each work period (one iteration of an overall work process).
An action motif is a data segment that occurs several times in each work period, corresponding to an action that is performed several times in each period.
Tracking multiple period motifs enables us to roughly capture the temporal structure and duration of the work period even when outlier activities occur.
Action motifs, which are spread throughout the work period, permit us to precisely detect the start time of each operation.
We evaluated the proposed method using sensor data collected from workers in actual factories and achieved state-of-the-art performance.


The lack of large-scale, labeled data sets impedes progress in developing robust and generalized predictive models for on-body sensor-based human activity recognition (HAR). Labeled data in human activity recognition is scarce and hard to come by, as sensor data collection is expensive, and the annotation is time-consuming and error-prone. To address this problem, we introduce IMUTube, an automated processing pipeline that integrates existing computer vision and signal processing techniques to convert videos of human activity into virtual streams of IMU data. These virtual IMU streams represent accelerometry at a wide variety of locations on the human body. We show how the virtually-generated IMU data improves the performance of a variety of models on known HAR datasets. Our initial results are very promising, but the greater promise of this work lies in a collective approach by the computer vision, signal processing, and activity recognition communities to extend this work in ways that we outline. This should lead to on-body, sensor-based HAR becoming yet another success story in large-dataset breakthroughs in recognition.


This paper presents WiPolar, an approach that simultaneously tracks multiple people using commodity WiFi devices. While two recent papers have also demonstrated multi-person tracking using commodity devices, they either require the people to continuously keep moving without stopping, and/or require the number of people to be input manually, and/or keep the WiFi devices from performing their primary function of data communication. Motivated by the increasing availability of polarized antennas on modern WiFi devices, WiPolar leverages signal polarization to perform accurate multi-person tracking using commodity devices while addressing the three limitations of prior work mentioned above. The key insight that WiPolar is based on is that different people expose different instantaneous horizontal and vertical radar cross-sections to WiFi transmitters due to differences in their physiques and orientations with respect to the transmitter. This enables WiPolar to accurately separate the multipaths reflected from different people, which, in turn, allows it to track them simultaneously. To the best of our knowledge, this is the first work that leverages polarization of WiFi signals to localize and track people. We implement WiPolar using commodity WiFi devices and extensively evaluate it for tracking up to five people in three different environments. Our results show that WiPolar achieved a median tracking error of just 56cm across all experiments. It also accurately tracks people even when they were not moving. WiPolar achieved a median tracking error of 74cm for people that were either stationary or just taking a small pause.


Our ability to exploit low-cost wearable sensing modalities for critical human behaviour and activity monitoring applications in health and wellness is reliant on supervised learning regimes; here, deep learning paradigms have proven extremely successful in learning activity representations from annotated data. However, the costly work of gathering and annotating sensory activity datasets is labor intensive, time consuming and not scalable to large volumes of data. While existing unsupervised remedies of deep clustering leverage network architectures and optimization objectives that are tailored for static image datasets, deep architectures to uncover cluster structures from raw sequence data captured by on-body sensors remains largely unexplored. In this paper, we develop an unsupervised end-to-end learning strategy for the fundamental problem of human activity recognition (HAR) from wearables. Through extensive experiments, including comparisons with existing methods, we show the effectiveness of our approach to jointly learn unsupervised representations for sensory data and generate cluster assignments with strong semantic correspondence to distinct human activities.


Transfer Learning is becoming increasingly important to the Human Activity Recognition community, as it enables algorithms to reuse what has already been learned from models. It promises shortened training times and increased classification results for new datasets and activity classes. However, the question of what exactly is transferred is not dealt with in detail in many of the recent publications, and it is furthermore often difficult to reproduce the presented results. Therefore we would like to contribute with this paper to the understanding of transfer learning for sensor-based human activity recognition.
In our experiment use weight transfer to transfer models between two datasets, as well as between sensors from the same dataset. As source- and target- datasets PAMAP2 and Skoda Mini Checkpoint are used. The utilized network architecture is based on a DeepConvLSTM.
The result of our investigation shows that transfer learning has to be considered in a very differentiated way, since the desired positive effects by applying the method depend very much on the data and also on the architecture used.



Virtual Conference: September 12-17, 2020

Paper Sessions: September 14-17, 2020


Past Conferences

The ACM international joint conference on pervasive and ubiquitous computing (ubicomp) is the result of a merger of the two most renowned conferences in the field: pervasive and ubicomp. while it retains the name of the latter in recognition of the visionary work of mark weiser, its long name reflects the dual history of the new event. a complete list of both ubicomp and pervasive past conferences is provided below.

UbiComp 2019, London, England

UbiComp 2018, Singapore

UbiComp 2017, Maui, USA

UbiComp 2016, Heidelberg, Germany

UbiComp 2015, Osaka, Japan

UbiComp 2014, Seattle, USA

UbiComp 2013, Zurich, Switzerland

UbiComp 2012, Pittsburgh (PA), USA

Pervasive 2012, Newcastle, England

UbiComp 2011, Beijing, China

View All