ISWC ’23: Proceedings of the 2023 International Symposium on Wearable Computers


Full Citation in the ACM Digital Library

SESSION: Notes

Input Interface with Touch and Non-touch Interactions using Atmospheric Pressure for Hearable Devices

  • Koki Iguma
  • Kazuya Murao
  • Hiroki Watanabe

A hearable device is a wearable computer that is worn on the ear. In addition to offering conventional music listening functions when utilized as earphones, a hearable device can be linked to smartphones and various onboard sensors to recognize user actions and voice assistants. Although some devices recognize command operations when the earpiece is touched with a hand directly, there are limitations related to the shape of the earpiece and the problem of false recognition that occurs when the earpiece is touched unintentionally. Hands-free input methods utilizing voice assistants and acceleration sensors that measure head movement are available, but these run into problems such as low command recognition accuracy due to noise in public spaces and low social acceptability. In this study, we implement a device that measures the atmospheric pressure in the ear canal and around the ear by installing an atmospheric pressure sensor inside canal-type earphones. We propose a method that recognizes 12 types of gesture based on the pattern of pressure change caused by pressing and releasing the earphone with a finger (touch interaction) and by covering the auricle with a hand and putting pressure on it (non-touch interaction). In our method, six types of gesture are performed with two different interaction methods. We evaluated the recognition accuracy of each gesture by having five participants perform each gesture 50 times. Our findings showed that the “quick press and quick release” gestures were recognized with 0.99 accuracy for touch and 0.82 for non-touch.

C-Auth: Exploring the Feasibility of Using Egocentric View of Face Contour for User Authentication on Glasses

  • Hyunchul Lim
  • Guilin Hu
  • Richard Jin
  • Hao Chen
  • Ryan Mao
  • Ruidong Zhang
  • Cheng Zhang

C-Auth is a novel authentication method for smart glasses that explores the feasibility of authenticating users using the facial contour lines from the nose and cheeks captured by a down-facing camera in the middle of the glasses. To evaluate the system, we conducted a user study with 20 participants in three sessions on different days. Our system correctly authenticates the target participant versus the other 19 participants (attackers) with a true positive rate of 98.0% (SD: 2.96%) and a false positive rate of 4.97% (2.88 %) across all three days. We conclude by discussing current limitations, challenges, and potential future applications for C-Auth.

“\”Hello I am here\”: Proximal Nonverbal Cues Role in Initiating Social Interactions in VR”

  • Amal Yassien
  • Slim Abdennadher

Virtual Reality (VR) has revolutionized social interactions, but limited field of view (FoV) remains a significant obstacle. Users often fail to notice others within the virtual environment, hindering social engagement. To facilitate initiating social interactions, we developed a novel social signaling technique that utilizes proximal nonverbal cues to indicate users’ location, name, and interests within a social distance. In a 2 × 2 mixed user study, we found that this technique greatly enhanced social presence and interaction quality among users with prior social ties. Our signaling technique has tremendous potential to facilitate social interactions across various social virtual events, such as staff meetings and reunions.

User Authentication Method for Wearable Ring Devices using Active Acoustic Sensing

  • Shunsuke Iwakiri
  • Kazuya Murao

The ring-type devices currently on the market are not typically equipped with authentication. This paper proposes an automatic and continuous user authentication method using active acoustic sensing with a speaker and microphone mounted on the ring device. The proposed method authenticates the wearer by extracting the frequency response of the acoustic signal acquired by the microphone and calculating the similarity between the frequency response of the current wearer and that of a pre-registered individual. It takes advantage of the fact that the ring device is in constant contact with the finger and that the shape and composition of each user’s finger have unique acoustic characteristics. The ring device was created by fixing a flexible piezoelectric element to a 3D-printed ring and was evaluated using seven participants in two states: a relaxing hand and a gripping hand. The average EER was 0.034 for the relaxing hand and 0.027 for the gripping hand.

EchoNose: Sensing Mouth, Breathing and Tongue Gestures inside Oral Cavity using a Non-contact Nose Interface

  • Rujia Sun
  • Xiaohe Zhou
  • Benjamin Steeper
  • Ruidong Zhang
  • Sicheng Yin
  • Ke Li
  • Shengzhang Wu
  • Sam Tilsen
  • Francois Guimbretiere
  • Cheng Zhang

Sensing movements and gestures inside the oral cavity has been a long-standing challenge for the wearable research community. This paper introduces EchoNose, a novel nose interface that explores a unique sensing approach to recognize gestures related to mouth, breathing, and tongue by analyzing the acoustic signal reflections inside the nasal and oral cavities. The interface incorporates a speaker and a microphone placed at the nostrils, emitting inaudible acoustic signals and capturing the corresponding reflections. These received signals were processed using a customized data processing and machine learning pipeline, enabling the distinction of 16 gestures involving speech, tongue, and breathing. A user study with 10 participants demonstrates that EchoNose achieves an average accuracy of 93.7% in recognizing these 16 gestures. Based on these promising results, we discuss the potential opportunities and challenges associated with applying this innovative nose interface in various future applications.

Selecting the Motion Ground Truth for Loose-fitting Wearables: Benchmarking Optical MoCap Methods

  • Lala Shakti Swarup Ray
  • Bo Zhou
  • Sungho Suh
  • Paul Lukowicz

To aid smart wearable researchers in selecting optimal ground truth methods for motion capture (MoCap) across all loose garment types, we introduce a benchmark: DrapeMoCapBench (DMCB). This benchmark is tailored to assess optical marker-based and marker-less MoCap performance. While high-cost marker-based systems are recognized as precise standards, they demand skin-tight markers on bony areas for accuracy, which is problematic with loose garments. Conversely, marker-less MoCap methods driven by computer vision models have evolved, requiring only smartphone cameras and being cost-effective. DMCB employs real-world MoCap datasets, conducting 3D physics simulations with diverse variables: six drape levels, three motion intensities, and six body type-gender combinations. This benchmarks advanced marker-based and marker-less MoCap techniques, identifying the superior approach for distinct scenarios. When evaluating casual loose garments, both methods exhibit notable performance degradation (>10cm). However, for everyday activities involving basic and swift motions, marker-less MoCap slightly surpasses marker-based alternatives. This renders it an advantageous and economical choice for wearable studies.

Challenges in Using Skin Conductance Responses for Assessments of Information Worker Productivity

  • Anam Ahmad
  • Thomas Ploetz

Breaks as discretionary self-interruptions can have beneficial effects on information worker productivity and well-being. This has design implications for potential productivity tools that can assess opportune moments to suggest these breaks. Electrodermal Activity (EDA) is a good psychophysiological metric to capture changes in autonomic activity resulting from affective states that necessitate breaks. Wrist-worn sensing platforms have been heralded as effective means for EDA-based affective state assessments in real-life scenarios. However, our study finds no correlation even in a controlled setting with a constrained operational definition of productivity and well-researched EDA measurement and processing techniques. We reflect on our rationale against prior success reported in laboratory and ambulatory assessments of EDA.

Generating Virtual On-body Accelerometer Data from Virtual Textual Descriptions for Human Activity Recognition

  • Zikang Leng
  • Hyeokhyen Kwon
  • Thomas Ploetz

The development of robust, generalized models for human activity recognition (HAR) has been hindered by the scarcity of large-scale, labeled data sets. Recent work has shown that virtual IMU data extracted from videos using computer vision techniques can lead to substantial performance improvements when training HAR models combined with small portions of real IMU data. Inspired by recent advances in motion synthesis from textual descriptions and connecting Large Language Models (LLMs) to various AI models, we introduce an automated pipeline that first uses ChatGPT to generate diverse textual descriptions of activities. These textual descriptions are then used to generate 3D human motion sequences via a motion synthesis model, T2M-GPT, and later converted to streams of virtual IMU data. We benchmarked our approach on three HAR datasets (RealWorld, PAMAP2, and USC-HAD) and demonstrate that the use of virtual IMU training data generated using our new approach leads to significantly improved HAR model performance compared to only using real IMU data. Our approach contributes to the growing field of cross-modality transfer methods and illustrate how HAR models can be improved through the generation of virtual training data that do not require any manual effort.

Acoustic+Pose: Adding Input Modality to Smartphones with Near-Surface Hand-Pose Recognition using Acoustic Surface

  • Kunihiro Kato
  • Kaori Ikematsu

To achieve mid-air interactions for smartphones, acoustic-sensing, a technique using the built-in speaker and microphone of smartphones, is promising. However, detecting hand poses on the near-surface of touchscreens remains challenging due to the arrangement of the built-in speaker and microphone. To address this, we present Acoustic+Pose, a novel approach for combining conventional touch interactions with near-surface hand-pose estimation to enable a wide range of interactions. We focused on smartphones incorporating Acoustic Surface, a technology that vibrates the entire smartphone screen to emit sound over a wide area. We used this technology to extend the input space to the near surface of touchscreens. We trained machine-learning models to recognize hand poses in the near-surface area and demonstrated interaction techniques to use the recognized poses for a new modality of smartphone input. Through an evaluation, we confirmed that the trained models recognized 10 hand poses with 90.2% accuracy.

Towards a Haptic Taxonomy of Emotions: Exploring Vibrotactile Stimulation in the Dorsal Region

  • Steeven Villa
  • Thuy Duong Nguyen
  • Benjamin Tag
  • Tonja-Katrin Machulla
  • Albrecht Schmidt
  • Jasmin Niess

The implicit communication of emotional states between persons is a key use case for novel assistive and augmentation technologies. It can serve to expand individuals’ perceptual capabilities and assist neurodivergent individuals. Notably, vibrotactile rendering is a promising method for delivering emotional information with minimal interference with visual or auditory perception. To date, the subjective individual association between vibrotactile properties and emotional states remains unclear. Previous approaches relied on analogies or arbitrary variations, limiting generalization. To address this, we conducted a study with 40 participants, analyzing associations between attributes of self-generated vibrotactile patterns (amplitude, frequency, spatial location of stimulation) and four emotional states (Anger, Happiness, Neutral, Sadness). We fin a preference for symmetrically arranged patterns, as well as distinct amplitude and frequency profiles for different emotions.

On the Utility of Virtual On-body Acceleration Data for Fine-grained Human Activity Recognition

  • Zikang Leng
  • Yash Jain
  • Hyeokhyen Kwon
  • Thomas Ploetz

Previous work has demonstrated that virtual accelerometry data, extracted from videos using cross-modality transfer approaches like IMUTube, is beneficial for training complex and effective human activity recognition (HAR) models. Systems like IMUTube were originally designed to cover activities that are based on substantial body (part) movements. Yet, life is complex, and a range of activities of daily living is based on only rather subtle movements, which bears the question to what extent systems like IMUTube are of value also for fine-grained HAR? In this work we first introduce a measure to quantitatively assess the subtlety of human movements that are underlying activities of interest–the motion subtlety index (MSI)–which captures local pixel movements and pose changes in the vicinity of target virtual sensor locations, and correlate it to the eventual activity recognition accuracy. We explore for which activities with underlying subtle movements a cross-modality transfer approach works, and for which not. As such, the work presented in this paper allows us to map out the landscape for IMUTube-like system applications in practical scenarios.

HPSpeech: Silent Speech Interface for Commodity Headphones

  • Ruidong Zhang
  • Hao Chen
  • Devansh Agarwal
  • Richard Jin
  • Ke Li
  • François Guimbretière
  • Cheng Zhang

We present HPSpeech, a silent speech interface for commodity headphones. HPSpeech utilizes the existing speakers of the headphones to emit inaudible acoustic signals. The movements of the temporomandibular joint (TMJ) during speech modify the reflection pattern of these signals, which are captured by a microphone positioned inside the headphones. To evaluate the performance of HPSpeech, we tested it on two headphones with a total of 18 participants. The results demonstrated that HPSpeech successfully recognized 8 popular silent speech commands for controlling the music player with an accuracy over 90%. While our tests use modified commodity hardware (both with and without active noise cancellation), our results show that sensing the movement of the TMJ could be as simple as a firmware update for ANC headsets which already include a microphone inside the hear cup. This leaves us to believe that this technique has great potential for rapid deployment in the near future. We further discuss the challenges that need to be addressed before deploying HPSpeech at scale.

How Much Unlabeled Data is Really Needed for Effective Self-Supervised Human Activity Recognition?

  • Sourish Gunesh Dhekane
  • Harish Haresamudram
  • Megha Thukral
  • Thomas Plötz

The prospect of learning effective representations from unlabeled data alone has led to a boost in developing self-supervised learning (SSL) methods for sensor-based Human Activity Recognition (HAR). Typically, (large-scale) unlabeled data are used for pre-training, with the learned weights being used as feature extractors for recognizing activities. While prior works have focused on the impact of increased data scale on performance, instead, we aim to discover the pre-training data efficiency of self-supervised methods. We empirically determine the minimal quantities of unlabeled data required for obtaining comparable performance to using all available data. We investigate three established SSL methods for HAR on three target datasets. Out of these three methods, we discover that Contrastive Predictive Coding (CPC) is the most efficient in terms of pre-training data requirements: just 15 minutes of sensor data across participants is sufficient to obtain competitive activity recognition performance. Further, around 5 minutes of source data is enough when there are sufficient amounts of target application data available. These findings can serve as starting point for more efficient data collection practices.

WeightMorphy: A Dynamic Weight-Shifting Method to Enhance the Virtual Experience with Body Deformation

  • Naoki Okamoto
  • Masaharu Hirose
  • Sohei Wakisaka
  • Hiroto Saito
  • Atsushi Izumihara
  • Masahiko Inami

We propose WeightMorphy, a hand-mounted system designed to improve teleoperation manipulation’s operability and immersive experience by changing the moment of inertia. This system reduces the discrepancy between the shape of the virtual hand and its corresponding moment of inertia, enabling instantaneous control by the user while maintaining accuracy. We have provided a detailed description of the design and concept of our system and conducted experiments to examine the effect of shifting the center of gravity on the operability of the deformable virtual hand using WeightMorphy.

Modeling and Evaluation of Soft Force Sensors using Recurrent and Feed-Forward Neural Networks and Exponential Methods to Compensate for Force Measurement Error in Curved Conditions

  • Alireza Golgouneh
  • Heidi Woelfle
  • Brad Holschuh
  • Lucy E. Dunne

A wide variety of applications of wearable technology require information about forces and pressures exerted on the body, either by a device (e.g. to sense active wearing periods or to provide feedback to a force-sensitive therapy like compression) or by other objects or body parts (e.g. for bedsore prevention or force-based gait monitoring). However, typical force sensing mechanisms are often difficult to translate to the wearable environment because the geometry and mechanics of body tissues introduce error into the sensor response. Previous studies have shown that soft force sensors are significantly affected by deformation leading to erroneous force measurement, but no effort has been made yet to rectify force data estimated by soft textile-based sensors under deformed conditions. In this study, we model the responses of three low-cost textile-based sensors and one off-the-shelf force-sensitive resistor using an Exponential model, a Recurrent Neural Network (RNN) and a Multi-Layer Perceptron (MLP) Network. Results show that RNN outperforms in modelling hysteresis with an RMSE of 4.2%. Further, we evaluate sensor performance in human-like curvatures, and refine the models by fusing the degree of curvature. Results show that the refined models can reduce measurement error from 46% to 6.8% and from 26.33 to 1.06% in some cases.

Evaluating Spiking Neural Network on Neuromorphic Platform For Human Activity Recognition

  • Sizhen Bian
  • Michele Magno

Energy efficiency and low latency are crucial requirements for designing wearable AI-empowered human activity recognition systems, due to the hard constraints of battery operations and closed-loop feedback. While neural network models have been extensively compressed to match the stringent edge requirements, spiking neural networks and event-based sensing are recently emerging as promising solutions to further improve performance due to their inherent energy efficiency and capacity to process spatiotemporal data in very low latency. This work aims to evaluate the effectiveness of spiking neural networks on neuromorphic processors in human activity recognition for wearable applications. The case of workout recognition with wrist-worn wearable motion sensors is used as a study. A multi-threshold delta modulation approach is utilized for encoding the input sensor data into spike trains to move the pipeline into the event-based approach. The spikes trains are then fed to a spiking neural network with direct-event training, and the trained model is deployed on the research neuromorphic platform from Intel, Loihi, to evaluate energy and latency efficiency. Test results show that the spike-based workouts recognition system can achieve a comparable accuracy (87.5%) comparable to the popular milliwatt RISC-V bases multi-core processor GAP8 with a traditional neural network (88.1%) while achieving two times better energy-delay product (0.66 vs. 1.32 ).

Estimating Attention Allocation by Electrodermal Activity

  • Arinobu Niijima

Electrodermal activity (EDA) represents changes in the electrical activity of the palmar skin and serves as an indicator of sympathetic nervous system activity. This paper presents a novel method for estimating attention allocation under divided attention conditions using only EDA data. Our approach involves the use of the low-frequency power spectrum derived from the phasic component of EDA associated with attentional focus, combined with a machine learning classification model. We conducted three user studies aimed at estimating participants’ attention allocation during the performance of simple tasks under both visual and auditory stimuli where the frequencies of the stimuli were different, identical, or ambiguous. The goal was to estimate whether participants focused on visual or auditory stimuli. The results showed that our method could estimate attention allocation with the accuracy of 96% and 73% when the frequencies of the two stimuli were different and ambiguous, respectively, and could not estimate when the frequencies were identical.

TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors

  • Riku Kitamura
  • Takumi Yamamoto
  • Yuta Sugiura

Fingertip input allows for interactions that are natural, easy to perform, and socially acceptable. It also has advantages in terms of low physical demand, confidentiality, and haptic feedback. In this study, we propose TouchLog, a fingernail-type device that uses skin deformation of the fingertip to identify finger micro gestures written with the thumb on the index finger. TouchLog is attached to the index fingernail and allows for one-handed fingertip input without compromising the haptic feedback on the finger. To evaluate the accuracy of 11 types of finger micro gesture recognition, we conducted a user study (N = 10) and obtained an average identification accuracy of 91.5% (SD = 3.1%). A continuous input method using skin deformation and contact pressure was also examined, and its usefulness as a wearable device was discussed.

Exploring Recognition Accuracy of Vibrotactile Stimuli in Sternoclavicular Area

  • Mikołaj P. Woźniak
  • Anna Walczak
  • Adam Jan Sałata
  • Magdalena Wróbel-Lachowska
  • Krzysztof Grudzień
  • Heiko Müller
  • Susanne Boll
  • Andrzej Romanowski

Growing popularity of wearable haptic devices encouraged researchers to implement on-body interfaces that appropriate different form factors and interaction techniques. Among vibrotactile wearable interfaces, neck-worn devices gathered limited attention in HCI. While the “necklace area” offers wide opportunities for subtle haptic interaction, we lack knowledge of its tactile acuity to design interactive systems effectively. In this work, we present a prototype of HaptiNecklace – a vibrotactile necklace designed to study tactile acuity of sternoclavicular area. In the experimental study with N=19 participants, we compared recognition accuracy and cognitive load between different numbers of vibrotactile motors attached to the prototype in two scenarios – static and mobile. The results show that directional patterns ensure better recognition than single-point vibrations in both mobile and static context. Moreover, introducing mobile scenario does not influence recognition accuracy but highly increases cognitive load. In this work, we provide practical hints to designing vibrotactile necklaces.

Learning Effects and Retention of Electrical Muscle Stimulation in Piano Playing

  • Arinobu Niijima

Electrical muscle stimulation (EMS)-based systems have been proposed to assist in the learning of motor skills for piano playing. However, learning effects and retention have not been thoroughly evaluated. To address this, we conducted two user studies to investigate the learning effects and retention of EMS for piano playing. Twenty-four novice participants practiced the technique of tremolo, a rapid change between two notes, with both hands under three conditions: without EMS, with EMS on one hand, and with EMS on both hands. The results showed that practicing with EMS on both hands significantly improved tempo accuracy compared to practicing without EMS. A follow-up study of 15 participants confirmed that the improved performance achieved with EMS on both hands was maintained after one week and was not significantly different from practicing without EMS.

Robo-Coverstitch: Coverstitched Auxetic Shape Memory Actuators For Soft Wearable Devices

  • Robert Pettys-Baker
  • Heidi Woelfle
  • Brad Holschuh

As wearable technologies proliferate there is a growing need for compact, easy-to-produce textile-based mechanical actuators. This paper presents a novel coverstitched auxetic actuator using shape memory alloys (SMAs) – which we call the Robo-coverstitch – that can be easily sewn onto fabrics using an industrial coverstitch machine. Multiple Robo-coverstitch samples were manufactured and tested to characterize their force vs. strain properties. The samples were actuated in both the length- and width-wise direction between 0-15% strain at constant electrical current (0.4A). The actuators produced increased force in both length- and width-wise directions when actuated (i.e., auxetic actuation behavior), and these active forces increased with actuator strain. The Robo-coverstitch offers an unobtrusive mechanical actuation solution that can be readily deployed into commercial garments.

Going Blank Comfortably: Positioning Monocular Head-Worn Displays When They are Inactive

  • Yukun Song
  • Parth Arora
  • Rajandeep Singh
  • Srikanth T. Varadharajan
  • Malcolm Haynes
  • Thad Starner

Head-worn displays like Tooz and North’s Focals are designed to be worn all day as smart eyeglasses. When the display is not lit (often to save battery life), the optical combiners may remain visible to the user as an out-of-focus seam or discoloration in the lens. We emulate seven shapes and positions of optical combiners which 30 participants rank for comfort. Based on these results, we run a second user study with 12 participants comparing the comfort of a combiner with various offset distances from the user’s primary position of gaze (PPOG) towards the nose. Results suggest that a combiner’s nearest edge should be more than 15° from the PPOG.

User Authentication Method for Hearables Using Sound Leakage Signals

  • Takashi Amesaka
  • Hiroki Watanabe
  • Masanori Sugimoto
  • Yuta Sugiura
  • Buntarou Shizuki

We propose a novel biometric authentication method that leverages sound leakage signals from hearables that are captured by an external microphone. A sweep signal is played from hearables, and sound leakage is recorded using an external microphone. This sound leakage signal represents the acoustic characteristics of the ear canal, auricle, or hand. Then, our system analyzes the echoes and authenticates the user. The proposed method is highly adaptable to hearables because it leverages widely available sensors, such as speakers and external microphones. In addition, the proposed method has the potential to be used in combination with existing methods. In this study, we investigate the characteristics of sound leakage signals using an experimental model and measure the authentication performance of our method using acoustic data from 16 people. The results show that the balanced accuracy (BAC) scores were in the range of 87.0%–96.7% in several scenarios.

Wireless Sensor Collar for Automatic Recognition of Canine Agility Activities

  • Charles Ramey
  • Arianna Mastali
  • Adithya Muralikrishna
  • Thad Starner
  • Melody Moore Jackson

Canine agility is a rapidly growing sport where dogs and their handlers navigate obstacle courses. Recent studies show that over 40% of all agility dogs suffer an injury while training or competing. By collecting and analyzing sensor measurements from a wearable computer while dogs participate in the sport, we hope to better inform dog handlers and trainers and improve the performance and overall health of their dogs. As a first step towards this long-term project goal, we present the initial validation of a bespoke collar-worn activity tracker and machine learning classifier for recognizing agility activities. The ability to classify agility activities from collar-worn sensors will provide the groundwork for further analysis of relative activity exertion levels, activity variance with repetitions, and gait regularity. To validate our system, we conducted a pilot study of six dogs performing a short agility course including three different agility obstacles. Our wireless sensor collar was able to provide data in real-time via WiFi while dogs navigated the obstacles. Our MINIROCKET-based machine learning classifier achieved 85% accuracy across the pilot study data.

SESSION: Briefs

Detecting Thumb-Posture for One-handed Interactions with Smartphone using Acoustic Sensing

  • Kunihiro Kato
  • Kaori Ikematsu

This paper presents a novel approach for expanding one-handed interactions using the thumb positioned above the smartphone screen. Our approach is based on acoustic sensing, a technique for leveraging the built-in speaker and microphone of the smartphone without requiring additional sensors or attachments. We explored the feasibility of our approach on smartphones with the conventional speaker and microphone arrangement and investigated the enhancement of recognition accuracy by using smartphones equipped with Acoustic Surface, which is a technology enabling the entire screen to vibrate and emit sound over a wider area and installed in several commercial smartphones such as LG G8 ThinQ and Huawei P30 Pro. We focused on classifying 12 different thumb postures and developed models that achieve prediction accuracies of 78.6% (conventional smartphone) and 87.0% (Acoustic Surface).

UltrasonicWhisper: Ultrasound Can Generate Audible Sound in Your Hearable

  • Hiroki Watanabe
  • Tsutomu Terada

Recent studies have shown that ultrasound can be used for voice input to microphones such as smart speakers by taking advantage of the nonlinearity of the microphones. A similar attack on the hearing of a user wearing a hearable with an outside microphone is also possible. Specifically, information modulated by ultrasound from an attacker is demodulated into audible sound inside the hearable, and audio information can be presented to the wearer via its inner loudspeaker. This process could result in the presentation of false information disguised as instructions from the hearable and possible interference with the user’s hearing. In light of those issues, this study experimentally evaluated the possibility of ultrasonic attacks on hearables. Evaluation results confirmed that mean Mel-cepstral distortion (MCD) and mean opinion score (MOS) of the demodulated sound were 7.90 and 2.53, respectively. We also confirmed that The participants followed 14.9% of the false instructions presented by ultrasound even when they were alerted to the ultrasonic attack.