1
Poster: Audio-Kinetic Model for Automatic Dietary Monitoring with Earable Devices Chulhong Min * , Akhil Mathur * , Fahim Kawsar ** Nokia Bell Labs, TU Delft {chulhong.min, akhil.mathur, fahim.kawsar}@nokia-bell-labs.com Figure 1: (a) Earable platform, (b) Audio-kinetic model, (c) F-1 score, (d) Spectrogram for chewing, (d) Gyroscope for drinking CCS CONCEPTS Human-centered computing Ubiquitous and mobile com- puting systems and tools; 1 INTRODUCTION Monitoring dietary intake has a profound implication on healthcare and well-being services in everyday lives. Over the last decade, ex- tensive research efforts have been made to enable automatic dietary monitoring by leveraging multi-modal sensory data on wearable devices [1, 2]. In this poster, we explore an audio-kinetic model of well-formed multi-sensory earable devices for dietary monitoring. We envision that earable devices are ideal for dietary monitoring by virtue of their placement. They are worn close to a user’s mouth, jaw, and throat which make them capable of capturing any acoustic events originating in these body parts. Inertial sensors can potentially cap- ture movements of the head and jaw that are often associated with food intake. As such, fusing the inertial and acoustic data carries a potential for accurate detection of food intake-relevant activities. We showcase two primitive activities with our audio-kinetic model, chewing and drinking. These primitives are simple but pro- vide useful contextual cues. For example, analysing the food intake behaviour from chewing can provide insights into the development of obesity and eating disorders. Similarly, tracking drinking events is useful for estimating a user’s water intake over a period of time. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). MobiSys’18, June 10–15, 2018, Munich, Germany © 2018 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-5720-3. https://doi.org/http://doi.org/10.1145/3210240.3210810 2 PRELIMINARY STUDY Device: We developed a customised earbud platform with a dimen- sion of 18 mm × 18 mm × 20 mm and instrumented with a microphone, a 6-axis inertial measurement unit, Bluetooth, and BLE and powered by a CSR processor and a 40mAH battery (See Figure 1(a)). Audio-Kinetic Model: Figure 1(b) shows our model, consisting of the inertial and audio pipelines. We extract time-domain and frequency-domain features from inertial data, and MFCC features from audio data. Both the features are concatenated and used as a combined feature set with three off-the-shelf classifiers: SVM with RBF kernel, Random Forests, and Naive Bayes. Results: We recruited 8 users for data collection. They wore a pair of earbuds and were asked to eat and chew solid foods for 4 minutes. Next, they performed 10 ’drink’ actions by drinking water from a bottle. They took as much break between two consecutive chewing or drinking activities. Figure 1(c) shows the performance of the best- performing Random Forest. The results show that our audio-kinetic model achieves higher accuracies by fusing two sensing channel and providing richer information. Each feature set carries useful cues for different activity; the acoustic features carry more importance for chewing and the inertial features are more suited for drinking. However, their combined use enables to cover corner cases. We present interesting cases. Figure 1(d) shows the audio spectro- gram of the four chewing events. The intensity of the audio signal gradually decreases as the chewing event progresses. This corre- sponds to the normal chewing habits which start from crushing sounds with high energy content and end with lower amplitude sounds when the food is fully crushed. Figure 1(e) shows the gyro- scope data captured while the drinking actions. The head movement while drinking induces a change in pitch and roll axis. REFERENCES [1] Merck et al. Multimodality Sensing for Eating Recognition. In PervasiveHealth 2016 [2] Bedri et al. EarBit: using Wearable Sensors to Detect Eating Episodes in Uncon- strained Environments. In ACM IMWUT 2017

Poster: Audio-Kinetic Model for Automatic ... - Akhil Mathur · Chulhong Min*, Akhil Mathur*, Fahim Kawsar* ... Similarly, tracking drinking events is useful for estimating a user’s

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • Poster: Audio-Kinetic Model for Automatic Dietary Monitoringwith Earable Devices

    Chulhong Min*, Akhil Mathur*, Fahim Kawsar*†

    *Nokia Bell Labs, †TU Delft{chulhong.min, akhil.mathur, fahim.kawsar}@nokia-bell-labs.com

    Figure 1: (a) Earable platform, (b) Audio-kinetic model, (c) F-1 score, (d) Spectrogram for chewing, (d) Gyroscope for drinking

    CCS CONCEPTS• Human-centered computing → Ubiquitous and mobile com-puting systems and tools;

    1 INTRODUCTIONMonitoring dietary intake has a profound implication on healthcareand well-being services in everyday lives. Over the last decade, ex-tensive research efforts have been made to enable automatic dietarymonitoring by leveraging multi-modal sensory data on wearabledevices [1, 2].

    In this poster, we explore an audio-kinetic model of well-formedmulti-sensory earable devices for dietary monitoring. We envisionthat earable devices are ideal for dietary monitoring by virtue oftheir placement. They are worn close to a user’s mouth, jaw, andthroat which make them capable of capturing any acoustic eventsoriginating in these body parts. Inertial sensors can potentially cap-ture movements of the head and jaw that are often associated withfood intake. As such, fusing the inertial and acoustic data carries apotential for accurate detection of food intake-relevant activities.

    We showcase two primitive activities with our audio-kineticmodel, chewing and drinking. These primitives are simple but pro-vide useful contextual cues. For example, analysing the food intakebehaviour from chewing can provide insights into the developmentof obesity and eating disorders. Similarly, tracking drinking eventsis useful for estimating a user’s water intake over a period of time.

    Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for third-party components of this work must be honored.For all other uses, contact the owner/author(s).MobiSys’18, June 10–15, 2018, Munich, Germany© 2018 Copyright held by the owner/author(s).ACM ISBN 978-1-4503-5720-3.https://doi.org/http://doi.org/10.1145/3210240.3210810

    2 PRELIMINARY STUDYDevice: We developed a customised earbud platform with a dimen-sion of 18mm× 18mm× 20mm and instrumented with a microphone,a 6-axis inertial measurement unit, Bluetooth, and BLE and poweredby a CSR processor and a 40mAH battery (See Figure 1(a)).

    Audio-Kinetic Model: Figure 1(b) shows our model, consistingof the inertial and audio pipelines. We extract time-domain andfrequency-domain features from inertial data, and MFCC featuresfrom audio data. Both the features are concatenated and used as acombined feature set with three off-the-shelf classifiers: SVM withRBF kernel, Random Forests, and Naive Bayes.

    Results: We recruited 8 users for data collection. They wore a pairof earbuds and were asked to eat and chew solid foods for 4 minutes.Next, they performed 10 ’drink’ actions by drinking water from abottle. They took as much break between two consecutive chewingor drinking activities. Figure 1(c) shows the performance of the best-performing Random Forest. The results show that our audio-kineticmodel achieves higher accuracies by fusing two sensing channel andproviding richer information. Each feature set carries useful cuesfor different activity; the acoustic features carry more importancefor chewing and the inertial features are more suited for drinking.However, their combined use enables to cover corner cases.

    We present interesting cases. Figure 1(d) shows the audio spectro-gram of the four chewing events. The intensity of the audio signalgradually decreases as the chewing event progresses. This corre-sponds to the normal chewing habits which start from crushingsounds with high energy content and end with lower amplitudesounds when the food is fully crushed. Figure 1(e) shows the gyro-scope data captured while the drinking actions. The head movementwhile drinking induces a change in pitch and roll axis.

    REFERENCES[1] Merck et al. Multimodality Sensing for Eating Recognition. In PervasiveHealth

    2016[2] Bedri et al. EarBit: using Wearable Sensors to Detect Eating Episodes in Uncon-

    strained Environments. In ACM IMWUT 2017

    https://doi.org/http://doi.org/10.1145/3210240.3210810

    1 Introduction2 Preliminary StudyReferences