Introduction

Motivation

One of the leading causes of death in America is cardiovascular disease; although there are genetic factors, heart disease is strongly linked to both obesity and lack of physical activity [1]. Beyond prevention of both obesity and heart disease, having an accurate depiction of one’s physical activities can be pivotal for managing the symptoms of various diseases as well as maintaining a healthy lifestyle [2]; for people who have type 1 diabetes, increasing their physical activity (to a specific point) helped them to manage symptoms and maintain blood sugar levels. Increasing or maintaining one’s physical activities is reliant on an individual fully understanding their activities, mostly through some sort of tracking or journaling. Although tracking physical activities (both quantity and quality) can be beneficial for many people, closely monitoring it by hand can be incredibly time consuming and difficult. Prior research has shown that it is possible to classify different physical activities from each other using machine learning (ML). Some of these algorithms (random forests or SVMs) rely on engineered feature vectors while others (recurrent or convolutional neural networks) use raw data from sensors (e.g., accelerometer data). Although there is still much work to be done in this area, these algorithms have been shown to be quite effective at differentiating activities and their classification accuracies for more activities are increasing yearly. Instead of using manual tracking for maintaining a log of an individual’s activities, it may be possible to utilize these ML algorithms to aid in the process of identifying and tracking an individual’s activities.

Human activity recognition (HAR) as well as using technology to facilitate behavior interventions are currently active research areas within the personal health informatics (PHI), human-computer interaction (HCI) and ML communities. Many researchers are dedicating their time to create the best algorithm for HAR, but there are a variety of problems in this domain. Firstly, researchers are oftentimes unable to consistently compare the classification accuracies of algorithms designed for activity recognition. Because many datasets that can be used for HAR are not public, researchers are working across a variety of datasets. These datasets are comprised of different activities, made up of sensor data sampled at different rates and from different locations, and obtained from different scenarios (e.g., free-living data versus exercise lab data). With all these variables, there is no way to compare the accuracy of different models. Yet another problem that exists in this domain, is how—or if—these algorithms can function in real time. Most of the research in this area focuses on models both training and evaluating models on pre-collected data. These models are trained and evaluated on clusters at the researcher’s institutions which affords these researchers the luxury of not considering the computational power required for training these models or the space these models require for training and prediction. Many of these models have not been (and will never be) implemented in a device that allows a user to interact with the model’s prediction or use these predictions for tracking their activities. Real-time systems, although potentially incredibly beneficial, are few and far between; there is a dearth of research in this area.

Approach

This paper proposes one method to implement and deploy an accurate, efficient real-time human activity recognition system on a mobile device. This system should continuously acquire—from a sensor within a mobile device (perhaps a gyroscope or accelerometer)—data about the user’s movements. As this data is collected, this system should classify the user’s activities in real-time and somehow notify the user of these predictions so they can correct mistakes. For each instance of the activities this system misclassifies, the underlying ML model should retrain with the corrected label. Overall, this system is responsible for predicting a user’s activities as they occur and correcting classification mistakes to improve classification performance for this user.

When we design this system, we want first and foremost to ensure that it is fast, so any component must not interfere with the user experience and be completed as quickly as possible. However, to meet the requirements of our system, we must address three questions. Firstly, how must we alter a physical activity recognition model to ensure it is efficient, lightweight (regarding size), and accurate? Secondly, how do we ensure the user’s data, that the model must use, remains private? Further, how do we maintain the security of the model (i.e., how do we ensure the user cannot access the model itself)? Lastly, does the hardware of a mobile device need to change to quickly predict (or re-train) such a model? Addressing these questions is pivotal to implementing and deploying a real-time HAR system in a mobile device. In this survey we discuss one possible solution to each of these questions and explore how they can be pieced together to develop a real-time physical activity recognition system. This system can (hopefully) be used to aid in the tracking of people’s activities and in the future, for influencing and maintaining real-time behavior change interventions.

Throughout this paper we investigate three systems publications. Each publication answers precisely one of the questions we have defined as necessary to designing a real-time physical activity recognition system. When addressing each question, we will also consider the time of using each component in this system. We first explore how we can minimize the size of a ML model (specifically one used for HAR) while still maintaining the accuracy of a more robust model. This model can be embedded in any device. Then, we discuss one way to, in any preexisting multi-core device, implement a solution that allows any user using our system to maintain privacy of their data while preventing the user from accessing the model itself—thereby protecting the intellectual property of any model. Lastly, we explore how a proposed hardware change (e.g., creating a new mobile device specially for our system) may expedite the runtime of our model. This survey provides one possible real-time system for physical activity recognition that could be implemented in a (potentially custom) mobile device. Many more solutions exist and should be explored thoroughly before such a system is deployed.

A lightweight model

Creating a real-time physical activity recognition system that can be deployed on mobile devices relies first and foremost on a good physical activity recognition algorithm. To embed such an algorithm on a mobile device, this algorithm needs to be resource efficient. Particularly, this model needs to be space/memory efficient (small) and computational efficient (e.g, not greedy regarding CPU/GPU time). The paper titled “Design of Novel Deep Learning Models for Real-time Human Activity Recognition with Mobile Phones” precisely focuses on these issues [3]. Considering a physical activity recognition model trained on engineered features, these researchers considered two questions: how to minimize the number of weights in a neural network and which features most effected the accuracy of the model. Exploring these two concepts together yielded a smaller, but equally accurate model that they could deploy in a mobile device.

To reduce the number of weights within an activity recognition model, these researchers found they could reduce the size of their engineered input feature vectors. Many human activity recognition (HAR) models do not utilize raw accelerometer data, they use engineered features obtained from the raw data normally through some sort of signal-processing. These features encompass certain statements about the accelerometer data of a particular activity such as the signals amplitude or its periodicity. Using mobile phone gyroscope and accelerometer data from the UCI and UCF datasets, these authors engineered features of different lengths from a single feature to over 500+ features—these were based on prior work that identified potentially relevant features for activity recognition. Using the feature vectors of various lengths, the authors trained both multi-class SVMs and PCAs. The authors identified that both models reached their peak accuracy around 100 features and more importantly, they found that the frequency domain of a given signal did not impact a model’s ability to determine the underlying physical activity. While this does not contradict prior work, the frequency domain of an accelerometer signal has been used for HAR models since the first model addressing this question. Because this research reveals that so few features are needed as an input, this immediately minimizes the number of weights in a model making it more suitable for embedding in a mobile device.

In addition to minimizing the feature length, these researchers compared the effects of hyper parameters specifically in a fully connected layer on classification accuracy of a deep neural network (DNN). Less parameters in a model means there are less weights. Models with less weights will train and predict quicker (be more computational efficient) and, once trained, will be smaller (be more memory efficient). They found their model performed almost as well across sizes of fully connected layers but each DNN+fc[x-nodes] performed slightly worse than their DNN coupled with an SVM. These models ranged in size from .2MB, the CNN without a fully connect layer, to 5MB, the DNN with a 528-node fully connected layer. The authors do not include the size of the model with the SVM although it can be assumed that it is larger than the model without a fully connected layer. The reduction of model size (via hyperparameter reduction) as well as the reduction in feature length yields a very small but very powerful model that can easily fit in a mobile device.

Strengths and Weaknesses

Despite considering only one model and one task, “Design of Novel Deep Learning Models for Real-time Human Activity Recognition with Mobile Phones” provides a great methodology for considering how to minimize the size of a model. Minimizing the size of any model leads to a quicker model to train, evaluate, and make predictions. Thus, this approach is pivotal in implementing our real-time HAR system as it allows out model to run quickly, our main goal. In addition, they have revealed that the sizes of features used for activity recognition models can be greatly reduced. They even claim that possibly, many of these features are not improving the classification accuracy. Of course, to make strong claims about the necessities of features for this task, more work is needed. These authors considered only one dataset and both the dataset and task were incredibly simple. They also do a preliminary investigation on the impacts of certain layers on a model’s classification. In the case of their model, they found several models to be just as powerful, even though the number of parameters/sizes varied. It is important to note that other than removing or adding in a fully connected layer these authors do not consider other potentially smaller architectures. It should be reserved for future work to replicate these authors experiments while considering different classification algorithms. Because we wish to create a real-time physical activity recognition system, this paper provides explicit methodologies on which future work can build and establishes a baseline model size and expected accuracy.

Although the methods proposed in “Design of Novel Deep Learning Models for Real-time Human Activity Recognition with Mobile Phones” may be beneficial for minimizing the size of a model, there are more things to be considered. Firstly, these authors only consider algorithms that rely on engineered features. Because they are relying on engineered features, any real-time mobile activity recognition system would need to include the necessary steps to preprocess the data and create these features from the raw, collected accelerometer data. The space and compute power needed to perform these steps are not included within these authors initial findings and is necessary future work. Instead of engineering features, future work should consider models that work on raw data such as convolutional neural networks (CNNs) and recurrent neural networks. It would be relatively easy to extend what these authors have done. Firstly, like the authors, one could consider the effects of down sampling raw accelerometer data on a CNNs classification accuracy. Down sampling the raw data is one way to consider minimizing the weights in a network (by minimizing the input vectors size). Further, experimenting with hyperparameter selection, specifically for a minimal model size, may lead to smaller but just as powerful CNNs. Depending on their size, a minimal CNN may be easier to embed in a mobile device as the data would not require any preprocessing beyond segmentation into windows of time.

In addition to overlooked size considerations, their models are trained on datasets that are relatively simple and lack exact definitions. The accelerometer and gyroscope data they are using does not encompass many activities and that were incredibly small (< 50 participants), both factors lead to a lack of diversity in their training and testing data. The UCI dataset is from 30 participants capturing just six different activities. They do not note how long each activity was performed or even what the activities are. The other dataset, UCF has either six or nine participants for a maximum of nine activities. Again, they do not say what the activities are or how many instances (windows of time) of this activity exist for a given participant or activity. Their models perform with high accuracies however, if these datasets are simple—e.g., they are obtained from people in same age group, height, overall demographics and they are simple activities (e.g., distinguishing walking from laying still on the ground) performed in a lab setting—there may not be much diversity in overall movement, and one should assume that the model will perform well. Even though the datasets are limited, the accuracies that they are reported are impressive especially because there is so little training data. Without knowing more about what precisely these activities are and how many instances of each activity exists, it is hard to draw specific conclusion on their accuracies or establish future work beyond testing their models on different datasets to determine if their specific model is an option for implementing in our real-time HAR system.

A secure system

Since we have explored a manner to create a lightweight model, we must consider how we can ensure this real-time HAR system is secure. We want to maintain the privacy of the user’s data that will be fed into the model and ensure the model is not accessible to the user. We now consider how we can ensure that this model is not accessible to the user of our system and that the user’s data is not abused by the model. One way to achieve this task, is to utilize the offline model guard (OMG) proposed in the paper “Offline Model Guard: Secure and Private ML on Mobile Devices” [4]. The OMG is designed to protect both the user (as the local copy of a model will never have direct access to a user’s data) and the model (as the user will never have access to the model itself, only its outputs/predictions via I/O). It can be implemented without any major changes to the OS or hardware of a device. The authors show that using OMG protects data in the way we desire and that the accuracy of the model as well as the time to interact with the model remains the same. Because OMG can be implemented in almost any device and protect data without interfering with the model, it is a suitable solution for a real-time activity recognition system on a mobile device.

OMG is driven by the idea that in any system using a ML model, every user’s data must not be accessible to a company (via a model) without their consent and the intellectual property (IP) of a model deployed in a system should not be accessible to the user. With AI and ML addressing more and more problems—such as physical activity recognition or facial ID, the data used to accurately train these models, or even the data needed to make a prediction has become increasingly personal. Further, to ensure a model works for every individual, the model may need several (and oftentimes continuous updated instances) of an individual’s data; a user should have the right to keep this data private. In the same manner, no user should be able to access the model directly as this is a company or a research individual’s private intellectual property.

Online Model Guard provides a solution to this problem by modifying SANCTUARY and utilizes an enclave to separate a local copy of the model from the user’s data. SANCTUARY is a security architecture that does not require any changes to a devices hardware and can be implemented on any multi-core device. SANTUARY works by binding memory to a specific CPU. This means once an object is bound to memory, no external process can access the memory within that CPU. Although no external process can change the memory within this CPU, I/O can occur (e.g., the contents of a specific memory segment can be outputted by the CPU). This reserved memory space in the CPU can be thought of like a tunnel; we know what goes in, we know what comes, but from the outside it is unclear exactly what is happening within the tunnel. This reserved memory space in a CPU is referred to as an enclave.

Like SANCTUARY, OMG utilizes an enclave to separate running a model on user’s data into three distinct phases to ensure security. The first phase is preparation of the model. During this phase, an encrypted version of the desired model is stored in the mobile device’s memory. Both this encrypted model and a pass key to decrypt the model are stored in a protected memory location the user does not have access to. The second phase of OMG is model initialization. During this phase, the encrypted model is decrypted with the pass key and stored locally in the enclave. Because OMG updates SANCTUARY, the user cannot direct access anything within this enclave so both the encrypted and decrypted model are protected from the user. Further, because the decrypted model is a local copy, the original model does not have direct access to user data. Placing the decrypted model into an enclave only when it is in use ensures maximal use of the CPUs. If we are to have a real-time system utilizing this enclave, it is worth considering the impacts constantly using a CPU with an enclave may have on the battery power of a mobile device. The final step of OMG is operation or using user data with the model. User data is passed to the enclave, run on the model, and the enclave sends the predictions back out to the user. During the process the decrypted model (that a company may have access to) has no access to the user’s data which ensures its security. These three steps ensure security of both the model and user data.

Through a variety of experiments, the authors show that OMG is a secure solution. They show that deploying a model with OMG performs with the same accuracy as one that does not use OMG and that these two models perform at approximately the same speed (e.g., OMG does not slow down a model or negatively affect the user’s experience). Because OMG is secure, works on any device, and ensures a pleasant user experience, it is well suited to be implemented in a deployed real-time human activity recognition system on a mobile device.

Strengths and Weaknesses

We live in a world where people want to ensure their personal data is private, because of this, ensuring security of data in a real-time activity recognition system is pivotal. OMG offers one way to go about protecting the model’s developer’s intellectual property and the user’s data. Because OMG does not require any hardware changes, it is easy to deploy as a solution in many pre-existing mobile devices. It would be incredibly beneficial to the system that we seek to design as it could be implemented on any device that has a multi-core processor. Further, at the heart of our system, we want speed. Because OMG has been shown to not impact the speed of use of a model while maintaining protection of necessary data, it is a great choice to implement in a HAR system.

Obviously, the immediate drawback of implementing OMG in our system is that it relies on the mobile device having a multi-core processor. While some newly released mobile devices contain multi-core processors, this is not a guarantee. Deploying a real-time activity recognition system with OMG would severely limit the mobile devices that it could run on potentially decreasing the user base of an HAR system. Other considerations that these authors have not addressed are the efficiency of implementing OMG on a mobile device, specifically, how does this impact battery power? And how does OMG impact a device’s ability to carry out its usual uses? Utilizing an enclave essentially decommissions a CPU while the model is in use, how does this effect the user’s experience if they are trying to utilize their mobile device for other activities while the model is running?

A faster algorithm

Thus far, we have explored two of the three components necessary for creating and deploying a real-time physical activity recognition system on a mobile device. We have determined one possible way to consolidate a model using the solutions in “Design of Novel Deep Learning Models for Real-time Human Activity Recognition with Mobile Phones” and we have seen one method in securing our lightweight-model and the users’ data with OMG from “Offline Model Guard (OMG): Secure and Private ML on Mobile Devices”. We have also ensured that both components will either increase the speed of our system or, at worst, not slow the use of our model (in the case of OMG). We now consider how we can ensure this model is as quick as possible to re-train and predict on user data. The paper “NCPU: An Embedded Neural CPU Architecture on Resource-Constrained Low Power Devices for Real-time End-to-End Performance” outlines a hardware change that accelerates this process. Although it is a hardware change, accelerating an embedded model’s ability to train, retrain, and predict would achieve our overall system’s goal of being as quick as possible.

To understand why acceleration of models on a mobile device is necessary, it is important to note the differences between the compute power on a mobile device and the tradition computers (or cluster) that many models are currently trained and evaluated on. Mobile devices are constrained much more than other computers by space. This lack of space means the available memory (for storage of a model) and CPU/GPUs (for running a model) are much less than that of a traditional computer or cluster. In addition, many state-of-the-art ML algorithms have hundreds of thousands or millions of nodes and are trained and evaluated on incredibly large datasets; these models, in size, are large, and the compute power to efficiently predict on these models can be taxing. Using a cluster allows many researchers and corporations the privilege of not considering size and memory concerns of their models. When deploying an activity recognition system on a mobile device, we do not have this luxury.

The naive solution to lack of compute power on a mobile device is to increase the compute power. A developer could increase the number of CPUs in their device. This immediately comes with two consequences. Firstly, additional CPUs would, most likely, increase the size of the mobile device which could be viewed negatively by many pre-existing users. Secondly, in many cases, this additional computer power is not necessary. Thus, this use of space and additional CPU is, most of the time, a waste of space and resources. Because there are so many drawbacks of simply adding more computational power to a mobile device, these authors considered a different solution with NCPU: specializing the compute power that a mobile device does have.

Instead of adding additional CPUs to a mobile device to accommodate a complex model, the NCPU is a chip designed to accelerate neural network use while maintaining the same functionality of a CPU. The NCPU can perform two functions. The NCPU is inspired by prior solutions to this same problem, specifically a heterogenous architecture for a processor. A heterogenous architecture is a processor that consists of a CPU and a domain-specific accelerator. Although heterogenous architectures achieve their goals, one drawback of them is that the workload is very typically unbalanced, the CPU sees many more instructions than the accelerator.

The NCPU is akin to a heterogenous architecture in that it is both a domain-specific accelerator (for neural networks) and a CPU, however, it is better equipped to handle the unbalanced nature of a traditional heterogeneous architecture. In addition, the NCPU is smaller than other heterogenous processors and, when used as an accelerator lead to a twelve percent decrease in energy consumption (as compared to a heterogeneous processor). When functioning as a general CPU, the NCPU with two cores showed an increase in speed on forty-three percent which equates to approximately 70+ percent battery power. Because the NCPU is space, energy, and cost efficient, it is well equipped to be utilized in a real-time activity recognition system deployed on a mobile device.

Strengths and Weaknesses

Although the benefits of an accelerated model in a real-time activity recognition system are easy to see, the NCPU is a hardware update. This means that a system utilizing the NCPU would not be deployable on current mobile devices, in fact, such a system using NCPUs would need to be released on its own mobile device. This could be a potential benefit, in the case that a company wishes to have proprietary software and hardware and has an established user base. However, it could also be detrimental to such a system as it requires user to purchase an entirely new mobile device. Further, the authors themselves mention that they only implemented and tested one possible design of the NCPU from the design space. They credited this to the fact that the cost of creating chips is high. Only considering one possible design means that there are potentially better versions of the NCPU in existence, but they are yet to be manufactured and tested. That said, this NCPU is the first of its kind and outperforms other existing solutions and the authors intend to investigate other NCPUs as future work.

Conclusion

Deploying a real-time physical activity recognition algorithm on a mobile device would require someone (or company) to ensure their model was lightweight, maintained security over their intellectual property and the user felt safe that their personal data would not be shared or exploited, and their hardware supported what they were trying to do (i.e., equipped to handle the computational costs of a robust model). These three goals should be achieved while maintaining the speed of the model. A smaller/lightweight model ensures that it the system will be faster than one implementing a larger model as less computations need to occur for each prediction (and in training); however, it is possible a lighter weight model may have less performance. Specifically, adding security should not decrease the performance or speed in which the system is able to make predictions. And, although utilizing the NCPU is guaranteed to increase the speed of our system while maintaining its size. However, it comes at the tradeoff of being a hardware change; any system that implements NCPU would need to be its own device and could not work in a preexisting solution. These three ideas have individually been explored in detail by the three aforementioned papers.

Bringing all these ideas together, one could easily implement a system that would capture all our ideals. Even though this paper offers suggestions about one system, there are a variety of ways to go about implementing each component, especially to retain speed. To create the best real-time system, one would need to properly evaluate each component separately as well as how they work together. This would require analysis of many different models, security mechanism, and mobile device architectures. If this system was to be implemented on pre-existing mobile devices (which is ideal), there is no further analysis of architectures required.

Another area of future work could be the cost (in regards to both speed and space) of retraining a model on someone’s personal data in such a system. How does retraining impact the user of the mobile device? How can we minimize the time required to retrain the model? Minimizing the size of a pretrained model is a common area of research, but retraining a model does not seem to be. If a user is relying on a model to track their physical activity, it is important that they be able to correct misclassifications. Ideally, this would lead to retraining the model on the misclassified data with the true labels. Another immediate question is if someone interacting with a HAR system would correct misclassified data. These questions span research areas in computer science but are necessary to investigate if we are going to create a real-time human activity recognition system on a mobile device.

References

[1] “Heart disease facts,” Centers for Disease Control and Prevention, 14-Oct-2022. [Online]. Available: https://www.cdc.gov/heartdisease/facts.htm. [Accessed: 21-Apr-2023].

[2] “Get active!,” Centers for Disease Control and Prevention, 03-Nov-2022. [Online]. Available: https://www.cdc.gov/diabetes/managing/active.html. [Accessed: 21-Apr-2023].

[3] M. Nutter, C. H. Crawford, and J. Ortiz, “Design of novel Deep learning models for real-time human activity recognition with mobile phones,” 2018 International Joint Conference on Neural Networks (IJCNN), 2018.

[4] S. P. Bayerl, T. Frassetto, P. Jauernig, K. Riedhammer, A.-R. Sadeghi, T. Schneider, E. Stapf, and C. Weinert, “Offline Model Guard: Secure and private ML on mobile devices,” 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2020.

[5] T. Jia, Y. Ju, R. Joseph, and J. Gu, “NCPU: An embedded neural CPU architecture on resource-constrained low power devices for real-time end-to-end performance,” 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020.