Inertial Measurement Units (IMU) provide a portable, cost-effective and easy-to-use solution for all kinds of motion analysis. They are small enough to be attached to any segment of the human body without restricting the area of movement in any way. This flexibility allows motion analysis to be applied in various scenarios with any subject. Motesque has shown that the combination of IMU sensor signals and knowledge about their location can deliver astonishing insights into human locomotion, allowing experts to find the optimal solution for any given problem. The Motesque Motion Analysis kit provides customers with a maximum of freedom in terms of where to use it. Until now, the only limitation has been the predefined segment for each IMU. At Motesque, we were wondering if machine learning might be able to help us remove this limitation, and thus to enhance the customer experience even more. The answer is: yes!
Defining the problem
Before we started to research possible solutions, we made the following assumptions; these would give us an idea of the feasibility of what we wanted to achieve:
- The motion we want to analyze contains a cyclic movement
- We know which kind of movement is present in the recorded data
- The number and position of possible locations for any sensor is predefined
In a data-science feasibility study, it is always important to start with the presumably simplest solution, one that includes a lot of prior knowledge or contains different building blocks that solve subproblems. Only if these multi-step solutions are successful is it advisable to remove assumptions that eventually lead to a final end-to-end solution. Thus, instead of trying to develop a model that could identify the location of any IMU attached to any segment for all kinds of movements with significantly different characteristics, we first aimed to solve the core problem for a very specific use case: running on a treadmill.
The first assumption above ensures that the variation in the signal of any given IMU is only the result of an individual way of executing a certain movement by a particular subject. The second assumption ensures that the algorithm we develop is customized for one particular movement, which decreases the complexity of the model. The third assumption is especially important when IMU sensors are being used. The initial alignment of the sensor will impact the signal course of an IMU during the cyclic movement. To keep in line with our usage recommendations, we assume that only the segment is arbitrary for each IMU, but the exact location (orientation) of a sensor on a specific segment is predefined.
Even with these three assumptions, we are still left with a complex problem. As an example, both running speed and foot strike technique – fore-foot-strike, rear-foot-strike – will have a significant impact on the course of each IMU signal and thus will provide high variance within the same class of movement.
What do we need to achieve?
Any solution developed must satisfy the following requirements:
- It must generalize well to new subjects
- The model should run on a low-power computing device within a reasonable time frame
- The model should be invariant to the length of the recorded data
How can AI help us?
Simple, generic algorithms will most likely fail to define a pattern that generalizes well to all kinds of variations of the same movement. As a result, we have had to look for more sophisticated solutions from the field of machine learning. Since calculating any biomechanical parameter requires knowledge of the IMU sensor segment location itself, the data we are left with records all kinds of variations based on the raw data stream. The absence of precalculated features that would be typically used by machine-learning algorithms like Support-Vector-Machine (SVM), Logistic Regression, Clustering and many others has forced us to dive into the world of deep neural networks.
Instead of biomechanical features, we could have chosen to extract statistical parameters from the raw sensor data that could be used by the above-mentioned algorithms. However, one of the strengths of neural networks is their ability to independently learn a feature representation from raw data that is best suited for the given problem… so we let the AI do its job.
Another advantage of neural networks is the existence of specialized architectures for different problems. Conventional machine-learning algorithms treat each input as an independent feature. If we wanted to use an SVM with a multivariate time series input, we would need to stack each IMU signal (axis of the sensor). This way, we would lose two types of information:
- Is there a correlation between a given sensor signal and the sensor signal from a different axis at the same timestamp?
- What impact do the following and previous time frames have?
Two different neural network architectures are capable of keeping this kind of information when dealing with multivariate time series data by learning temporal patterns:
- Recurrent Neural Networks (RNN)
- Convolutional Neural Networks (CNN)
RNNs and their well-known subtype Long-Short-Term-Memory Network (LSTM) were specifically designed to handle temporal dependencies. The memory cell of an LSTM is trained to incorporate previous timeframes into the decision-making process. Apart from their excellent performance in time-series related tasks such as natural language processing, audio processing and human motion analysis, they have a fundamental issue, namely compute resources. Because of the complex structure of an LSTM Memory cell, all of the benefits that GPU-accelerated training and inference provide in any other neural network structure only apply in small ways to RNN structures. Furthermore, the transition to models that can be deployed to ARM (Advanced RISC Machines)-based computing devices such as Tensorflow Lite are either not well supported or would be too slow to run efficiently.
CNNs are the core architecture behind all existing image-processing models as well as other applications with a high number of input features. They are very good at local feature engineering while keeping the computational cost to a bare minimum. Connecting several CNN layers can transform the local feature space to a global feature space that is used by simple, fully connected layers at the end of a network to derive a decision. CNN layers are best known for their application to 3D data input such as rgb images, but there are also variants for 2D input that could be defined as t x n, where t is the number of timeframes and n the number of features (e.g. sensor axis). Moreover, recent research on dilated convolutions has shown that dilation in CNNs can result in temporal awareness like that found in RNN structures. Our final model was a shallow CNN model with dilated convolutions that is fed with overlapping windows of gyroscope and accelerometer signals from a single sensor. The model is designed to classify a single sensor k times, where k is the number of overlapping windows extracted from the given motion data. A majority voting of the k classifications determines the final prediction of the model for the IMU.
What kind of data are we using?
Every machine-learning project that involves training a neural network requires a large amount of labelled data. Because we were able to assume that the sensors were attached to the predefined segments in all recording sessions in our MLab, gathering data was straightforward. In general, each trial consisted of three sensors:
- Foot left
- Foot right
The diversity of athletes that have used our running analysis have provided us with sufficient variability in the data. It is fair to say that our data represents all kinds of running styles and thus will lead to a model that will generalize well. The data was split into 80% training, 10% validation and 10% test data, and we ensured that each subject was only included in one of the datasets. In addition, the analysis was evaluated in a ten-fold cross validation, where each subject was part of the test set once.
Is it any good?
The model achieved an impressive accuracy of 99.6% combined on all test sets, based on the majority voting. The confusion matrix showed no errors between the foot sensors and the pelvis, and the model only rarely mixed up the foot sensors.
We have successfully achieve the creation of a machine-learning model that is able to classify the body segment to which an IMU sensor is attached for a specific use case. The general setup of our processing pipeline, including overlapping windows of fixed sizes alongside majority voting, made the model not only invariant in relation to the length of the motion trial but it also allows us to get an idea of the confidence the model has about its predictions. If the model classifies each overlapping window to the same class, we will be very confident that this classification is correct. On the other hand, if the model classifies the overlapping windows of the same sensor to all kinds of target classes, we are left with two choices. First, we could conclude that the movement recorded by the sensor does not match the movement we expect, hence the model predictions are not trustworthy. Or second, the model is presented with a very rare form of the specified movement that does not match the pattern it has learned. The classification would still be untrustworthy, but will remain valuable for future retraining.
As a next step, the same methodology should be applied to a different movement type, e.g. to walking or stair ascent/descent, with the addition of more sensors. As a preceding step, a human motion classification model would be useful, allowing the correct segment classifier model to be chosen automatically. The challenge for this model would be to be invariant in relation to the order of inputs, because we would not know which sensor is which. Together, both models will provide the freedom to attach the sensors at any segment for any cyclic movement that we know of.