Bayesian Optimized Random Forest for Movement Classification with Smartphones

Bayesian Optimized Random Forest for Movement Classification with Smartphones 1 2 3 4 Anonymous Author(s) Affiliation Address email 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Abstract As electronic devices become more powerful, they also become more prevalent in our daily lives due to their usefulness. Smartphones today are equipped with a large variety of sensors, which can be utilized for many useful things. This work looks at accelerometer and gyroscopic data of smart phones as a way to classify movement types of their owner. This is done by generating feature sets from the component time series signals and developing a random forest to classify these features as one of six movement types: walking, walking upstairs, walking downstairs, sitting, standing, and laying. Furthermore, a Bayesian optimization scheme was used to find an optimal parameter set for the forest, while keeping the forest as small, and shallow, as possible to improve prediction speeds. With this scheme, we were able to achieve overall prediction accuracy of ~92% for all activities combined. The prediction accuracy for walking, walking up stairs, walking downstairs, sitting, standing and laying are ~97%, ~87%, ~85%, ~87%, ~95%, and ~99% respectively. 1 Introduction As the computational power of smart phones has drastically increased over the past several years, people have become far more reliant on their phones for scheduling, emails, global positioning, and entertainment. Furthermore, the continual increase of power and efficiency of these mobile chips will allow human beings to do even more with their phones in the near future. Mobile phones contain many sensors which can be utilized to guess what the user is doing with their device, and potentially improve their user experience. Currently, mobile phones can contain sensors for imaging (cameras), acceleration (accelerometer), direction (electronic compass), pressure (pressure sensor), and orientation (gyroscope). These sensors are typically utilized for user interface services, such as screen rotation, and location services, such as GPS positioning for mapping applications. Utilizing these sensors more extensively, by classifying the owner s movement for example, could improve some of these applications. Further, being able to classify dangerous phone movement like free-fall or landing in water could reduce the risk of important data loss. However, these sensors could be used for more than just user experience improvement. It is no secret that North America is experiencing an obesity epidemic. In 2011, it was estimated that 34% of adults (aged 18 and older) were overweight, and 28% were suffering

41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 from obesity [1]. Since smart phones have become so prevalent, it may be useful to classify physical behavior using the numerous sensors present on these phones, in order to track exertion. This information could be used by the individual to monitor exertion and the total exercise performed in a given interval. This could potentially help overweight individuals to maintain a healthy lifestyle. The goal of this paper was to attempt to classify certain physical behaviors given gyroscopic and acceleration data from a Samsung Galaxy S II. The data was collected from 30 volunteers whose ages ranged from 19 to 48. The smartphone was attached to their waist, and they performed 6 different activities. A Random Forest classification scheme was utilized in an attempt to classify the data. To fine tune this scheme, Bayesian Optimization was used to find the free parameters of the forest implementation. 2 Data 2.1 Data Overv iew The dataset used in this paper was acquired from UCI Machine Learning Repository [2]. The data consists of over 10000 records of volunteers performing 6 different physical activities: walking, walking up stairs, walking down stairs, sitting, standing, and laying down. Each record corresponds to one of these activities. 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 Figure 1: Filtered gyroscopic and accelerometer data for walking up stairs In Figure 1, a sample time series of walking up stairs is shown for both the gyroscope and accelerometer. The Samsung Galaxy S II was used to capture tri-axial angular velocity (from gyroscope) and tri-axial acceleration (from accelerometer) time series at 50Hz. In order to produce the 10000 records, the original time series were sampled in fixed-width time windows of 2.56 seconds and they overlapped by 50%. Both time series were filtered using a median filter and a low-pass Butterworth filter to remove unwanted high frequency noise. Further, the acceleration was separated into two parts (acceleration of body and gravity) and low-pass filtered. The derivative of each signal was taken to estimate Jerk Signals for both acceleration and angular velocity. Lastly, some of these signals were transformed into the Frequency Domain (via Fast Fourier Transform). All of these signals were used to create an estimate of each feature. There are a total of 561 features for each record that are included in the dataset. The dataset was randomly split into about 7000 records for training, and about 3000 records for testing. 2.2 F eature Selection Table 1 shows the resulting signal names and their descriptions obtained from the preprocessing described above.

80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 Signal Name Table 1: Signal Names and Descriptions Description (XYZ implies 1 signal for each component) 1. tbodyacc-xyz Time domain acceleration due to external force 2. tgravityacc-xyz Time domain acceleration of gravity 3. tbodyaccjerk-xyz Time domain derivative of external force 4. tbodygyro-xyz Time domain angular velocity 5. tbodygyrojerk-xyz Time domain derivative of angular velocity 6. tbodyaccmag Time domain magnitude of external force 7. tgravityaccmag Time domain magnitude of gravity 8. tbodyaccjerkmag Time domain magnitude of derivative of force 9. tbodygyromag Time domain magnitude of angular velocity 10. tbodygyrojerkmag Time domain magnitude angular velocity derivative 11. fbodyacc-xyz Frequency Domain of 1 12. fbodyaccjerk-xyz Frequency Domain of 3 13. fbodygyro-xyz Frequency Domain of 4 14. fbodyaccmag Frequency Domain of 6 15. fbodyaccjerkmag Frequency Domain of 8 16. fbodygyromag Frequency Domain of 9 17. fbodygyrojerkmag Frequency Domain of 10 With these signals, 17 different operations are used to generate the feature set. The first four operations used to generate features are the standard deviation, the mean, the max value, and the min value of the signal. Another operation generates an angle between a pair of vectors, which is obtained by calculating the dot product between the two signals by taking average value of the XYZ components of each signal [3]. Other operations include signal magnitude area (area under the magnitude curve), energy of the signal (sum of the squares divided by number of values), cross-correlation coefficient [4], auto-correlation coefficient [5], index of the frequency component with the largest magnitude, and skewness and kurtosis of the frequency domain signal [6]. The operations were applied to many of the signals to generate a 561 feature vector for each record in the data set. 3 Classification 3.1 Rando m Fo rest A classification random forest was used to classify the data into one of the 6 categories. This process consists of building decision trees. These decision trees were built by recursively splitting the data into leaves based on information gain for a given set S of training points, given by 101 102 H(S) is the entropy of the training set S which is given by 103 104 105 106 107 108 where p(c) is the probability distribution (histogram) of the classification labels in the training set S [7]. Split locations are tested by taking random number of features to split over. Then, for each feature, the data is split between each data point and the information gain is calculated. The maximum information gain over all tested splits will be used. Figure 2 shows an example of split at a particular node in one of the decision trees.

109 110 111 112 113 114 115 116 117 Figure 2: An example of split made at a node during the construction of one of the decision trees. The x-axis for all plots is the split dimension chosen, and the red line is the threshold. The y-axis in each of the subplots is one of the features of the N random dimensions chosen to test splits over. Finally, once the forest has been built, the test data can be fed through the forest to see how it is classified. The probability of the classification is calculated by the probability distribution of the leaf the data point ended up in for each tree and then summed over T trees in the forest given by 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 This will determine the predicted classification for that particular test point [7]. Also, the forest uses bootstrapping in order to reduce variance and over fitting the training data. Bootstrapping could also help with feature correlation caused by signals from different activities being similar. The forest builder contains 4 free parameters the number of trees, the depth of each tree, the minimum size of each leaf, and the number of dimensions to try splits over. These parameters were estimated using a Bayesian Optimization scheme. 3.2 Bayesian Optimization Bayesian Optimization was implemented using Gaussian Processes. Candidate points were sampled from a 4 dimensional space to account for the 4 free parameters of the random forest, and the function used to add data to Gaussian Process was the forest builder. The forest builder took the set of candidate points and built the forest based on the training data. Then, the forest predicted the classes of the test data, and the accuracy of the result is the output used for maximization. Firstly, the GP prior is created using the kernel function given by where sigma is the standard deviation of the noise and l is a turning parameter set to 0.1. When new data is added, X*, we construct the two new matrices K* and K** as shown below. 138

139 140 From Theorem 4.2.1 (Marginals and Conditionals of an MVN) [8], we get the following expressions for Mu* and Sigma* of the posterior distribution: 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 The goal of Bayesian optimization is to optimize the function without knowing what the true function actually is. We want to run test points through our function as little as possible, since it is expensive. To do this, we used the Upper Confidence Bound acquisition function shown below. Mu and Sigma are the average and standard deviation of the posterior, and Kai is a free parameter set to 0.1. This function tries to balance exploitation and exploration in order to find a maxima in the given candidates x. 4 Results 4.1 M a nual Forest Parame t er Selection Since the dataset and feature set were quite large, we had very good results even by manually picking forest parameters. The prediction accuracies of each activity for both the test and training set are displayed in Table 2. Table 2: Manual Parameter Selection Result Activity Training Test Accuracy Accuracy Walking 98.6% 95.3% Walking Upstairs 97.7% 87.3% Walking 95.2% 78.9% Sitting 77.5% 61.0% Standing 97.0% 95.67% Laying 99.7% 98.1% TOTAL: 95% 85% It is to be expected that the training predictions are more accurate than the test predictions, as the forest is modeled after the training data. From the test prediction accuracies, it is clear that both sitting and walking downstairs are both the least accurate predictors. 4.2 O ptimized Forest Parameter Selection Using Bayesian optimization, we made a noticeable improvement on the predictive capabilities of the random forest. We decided to keep the forest as small as possible, because as the forest grows, the computation time increases dramatically. This is not ideal for smart phone processors. The optimization space covered a forest size of 1 to 20 trees, a tree depth of 1 to 20, a minimum leaf size of 1 to 10, and the number of dimensions to test splits over from 1 to 15. Figure 3 displays the results of the optimization for 100 iterations..

171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 Figure 3: Results for 100 iterations of optimization From this result, we chose the parameters which yielded maximum accuracy in this interval and allowed for the fastest tree computation time. The result of the optimization increased overall accuracy of the training data by ~4% and the test data by ~7%. The individual prediction accuracies are shown in Table 3. Table 3: Optimized Parameter Selection Result Activity Training Test Accuracy Accuracy Walking 99.9% 97.6% Walking Upstairs 100% 86.8% Walking 100% 84.5% Sitting 99.5% 87.1% Standing 99.6% 94.7% Laying 99.7% 99.8% TOTAL: 99% 92% Clearly, the most drastic improvement gained by the optimization was the classification of the sitting activity. The classification accuracy increased by ~26% over the manual selected parameters. Further, the walking downstairs activity prediction accuracy increased by ~7%. The other classes saw prediction accuracy increases by ~1-2%. The parameter values that yielded these results were a forest of 15 trees, a tree depth of 13, a minimum leaf size of 5, and the number of dimensions to try splits of 12. 5 Conclusion and Discussion To conclude, a movement type classification problem is solved using a Bayesian optimized random forest. We have achieved respectable classification accuracies that range from ~85% to 98%. The weakest prediction accuracies come from sitting, walking downstairs and walking upstairs, while laying, walking and, standing accuracies are greater than ~95%. Though the classification techniques were different, our results seem to be more accurate than Wu et al [9]. Using a variety of classification techniques (including K-nearest neighbor, a decision tree, and Multilayer perception), they had accuracies of ~81-92% for walking, ~42-70% for walking upstairs, ~54.6-79.4% for walking downstairs, and ~99-100% for sitting [9]. The lower accuracies achieved for walking upstairs, walking downstairs and sitting could be the result of correlation between the two signals, or because the feature set chosen may be missing a feature that better represents the activity. Table 4 displays the analysis of the

199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 misclassified points of the 3 aforementioned activities. Table 4: Error decomposition of results True Activity Incorrect Classifications Percentage of Error Walking Upstairs Walking ~84% Walking ~16% Walking Walking ~56% Upstairs Walking ~44% Sitting Standing ~90% Laying ~10% When the sitting activity is misclassified, 90% of the time it is classified as the standing activity. This suggests that there is slight correlation between their features, which is logical since the two actions are quite similar, with respect to the phone s position. This suggests that these results could probably be improved by performing cross-correlation of component wise signals across different activities, and removing those signals from feature creation if their correlations are statistically significant. Alternatively, adding another Bayesian Optimizer to the hierarchy for selecting a sub-set of the current features may also improve the results, and remove the effects of the correlation, though this would definitely require the algorithm to be parallelized. Furthermore, the decrease in the feature set size would improve real-time classification performance, by reducing the number of features that need to be calculated in order to run the classification. This is imperative on smart phone platforms where processing power and electrical power are limited. Other factors that could add Bayesian optimizers to the hierarchy are filtering techniques, the size of the sample window, and the amount of signal overlap between different data points. Though the classification may be improved adding another optimizer to the hierarchy or cleverly analyzing the feature set, we found that our average test accuracy of 92% (~98% for walking, ~87% for walking upstairs, 85% for walking downstairs, ~87% for sitting, ~94% for standing, and ~99% for laying) is very respectable. References [1] Schiller JS, Lucas JW, Peregoy JA. Summary health statistics for U.S. adults: National Health Interview Survey, 2011. National Center for Health Statistics. Vital Health Stat 10(256). 2012. [2] https://archive.ics.uci.edu/ml/index.html [3] https://en.wikipedia.org/wiki/dot_product [4] https://en.wikipedia.org/wiki/cross-correlation [5] https://en.wikipedia.org/wiki/autocorrelation [6] https://en.wikipedia.org/wiki/kurtosis [7] A. Criminisi, J. Shotton and E. Konukoglu (2011) Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning. Microsoft Research technical report TR-2011-114 [8] Kevin Patrick Murphy (2012) Machine Learning: a Probabilistic Perspective Massachusetts Institute of Technology [9] W. Wu, S. Dasgupta, E. Ramirez, C. Peterson and G. Norman (2012) Classification Accuracies of Physical Activities Using Smartphone Motion Sensors. Journal of Medical Internet Research