- Research
- Open access
- Published:
Examining physical activity clustering using machine learning revealed a diversity of 24-hour step-counting patterns
Journal of Activity, Sedentary and Sleep Behaviors volume 3, Article number: 19 (2024)
Abstract
Background
Physical activity is a crucial aspect of health benefits in the public society. Although studies on the temporal physical activity patterns might lead to the protocol for efficient intervention/program, a standardized procedure to determine and analyze the temporal physical activity patterns remains to be developed. Here, we attempted to develop a procedure to cluster 24-hour patterns of physical activity as step counts measured with an accelerometer-based wearable sensor.
Methods
The 1 Hz step count data was collected by a hip-worn triaxial accelerometer from 42 healthy participants, comprising 35 males and 7 females, at the Sendai Oroshisho center in 2008. This is a cross sectional study using unsupervised machine learning, specifically the kernel k-means algorithm with the global alignment kernel was applied on a total of 815 days from 42 participants, and 6 activity patterns were identified. Further, the probability of each 24-hour step-counting pattern was calculated for every participant and used for spectral clustering of step-behavioral patterns.
Results
We could identify six 24-hour step-counting patterns and five daily step-behavioral clusters. We could further identify five step-behavioral clusters, all-day dominant (21 participants), all-day + bi-phasic dominant (8 participants), bi-phasic dominant (6 participants), all-day + evening dominant (4 participants), and morning dominant (3 participants). When the amount of physical activity was categorized into tertile groups reflecting highly active, moderately active, and low active, each tertile group consisted of different proportions of six 24-hour step-counting patterns
Conclusions
Our study introduces a novel approach using an unsupervised machine learning method to categorize daily hourly activity, revealing six distinct step counting patterns and five clusters representing daily step behaviors. Our procedure would be reliable for finding and clustering physical activity patterns/behaviors and reveal diversity in the categorization by a traditional tertile procedure using total step amount.
Background
Physical activity is a crucial aspect of maintaining good health and well-being. World Health Organization (WHO) defines “physical activity as any bodily movement produced by skeletal muscles that requires energy expenditure” and recommends at least 75–150/150–300 min of vigorous/moderate-intensity aerobic physical activity throughout the week [1]. WHO also mentioned that physical inactivity is one of the leading risk factors for global mortality, causing an estimated 3.2 million deaths per year [2]. Many observational and intervention studies suggested that regular physical activity has been linked to numerous physical and mental health benefits, including reducing the risk of chronic diseases, improving cardiovascular health, maintaining a healthy weight, improving mental health, and enhancing the overall quality of life [3,4,5,6,7]. However, a recent systematic review and meta-analysis of randomized controlled trials indicated that interventions to maintain physical activity behavior was effective in short periods (at least three months) but provided a small impact on the long-term maintenance of physical activity [5]. Therefore, it is critical to figure out efficient ways integrated into daily life to promote and maintain physical activity for the improvement of the health and well-being of the public.
A historical topic in public health is what kind, how much, and how intense physical activity is required for health benefits [8]. One of the simplest ways to increase and maintain physical activity is to walk/move more and longer [9,10,11] as shown in the 10,000-steps campaign [12,13,14]. A recent meta-analysis reported that taking more steps per day was associated with lower mortality risk, where the risk plateaued at approximately 6,000–8,000 steps or 8,000–10,000 steps per day for older (≥ 60 years) or younger adults (< 60 years), respectively [15]. Most of the previous research on physical activity has concentrated on how the total amount of activity based on its intensity and duration is linked to health [16,17,18]. However, analyzing only the total amount of physical activity might overlook a complete understanding of the activity behavior since the metrics of the total amount of physical activity tends to exclude or underestimate the effect of temporal variations of physical activity.
Several recent studies using clustering approaches using big data sets have reported the association between physical activity patterns and health outcomes, such as mortality risk, blood pressure, adiposity, and depressive symptoms [19,20,21,22]. The clustering approach could have the potential to be a mainstream of physical activity analyses in the epidemiology field in the future since the popularization of smartphones or wearable devices with accelerometers enables the collection of a vast data set of physical activity with higher temporal resolution [23, 24]. Therefore, a standardized, robust, and reliable procedure analyzing temporal patterns of physical activity from big data needs to be developed.
Recently, several studies focused on the 24-hour temporal physical activity patterns using a narrower criterion for physical activity, such as steps or body movements objectively measured with an accelerometer [25,26,27,28,29]. An earlier study that analyzed the relationship between activity rhythm disturbances and depression identified eight sub-clusters in older adults (≥ 65 years) [27]. Late or combined early/compressed/dampened activity rhythms may independently contribute to depression symptom development [27]. Another recent report using activity counts for seven days in US adults indicated that, although the lowest physical activity counts cluster showed a higher obesity index including body mass index and waist circumference than the other clusters showing higher activity counts, the clusters also expressed different temporal patterns of physical activity [25]. Moreover, an increase in cardiovascular disease risk was observed in the inactive cluster than the other clusters (evening active, moderately active, and very active) [26] from the data of The Northern Finland Birth Cohort 1966 study [30]. The latest report combining activity timing/period and pattern robustness identified sub-groups with progressive depression symptoms and impaired cognitive performance in aging [29]. Therefore, understanding temporal physical activity patterns would help extract informative health outcomes.
In the previous studies above, participants were first clustered by several selected features indicating physical activity patterns, and the physical activity patterns in 24 h were analyzed. However, those procedures have difficulty analyzing temporal physical activity patterns directly because of their methodological restrictions, in which participants were first clustered by extracted individual features of the physical activity patterns and then analyzed. In this study, we propose a novel procedure for the direct clustering of physical activity based on the temporal step-counting patterns with unsupervised machine learning irrespective of the total volume of physical activity. Subsequently, we could identify five step-behavioral clusters of participants, based on the proportion of each step-counting pattern. Furthermore, we analyzed the proportion of the temporal step-counting patterns and step-behavioral clusters of groups categorized by a traditional tertile procedure. The results revealed potential diversity in categorizing by the tertile procedure using total step amount. Our approach could provide a useful procedure to analyze temporal patterns of physical activity in a direct manner.
Materials and methods
This cross-sectional study analyzed data from 42 participants, collected using a hip-worn triaxial accelerometer for up to 25 days, to investigate physical activity patterns. Five participants with less than three days of data are excluded from this study since at least three days of data are required to determine the dominant daily step behavior based on the twenty-four-hour step-counting patterns. Previous research also indicated that a minimum of three days of accelerometer data is necessary for the accurate prediction of physical activity for adults [31]. After excluding data from 47 participants with less than three days of recorded activity, the final dataset included 42 participants with an average of 19 ± 4 days of data. The 1 Hz step count data was converted into hourly data, resulting in 815 days of hourly data for clustering. Using the kernel k-means algorithm with the global alignment kernel in the tslearn Python package, six clusters of 24-hour physical activity patterns were identified. The probability of each pattern for each participant was calculated and used for spectral clustering in MATLAB, yielding five clusters of daily step behavior. Daily step amounts were categorized into tertile groups (low, medium, and high), and the proportion of total activity within these groups was examined in relation to the 24-hour patterns and activity behaviors. The detailed method is explained in the flow chart (Fig. 1). A scale effect analysis by simulation was performed to confirm the reliability of our results from the limited sample numbers since the data set used in the study is the secondary use of a subset of the previous cohort study [32], and detection of the activity pattern was not assumed from the physical activity measured by accelerometers.
Participants and data collection
We used the anonymized activity counting data obtained in the previous study [32], in which participatns were recruited in August 2008 throughout an annual health examination at the Sendai Oroshisho center, provided and agreed with informed consent for their data to be analyzed [32]. The data collected from forty-seven participants between August and September 2008 in Sendai Oroshisho center was used for a series of analyses. The Institutional Review Board of the Tohoku University Graduate School of Medicine have approved the current study protocol (Approval number: 2019-1-394).
Step count data was continuously collected by a hip-worn triaxial accelerometer (Fig S1, Nipro Welsupport, model no wat3663, Japan) for up to 25 days from forty-seven participants. Participants were instructed to wear the accelerometer continuously without removing it except for sleeping or taking a shower/bath during the recording period. We did not collect specific information regarding the number of hours they did not wear the device. Data from participants with less than three days was excluded for statistical reasons [31]. Forty-two participants have the complete set of data. The demography of the forty-two participants (35 males and 7 females). Each participant has 19 ± 4 days of data. The data sampling rate was 1 Hz. The triaxial accelerometer, Welsupport, directly classifies physical activity into eight types of activities from acceleration data (count accuracy is within ± 5% in the catalog spec). Eight types of activities are recorded as below: 0, Resting; 1, Moving while standing; 2, Going down using a vehicle; 3, Going up using vehicle; 4, Going down using stairs; 5, Walking on the ground level; 6, Going up using stairs; 7, Running; 8, Lying down. Numbers of indices 5–7 were counted as steps. The other indices were counted as zero.
Software for data processing and visualization
MATLAB (2021a, MathWorks, MA) and Python (3.11.0) were used for data processing and visualization.
Twenty-four-hour step-counting pattern
The activity counting data at 1 Hz sampling rate was first converted hourly for each 24-hour. A total of 815 days of hourly data was used to cluster the 24-hour step-counting patterns by unsupervised machine learning using tslearn, a Python package for time-series analysis [33]. The kernel k-means algorithm was used for clustering with the global alignment kernel (GAK). The optimal number of clusters was determined using the elbow method. An infection point was observed at cluster number 6, and the Inertia (Sum of distances of samples to their closest cluster center computed using the kernel trick) resulted in a minimum at the point (Fig. S2). Six clusters of 24-hour physical activity patterns were obtained (Fig. 2).
Six step-counting patterns for 24 h. Temporal step-counting pattern for 24 h identified using unsupervised machine learning. (A) All traces of step-counting activity in each pattern. (B) Averaged traces (mean ± SD) of step-counting activity in each pattern. AD, all-day; BP, bi-phasic; M, morning; E, evening; IM, irregular morning; IN, irregular night
Dominant daily step behavior
Dominant daily step behavior was further identified for each participant by clustering based on the 24-hour physical activity patterns. First, the probability of each 24-hour step-counting pattern is calculated for each participant based on the formula below.
Vectors of the probability of each 24-hour step-counting pattern (with six values) were used to cluster the daily step behaviors. A MATLAB function, spectralcluster(), was used for the spectral clustering. The spectralcluster() calculates the eigenvalues of the Laplacian matrix when data and a certain cluster number are given. The optimum cluster number was chosen based on the eigenvalues because the number of eigenvalues with approximately zero is a reasonable estimate of the number of clusters in the given data [34]. Five clusters of daily step-behavior were obtained.
Tertile clustering of daily physical activity based on the daily step amount
The daily step amount for 815 days was divided into the tertile group (low, medium, and high). Each group is defined as:
Low: Daily step amount < = 4127.
Medium: 4127 < Daily step amount < 6946.
High: Daily step amount > = 6946.
Statistical analysis
Pearson chi-squared test was performed with (JMP pro 16.0.0) for all statistical analyses where P < 0.05 is considered significant.
Scale effect analysis by simulation
A scale effect analysis by simulation was performed to confirm the reliability of our results from the limited sample numbers since the data set used in the study is the secondary use of a subset of the previous cohort study [32], and detection of the activity pattern was not assumed from the physical activity measured by accelerometers in the previous study. In the scale effect analyses on the number of days, a different number of days (3, 6, 9, 12, 15, and 18 days) was randomly selected from the data set for each participant, and a clustering procedure was performed as mentioned above. If the selected participant did not have the indicated number of days (i.e., an analysis requires 18 days, but 6 days are available), a maximum available number of days was used. In the scale effect analyses on the number of participants, a different number of participants (10, 20, and 30 participants) was randomly selected, and a clustering procedure was performed as above.
Technical note for clustering strategy
By using combined data from all 42 participants rather than analyzing each participant’s data individually, this study reduces several potential biases. Aggregating the data helps mitigate individual variability bias, ensuring that the results reflect broader trends rather than being influenced by outliers. It also increases the overall sample size, enhancing statistical power and the robustness of the findings. This approach reduces selection bias, ensuring the results are more representative and not overly influenced by any single participant. Additionally, combining data from multiple days accounts for day-to-day variations in activity, providing a more comprehensive view of typical patterns. Lastly, this method dilutes the effect of measurement errors or anomalies in individual step counts, leading to more reliable and accurate results.
Results
This cross-sectional study analyzed data from 42 participants (35 males and 7 females from August to September 2008 in Miyagi prefecture, Japan) collected using a hip-worn triaxial accelerometer for up to 25 days, to investigate physical activity patterns. After excluding data from 5 participants out of 47 participants who had less than three days of recorded activity, the final dataset comprised 42 participants, each contributing an average of 19 ± 4 days of data. The 1 Hz step count data was converted into hourly data, resulting in 815 days of hourly data for clustering. Using the kernel k-means algorithm with the global alignment kernel in the tslearn Python package, six clusters of 24-hour physical activity patterns were identified. The probability of each pattern for each participant was calculated and used for spectral clustering in MATLAB, yielding five clusters of daily step behavior. Daily step amounts were categorized into tertile groups (low, medium, and high), and the proportion of total activity within these groups was examined in relation to the 24-hour patterns and activity behaviors. The detailed method is illustrated in the flow chart (Fig. 1). The demography of the participants is shown in Table 1.
Twenty-four-hour step-counting pattern
Step count data obtained with a triaxial accelerometer was first converted hourly for each 24-hour and then applied for the clustering with unsupervised machine learning (See Methods). Figure 2 shows six 24-hour step-counting patterns identified: all-day (AD, 330 days), bi-phasic (BP, 146 days), morning (M, 129 days), evening (E, 106 days), irregular morning (IM, 64 days), and irregular night (IN, 40 days). AD showed a continuous high step count from 9:00 to 19:00. BP showed increased step counts from 8:00 to 21:00 but had two peak periods at 9:00 and 19:00. Step counts between 10:00–16:00 was slightly low in BP. M started actively around 6:00, kept on high step counts until 14:00, and gradually decreased in activity after 15:00. By contrast, E started active afternoon and showed peak step counts between 16:00–18:00. IM showed high step counts mainly in the morning from 5:00 to 13:00 and decreases in activity after 15:00. IN showed high step-counts from 19:00 to midnight. Table 2 summarizes the proportion of each step-counting pattern. Among 815 days, AD was the most dominant pattern (330/815, 40.5%), and the BP pattern followed (146/815, 17.9%). M and E patterns were 15.8% and 13.0%, respectively. IM and IN were minor patterns, less than 10% (7.9% and 4.9%, respectively). The proportion of each pattern showed significant differences except for a difference between IM and IN (Table S1). Each step-counting pattern showed a different step-count amount (Table 3). The AD showed the highest step-count amount per day (7020.0 ± 3271.4 counts). The M and BP patterns were similar, and the E pattern followed. The IN and IM patterns showed lower step count amounts. AD, BP, and M patterns differed significantly from E, IM, and IN (Table S2). Among E, IM, and IN patterns, a significant difference was observed between E and IM patterns. Regarding duration, the BP pattern showed the longest duration of activity per day (14.54 ± 2.61 h). The AD and M followed. Generally, the AD and BP patterns shared 58.4% of total days and showed higher/longer step count amounts per day. The M pattern was more active than the E pattern. The IM and IN shared a minor proportion in days (12.8%) and showed the lowest step-count amount. All the pairs show a difference (p < 0.05) except BP-AD (Table S3).
Clustering dominant daily step-behaviors
According to the six 24-hour step-counting patterns, we further clustered participants based on the probability of each pattern. We found five daily step-behavioral clusters; AD dominant (21 participants), AD + BP dominant (8 participants), BP dominant (6 participants), AD + E dominant (4 participants), and M dominant (3 participants) (Fig. 3). Half of the participants (21) belong to the AD-dominant (S4 Table), consistent with the result that AD was the most frequent step-counting pattern. The AD + BP was the following frequent behavior. The other daily step behaviors shared minor proportions. There were no IM dominant or IN dominant step behaviors due to their minor proportions in the 24-hour step-counting patterns. Interestingly, mean values of step count amounts for each step behavior were similar (Table S4).
Five step-behaviors. Step behavior clusters were identified using unsupervised machine learning. (A) All traces of proportion of step-counting patterns in each behavior. (B) Averaged traces (mean) of the proportion of step-counting patterns in each behavior. AD, all-day; BP, bi-phasic; M, morning; E, evening; IM, irregular morning; IN, irregular night
Table 4 represents the probabilities of six step-counting patterns in each daily step-behavioral cluster. The AD dominant step-behavior accounts for most of the AD pattern (0.5515 ± 0.1492). AD dominant step behavior also consists of a significant portion with the M pattern (0.1227 ± 0.0888). Other step-counting patterns are less. AD + BP step-behavior is characterized by a significant share of AD and BP step-counting patterns (0.4031 ± 0.1021 and 0.2964 ± 0.0586, respectively). AD + BP step-behavior also contains a significant amount with the M step-counting pattern (0.1587 ± 0.0748), while other step-counting patterns are less frequent. BP dominant step-behavior shares a significant proportion of the BP step-counting pattern (0.3984 ± 0.1167). This step behavior also contains a minor amount with the IN and E step-counting patterns (0.1482 ± 0.1694 and 0.2045 ± 0.1356, respectively). AD + E step-behavior consists of a similar amount with the AD and E step-counting patterns (0.4195 ± 0.0168 and 0.3815 ± 0.0771, respectively), while other step-counting patterns are less. M dominant behavior is characterized by most of the M step-counting pattern (0.6235 ± 0.1509) and also consists of less frequency of the AD and IM step-counting patterns (0.1267 ± 0.1378 and 0.1105 ± 0.0301, respectively). Though IM and IN step-counting patterns share the least number of days and do not show any dominant step-behavioral clusters, these step-counting patterns show some prevalence in all step-behavioral clusters, especially in BP dominant and M dominant step-behavioral clusters.
Scale effect evaluation by simulations
We then investigated the scale effects of day and participant numbers on cluster identification (Fig. 4 and Figs S3-6). First, the number of days was randomly selected from 3 to 18 days per participant, and our procedure was applied. When 3 days of data per participant were used, only 4 step-counting patterns (Fig. 4A and Fig S3A) and 3 step behaviors (Fig. 4B and Fig S4A) were identified. By increasing the number of days, the number of step-counting patterns and step behaviors increased and reached a plateau over 9 days of data per participant (Fig. 4A and B). The step-counting patterns identified from 3 to 18 days of data were consistent with those from the full data set (Fig. 2 and Fig S3), although step behaviors were slightly different from those of the full data set (Fig. 3 and Fig S4). When 15 days per participant (540 days from 36 participants) were selected, step behaviors became consistent with those from the full data set were detected set (Fig S4B).
Scale effect evaluation by simulations. The effects of day and participant numbers on cluster identification were investigated. The effects of different numbers of days on identifying the numbers of step-counting patterns (A) and step behaviors (B). The effects of different numbers of participants on identifying the numbers of step-counting patterns (C) and step behaviors (D)
We then examined the scale effect of participant number on the step behavior clustering by our procedure. When 10 participants were randomly selected, 5 step-counting patterns were identified (Fig. 4C and Fig S5), while only 2 step behaviors were detected (Fig. 4D and Fig S6). When 20 participants, step-counting patterns increased to 6 but the number of step behaviors was four (Fig. 4C and D). Analysis using 30 participant data set (572 days) showed a similar result to the full data set (Fig. 2 and Fig S6). These results suggest that at least 15 days per participant (42 participants) or data set from 30 participants (572 days) are required to obtain robust results.
Diversity of temporal step-counting patterns is buried in a traditional tertile categorization
We identified six 24-hour step-counting patterns and five daily step behavior clusters using unsupervised machine-learning approaches. Then, we compare the step amounts from these step-counting patterns to those from traditional grouping approaches such as the tertile procedure.
Table 5 summarized the composition of step-counting patterns in each tertile group. AD is the most dominant step-counting pattern in the high and mid groups (52.7% and 46.6%, respectively). However, the composition of BP, M, and E step-counting patterns differed between the high and mid groups. On the other hand, the low tertile group consists of a relatively similar proportion for each step-counting pattern. The proportion of tertile groups in each step-counting pattern was consistent with the result above (Table 6). In the AD pattern, the high tertile group is dominant (44.2%) and followed by the mid (37.9%) and low groups (17.9%). In of BP pattern, mid tertile group is the most dominant (50.7%) and followed by the high (26.0%) and low (23.3%) tertile groups. The low tertile and high tertile groups were similar (39.5%) and (38.0%) of days in the M pattern. In the E pattern, the low tertile group is the dominant (42.2%), followed by mid (31.1%) and high (21.7%). IM and IN patterns consist of low tertile groups (76.6% and 67.5%, respectively. Each group showed significantly different compositions (Table S5 – S8).
Discussion
In this study, we attempted to develop a procedure to cluster 24-hour step-counting patterns, as physical activity patterns, using unsupervised machine learning. We identified six step-counting patterns and five daily step behavior clusters. Comparing traditional tertile procedure (high, medium, and low) using daily step amounts, a different proportion of six 24-hour step-counting patterns were observed in each tertile group, suggesting heterogeneity in the categorization by the traditional procedure.
The “temporal patterns of physical activity” in the general term have been paid more attention in recent decades [25, 35,36,37]. However, the situation is confusing since this general term includes different “activity” aspects, such as simple body movements or steps [25], active/sedentary behaviors [35], social behaviors [36], or those combinations [37]. Only several studies focused on the 24-hour temporal physical activity patterns, such as body movements or steps in a narrower criterion [25,26,27,28,29]. For clustering, some previous studies extracted and used several features, such as rhythm height, timing and robustness of physical activity [27], several parameters affecting phase shape and length of rest-activity rhythm [28, 29], or timing, intensity, and duration of activity [25]. Another study used 7-day-continuous data of physical activity counts for clustering and then analyzed intensity and temporal patterns of physical activity for 24 h [26]. These studies adopted a two-step strategy, that is, (1) clustering by extracted features and (2) analyzing 24-hour patterns of physical activity. By contrast, our procedure first directly identified the 24-hour patterns of physical activity from bulk data of daily step-counting records using time-series clustering analysis (Fig. 2) and further clustered participants depending on the probability of each pattern (Fig. 3). Therefore, our strategy appears more straightforward, direct, and intuitive than previous studies.
Recently, a study on temporal physical activity patterns in big data of US adults using machine learning has been reported by Guo et al. [38]. In the study, physical activity counts for 24 h were aligned by Dynamic Time Warping (DTW) with a global constraint using the Sakoe-Chiba Band [39] and then clustered by kernel k-means [40], and kernel hierarchical agglomerative clustering with Ward’s Linkage [41]. The procedure by Guo et al. is sophisticated and similar to our approach to figuring out temporal physical activity patterns directly from time series data with high temporal resolution. However, our approach differs from the procedure by Guo et al. in three points. (1) While the procedure by Guo et al. applied the DTW to the data with a high time resolution, which requires a high computation power using a Graphics Processing Unit (GPU) and CUDA parallel computing platform to accelerate the computation of the distance matrices [38], we used a data set with every-hour data to reduce computing effort. (2) Our procedure might be more sensitive to finding minor irregular patterns, while the procedure by Guo et al. may be suitable for identifying robust patterns [38]. Even considering the different data sets between our work and the study by Guo et al., BP, IM, and IN were not observed in their study [38]. (3) Our approach using the dominant daily step behaviors allows us to attribute temporal physical activity patterns in terms of timing, enabling epidemiological study for health outcomes [22].
The scale effect analyses by simulation using different numbers of randomly selected data sets indicated the limitation of our procedure when applying for a smaller number of data (Fig. 4 and Figs S3-6). The analyses indicated that at least 15 days per participant (540 days from 36 participants) or data set from 30 participants (572 days) are required to obtain robust results, suggesting that over 500 days of data set could be essential. The relationship between the number of days per participant and participants remains to be investigated using a larger data set.
Conventional quantile procedure has been employed on the total amount of activity to categorize data into some clusters because of its simple application. Recently, however, quantile procedure has been pointed for a few potential problems; (1) reduction of detection power, (2) multiple comparison testing, (3) assuming the homogeneity of risk within the group, and (4) difficulty in comparing results within studies [42, 43]. Unfortunately, these problems remain to be unsolved to date as discussed in a recent review [44]. Our procedure described in this study might provide a complementary strategy to compensate for the heterogeneity in the quantile-based clustering. Our study showed an association between step-counting patterns and step amounts and heterogeneity in step-counting patterns among tertile groups based on the step amount, producing a typical example of the second problem (Tables 5 and 6). Our approach may be helpful for understanding and handling the heterogeneity derived from different physical activity patterns between/within the groups, considering it as a confounding factor.
A significant limitation of the current study is that our dataset was small and collected from office workers in the Miyagi area in the Tohoku region, Japan. As daily activity depends on job type and the geographical area where participants live, it is still challenging to mention how our procedure can be generalized for different populations. A more extensive and diverse dataset needs to be analyzed to confirm our procedure’s reliability, robustness, and reproducibility. Our approach using unsupervised machine learning to daily activity behavior and patterns may help to understand people’s prominent behaviors considering heterogeneity. However, as shown in the scale effects evaluation (Fig. 4), even with a relatively small sample size (9 days per person, 30 participants), our procedure is sensitive enough and could lead to general applications to temporal activity analysis. Although the assocition between our results and health outcomes is beyond the goal of this study, our procedure might be reliable for identifying specific patterns of physical activity and contribute to revealing the association between particular activity behaviors and positive health outcomes by combining our procedures to identify specific physical activity patterns with the other criteria, including activity intensity, sedentary behaviors, and sleep/awake cycles.
Conclusion
In our study, we introduced a new approach using unsupervised machine learning to group 24-hour step-counting patterns. This approach revealed six distinct patterns and five clusters of daily step behaviors. We found significant heterogeneity in daily step behavior categorization when applying a traditional tertile procedure based on total step amount, emphasizing the importance of considering temporal variations in activity.
Data availability
No datasets were generated or analysed during the current study.
Abbreviations
- AD:
-
All day
- BP:
-
Bi-phasic
- M:
-
Morning
- E:
-
Evening
- IM:
-
Irregular morning
- IN:
-
Irregular night
- AD + BP:
-
All day + Bi-phasic
- AD + E:
-
All day + Evening
- AD + BP + M:
-
All day + Bi-phasic + Morning
- AD + BP + M + IM:
-
All day + Bi-phasic + Morning + Irregular morning
- BP + E:
-
Bi-phasic + Evening
- AD + M + IM:
-
All day + Morning + Irregular morning
- BP + E + IN:
-
Bi-phasic + Evening + Irregular night
- MATLAB:
-
Matrix Laboratory
- Tukey HSD:
-
Tukey’s Honest Significant Difference
- SD:
-
Standard Deviation
References
Organization WHO, Physical. activity 2022. https://www.who.int/news-room/fact-sheets/detail/physical-activity
Organization WHO, Physical inactivity. https://www.who.int/data/gho/indicator-metadata-registry/imr-details/3416
Chekroud SR, Gueorguieva R, Zheutlin AB, Paulus M, Krumholz HM, Krystal JH, Chekroud AM. Association between physical exercise and mental health in 1·2 million individuals in the USA between 2011 and 2015: a cross-sectional study. Lancet Psychiatry. 2018;5:739–46.
Kraus WE, Powell KE, Haskell WL, Janz KF, Campbell WW, Jakicic JM, Troiano RP, Sprow K, Torres A, Piercy KL. Physical activity, all-cause and Cardiovascular Mortality, and Cardiovascular Disease. Med Sci Sports Exerc. 2019;51:1270–81.
Madigan CD, Fong M, Howick J, Kettle V, Rouse P, Hamilton L, Roberts N, Gomersall SR, Daley AJ. Effectiveness of interventions to maintain physical activity behavior (device-measured): systematic review and meta-analysis of randomized controlled trials. Obes Rev. 2021. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/obr.13304.
Pearce M, Garcia L, Abbas A, et al. Association between Physical Activity and Risk of Depression: a systematic review and Meta-analysis. JAMA Psychiatry. 2022;79:550–9.
Piercy KL, Troiano RP, Ballard RM, Carlson SA, Fulton JE, Galuska DA, George SM, Olson RD. The physical activity guidelines for americans. JAMA. 2018;320:2020–8.
Powell KE, Paluch AE, Blair SN. Physical activity for health: what kind? How much? How intense? On top of what? Annu Rev Public Health. 2011;32:349–65.
Morris JN, Hardman AE. Walking to health. Sports Med. 1997;23:306–32.
Ogilvie D, Foster CE, Rothnie H, Cavill N, Hamilton V, Fitzsimons CF, Mutrie N. Interventions to promote walking: systematic review. BMJ. 2007;334:1204–7.
National Heart, Lung, and Blood Institute. Tips for getting active. https://www.nhlbi.nih.gov/health/educational/wecan/get-active/getting-active.htm
Hatano Y. (1993) Use of the pedometer for promoting daily walking exercise. Journal of the International Com- mittee on Health, Physical Education and Recreation, 29, 4–8. - References - Scientific Research Publishing. https://www.scirp.org/(S(i43dyn45teexjx455qlt3d2q))/reference/ReferencesPapers.aspx?ReferenceID=454558. Accessed 1 Jun 2023.
Eyler AA, Brownson RC, Bacak SJ, Housemann RA. The epidemiology of walking for physical activity in the United States. Med Sci Sports Exerc. 2003;35:1529–36.
Tudor-Locke C, Bassett DR. How many Steps/Day are Enough? Preliminary Pedometer Indices for Public Health. Sports Med. 2004;34:1–8.
Paluch AE, Bajpai S, Bassett DR, et al. Daily steps and all-cause mortality: a meta-analysis of 15 international cohorts. Lancet Public Health. 2022;7:e219–28.
Amagasa S, Machida M, Fukushima N, Kikuchi H, Takamiya T, Odagiri Y, Inoue S. Is objectively measured light-intensity physical activity associated with health outcomes after adjustment for moderate-to-vigorous physical activity in adults? A systematic review. Int J Behav Nutr Phys Activity. 2018. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12966-018-0695-z.
Brailey G, Metcalf B, Lear R, Price L, Cumming S, Stiles V. A comparison of the associations between bone health and three different intensities of accelerometer-derived habitual physical activity in children and adolescents: a systematic review. Osteoporos Int. 2022;33:1191–222.
Lu Y, Wiltshire HD, Baker JS, Wang Q, Ying S, Li J, Lu Y. Objectively determined physical activity and adiposity measures in adult women: a systematic review and meta-analysis. Front Physiol. 2022. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/FPHYS.2022.935892.
Yerramalla MS, Chen M, Dugravot A, Van Hees VT, Sabia S. Association between profiles of accelerometer-measured daily movement behaviour and mortality risk: a prospective cohort study of British older adults. BMJ Open Sport Exerc Med. 2024. https://doiorg.publicaciones.saludcastillayleon.es/10.1136/bmjsem-2023-001873.
Germano-Soares AH, Farah BQ, Da Silva JF, Barros MVG, Tassitano RM. Clustering of 24H movement behaviors associated with clinic blood pressure in older adults: a cross-sectional study. J Hum Hypertens. 2024;38:575–9.
Janda D, Gába A, Hron K, Arundell L, Contardo Ayala AM. Movement behaviour typologies and their associations with adiposity indicators in children and adolescents: a latent profile analysis of 24-h compositional data. BMC Public Health. 2024. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12889-024-19075-8.
Nawrin SS, Inada H, Momma H, Nagatomi R. Twenty-four-hour physical activity patterns associated with depressive symptoms: a cross-sectional study using big data-machine learning approach. BMC Public Health. 2024. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12889-024-18759-5.
Diaz C, Caillaud C, Yacef K. Mining Sensor Data to assess changes in physical activity behaviors in Health interventions: systematic review. JMIR Med Inf. 2023;11:e41153.
Jones PJ, Catt M, Davies MJ, Edwardson CL, Mirkes EM, Khunti K, Yates T, Rowlands AV. Feature selection for unsupervised machine learning of accelerometer data physical activity clusters – A systematic review. Gait Posture. 2021;90:120–8.
Aqeel M, Guo J, Lin L, Gelfand S, Delp E, Bhadra A, Richards EA, Hennessy E, Eicher-Miller HA. Temporal physical activity patterns are Associated with obesity in U.S. adults. Prev Med (Baltim). 2021;148:106538.
Niemelä M, Kangas M, Farrahi V, Kiviniemi A, Leinonen A-M, Ahola R, Puukka K, Auvinen J, Korpelainen R, Jämsä T. (2019) Intensity and temporal patterns of physical activity and cardiovascular disease risk in midlife. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.ypmed.2019.04.023
Smagula SF, Boudreau RM, Stone K, Reynolds CF, Bromberger JT, Ancoli-Israel S, Dam TT, Barrett-Connor E, Cauley JA. Latent activity rhythm disturbance sub-groups and longitudinal change in depression symptoms among older men. Chronobiol Int. 2015;32:1427–37.
Smagula SF, Krafty RT, Thayer JF, Buysse DJ, Hall MH. Rest-activity rhythm profiles associated with manic-hypomanic and depressive symptoms. J Psychiatr Res. 2018;102:238–44.
Smagula SF, Zhang G, Gujral S, Covassin N, Li J, Taylor WD, Reynolds CF, Krafty RT. Association of 24-Hour activity pattern phenotypes with depression symptoms and cognitive performance in aging. JAMA Psychiatry. 2022;79:1023–31.
Northern Finland Birth Cohorts | University of Oulu. https://www.oulu.fi/en/university/faculties-and-units/faculty-medicine/northern-finland-birth-cohorts-and-arctic-biobank
Hart TL, Swartz AM, Cashin SE, Strath SJ. How many days of monitoring predict physical activity and sedentary behaviour in older adults? Int J Behav Nutr Phys Activity. 2011. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/1479-5868-8-62.
Guo H, Niu K, Monma H, Kobayashi Y, Guan L, Sato M, Minamishima D, Nagatomi R. Association of Japanese dietary pattern with serum adiponectin concentration in Japanese adult men. Nutr Metab Cardiovasc Dis. 2012;22:277–84.
Tavenard R, Faouzi J, Vandewiele G, Divo F, Androz G, Holtz C, Payne M, Yurchak R, Rußwurm M. Tslearn, A Machine Learning Toolkit for Time Series Data. J Mach Learn Res. 2020;21:1–6.
von Luxburg U. A tutorial on spectral clustering. Stat Comput. 2007;17:395–416.
Creasy SA, Hibbing PR, Cotton E, Lyden K, Ostendorf DM, Willis EA, Pan Z, Melanson EL, Catenacci VA. Temporal patterns of physical activity in successful weight loss maintainers. Int J Obes (Lond). 2021;45:2074–82.
De Baere S, Lefevre J, De Martelaer K, Philippaerts R, Seghers J. Temporal patterns of physical activity and sedentary behavior in 10–14 year-old children on weekdays. BMC Public Health. 2015. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/S12889-015-2093-7.
Hallman DM, Mathiassen SE, Gupta N, Korshøj M, Holtermann A. Differences between work and leisure in temporal patterns of objectively measured physical activity among blue-collar workers. BMC Public Health. 2015. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/S12889-015-2339-4.
Guo J, Aqeel MM, Lin L, Gelfand SB, Eicher-Miller HA, Bhadra A. Physical activity patterns among US adults. medRxiv 2023.01.23.23284777.
Sakoe H, Chiba S. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust. 1978;26:43–9.
Dhillon IS, Guan Y, Kulis B. (2004) Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, pp 551–556.
Jain RK, Vokes T. Physical activity as measured by accelerometer in NHANES 2005–2006 is associated with better bone density and trabecular bone score in older adults. Arch Osteoporos. 2019;14:29.
Greenland S. Avoiding power loss associated with categorization and ordinal scores in dose-response and trend analysis. Epidemiology. 1995;6:450.
Bennette C, Vickers A. Against quantiles: categorization of continuous variables in epidemiologic research, and its discontents. BMC Med Res Methodol. 2012. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/1471-2288-12-21.
Jones PR, Ekelund U. Physical activity in the Prevention of Weight Gain: the impact of Measurement and Interpretation of associations. Curr Obes Rep. 2019;8:66–76.
Acknowledgements
Not applicable.
Funding
This project was partially supported by a pioneering research support grant from JST SPRING (Grant number: JPMJSP2114) at Tohoku University. The funding body had no role in the study design, data analysis, or manuscript submission.
Author information
Authors and Affiliations
Contributions
SSN and HI analyzed the data. SSN, HI, and HM wrote the manuscript. HI and RN designed and supervised the research. All authors revised the manuscript and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
The study was approved by the Institutional Review Board of Tohoku University Graduate School of Medicine under Protocol Approval No: 2019-1-394. All methods used in this study adhered to the relevant guidelines and regulations. Written informed consent was obtained from all participants, and the data was collected anonymously.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Nawrin, S.S., Inada, H., Momma, H. et al. Examining physical activity clustering using machine learning revealed a diversity of 24-hour step-counting patterns. JASSB 3, 19 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s44167-024-00059-3
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s44167-024-00059-3