Background: In current study, we aimed to investigate if Attention Deficit Hyperactivity Disorder (ADHD) is better to be categorized among behavioral or neurodevelopmental disorders, based on some familial and environmental factors.
Methods: We conducted correlation analysis to identify psychiatric disorders in the dataset which have an important impact on ADHD. Also, we used machine learning-based approaches combined with a feature selection algorithm to cluster and classify ADHD as a behavioral or neurodevelopmental disorder.
Results: Model evaluation showed that ADHD is clustered in the group of behavioral disorders with the accuracy of 78%. Furthermore, Support Vector Machine (SVM) classified ADHD as a behavioral disorder with the accuracy of 72.66% and as a neurodevelopmental disorder with the accuracy of 60.07%.
Conclusion: In sum, we can say that our findings support categorizations systems like HiTOP in comparison to DSM-5. However, as biological factors were not included in our analysis, it should be considered with caution and examined in future researches.
Keywords: Attention deficit disorder with hyperactivity, Biological factors, Diagnostic and statistical manual of mental dis-orders, Neurodevelopmental disorders, Support vector machine
Attention Deficit and Hyperactivity Disorder (ADHD) is a common psychological condition among children and adolescents (1). The worldwide pooled-prevalence of ADHD among children and adolescents is estimated at 5.29% (2). Different diagnostic and classification systems (i.e. the Diagnostic and Statistical Manual of Mental Disorders, the International Classification of Disease) agree that inattention, hyperactivity, and impulsivity are main characteristics of ADHD (3,4). It is usually considered as behavioral or externalizing disorders; however, the new edition of DSM (i.e. DSM-5) places ADHD in the new category of neurodevelopmental disorders (1,5). In previous versions of the DSM, in addition to Conduct Disorder (CD) and the Oppositional Defiant Disorder (ODD), ADHD was included in the category of disruptive disorders (6,7). ICD-10 also includes ADHD in the category of Hyperkinetic Disorders (F90), which are characterized by “an early onset, lack of persistence in activities that require cognitive involvement, and a tendency to move from one activity to another without completing any one, together with disorganized, ill-regulated, and excessive activity” (3).
Due to the limitations of the traditional system, efforts have been made to provide more practical systems for the classification of psychiatric disorders (8-10). A new dimensional system for the nosology of psychiatric problems is the Hierarchical Taxonomy of Psychopathology (HiTOP), which is based on empirical patterns of co-occurrence between psychological symptoms (11,12). The HiTOP has five levels of symptoms/problematic behaviors, as the lowest level, to the extreme spectrum, as the highest level. Based on HiTOP, ADHD, CD, ODD and Intermittent Explosive Disorder (IED) are a group of disorders under the sub-factor of antisocial behaviors that belong to the externalizing dimension spectra (13).
An important assumption in hierarchical taxonomy is that similar mental disorders share common risk factors: genetically or environmentally. To test this concept and investigate whether ADHD is better classified among externalizing behavior disorders or neurodevelopmental disorders, we used a machine learning approach and identified some potential familial and environmental risk factors shared between ADHD, externalizing behavior disorders, and neurodevelopmental disorders.
Materials and Methods
The data used in this work is from the Iranian Child and Adolescent Psychiatric Disorders Project (IRCAP). This cross-sectional community-based project was conducted in the Iranian population aged 6-18 in urban and rural areas in order to investigate the epidemiology of psychiatric disorders in young people and its relationship with lifestyle, social capital, and personality disorders of parents. K-SADS-PL was administered by trained interviewers to recognize psychiatric disorders in both screening and diagnostic stage. For more details about this project and data gathering procedure, refer to (14-16). The prevalence of ADHD in this national survey is estimated at 4% (1175/29710). Behavioral disorders and neurodevelopmental disorders are concomitant with ADHD in 31% and 7.7% of subjects who received diagnosis of ADHD, respectively (17).
In the current study, we conducted correlation analysis to identify psychiatric disorders in the dataset that have an important impact on ADHD. Also, we used machine learning-based approaches combined with a feature selection algorithm to cluster and classify ADHD as a behavioral or neurodevelopmental disorder.
In the correlation analysis, Pearson’s correlation coefficient was calculated to identify DSM disorders that have strong effects on ADHD. It is a measure of collinear correlation between two given variables. Mathematically, it can be calculated as:
where is the correlation coefficient, is the covariance between A and B variables, and σA and σB are the standard deviations of A and B, respectively. In the first step, we considered all psychiatric disorders other than ADHD as the independent variables and ADHD as the dependent or target variable to assess the linear relationship between psychiatric disorders and ADHD. We utilized Pandas Toolkit, which is an open source library providing easy to use and high-performance data structures and analysis tools for the Python programming language, for correlation analysis purposes.
Feature selection and machine learning based clustering and classification
We did not rely only on the correlation coefficient to identify the strong predictor variables. K-means clustering method was used to assign ADHD to the behavioral or neurodevelopmental disorder. Figure 1 shows the steps in our methodology to be performed. K-means is a popular unsupervised machine learning algorithm for cluster analysis in data mining. The objective of K-means is to group similar data points and discover underlying patterns. For this purpose, K-means tries to find a specified number of clusters (K) in a dataset. A cluster is a collection of data points that are accumulated together due to certain similarities. After choosing the number of clusters or centroids (in our case K=2) and entering the dataset as a collection of features (in our case demographic data, life style, social capital and Millon subscales) for each data point, the algorithm uses an iterative processing to generate a final result. Firstly, every data point is allocated to its nearest centroid based on the Euclidean distance. In fact, the algorithm assigns each data point to a cluster according to:
where ai is the collection of centroids in set A, x is a data point and dist ( . ) is the standard Euclidean distance. Then, the centroids are updated using the following formula:
where Di is the set of data point assignments for every i th cluster centroid. K-means iteratively conducts these computations until a stopping condition is satisfied; no data point changes cluster, a maximum number of iterations is reached, or the sum of the distances is minimized.
In the classification stage, we used non-ADHD cases with neurodevelopmental or behavioral disorders as the training set and ADHD cases as the test set. Support Vector Machine (SVM) was used as a supervised learning model for classification purpose.
Since there are more than 800 features for each data point in the dataset, we decided to use an appropriate feature selection method to identify the most informative attributes for the clustering step. Irrelevant features slow down processing and reduce model interpretability and generalization. An ideal feature selection algorithm should extract relevant features, determine nonlinear feature interactions, scale the number of features linearly, and enable the introduction of known sparsity structure. Therefore, we used gradient boosted feature selection, which is a scalable, flexible and easy to use algorithm. We utilized scikit-learn Toolkit, which is an open source library providing various clustering, classification and regression algorithms for the Python programming language, for machine learning purposes.
In order to identify the ranking of psychiatric disorders according to their relationship with ADHD, Pearson correlation analysis was used as described earlier. Figure 2 shows the results of this analysis. Psychiatric disorders are ranked from strongest to weakest association with ADHD. It can be seen that behavioral disorders, i.e. ODD and CD, have stronger relationships with ADHD than neurodevelopmental disorders, i.e. autism and types of Mental Retardation (MR).
At the feature selection stage, we performed feature selection using lasso, random forest, decision tree and adaptive boosting algorithms, and then evaluated the clustering and classification models based on the set of features selected by each algorithm. However, the best accuracy of the models was obtained for the gradient boosted method. Again, ODD and CD were the most important features chosen by all feature selection algorithms. After that, the different subscales of maternal personality disorders were among the most important features. Moreover, Generalized Anxiety Disorder (GAD) and Separation Anxiety Disorder (SAD) were also selected as the top 15 features in predicting ADHD.
For clustering the ADHD as a behavioral or neurodevelopmental disorder, we used demographic data, social capital, life style, and parental personality disorders data as initial input features for the machine learning approach. Then, the gradient boosted feature selection method chose an optimal subset of features for the final cluster analysis. Model evaluation was performed using 10 folds cross-validation approach in Python. Model evaluation demonstrated that ADHD is clustered in the group of behavioral disorders with the accuracy of 78%. Furthermore, SVM classified ADHD as a behavioral disorder with the accuracy of 72.66% and as a neurodevelopmental disorder with the accuracy of 60.07%.
As mentioned earlier, there is controversy regarding the categorization and classification of ADHD (18). DSM-5 categorized ADHD among neurodevelopmental disorders (1), however, there is another system such as HiTOP that categorized ADHD as a behavioral disorder (11). In the present study, we have investigated these claims with the machine learning method. Our findings suggest that ADHD has more shared factors with behavioral problems (i.e. CD and ODD) than neurodevelopmental disorders (i.e. intellectual disability and autism). Model evaluation represented that ADHD is clustered in the group of behavioral disorders with the accuracy of 78%. Furthermore, SVM classified ADHD as a behavioral disorder with the accuracy of 72.66% and as a neurodevelopmental disorder with the accuracy of 60.07%. A computational psychiatry approach has been used in this study. Computational psychiatry attempts to establish a logical link between the psychopathological and phenomenological aspects of psychiatric disorders through statistical/computational methods, which may result in reshaping current nosology of diseases (19). Machine learning, which is an important branch of artificial intelligence, is finding its place in computational psychiatry (20,21). In this study, we examined some conventional machine learning methods, while there are other algorithms that may lead to better results. Therefore, it is strongly recommended that future studies cluster data from this national project using more sophisticated and unsupervised algorithms and verify the findings of the present study.
As mentioned previously, we used different categories of variables (i.e. demographic data, social capital, life style and parental personality disorders data) in our statistical analysis. Though, some variables, i.e., demographic, social capital, life style, can be assumed mostly environmental. Other variables, i.e., parental personality disorders, show some degree of genetical vulnerability. Hence, we may conclude that ADHD has more shared risk factors with behavioral disorders than neurodevelopmental disorders. Also, given that it has more comorbidity and shared symptoms with behavioral disorders, it may better be categorized among externalizing behavioral disorders.
In this study, we used behavioral data for our analysis. However, in order to reach a comprehensive and definitive decision, in addition to behavioral data, biological data such as neurological and genetic data are also needed.
Overall, our findings provide more evidence to support classification systems such as HiTOP compared to DSM-5 for ADHD. However, since biological factors were not included in our analysis, this should be considered with caution and investigated in future research. Also, despite common symptoms and risk factors, ADHD may be prevented and treated with approaches same as externalizing behavior disorders.
This study was approved by ethical committee of National Institute for Medical Research Development (IRB code: IR.NIMAD.REC.1395.001). We have obtained assent from the subjects.
This study was supported by Psychiatry and Psychology Research Center, Tehran University of Medical Sciences (grant no: 10452).