Living creatures tend to organize their lives within structured communities such as families, schools, hospitals, towns or countries. For instance, students of the same age living in a town could be grouped into different classes according to their grade level, family income, school district and other features of interest. Other examples related with health care workers and patients show clear hierarchical data structures as well.
Hierarchical or nested structures (usually known as HLM) are very common throughout many research areas. The starting point of this data pattern was set up in the field of social sciences. Most research studies in this area were focused on educational data, where the main interest was to examine the relationship between inputs such as students, teachers or school resources, and student outcomes (academic achievement, self-concept, career aspirations…). Under this scenario, researchers emphasized that individuals who are drawn from an institution (classroom, school, town, …) will be more homogenous than subjects randomly sampled from a larger population: students belonging to the same classroom have the experience of being part of the same environment (places, district, teachers,…) and experiences. Due to this fact, observations based on these individuals are not fully independent.
As noted, hierarchical models consider that exist dependency among observations within the same study unit. Till last decades, owing to the lack of software development, ordinary least squares regression (OLSR), classical regression, has been used to estimate the aforementioned relationships. On consequence, results obtained from OLSR show too small standard errors leading to a higher probability of rejection of a null hypothesis than if: (1) an appropriate statistical analysis was performed; (2) data included truly independent observations. It is clear that the main issue that researchers must address is the non-independence of the observations.
Hierarchical modeling is similar to OLSR. It can be seen as an extension of the classical regression, where at least two levels are defined in the predictive model. On the base level (also called the individual level, referred as level1), the analysis is similar to OLSR: an outcome variable is defined as function of a linear combination of one or more independent level 1 variables:
where is the value of the outcome variable of the th individual of group , represents the intercept of group , is the slope of variable of group . On subsequent levels, the level 1 slope and intercept become dependent variables being predicted from level 2 variables:
Though this process, it is possible to model the effects of level 1 variables and level 2 variables on the desired outcome. In the figure of this post, one can observe that there are three main levels: patients (level1) belong to hospitals (level 2) where, at the same time, hospitals are located in certain neighborhoods (level 3).
This kind of modeling is essential to account for individual – and group level variation in estimating group-level regression coefficients. However, in certain cases, the classical and HLM approaches coincide: (1) when there is very little group-level variation and (2) when the number of groups is small and consequently, there is not enough information to accurately estimate group-level variation. In this setting, HLM gain little from classical OLRS.
Now, it is your opportunity. You know where is worth the effort of applying the HLM methods instead of classical regression.