class: middle, right, title-slide # Linear mixed models in R ## - consepts - execution - inspection - ### Athanasia Monika Mowinckel ### June 5
th
2019 --- layout: true <div class="my-sidebar"></div> --- # Linear mixed models - why? .pull-left[ **Linear models** - Homoscedastic (equal variances) - No autocorrelation - Resudials should be normally distributed ] .pull-right[ **Linear mixed models** - Handles autocorrelation through _random_ terms - Handles heteroscedasticity through _random_ terms - Residuals need not be normally distributed - When _linear model_ assumptions are met, generally gives same results in large data sets ] --- class: middle, center # Linear mixed models - why? Mixed models have many names, like our favourite pets: - hierarchical models - linear mixed models - linear mixed effects - multi-level models ??? In statistics we are usually taught only to do linear regressions, like t-tests and anova. These are good options when you have single observations per entity, and entity can for instance be a person or a location. If you have repeated observations from the same entity, or there is some hierarchical structure to your data in a way, i.e. your dependent observations correlate in some way, a linear mixed model might suit your needs more. When searching for information on mixed models, try all these terms, they will give you good coverage of the subject. These terms are not completely synonymous to each other, but searches on these will help you identify what model applies to your data. --- class: dark, center, middle # Get started ## Inspecting the data --- background-image: url("https://vignette.wikia.nocookie.net/gameofthrones/images/e/e0/Dragons_S8_Ep_1.jpg/revision/latest/scale-to-width-down/2000?cb=20190415031732); background-size: cover # Getting started .pull-left[ ```r library(tidyverse) ``` ```r url <- "https://raw.githubusercontent.com/Athanasiamo/LME_introduction_workshop/master/data-raw/dragons.tsv" download.file(url, "dragons.tsv") dragons <- read_tsv("dragons.tsv") ``` ] ??? Let's dig into it. We will be using a data set used by Garbiela K Hajduk in her introcution to LME [blogpost](https://gkhajduk.github.io/2017-03-09-mixed-models/). I did this workshop with the gapminder data before, but while it works nicely for running LME's, it has some properties making some highlighted advantages to using LME's difficult ro illustrate. The data is shipped with this course package. --- class: middle ### Hierarchical, long data <img src="static/hierarchy.gif" width="60%" /> ??? The background for this data is testing dragon intelligence, in some unknown way, at multiple mountain ranges and several sites. The data is already in long-format, which is where an entity has as many rows of data as observations. As you see in the top part of the data, each mountain rage several rows of data and multiple sites. This dataset is a great dataset for this workshop because it has a nice hierarchy, we do not only have mountain ranges, but sites are nested within mountain ranges This information is something we should use as we are building our models. As always, we should get to know our data a little before we start. Let's make some plots. --- ## Inspecting the data .pull-left[ ```r dragons %>% ggplot(aes(x=testScore)) + geom_density() ``` > checking the distribution of the data ] .pull-right[ <img src="lme_UseR_Oslo_2019_files/figure-html/dist1-out-1.png" width="100%" /> ] --- ## Inspecting the data .pull-left[ ```r dragons %>% ggplot(aes(x=bodyLength, y=testScore)) + geom_jitter(alpha=.2) + geom_smooth(method = "lm", colour="black") ``` ] .pull-right[ <img src="lme_UseR_Oslo_2019_files/figure-html/plot_smooths-out-1.png" width="100%" /> ] ??? - The `geom_smooth` of ggplot can give us a simple linear regression over the data. - It looks like there is an association between dragons body lengths and their intelligence. --- class: dark, middle, center # Let's do some modelling! ## Exploring standard linear models --- ## Some R-syntax information - formula .pull-left[ Predict intelligence by a dragons body length: `testScore ~ bodyLength` Predicting test score, with body length and range as main effects: `testScore ~ bodyLength + mountainRange` ] .pull-right[ Predicting intelligence only by the interaction of body length and mountain range: `testScore ~ bodyLength:mountainRange` A 'full factorial', a complete mains + interactions: `testScore ~ bodyLength + mountainRange + bodyLength:mountainRange` or the shorthand with an asterisk (`*`): `testScore ~ bodyLength * mountainRange ` ] ??? Running models in R, we use something we call a `formula`. This is basically an unquoted expression of your model specification. What is on the left-side of the tilde (`~`) is your dependent variable, and on the right you place you predictors. Main effects are added on using `+`. Interactions are specified with `:` --- ## Linear models - running a simple model .pull-left[ ```r library(broom.mixed) dragons_lm <- lm( testScore ~ bodyLength, data=dragons ) tidy(dragons_lm) %>% * knitr::kable(format="html", * digits=3) ``` From here on highlighted code means code you can ignore, it is just there to make the slides prettier. ] .pull-right[ <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> -61.318 </td> <td style="text-align:right;"> 12.067 </td> <td style="text-align:right;"> -5.081 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> bodyLength </td> <td style="text-align:right;"> 0.555 </td> <td style="text-align:right;"> 0.060 </td> <td style="text-align:right;"> 9.287 </td> <td style="text-align:right;"> 0 </td> </tr> </tbody> </table> <img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRU49V5H_nEJOSsilgH_Zvv_rowbB1yprlGoNt6fi8bC6rMh4S8" width="100%" /> ] --- ## Linear models - plot the regression .pull-left[ ```r predict_data <- tibble( bodyLength = seq(from = min(dragons$bodyLength), to = max(dragons$bodyLength))) dragons_lm_fit <- augment_columns( dragons_lm, newdata = predict_data) dragons_lm_fit %>% ggplot(aes(x=bodyLength, y=.fitted)) + geom_line() + geom_ribbon( alpha=.5, aes(ymin=.fitted-.se.fit, ymax=.fitted+.se.fit)) ``` ] .pull-right[ <img src="lme_UseR_Oslo_2019_files/figure-html/plot_lm1-out-1.png" width="100%" /> ] --- ## Linear models - checking assumption violations .pull-left[ ```r plot(dragons_lm, which=1) ``` ] .pull-right[ <img src="lme_UseR_Oslo_2019_files/figure-html/plot_lm2-out-1.png" width="100%" /> ] ??? Simple regressions in R have very easy ways to plot and check for assumption violations, just by plotting the model output. This residuals plot clearly shows that the assumption that the residuals be normally distributed is violated. If they were, the red line would be straight from 0, and the dots would appear to be random around the plot. --- ## Linear models - checking assumption violations .pull-left[ ```r plot(dragons_lm, which=2) ``` ] .pull-right[ <img src="lme_UseR_Oslo_2019_files/figure-html/plot_lm3-out-1.png" width="100%" /> ] ??? The normal QQ-plot is another great way to check violations. The dots should fall along the line should as a straight diagonal from corner to corner. This line is not corner-to-corner, and the dots don't all fall along it. This means there is something important in the data we are not accounting for. --- ## Linear models - checking assumption violations .pull-left[ ```r dragons %>% ggplot(aes(x=mountainRange, y=testScore, fill=mountainRange)) + geom_boxplot(alpha=.5) + coord_flip() ``` ] .pull-right[ <img src="lme_UseR_Oslo_2019_files/figure-html/box1-out-1.png" width="100%" /> ] ??? By making boxplots for each mountain range we can see data varies substantially across these. We should somehow be incorporating this information in our models. The previous diagnostic plots clearly indicate autocorrelation in our data, and we know this, because each subsequent observation for a mountain range is correlated to the last observation. - If we split the data up into the different mountain ranges, we can see there is great variation in the intelligence and size of the dragons in the different mountain ranges. - Our model should account for this variability when trying to see if there is a true relationship between the length and intelligence of a dragon. --- class: dark, middle ## **TASK:** Try adding an interaction term between bodyLength and mountainRange. See if the assumption violations are improved by this.
05
:
00
--- ## Linear models - adding a covariate .pull-left[ ```r dragons_lm_mountInt <- lm( testScore ~ bodyLength * mountainRange, data=dragons ) tidy(dragons_lm_mountInt) %>% * knitr::kable(format="html", * digits=3) ``` ] .pull-right[ <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> -44.834 </td> <td style="text-align:right;"> 36.123 </td> <td style="text-align:right;"> -1.241 </td> <td style="text-align:right;"> 0.215 </td> </tr> <tr> <td style="text-align:left;"> bodyLength </td> <td style="text-align:right;"> 0.378 </td> <td style="text-align:right;"> 0.201 </td> <td style="text-align:right;"> 1.884 </td> <td style="text-align:right;"> 0.060 </td> </tr> <tr> <td style="text-align:left;"> mountainRangeCentral </td> <td style="text-align:right;"> 160.671 </td> <td style="text-align:right;"> 80.791 </td> <td style="text-align:right;"> 1.989 </td> <td style="text-align:right;"> 0.047 </td> </tr> <tr> <td style="text-align:left;"> mountainRangeEmmental </td> <td style="text-align:right;"> 6.053 </td> <td style="text-align:right;"> 56.854 </td> <td style="text-align:right;"> 0.106 </td> <td style="text-align:right;"> 0.915 </td> </tr> <tr> <td style="text-align:left;"> mountainRangeJulian </td> <td style="text-align:right;"> 180.106 </td> <td style="text-align:right;"> 58.222 </td> <td style="text-align:right;"> 3.093 </td> <td style="text-align:right;"> 0.002 </td> </tr> <tr> <td style="text-align:left;"> mountainRangeLigurian </td> <td style="text-align:right;"> 118.255 </td> <td style="text-align:right;"> 57.064 </td> <td style="text-align:right;"> 2.072 </td> <td style="text-align:right;"> 0.039 </td> </tr> <tr> <td style="text-align:left;"> mountainRangeMaritime </td> <td style="text-align:right;"> 140.017 </td> <td style="text-align:right;"> 86.033 </td> <td style="text-align:right;"> 1.627 </td> <td style="text-align:right;"> 0.104 </td> </tr> <tr> <td style="text-align:left;"> mountainRangeSarntal </td> <td style="text-align:right;"> 116.391 </td> <td style="text-align:right;"> 53.117 </td> <td style="text-align:right;"> 2.191 </td> <td style="text-align:right;"> 0.029 </td> </tr> <tr> <td style="text-align:left;"> mountainRangeSouthern </td> <td style="text-align:right;"> 90.017 </td> <td style="text-align:right;"> 52.225 </td> <td style="text-align:right;"> 1.724 </td> <td style="text-align:right;"> 0.085 </td> </tr> <tr> <td style="text-align:left;"> bodyLength:mountainRangeCentral </td> <td style="text-align:right;"> -0.644 </td> <td style="text-align:right;"> 0.399 </td> <td style="text-align:right;"> -1.614 </td> <td style="text-align:right;"> 0.107 </td> </tr> <tr> <td style="text-align:left;"> bodyLength:mountainRangeEmmental </td> <td style="text-align:right;"> -0.006 </td> <td style="text-align:right;"> 0.289 </td> <td style="text-align:right;"> -0.020 </td> <td style="text-align:right;"> 0.984 </td> </tr> <tr> <td style="text-align:left;"> bodyLength:mountainRangeJulian </td> <td style="text-align:right;"> -0.681 </td> <td style="text-align:right;"> 0.289 </td> <td style="text-align:right;"> -2.358 </td> <td style="text-align:right;"> 0.019 </td> </tr> <tr> <td style="text-align:left;"> bodyLength:mountainRangeLigurian </td> <td style="text-align:right;"> -0.530 </td> <td style="text-align:right;"> 0.290 </td> <td style="text-align:right;"> -1.829 </td> <td style="text-align:right;"> 0.068 </td> </tr> <tr> <td style="text-align:left;"> bodyLength:mountainRangeMaritime </td> <td style="text-align:right;"> -0.488 </td> <td style="text-align:right;"> 0.440 </td> <td style="text-align:right;"> -1.109 </td> <td style="text-align:right;"> 0.268 </td> </tr> <tr> <td style="text-align:left;"> bodyLength:mountainRangeSarntal </td> <td style="text-align:right;"> -0.409 </td> <td style="text-align:right;"> 0.279 </td> <td style="text-align:right;"> -1.465 </td> <td style="text-align:right;"> 0.143 </td> </tr> <tr> <td style="text-align:left;"> bodyLength:mountainRangeSouthern </td> <td style="text-align:right;"> -0.453 </td> <td style="text-align:right;"> 0.290 </td> <td style="text-align:right;"> -1.563 </td> <td style="text-align:right;"> 0.119 </td> </tr> </tbody> </table> ] --- ## Linear models - checking assumption violations .pull-left[ Adding an interaction is not helping our assumption violations. We need to be handling the autocorrelation somehow. An alternative is to aggregate data across mountain ranges and use the aggregated data for analysis. But that is data reduction, and you will be loosing a lot of power and cannot neatly account for the variability in the data. ] .pull-right[ ```r plot(dragons_lm_mountInt, which=2) ``` <img src="lme_UseR_Oslo_2019_files/figure-html/unnamed-chunk-6-1.png" width="100%" /> ] --- class: dark, middle, center # Let's do some modelling! ## Exploring linear mixed models --- ## Linear mixed models .pull-left[ ```r library(lme4) dragons_lme_mount <- lmer( testScore ~ bodyLength + (1|mountainRange), data = dragons ) ``` ] .pull-right[ In the package `lme4` a random effect (autocorrelation specification) is added with the formulaic expression `(1|entity)`, which will fit an independent intercept per entity. In this case, our entity is mountain range. You may also use the package `nlme` for linear mixed models, but you specify the random effect differently. ] ```r tidy(dragons_lme_mount) %>% * knitr::kable(format="html") ``` <table> <thead> <tr> <th style="text-align:left;"> effect </th> <th style="text-align:left;"> group </th> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> fixed </td> <td style="text-align:left;"> NA </td> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 43.709380 </td> <td style="text-align:right;"> 17.1348869 </td> <td style="text-align:right;"> 2.5508998 </td> </tr> <tr> <td style="text-align:left;"> fixed </td> <td style="text-align:left;"> NA </td> <td style="text-align:left;"> bodyLength </td> <td style="text-align:right;"> 0.033165 </td> <td style="text-align:right;"> 0.0786466 </td> <td style="text-align:right;"> 0.4216961 </td> </tr> <tr> <td style="text-align:left;"> ran_pars </td> <td style="text-align:left;"> mountainRange </td> <td style="text-align:left;"> sd__(Intercept) </td> <td style="text-align:right;"> 18.430156 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:left;"> ran_pars </td> <td style="text-align:left;"> Residual </td> <td style="text-align:left;"> sd__Observation </td> <td style="text-align:right;"> 14.960422 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> </tr> </tbody> </table> --- ## Linear mixed models - inspecting residuals .pull-left[ ```r qqnorm(resid(dragons_lme_mount)) qqline(resid(dragons_lme_mount)) ``` ] .pull-right[ <img src="lme_UseR_Oslo_2019_files/figure-html/resid-1-out-1.png" width="100%" /> ] ??? With LMEs we no longer get QQ plots to inspect, so we need to force one with some extra coding. The dots are more nicely on the line now, with only a few stragglers. But there **are** stragglers! There is still something in the data we are not accounting for. Within each mountain range, there are several sites. There might be site trends that might help our model work better, and also we can utilise partial pooling, as the intercepts of sites within mountain ranges will be shrunk towards the mountain range estimate. This increases our degrees of freedom, ergo more power! --- ## Linear mixed models - random intercepts .pull-left[ ```r dragons <- dragons %>% mutate(siteUniq = factor( mountainRange:site )) dragons_lme_mountS <- lmer( testScore ~ bodyLength + (1|mountainRange) + (1|siteUniq), data=dragons) ``` ] .pull-right[ ```r tidy(dragons_lme_mountS) %>% * knitr::kable(format="html") ``` <table> <thead> <tr> <th style="text-align:left;"> effect </th> <th style="text-align:left;"> group </th> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> fixed </td> <td style="text-align:left;"> NA </td> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 40.0666846 </td> <td style="text-align:right;"> 21.8637262 </td> <td style="text-align:right;"> 1.8325643 </td> </tr> <tr> <td style="text-align:left;"> fixed </td> <td style="text-align:left;"> NA </td> <td style="text-align:left;"> bodyLength </td> <td style="text-align:right;"> 0.0512593 </td> <td style="text-align:right;"> 0.1036823 </td> <td style="text-align:right;"> 0.4943882 </td> </tr> <tr> <td style="text-align:left;"> ran_pars </td> <td style="text-align:left;"> siteUniq </td> <td style="text-align:left;"> sd__(Intercept) </td> <td style="text-align:right;"> 4.8052854 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:left;"> ran_pars </td> <td style="text-align:left;"> mountainRange </td> <td style="text-align:left;"> sd__(Intercept) </td> <td style="text-align:right;"> 18.0986815 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:left;"> ran_pars </td> <td style="text-align:left;"> Residual </td> <td style="text-align:left;"> sd__Observation </td> <td style="text-align:right;"> 14.4422846 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> </tr> </tbody> </table> ] ??? First we need to make sites unique, as they are meaningless without being attached to their mountain range. Let's first just try _just_ adding the unique site as a random effect. This will **not** nest the sites within mountain rages, and we will not get partial pooling. We ignore the hierarchical structure of the data, but would like to see if this makes any difference to our model. --- ## Linear mixed models - inspecting residuals .pull-left[ ```r qqnorm(resid(dragons_lme_mountS)) qqline(resid(dragons_lme_mountS)) ``` ] .pull-right[ <img src="lme_UseR_Oslo_2019_files/figure-html/resid-2-out-1.png" width="100%" /> ] ??? The QQ is even better now. Most lines fall directly or very close to the line, and it's almost directly from corner to corner. --- ## Linear mixed models - random intercepts ```r dragons_lme_mountSnest <- lmer(testScore ~ bodyLength + (1|mountainRange) + (1|mountainRange:site), data=dragons) broom::tidy(dragons_lme_mountSnest) %>% * knitr::kable(format="html") ``` <table> <thead> <tr> <th style="text-align:left;"> effect </th> <th style="text-align:left;"> group </th> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> fixed </td> <td style="text-align:left;"> NA </td> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 40.0666846 </td> <td style="text-align:right;"> 21.8637262 </td> <td style="text-align:right;"> 1.8325643 </td> </tr> <tr> <td style="text-align:left;"> fixed </td> <td style="text-align:left;"> NA </td> <td style="text-align:left;"> bodyLength </td> <td style="text-align:right;"> 0.0512593 </td> <td style="text-align:right;"> 0.1036823 </td> <td style="text-align:right;"> 0.4943882 </td> </tr> <tr> <td style="text-align:left;"> ran_pars </td> <td style="text-align:left;"> mountainRange:site </td> <td style="text-align:left;"> sd__(Intercept) </td> <td style="text-align:right;"> 4.8052854 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:left;"> ran_pars </td> <td style="text-align:left;"> mountainRange </td> <td style="text-align:left;"> sd__(Intercept) </td> <td style="text-align:right;"> 18.0986815 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:left;"> ran_pars </td> <td style="text-align:left;"> Residual </td> <td style="text-align:left;"> sd__Observation </td> <td style="text-align:right;"> 14.4422846 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> </tr> </tbody> </table> --- ## Linear mixed models - inspecting residuals .pull-left[ ```r qqnorm(resid(dragons_lme_mountSnest)) qqline(resid(dragons_lme_mountSnest)) ``` ] .pull-right[ <img src="lme_UseR_Oslo_2019_files/figure-html/resid-3-out-1.png" width="100%" /> ] ??? The last two models are actually identical, due to the way they have been set up. This took me quite some time to understand, and is not so intuitive to me. But because `siteUniq` is unique for each site, and we have `(1|mountainRange)` in the formula too, these actually become nested. Some prefer having this type of syntax, while for me it makes more intuitive sense to state the nesting formulaically with `(1|mountainRange) + (1|mountainRange:site)`. --- ## Linear mixed models - Comparing models ```r anova(dragons_lme_mount, dragons_lme_mountS, dragons_lme_mountSnest) %>% tidy() %>% * knitr::kable(format="html") ``` <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> npar </th> <th style="text-align:right;"> AIC </th> <th style="text-align:right;"> BIC </th> <th style="text-align:right;"> logLik </th> <th style="text-align:right;"> deviance </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> df </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> dragons_lme_mount </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 4001.478 </td> <td style="text-align:right;"> 4018.173 </td> <td style="text-align:right;"> -1996.739 </td> <td style="text-align:right;"> 3993.478 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:left;"> dragons_lme_mountS </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 3988.767 </td> <td style="text-align:right;"> 4009.636 </td> <td style="text-align:right;"> -1989.384 </td> <td style="text-align:right;"> 3978.767 </td> <td style="text-align:right;"> 14.71065 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0.0001253 </td> </tr> <tr> <td style="text-align:left;"> dragons_lme_mountSnest </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 3988.767 </td> <td style="text-align:right;"> 4009.636 </td> <td style="text-align:right;"> -1989.384 </td> <td style="text-align:right;"> 3978.767 </td> <td style="text-align:right;"> 0.00000 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1.0000000 </td> </tr> </tbody> </table> ??? When comparing models, which is the real advantage when using mixed models, and the likelihood function it uses, we want to find the model with lowest AIC and/or BIC, but also to keep in mind what model best formulates the known structure of the data. In this case, all models are performing more or less equally, and so we will keep with the nested model, which preserves more of the data structure. R tells us that the models are refitted with ML instead of REML. This is necessary for model comparison, and it's very handy that this is done for us, so we don't have to keep running models in parallel for comparisons. You can now also see that the two models with the different ways of nesting have identical values, and give the same output. So now we know that. The next question is, how do we know that any of these models are better than the null, i.e. that there is no relationship between body length and intelligence? We should run a "null" model, and add this to the models comparisons table. --- class: dark, middle ## Run the "null" model, check the QQ-plot, and add it to the model comparisons. **HINT:** Formula to just calculate the intercept for a model (the null) is `y ~ 1 + (1|entity)`
05
:
00
--- ## Linear mixed models - Comparing models ```r dragons_lme_null <- lmer(testScore ~ 1 + (1|mountainRange) + (1|mountainRange:site), data=dragons) broom::tidy(dragons_lme_null) %>% * knitr::kable(format="html") ``` <table> <thead> <tr> <th style="text-align:left;"> effect </th> <th style="text-align:left;"> group </th> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> fixed </td> <td style="text-align:left;"> NA </td> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 50.386032 </td> <td style="text-align:right;"> 6.64072 </td> <td style="text-align:right;"> 7.587435 </td> </tr> <tr> <td style="text-align:left;"> ran_pars </td> <td style="text-align:left;"> mountainRange:site </td> <td style="text-align:left;"> sd__(Intercept) </td> <td style="text-align:right;"> 4.708035 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:left;"> ran_pars </td> <td style="text-align:left;"> mountainRange </td> <td style="text-align:left;"> sd__(Intercept) </td> <td style="text-align:right;"> 18.491431 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:left;"> ran_pars </td> <td style="text-align:left;"> Residual </td> <td style="text-align:left;"> sd__Observation </td> <td style="text-align:right;"> 14.432813 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> </tr> </tbody> </table> --- ## Linear mixed models - Comparing models .pull-left[ ```r qqnorm(resid(dragons_lme_null)) qqline(resid(dragons_lme_null)) ``` ] .pull-right[ <img src="lme_UseR_Oslo_2019_files/figure-html/resid-4-out-1.png" width="100%" /> ] ??? The last two models are actually identical, due to the way they have been set up. This took me quite some time to understand, and is not so intuitive to me. But because `siteUniq` is unique for each site, and we have `(1|mountainRange)` in the formula too, these actually become nested. Some prefer having this type of syntax, while for me it makes more intuitive sense to state the nesting formulaeically with `(1|mountainRange) + (1|mountainRange:site)`. --- ## Linear mixed models - Comparing models ```r anova(dragons_lme_null, dragons_lme_mountSnest) %>% tidy() %>% * knitr::kable(format="html") ``` <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> npar </th> <th style="text-align:right;"> AIC </th> <th style="text-align:right;"> BIC </th> <th style="text-align:right;"> logLik </th> <th style="text-align:right;"> deviance </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> df </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> dragons_lme_null </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 3987.054 </td> <td style="text-align:right;"> 4003.749 </td> <td style="text-align:right;"> -1989.527 </td> <td style="text-align:right;"> 3979.054 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:left;"> dragons_lme_mountSnest </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 3988.767 </td> <td style="text-align:right;"> 4009.636 </td> <td style="text-align:right;"> -1989.384 </td> <td style="text-align:right;"> 3978.767 </td> <td style="text-align:right;"> 0.2867604 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0.5923041 </td> </tr> </tbody> </table> --- ## Linear mixed models - Inspecting the fit .pull-left[ ```r predict_data$fit = predict( dragons_lme_mountSnest, newdata=predict_data, re.form=NA) ggplot(dragons, aes(x=bodyLength, y=testScore)) + geom_jitter(alpha=.2) + geom_smooth(method="lm") + geom_line(data=predict_data, colour="forestgreen", aes(y=fit)) ``` ] .pull-right[ <img src="lme_UseR_Oslo_2019_files/figure-html/pred1-out-1.png" width="100%" /> ] ??? Lets have a look at our fit. Make a data.frame with the variable of interest, `bodyLength`, spanning the time to predict in and run a prediction to get the fits. We also add a standard linear smooth and see if that is different. In this case, we can see that running an LME has very clearly altered our inference. The simple linear regression clearly indicated that there was a relationship between body length and intelligence, but violated model assumptions. When running LME that accounted for autocorrelation in the data, the indicated relationship between body length and intelligence went up in flames. --- background-image: url("https://usercontent1.hubstatic.com/14112116_f520.jpg") background-size: cover class: center, middle # Playing with other data <div style="background-color: #fff; opacity: .5"> Slides using the gapminder data <a href="https://athanasiamo.github.io/RLadies-London-LMM-2019/#1">here</a> </div> ??? If we have time, we can have a look at applying some of this to the `gapminder` data. I have used this data for this workshop before, and they are nice to use for understading the specification of random effects, and one can even try random _slopes_, which we have not covered here. --- # More resources on LMM's ### - Julia Pilowski's [practical guide](https://www.juliapilowsky.com/2018/10/19/a-practical-guide-to-mixed-models-in-r/) ### - Bodo Winters' [tutorials and data](http://www.bodowinter.com/tutorials.html) ### - Jared Knowles' [getting started](https://www.jaredknowles.com/journal/2013/11/25/getting-started-with-mixed-effect-models-in-r) ### - B. Bolker's [mixed models in R](https://rpubs.com/bbolker/3336) ### - Gabriela Hadjuk's [introduction](https://gkhajduk.github.io/2017-03-09-mixed-models/) - whose data I have borrowed for this workshop --- background-image: url("lme_rladies_london_files/Screenshot-2019-03-26-at-16.41.58.png") background-size: cover class: center # [https://drmowinckels.io](https://drmowinckels.io)