For this assessment, choose from the following two scenarios. More information about both can be found on the Discovering Statistics Canvas site and the Developmental Psychology Canvas site.
Building on the work of Henderson et al. (2021), you will investigate the following research question: do vocabulary and immediate recall of novel words predict recognition of those words the following day?
Building on the work of Bariola et al. (2012), you will investigate the following research question: do the mother’s and father’s tendency towards emotional suppression predict their child’s?
Calculate each of the following values using the dataset:
# The total number of children who participated
nrow(data) %>%
knitr::kable(
.,
caption = "**The total number of children who participated**",
col.names = "*N*",
align = 'c'
) %>%
kable_styling()
N |
---|
126 |
# The number and percentage of male and female children
data %>%
dplyr::group_by(child_gender) %>%
dplyr::summarise(
n = dplyr::n(),
Percentage = round((n/nrow(data))*100, 2)
) %>%
knitr::kable(
.,
caption = "**The number and percentage of male and female children**",
col.names = c("Gender", "*N*", "Percentage")
) %>%
kable_styling()
Gender | N | Percentage |
---|---|---|
female | 73 | 57.94 |
male | 53 | 42.06 |
# The range, mean and *SD* of the ages of the children, mothers and fathers
task1.1_tib <- dplyr::bind_rows(
data %>%
dplyr::summarise(
Mean = round(mean(child_age), 2),
SD = round(sd(child_age), 3),
Min = min(child_age),
Max = max(child_age)
),
data %>%
dplyr::summarise(
Mean = mean(mother_age),
SD = sd(mother_age),
Min = min(mother_age),
Max = max(mother_age)
),
data %>%
dplyr::summarise(
Mean = mean(father_age),
SD = sd(father_age),
Min = min(father_age),
Max = max(father_age)
)
) %>%
round(., 2)
rownames(task1.1_tib) <- c("Children", "Mothers", "Fathers")
knitr::kable(
task1.1_tib,
caption = "**The range, mean and *SD* of the ages of the children, mothers and fathers (measured in years)**"
) %>%
kable_styling()
Mean | SD | Min | Max | |
---|---|---|---|---|
Children | 8.40 | 0.40 | 8 | 9 |
Mothers | 32.03 | 3.74 | 25 | 42 |
Fathers | 35.41 | 4.56 | 25 | 48 |
Write a short paragraph in the style of a participants section summarising the descriptive information you calculated in the task above (max 100 words).
Participants consisted of 126 children (age range 8.00-9.00 years, M = 8.40, SD = 0.40), their mothers (age range 25.00-42.00 years, M = 32.03, SD = 3.74) and their fathers (age range 25.00-48.00, M = 35.41, SD = 4.56). Parents completed the Emotion Regulation Questionnaire (ERQ; Gross and John 2003, cited in Bariola et al., 2012), whilst children completed the Emotion Regulation Questionnaire for Children and Adolescents variant of the ERQ. In total, the number of female children who participated was 73 (57.94% of total), and the number of males was 53 (42.06% of total).
Only for the scenario you have chosen, create two linear models representing the following hypotheses and print out a summary of both models.
# Model 1: Mother's expressive suppression predicts child's suppression
lm_1 <- lm(child_supp ~ mother_supp, data)
summary(lm_1)
##
## Call:
## lm(formula = child_supp ~ mother_supp, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.7044 -2.6954 0.2774 2.8046 7.8409
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.12287 0.84283 12.011 < 2e-16 ***
## mother_supp 0.50906 0.09755 5.218 7.36e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.838 on 124 degrees of freedom
## Multiple R-squared: 0.1801, Adjusted R-squared: 0.1734
## F-statistic: 27.23 on 1 and 124 DF, p-value: 7.356e-07
broom::tidy(lm_1, conf.int = TRUE)
## # A tibble: 2 x 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 10.1 0.843 12.0 1.62e-22 8.45 11.8
## 2 mother_supp 0.509 0.0976 5.22 7.36e- 7 0.316 0.702
# Model 2: Both mother's and father's expressive suppression predict child's suppression
lm_2 <- lm(child_supp ~ mother_supp + father_supp, data)
summary(lm_2)
##
## Call:
## lm(formula = child_supp ~ mother_supp + father_supp, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.5376 -2.4313 0.5773 2.4592 8.2644
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.1953 1.1780 6.957 1.82e-10 ***
## mother_supp 0.4213 0.1032 4.082 7.99e-05 ***
## father_supp 0.2319 0.1007 2.302 0.023 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.773 on 123 degrees of freedom
## Multiple R-squared: 0.2139, Adjusted R-squared: 0.2011
## F-statistic: 16.74 on 2 and 123 DF, p-value: 3.723e-07
broom::tidy(lm_2, conf.int = TRUE)
## # A tibble: 3 x 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 8.20 1.18 6.96 1.82e-10 5.86 10.5
## 2 mother_supp 0.421 0.103 4.08 7.99e- 5 0.217 0.626
## 3 father_supp 0.232 0.101 2.30 2.30e- 2 0.0325 0.431
Compare the two models and identify which is better, then store that model in a new object called better_model
.
# comparison of lm_1 and lm_2
anova(lm_1, lm_2) %>%
broom::tidy()
## # A tibble: 2 x 6
## res.df rss df sumsq statistic p.value
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 124 1826. NA NA NA NA
## 2 123 1751. 1 75.4 5.30 0.0230
# better_model object
better_model <- lm_2
Briefly report how you decided which model was better, including any relevant statistics such as \(R^2\) for each model and the results of model comparison tests (max 100 words).
An analysis of variance (ANOVA) demonstrated that Model 2 significantly improved the fit of the model compared to Model 1 (F(1, 123) = 5.30, p<.05). Whilst model 1 explained approximately 18% (\(R^2\)= .18) of the variance in childrens’ expressive suppression scores, model 2 explained approximately 21% (\(R^2\) = .21) of the variance in childrens’ expressive suppression scores.
Thoroughly check the assumptions of better_model
and decide which version of the model to report: unadjusted; with robust parameter estimates; with robust standard errors; or bootstrapped. Store the output from the version of the model you choose in a new object called final_model
.
# Diagnostic plots for better_model
ggplot2::autoplot(better_model,
which = c(1,3,2,4),
colour = 'black',
size = 1.5,
alpha = .5,
shape = 'diamond',
smooth.colour = 'red',
) +
theme_light()
# 95% standardised residuals limit (should = ~5%)
better_model %>%
broom::augment() %>%
dplyr::filter(abs(.std.resid) >= 1.96) %>%
nrow()/nrow(data)*100
## [1] 5.555556
# 99% standardised residuals limit (should = ~1%)
better_model %>%
broom::augment() %>%
dplyr::filter(abs(.std.resid) >= 2.5) %>%
nrow()/nrow(data)*100
## [1] 1.587302
# any standardised residuals >= 3? (should = 0)
better_model %>%
broom::augment() %>%
dplyr::filter(abs(.std.resid) >= 3) %>%
nrow()
## [1] 0
# robust parameter estimates for better_model
robust::lmRob(child_supp ~ mother_supp + father_supp, data) %>%
summary()
##
## Call:
## robust::lmRob(formula = child_supp ~ mother_supp + father_supp,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.7473 -2.5443 0.3956 2.0873 8.3078
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.8025 1.3210 5.906 3.19e-08 ***
## mother_supp 0.4561 0.1164 3.918 0.000147 ***
## father_supp 0.2581 0.1142 2.260 0.025571 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.741 on 123 degrees of freedom
## Multiple R-Squared: 0.23
##
## Test for Bias:
## statistic p-value
## M-estimate 2.8732 0.4116
## LS-estimate 0.9898 0.8037
# robust confidence intervals and significance tests
parameters::parameters(better_model, robust = TRUE, vcov.type = "HC4")
## Parameter | Coefficient | SE | 95% CI | t(123) | p
## ------------------------------------------------------------------
## (Intercept) | 8.20 | 1.12 | [5.97, 10.42] | 7.30 | < .001
## mother supp | 0.42 | 0.10 | [0.23, 0.61] | 4.42 | < .001
## father supp | 0.23 | 0.10 | [0.03, 0.44] | 2.26 | 0.026
final_model <- parameters::parameters(better_model, robust = TRUE, vcov.type = "HC4")
Report the results of your checks and justify your decision on which model to report as your final model (max 300 words).
The model satisfies the linearity assumption as the distribution in the residuals-fitted plot is relatively horizontal. Additionally, the assumption of additivity as the combined effect of the mother’s and father’s expressive suppression scores on children’s expressive suppression scores are additive. The assumption of normality of residuals and sampling distribution is also satisfied as the distribution of the Q-Q plot remains positively diagonal throughout. Furthermore, there are no significantly influential cases as demonstrated by the Cook’s distance plot: the highest observed Cook’s distance being approximately 0.08, which is significantly lower that the threshold of and absolute value of 1. additionally, the expected proportion of cases have standardised residuals within the expected range, and no standardised residuals equal or exceed 3.
One assumption not met by the model however is that of homoscedastic errors. The residuals-fitted plot funnels inwards from left to right, meaning the variance of residual errors is not uniform across the distribution.
As such, it is appropriate to use robust parameter estimates for this model, rather than the unadjusted model. Additionally a bootstrap model is not appropriate since the sample size for this data is not too small (N = 126).
Create a nicely formatted table to present the results of final_model
in APA style.
final_model %>%
kable(.,) %>%
kable_styling()
Parameter | Coefficient | SE | CI | CI_low | CI_high | t | df_error | p |
---|---|---|---|---|---|---|---|---|
(Intercept) | 8.1952649 | 1.1223495 | 0.95 | 5.9736428 | 10.4168870 | 7.301883 | 123 | 0.0000000 |
mother_supp | 0.4212673 | 0.0953454 | 0.95 | 0.2325369 | 0.6099977 | 4.418328 | 123 | 0.0000216 |
father_supp | 0.2319071 | 0.1027793 | 0.95 | 0.0284618 | 0.4353524 | 2.256360 | 123 | 0.0258138 |
Obtain standardised bs for the predictor(s).
parameters::model_parameters(better_model, standardise = "refit") %>%
kable(.,) %>%
kable_styling()
Parameter | Coefficient | SE | CI | CI_low | CI_high | t | df_error | p |
---|---|---|---|---|---|---|---|---|
(Intercept) | 8.1952649 | 1.1779992 | 0.95 | 5.8634877 | 10.5270422 | 6.956936 | 123 | 0.0000000 |
mother_supp | 0.4212673 | 0.1032102 | 0.95 | 0.2169690 | 0.6255656 | 4.081643 | 123 | 0.0000799 |
father_supp | 0.2319071 | 0.1007384 | 0.95 | 0.0325015 | 0.4313126 | 2.302071 | 123 | 0.0230124 |
mother_supp(b) = 0.42
father_supp(b) = 0.23
Interpret (explain) what the statistical results for this model tell you, using your results from the tasks above (max 300 words).
Results show that parent’s combined tendency towards emotional suppression significantly predicts emotional suppression in their children (F(1, 123)= 16.74, P<0.001). The model predicts a unit increase in the mother and father’s score expressive suppression score will lead to an increase of 0.23 in the child’s expressive suppression score.