Design

For this assessment, choose from the following two scenarios. More information about both can be found on the Discovering Statistics Canvas site and the Developmental Psychology Canvas site.

Scenario 1: Vocabulary and Storybooks

Building on the work of Henderson et al. (2021), you will investigate the following research question: do vocabulary and immediate recall of novel words predict recognition of those words the following day?

Scenario 2: Emotion Regulation

Building on the work of Bariola et al. (2012), you will investigate the following research question: do the mother’s and father’s tendency towards emotional suppression predict their child’s?

Task 1

1.1

Calculate each of the following values using the dataset:

  • The total number of children who participated
  • The number and percentage of male and female children
  • The range, mean and SD of the ages of the children
  • The range, mean and SD of the ages of the mothers and the fathers
# The total number of children who participated
nrow(data) %>% 
  knitr::kable(
    ., 
    caption = "**The total number of children who participated**",
    col.names = "*N*",
    align = 'c'
  ) %>% 
kable_styling()
The total number of children who participated
N
126
# The number and percentage of male and female children
data %>% 
  dplyr::group_by(child_gender) %>% 
  dplyr::summarise(
    n = dplyr::n(),
    Percentage = round((n/nrow(data))*100, 2)
  ) %>% 
  knitr::kable(
    ., 
    caption = "**The number and percentage of male and female children**",
    col.names = c("Gender", "*N*", "Percentage")
  ) %>% 
kable_styling()
The number and percentage of male and female children
Gender N Percentage
female 73 57.94
male 53 42.06
# The range, mean and *SD* of the ages of the children, mothers and fathers
task1.1_tib <- dplyr::bind_rows(
  data %>% 
  dplyr::summarise(
    Mean = round(mean(child_age), 2),
    SD = round(sd(child_age), 3),
    Min = min(child_age),
    Max = max(child_age)
  ),
  data %>% 
  dplyr::summarise(
    Mean = mean(mother_age),
    SD = sd(mother_age),
    Min = min(mother_age),
    Max = max(mother_age)
  ),
  data %>% 
  dplyr::summarise(
    Mean = mean(father_age),
    SD = sd(father_age),
    Min = min(father_age),
    Max = max(father_age)
  )
) %>% 
round(., 2)

rownames(task1.1_tib) <- c("Children", "Mothers", "Fathers")

knitr::kable(
  task1.1_tib,
  caption = "**The range, mean and *SD* of the ages of the children, mothers and fathers (measured in years)**"
) %>% 
  kable_styling()
The range, mean and SD of the ages of the children, mothers and fathers (measured in years)
Mean SD Min Max
Children 8.40 0.40 8 9
Mothers 32.03 3.74 25 42
Fathers 35.41 4.56 25 48

1.2

Write a short paragraph in the style of a participants section summarising the descriptive information you calculated in the task above (max 100 words).

Participants consisted of 126 children (age range 8.00-9.00 years, M = 8.40, SD = 0.40), their mothers (age range 25.00-42.00 years, M = 32.03, SD = 3.74) and their fathers (age range 25.00-48.00, M = 35.41, SD = 4.56). Parents completed the Emotion Regulation Questionnaire (ERQ; Gross and John 2003, cited in Bariola et al., 2012), whilst children completed the Emotion Regulation Questionnaire for Children and Adolescents variant of the ERQ. In total, the number of female children who participated was 73 (57.94% of total), and the number of males was 53 (42.06% of total).

Task 2

2.1

Only for the scenario you have chosen, create two linear models representing the following hypotheses and print out a summary of both models.

Scenario 1 (Language)

  • Model 1: Vocabulary score predicts second-day recognition.
  • Model 2: Both vocabulary score and first-day recall predict second-day recognition.

Scenario 2 (Emotion Regulation)

  • Model 1: Mother’s expressive suppression predicts child’s suppression.
  • Model 2: Both mother’s and father’s expressive suppression predict child’s suppression.
# Model 1: Mother's expressive suppression predicts child's suppression
lm_1 <- lm(child_supp ~ mother_supp, data)
summary(lm_1)
## 
## Call:
## lm(formula = child_supp ~ mother_supp, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.7044  -2.6954   0.2774   2.8046   7.8409 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.12287    0.84283  12.011  < 2e-16 ***
## mother_supp  0.50906    0.09755   5.218 7.36e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.838 on 124 degrees of freedom
## Multiple R-squared:  0.1801, Adjusted R-squared:  0.1734 
## F-statistic: 27.23 on 1 and 124 DF,  p-value: 7.356e-07
broom::tidy(lm_1, conf.int = TRUE)
## # A tibble: 2 x 7
##   term        estimate std.error statistic  p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)   10.1      0.843      12.0  1.62e-22    8.45     11.8  
## 2 mother_supp    0.509    0.0976      5.22 7.36e- 7    0.316     0.702
# Model 2: Both mother's and father's expressive suppression predict child's suppression
lm_2 <- lm(child_supp ~ mother_supp + father_supp, data) 
summary(lm_2)
## 
## Call:
## lm(formula = child_supp ~ mother_supp + father_supp, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.5376  -2.4313   0.5773   2.4592   8.2644 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   8.1953     1.1780   6.957 1.82e-10 ***
## mother_supp   0.4213     0.1032   4.082 7.99e-05 ***
## father_supp   0.2319     0.1007   2.302    0.023 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.773 on 123 degrees of freedom
## Multiple R-squared:  0.2139, Adjusted R-squared:  0.2011 
## F-statistic: 16.74 on 2 and 123 DF,  p-value: 3.723e-07
broom::tidy(lm_2, conf.int = TRUE) 
## # A tibble: 3 x 7
##   term        estimate std.error statistic  p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)    8.20      1.18       6.96 1.82e-10   5.86      10.5  
## 2 mother_supp    0.421     0.103      4.08 7.99e- 5   0.217      0.626
## 3 father_supp    0.232     0.101      2.30 2.30e- 2   0.0325     0.431

2.2

Compare the two models and identify which is better, then store that model in a new object called better_model.

# comparison of lm_1 and lm_2
anova(lm_1, lm_2) %>% 
  broom::tidy()
## # A tibble: 2 x 6
##   res.df   rss    df sumsq statistic p.value
##    <dbl> <dbl> <dbl> <dbl>     <dbl>   <dbl>
## 1    124 1826.    NA  NA       NA    NA     
## 2    123 1751.     1  75.4      5.30  0.0230
# better_model object
better_model <- lm_2

2.3

Briefly report how you decided which model was better, including any relevant statistics such as \(R^2\) for each model and the results of model comparison tests (max 100 words).

An analysis of variance (ANOVA) demonstrated that Model 2 significantly improved the fit of the model compared to Model 1 (F(1, 123) = 5.30, p<.05). Whilst model 1 explained approximately 18% (\(R^2\)= .18) of the variance in childrens’ expressive suppression scores, model 2 explained approximately 21% (\(R^2\) = .21) of the variance in childrens’ expressive suppression scores.

Task 3

3.1

Thoroughly check the assumptions of better_model and decide which version of the model to report: unadjusted; with robust parameter estimates; with robust standard errors; or bootstrapped. Store the output from the version of the model you choose in a new object called final_model.

# Diagnostic plots for better_model
ggplot2::autoplot(better_model,
         which = c(1,3,2,4),
         colour = 'black',
         size = 1.5,
         alpha = .5,
         shape = 'diamond',
         smooth.colour = 'red',
        ) +
theme_light()

# 95% standardised residuals limit (should = ~5%)
better_model %>% 
  broom::augment() %>% 
  dplyr::filter(abs(.std.resid) >= 1.96) %>% 
  nrow()/nrow(data)*100
## [1] 5.555556
# 99% standardised residuals limit (should = ~1%)
better_model %>% 
  broom::augment() %>% 
  dplyr::filter(abs(.std.resid) >= 2.5) %>% 
  nrow()/nrow(data)*100
## [1] 1.587302
# any standardised residuals >= 3? (should = 0)
better_model %>% 
  broom::augment() %>% 
  dplyr::filter(abs(.std.resid) >= 3) %>% 
  nrow()
## [1] 0
# robust parameter estimates for better_model
robust::lmRob(child_supp ~ mother_supp + father_supp, data) %>% 
  summary() 
## 
## Call:
## robust::lmRob(formula = child_supp ~ mother_supp + father_supp, 
##     data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.7473  -2.5443   0.3956   2.0873   8.3078 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   7.8025     1.3210   5.906 3.19e-08 ***
## mother_supp   0.4561     0.1164   3.918 0.000147 ***
## father_supp   0.2581     0.1142   2.260 0.025571 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.741 on 123 degrees of freedom
## Multiple R-Squared: 0.23 
## 
## Test for Bias:
##             statistic p-value
## M-estimate     2.8732  0.4116
## LS-estimate    0.9898  0.8037
# robust confidence intervals and significance tests
parameters::parameters(better_model, robust = TRUE, vcov.type = "HC4")
## Parameter   | Coefficient |   SE |        95% CI | t(123) |      p
## ------------------------------------------------------------------
## (Intercept) |        8.20 | 1.12 | [5.97, 10.42] |   7.30 | < .001
## mother supp |        0.42 | 0.10 | [0.23,  0.61] |   4.42 | < .001
## father supp |        0.23 | 0.10 | [0.03,  0.44] |   2.26 | 0.026
final_model <- parameters::parameters(better_model, robust = TRUE, vcov.type = "HC4")

3.2

Report the results of your checks and justify your decision on which model to report as your final model (max 300 words).

The model satisfies the linearity assumption as the distribution in the residuals-fitted plot is relatively horizontal. Additionally, the assumption of additivity as the combined effect of the mother’s and father’s expressive suppression scores on children’s expressive suppression scores are additive. The assumption of normality of residuals and sampling distribution is also satisfied as the distribution of the Q-Q plot remains positively diagonal throughout. Furthermore, there are no significantly influential cases as demonstrated by the Cook’s distance plot: the highest observed Cook’s distance being approximately 0.08, which is significantly lower that the threshold of and absolute value of 1. additionally, the expected proportion of cases have standardised residuals within the expected range, and no standardised residuals equal or exceed 3.

One assumption not met by the model however is that of homoscedastic errors. The residuals-fitted plot funnels inwards from left to right, meaning the variance of residual errors is not uniform across the distribution.

As such, it is appropriate to use robust parameter estimates for this model, rather than the unadjusted model. Additionally a bootstrap model is not appropriate since the sample size for this data is not too small (N = 126).

Task 4

4.1

Create a nicely formatted table to present the results of final_model in APA style.

final_model %>% 
  kable(.,) %>% 
  kable_styling()
Parameter Coefficient SE CI CI_low CI_high t df_error p
(Intercept) 8.1952649 1.1223495 0.95 5.9736428 10.4168870 7.301883 123 0.0000000
mother_supp 0.4212673 0.0953454 0.95 0.2325369 0.6099977 4.418328 123 0.0000216
father_supp 0.2319071 0.1027793 0.95 0.0284618 0.4353524 2.256360 123 0.0258138

4.2

Obtain standardised bs for the predictor(s).

parameters::model_parameters(better_model, standardise = "refit") %>% 
  kable(.,) %>% 
  kable_styling()
Parameter Coefficient SE CI CI_low CI_high t df_error p
(Intercept) 8.1952649 1.1779992 0.95 5.8634877 10.5270422 6.956936 123 0.0000000
mother_supp 0.4212673 0.1032102 0.95 0.2169690 0.6255656 4.081643 123 0.0000799
father_supp 0.2319071 0.1007384 0.95 0.0325015 0.4313126 2.302071 123 0.0230124

mother_supp(b) = 0.42

father_supp(b) = 0.23

4.3

Interpret (explain) what the statistical results for this model tell you, using your results from the tasks above (max 300 words).

Results show that parent’s combined tendency towards emotional suppression significantly predicts emotional suppression in their children (F(1, 123)= 16.74, P<0.001). The model predicts a unit increase in the mother and father’s score expressive suppression score will lead to an increase of 0.23 in the child’s expressive suppression score.