Análisis estadístico de experimentos

DCA, DBA y diseños factoriales

Author

Edimer David Jaramillo

Published

November 28, 2024

Bibliotecas

Code
library(tidyverse)
library(effectsize)
library(readxl)
library(janitor)
library(broom)
library(emmeans)
library(car)

Datos

Code
datos <- read_excel("datos/experimento_ratones.xlsx") 
datos

Tamaño del efecto

Anova de una vía (DCA)

Hipótesis

\[ \begin{aligned} H_0: \mu_1 = \mu_2 \\ H_1: \mu_1 \neq \mu_2 \end{aligned} \]

Exploración

Code
datos |> 
  ggplot(aes(x = Treatment, y = GST)) +
  geom_boxplot()

Modelo

Code
modelo1 <- aov(GST ~ Treatment, data = datos)
modelo1 |> tidy()

Medias estimadas

Code
modelo1 |> 
  emmeans(specs = "Treatment") |> 
  as.data.frame()

Diferencia de medias

Code
modelo1 |> TukeyHSD()
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = GST ~ Treatment, data = datos)

$Treatment
                 diff      lwr      upr    p adj
Treated-Control 238.5 103.1061 373.8939 0.002037

Tamaño del efecto

Code
modelo1 |> 
  eta_squared()

Residuales

Code
par(mfrow = c(2, 2))
plot(modelo1)

  • Prueba de normalidad:
Code
modelo1 |> 
  residuals() |> 
  shapiro.test()

    Shapiro-Wilk normality test

data:  residuals(modelo1)
W = 0.92386, p-value = 0.1946
  • Prueba de homocedasticidad:
Code
bartlett.test(data = datos, GST ~ Treatment)

    Bartlett test of homogeneity of variances

data:  GST by Treatment
Bartlett's K-squared = 0.00092761, df = 1, p-value = 0.9757

Diseño en bloques (DBA)

  • Hipótesis principal:

\[ \begin{aligned} H_0: \mu_1 = \mu_2 \\ H_1: \mu_1 \neq \mu_2 \end{aligned} \]

  • Hipótesis de verificación:

\[ \begin{aligned} H_0: \beta_1 = \beta_2 \\ H_A: \beta_1 \neq \beta_2 \end{aligned} \]

Exploración

Code
datos |> 
  ggplot(aes(x = Treatment, y = GST)) +
  facet_wrap(~Block) +
  geom_boxplot()

Modelo

Code
modelo2 <- aov(GST ~ Treatment + Block, data = datos)
modelo2 |> tidy()

Medias estimadas

Code
modelo2 |> 
  emmeans(specs = "Treatment") |> 
  as.data.frame()

Diferencia de medias

Code
modelo2 |> TukeyHSD(which = "Treatment")
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = GST ~ Treatment + Block, data = datos)

$Treatment
                 diff      lwr      upr     p adj
Treated-Control 238.5 144.2819 332.7181 0.0001077

Tamaño del efecto

Code
modelo2 |> 
  eta_squared()

Residuales

Code
par(mfrow = c(2, 2))
plot(modelo2)

  • Prueba de normalidad:
Code
modelo2 |> 
  residuals() |> 
  shapiro.test()

    Shapiro-Wilk normality test

data:  residuals(modelo2)
W = 0.84408, p-value = 0.01116
  • Prueba de homocedasticidad tratamiento:
Code
leveneTest(data = datos, GST ~ Treatment)
  • Prueba de homocedasticidad bloque:
Code
leveneTest(data = datos, GST ~ Block)

Anova de dos vías aditivo

  • Hipótesis factor 1:

\[ \begin{aligned} H_0: \mu_1 = \mu_2 = \mu_3 = \mu_4 \\ H_1: \mu_i \neq \mu_j \end{aligned} \]

  • Hipótesis factor 2:

\[ \begin{aligned} H_0: \mu_1 = \mu_2 \\ H_1: \mu_1 \neq \mu_2 \end{aligned} \]

Exploración

Code
datos |> 
  ggplot(aes(x = Strain, y = GST, color = Treatment, fill = Treatment)) +
  geom_boxplot(alpha = 0.5)

Modelo

Code
modelo3 <- aov(GST ~ Treatment + Strain, data = datos)
modelo3 |> tidy()

Medias estimadas

  • Tratamiento:
Code
modelo3 |> 
  emmeans(specs = "Treatment") |> 
  as.data.frame()
  • Raza:
Code
modelo3 |> 
  emmeans(specs = "Strain") |> 
  as.data.frame()

Diferencia de medias

  • Tratamiento:
Code
modelo3 |> 
  TukeyHSD(which = "Treatment")
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = GST ~ Treatment + Strain, data = datos)

$Treatment
                 diff      lwr      upr     p adj
Treated-Control 238.5 92.14634 384.8537 0.0042678
  • Raza:
Code
modelo3 |> 
  TukeyHSD(which = "Strain")
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = GST ~ Treatment + Strain, data = datos)

$Strain
                  diff       lwr      upr     p adj
A/J-129/Ola      84.25 -198.7606 367.2606 0.8071951
BALB/C-129/Ola  -30.50 -313.5106 252.5106 0.9875652
NIH-129/Ola      28.75 -254.2606 311.7606 0.9895278
BALB/C-A/J     -114.75 -397.7606 168.2606 0.6275906
NIH-A/J         -55.50 -338.5106 227.5106 0.9329709
NIH-BALB/C       59.25 -223.7606 342.2606 0.9201965

Tamaño del efecto

Code
modelo3 |> 
  eta_squared()

Residuales

Code
par(mfrow = c(2, 2))
plot(modelo3)

  • Prueba de normalidad:
Code
modelo3 |> 
  residuals() |> 
  shapiro.test()

    Shapiro-Wilk normality test

data:  residuals(modelo3)
W = 0.98395, p-value = 0.9872
  • Prueba de homocedasticidad tratamiento:
Code
bartlett.test(data = datos, GST ~ Treatment)

    Bartlett test of homogeneity of variances

data:  GST by Treatment
Bartlett's K-squared = 0.00092761, df = 1, p-value = 0.9757
  • Prueba de homocedasticidad raza:
Code
bartlett.test(data = datos, GST ~ Strain)

    Bartlett test of homogeneity of variances

data:  GST by Strain
Bartlett's K-squared = 1.4085, df = 3, p-value = 0.7035

Anova de dos vías multiplicativo

  • Hipótesis factor 1:

\[\begin{aligned} H_0: \mu_1 = \mu_2 = \mu_3 = \mu_4 \\ H_1: \mu_i \neq \mu_j \end{aligned}\]

  • Hipótesis factor 2:

\[\begin{aligned} H_0: \mu_1 = \mu_2 \\ H_1: \mu_1 \neq \mu_1 \end{aligned}\]

  • Hipótesis interacción:

\[\begin{aligned} H_0: (\alpha\tau)_{11} = \cdots = (\alpha\beta)_{42} \\ H_1: (\alpha\tau)_{ij} \neq (\alpha\tau)_{ij} \end{aligned}\]

Exploración

Code
datos |> 
  group_by(Strain, Treatment) |> 
  summarise(promedio = mean(GST)) |> 
  ggplot(aes(x = Strain, y = promedio, color = Treatment)) +
  geom_point() +
  geom_line(aes(group = Treatment))

Modelo

Code
modelo4 <- aov(GST ~ Treatment * Strain, data = datos)
modelo4 |> tidy()

Medias estimadas

  • Tratamiento:
Code
modelo4 |> 
  emmeans(specs = "Treatment") |> 
  as.data.frame()
  • Raza:
Code
modelo4 |> 
  emmeans(specs = "Strain") |> 
  as.data.frame()
  • Interacción raza-tratamiento:
Code
modelo4 |> 
  emmeans(specs = pairwise ~ Treatment | Strain) |> 
  as.data.frame()

Diferencia de medias

  • Tratamiento:
Code
modelo4 |> 
  TukeyHSD(which = "Treatment")
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = GST ~ Treatment * Strain, data = datos)

$Treatment
                 diff      lwr      upr   p adj
Treated-Control 238.5 83.29536 393.7046 0.00758
  • Raza:
Code
modelo4 |> 
  TukeyHSD(which = "Strain")
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = GST ~ Treatment * Strain, data = datos)

$Strain
                  diff       lwr      upr     p adj
A/J-129/Ola      84.25 -220.5596 389.0596 0.8127166
BALB/C-129/Ola  -30.50 -335.3096 274.3096 0.9877705
NIH-129/Ola      28.75 -276.0596 333.5596 0.9896978
BALB/C-A/J     -114.75 -419.5596 190.0596 0.6405120
NIH-A/J         -55.50 -360.3096 249.3096 0.9344206
NIH-BALB/C       59.25 -245.5596 364.0596 0.9219941

Tamaño del efecto

Code
modelo4 |> 
  eta_squared()

Residuales

Code
par(mfrow = c(2, 2))
plot(modelo4)

  • Prueba de normalidad:
Code
modelo4 |> 
  residuals() |> 
  shapiro.test()

    Shapiro-Wilk normality test

data:  residuals(modelo4)
W = 0.91331, p-value = 0.1316
  • Prueba de homocedasticidad tratamiento:
Code
bartlett.test(data = datos, GST ~ Treatment)

    Bartlett test of homogeneity of variances

data:  GST by Treatment
Bartlett's K-squared = 0.00092761, df = 1, p-value = 0.9757
  • Prueba de homocedasticidad raza:
Code
bartlett.test(data = datos, GST ~ Strain)

    Bartlett test of homogeneity of variances

data:  GST by Strain
Bartlett's K-squared = 1.4085, df = 3, p-value = 0.7035
  • Prueba de homocedasticidad interacción:
Code
leveneTest(data = datos, GST ~ Treatment * Strain)

Anova multifactorial aditivo

  • Hipótesis factor 1:

\[\begin{aligned} H_0: \mu_1 = \mu_2 = \mu_3 = \mu_4 \\ H_1: \mu_i \neq \mu_j \end{aligned}\]

  • Hipótesis factor 2:

\[\begin{aligned} H_0: \mu_1 = \mu_2 \\ H_1: \mu_1 \neq \mu_1 \end{aligned}\]

  • Hipótesis de verificación:

\[ \begin{aligned} H_0: \beta_1 = \beta_2 \\ H_A: \beta_1 \neq \beta_2 \end{aligned} \]

Exploración

Code
datos |> 
  ggplot(aes(x = Strain, y = GST, color = Treatment, fill = Treatment)) +
  facet_wrap(~Block) +
  geom_point() +
  geom_line(aes(group = Treatment))

Modelo

Code
modelo5 <- aov(GST ~ Treatment + Strain + Block, data = datos)
modelo5 |> tidy()

Medias estimadas

  • Tratamiento:
Code
modelo5 |> 
  emmeans(specs = "Treatment") |> 
  as.data.frame()
  • Raza:
Code
modelo5 |> 
  emmeans(specs = "Strain") |> 
  as.data.frame()

Diferencia de medias

  • Tratamiento:
Code
modelo5 |> 
  TukeyHSD(which = "Treatment")
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = GST ~ Treatment + Strain + Block, data = datos)

$Treatment
                 diff      lwr      upr     p adj
Treated-Control 238.5 145.0966 331.9034 0.0002012
  • Raza:
Code
modelo5 |> 
  TukeyHSD(which = "Strain")
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = GST ~ Treatment + Strain + Block, data = datos)

$Strain
                  diff        lwr       upr     p adj
A/J-129/Ola      84.25  -97.12006 265.62006 0.5151475
BALB/C-129/Ola  -30.50 -211.87006 150.87006 0.9537274
NIH-129/Ola      28.75 -152.62006 210.12006 0.9607093
BALB/C-A/J     -114.75 -296.12006  66.62006 0.2737937
NIH-A/J         -55.50 -236.87006 125.87006 0.7867403
NIH-BALB/C       59.25 -122.12006 240.62006 0.7532299

Tamaño del efecto

Code
modelo5 |> 
  eta_squared()

Residuales

Code
par(mfrow = c(2, 2))
plot(modelo5)

  • Prueba de normalidad:
Code
modelo5 |> 
  residuals() |> 
  shapiro.test()

    Shapiro-Wilk normality test

data:  residuals(modelo5)
W = 0.96573, p-value = 0.7656
  • Prueba de homocedasticidad tratamiento:
Code
bartlett.test(data = datos, GST ~ Treatment)

    Bartlett test of homogeneity of variances

data:  GST by Treatment
Bartlett's K-squared = 0.00092761, df = 1, p-value = 0.9757
  • Prueba de homocedasticidad raza:
Code
bartlett.test(data = datos, GST ~ Strain)

    Bartlett test of homogeneity of variances

data:  GST by Strain
Bartlett's K-squared = 1.4085, df = 3, p-value = 0.7035
  • Prueba de homocedasticidad bloque:
Code
bartlett.test(data = datos, GST ~ Block)

    Bartlett test of homogeneity of variances

data:  GST by Block
Bartlett's K-squared = 0.14923, df = 1, p-value = 0.6993

Anova multifactorial multiplicativo

  • Hipótesis factor 1:

\[\begin{aligned}H_0: \mu_1 = \mu_2 = \mu_3 = \mu_4 \\ H_1: \mu_i \neq \mu_j\end{aligned}\]

  • Hipótesis factor 2:

\[\begin{aligned}H_0: \mu_1 = \mu_2 \\ H_1: \mu_1 \neq \mu_1\end{aligned}\]

  • Hipótesis interacción:

\[\begin{aligned}H_0: (\alpha\tau)_{11} = \cdots = (\alpha\beta)_{42} \\ H_1: (\alpha\tau)_{ij} \neq (\alpha\tau)_{ij} \end{aligned}\]

  • Hipótesis de verificación:

\[ \begin{aligned} H_0: \beta_1 = \beta_2 \\ H_A: \beta_1 \neq \beta_2 \end{aligned} \]

Exploración

Code
datos |> 
  ggplot(aes(x = Strain, y = GST, color = Treatment, fill = Treatment)) +
  facet_wrap(~Block) +
  geom_point() +
  geom_line(aes(group = Treatment))

Modelo

Code
modelo6 <- aov(GST ~ Treatment * Strain + Block, data = datos)
modelo6 |> tidy()

Medias estimadas

  • Tratamiento:
Code
modelo6 |> 
  emmeans(specs = "Treatment") |> 
  as.data.frame()
  • Raza:
Code
modelo6 |> 
  emmeans(specs = "Strain") |> 
  as.data.frame()
  • Interacción raza-tratamiento:
Code
modelo6 |> 
  emmeans(specs = pairwise ~ Treatment | Strain) |> 
  as.data.frame()

Diferencia de medias

Code
modelo6 |> 
  emmeans(specs = pairwise ~ Treatment | Strain) |> 
  pairs()
Strain = 129/Ola:
 contrast          estimate   SE df t.ratio p.value
 Control - Treated     -216 54.4  7  -3.972  0.0054

Strain = A/J:
 contrast          estimate   SE df t.ratio p.value
 Control - Treated     -420 54.4  7  -7.733  0.0001

Strain = BALB/C:
 contrast          estimate   SE df t.ratio p.value
 Control - Treated     -199 54.4  7  -3.659  0.0081

Strain = NIH:
 contrast          estimate   SE df t.ratio p.value
 Control - Treated     -118 54.4  7  -2.179  0.0657

Results are averaged over the levels of: Block 

Tamaño del efecto

Code
modelo6 |> 
  eta_squared()

Residuales

Code
par(mfrow = c(2, 2))
plot(modelo6)

  • Prueba de normalidad:
Code
modelo6 |> 
  residuals() |> 
  shapiro.test()

    Shapiro-Wilk normality test

data:  residuals(modelo6)
W = 0.92236, p-value = 0.1841
  • Prueba de homocedasticidad tratamiento:
Code
bartlett.test(data = datos, GST ~ Treatment)

    Bartlett test of homogeneity of variances

data:  GST by Treatment
Bartlett's K-squared = 0.00092761, df = 1, p-value = 0.9757
  • Prueba de homocedasticidad raza:
Code
bartlett.test(data = datos, GST ~ Strain)

    Bartlett test of homogeneity of variances

data:  GST by Strain
Bartlett's K-squared = 1.4085, df = 3, p-value = 0.7035
  • Prueba de homocedasticidad bloque:
Code
bartlett.test(data = datos, GST ~ Block)

    Bartlett test of homogeneity of variances

data:  GST by Block
Bartlett's K-squared = 0.14923, df = 1, p-value = 0.6993
  • Prueba de homocedasticidad interacción:
Code
leveneTest(data = datos, GST ~ Treatment * Strain)

Comparación de modelos - AIC

Code
AIC(modelo1, modelo2, modelo3, modelo4, modelo5, modelo6)

Comparaciones múltiples

Tabla

Code
TukeyHSD(modelo6, which = "Treatment:Strain") |> 
  tidy()

Gráfico 1

Code
TukeyHSD(modelo6, which = "Treatment:Strain") |> 
  tidy() |> 
  ggplot(aes(x = fct_reorder(contrast, estimate), y = estimate,
             ymin = conf.low, ymax = conf.high)) +
  geom_point() +
  geom_errorbar() +
  geom_hline(yintercept = 0, lty = 2, color = "red") +
  coord_flip()