ANCOVA (Analysis of Covariance)

AIM: To compare (two or more) independent sample means while controlling for a covariate


ANCOVA (Analysis of Variance) is applied to compare more than two independent sample means, while controlling for a covariate.

Suppose you want to compare mean marks scored in 12 th standard, by students in two different schools. The results show that the mean marks for school A are significantly more than that for school B. Here, we tend to infer that school A is much better than school B.

Then comes next issue: selective admission to intelligent students in school A. Now, you want to test whether the difference in the marks is because of quality education or it is because of quality of admitted students to school A. To identify quality of admitted students, you can consider marks obtained in previous years, before the admission to these schools. For example, marks obtained in 10th (considering students change the school after 10th standard).

Now you want to compare the mean marks in 12th AFTER controlling for marks in 10th. Here the covariate is marks in 10th standard. So, in this example dependent variable is marks in 12th standard, independent variable is school of 12th standard, and covariate is marks previous to the admission to the school. ANCOVA can be applied here to compare two means after controlling for the covariate.

Assumptions
1. Dependent variable and covariate should be continuous and normally distributed in all the categories of independent variable.
2. Samples are drawn using random sampling technique.
3. The observations are independent. There are no observations, which are in more than one group.
4. The residuals should be normally distributed. (homogeneity of variance  / homoscedasticity). It can be tested by Levene's test of homogeneity of variance of errors.
5. There are no outliers.
6. There is one independent variable, which is nominal or ordinal, with more than two possible levels / categories.
7. Covariate should be linearly related to the dependent variable for each category of independent variable. It can be tested by scatter plot for dependent variable and covariate, one each for each category of independent variable.

Null hypothesis and Alternate Hypothesis

Null Hypothesis

A. Sample means for all categories of independent variables are equal (μi = μj), after controlling for covariate.

 Alternate Hypothesis

A. There are at least two means (at least a pair of means), which are different than each other (μi ≠ μj), after controlling for covariate.

Where μi and μj can be any two category means.



Following is the sample data for marks in 12th, school and previous marks.

Marks_12_Standard School Previous_marks
80 B    64
77 B    61
78 B    60
70 B    54
95 A    90
90 A    92
78 B    52
76 B    57
93 A    85
92 A    90
75 B    57
75 B    65
73 B    55
95 A    88
75 B    51
75 B    69
80 B    60
82 A    83
80 A    98
99 A    92
80 B    56
91 A    94
80 B    68
77 B    65
83 B    67
85 B    70

You can copy and paste above table, and paste in the textbox on the ANCOVA page. (You may need to paste it in excel, to get the data in required format)
Please choose your alpha (usually 5%). Select type III as model type for SSQ. This type III SSQ is more approproiate for most of the situations. For details, please read details about types of SSQ in two-way ANOVA help page.
Click on "Run test". You will get following output.

 
Results

Descriptive: School
Category Mean SD Sample Size
B77.47063.676217
A90.77786.15999

Reference Category = B
(Explanation: Above table gives raw means for mean marks, SD and sample size in dependent variable (12th marks) for both the schools.
ANCOVA Results


Tests of between-subjects effects
Dependent variable = Marks_12_Standard
SourceSum of Squares (Type III)dfMean Sum of SquareFP
Corrected Model1078.02072539.010425.62340
Previous_marks35.9654135.96541.70970.2039
School35.7206135.72061.69810.2054
Error483.82542321.0359
Corrected Total1561.846226
Non Significant p value for School(P = 0.2054): The means for various caterogies of School are equal, after adjusting for Previous_marks.
(Explanation: You need to look at the row for our independent variable:School. Here F=1.6981, p=0.2054, non-significant: See above interpretation)
Parameter Estimates

ParameterBeta CoefficientsStd ErrortPLBUB
Intercept64.272610.15476.3294043.266185.2792
Previous_marks0.21760.16641.30760.2039-0.12670.5619
A6.87115.27291.30310.2054-4.036717.7788

Explanation: Using this regression table, we can predict 12 th marks for a student. The regression equation will be as follows
DV = 64.2726 + 0.2176 * (Previous_marks) + 6.8711 * (School)
Here, DV= 12th marks
School = 1, for school A
School = 0, for school B
 
Adjusted Means

CategoryAdjusted MeanStandard ErrorLBUB
B79.69852.034875.489183.9078
A86.56963.56379.198893.9403

Explanation: Above table shows average 12th marks, after adjusting for covariate. Please compare the means before and after adjustment.

Column chart showing Unadjusted and adjusted means

ANCOVA
Post-hoc Group-wise comparisons

Group-wise comparisons
Comparison BetweenDifference between meansStandard ErrortP#LBUB
B AND A-6.87115.2729-1.30310.2049-17.77884.0367
# Multiple comparisons adjustments: Bonferroni
Explanation: If table for "Tests of between-subjects effects" shows significant p value for independent variable, it means that at least a pair of category means in independent variables is significantly different. Above table provides results of post-hoc test (Bonferroni) to identify such pair/pairs of means.
(In this example, p value for independent variable:School is non-significant in the table for "Tests of between-subjects effects". So post-hoc test is actually not needed.


Levene's test for equality of error variances
 
Fdf1df2P
4.10211240.0541
Levene test is not significant (P = 0.0541): The assumption of equality of variances (homoskedasticity) is met.
Explanation: If Levene test is significant, then assumption of equality of error variances is violated. In this situation, the results of ANCOVA are biased.
In this example, Levene test is non significant, indicating that we can proceed with ANCOVA.

Lack of Fit test

SourceSum of SquaresdfMean Sum of SquaresFP
Lack of fit434.32541824.12922.43730.1646
Pure Error49.559.9
Lack of Fit test is not significant (P = 0.1646): The linear relationship fits the model adequately.
Explanation: If Lack of Fit is significant, then there may be another relationship between the variables, which fits the data much better than this linear relationship.
 
In this example, Lack of Fit is non-significant, hence we can say that linear relationship in ANCOVA can be accepted.
How to report ANCOVA results:
A one-way ANCOVA was conducted to compare the marks in 12th standard obtained by students in two schools, whilst controlling for marks in previous year before the admission to the school . Levene’s test was carried out to test the assumption of equality of error variance. The ANCOVA revealed no significant difference in mean marks in 12 th (F=1.6981, p=0.2054, after controlling for previous marks.
@ Sachin Mumbare