“On Bloom’s Two Sigma Problem: A Systematic Review of the Effectiveness of Mastery Learning, Tutoring, and Direct Instruction”, 2019-07-28 (; backlinks; similar):
Is Bloom’s “Two Sigma” phenomenon real? If so, what do we do about it?
Educational psychologist Benjamin Bloom found that one-on-one tutoring using mastery learning led to a two sigma(!) improvement in student performance. The results were replicated. He asks in his paper that identified the “2 Sigma Problem”: how do we achieve these results in conditions more practical (ie. more scalable) than one-to-one tutoring?
In a related vein, this large-scale meta-analysis shows large (>0.5 Cohen’s d) effects from direct instruction using mastery learning. “Yet, despite the very large body of research supporting its effectiveness, DI has not been widely embraced or implemented.”
The literatures examined here are full of small sample, non-randomized trials, and highly heterogeneous results.
Tutoring in general, most likely, does not reach the 2-sigma level that Bloom suggested. Likewise, it’s unlikely that mastery learning provides a 1-sigma improvement.
But high quality tutors, and high quality software are likely able to reach a 2-sigma improvement and beyond.
All the methods (mastery learning, direct instruction, tutoring, software tutoring, deliberate practice, and spaced repetition) studied in this essay are found to work to various degrees, outlined below.
This essay covers many kinds of subjects being taught, and likewise many groups (special education vs regular schools, college vs K-12). The effect sizes reported here are averages that serve as general guidance.
The methods studied tend to be more effective for lower skilled students relative to the rest.
The methods studied work at all levels of education, with the exception of direct instruction: There is no evidence to judge its effectiveness at the college level.
The methods work substantially better when clear objectives and facts to be learned are set. There is little evidence of learning transfer: Practicing or studying X subject does not improve much performance outside of X.
There is some suggestive evidence that the underlying reasons these methods work are increased and repeated exposure to the material, the testing effect, and fine-grained feedback on performance in the case of tutoring.
Long term studies tend to find evidence of a fade-out effect, effect sizes decrease over time. This is likely due to the skills being learned not being practiced.
Bloom noted that mastery learning had an effect size of around 1 (one sigma); while tutoring leads to d = 2. This is mostly an outlier case.
Nonetheless, Bloom was on to something: Tutoring and mastery learning do have a degree of experimental support, and fortunately it seems that carefully designed software systems can completely replace the instructional side of traditional teaching, achieving better results, on par with one to one tutoring. However, designing them is a hard endeavour, and there is a motivational component of teachers that may not be as easily replicable purely by software.
Overall, it’s good news that the effects are present for younger and older students, and across subjects, but the effect sizes of tutoring, mastery learning or DI are not as good as they would seem from Bloom’s paper. That said, it is true that tutoring does have large effect sizes, and that properly designed software does as well. The DARPA case study shows what is possible with software tutoring, in the case the effect sizes went even beyond Bloom’s paper.