Hattie’s Meta-Analysis Madness: The Method is Missing!!! (Part III of III)

Why Hattie’s Research is a Starting-Point, but NOT the End-Game for Effective Schools

Dear Colleagues,

Introduction

This three-part series is focusing on how states, districts, schools, and educational leaders make decisions regarding what services, supports, programs, curricula, instruction, strategies, and interventions to implement in their classrooms.  Recognizing that we need to use programs that have documented efficacy and the highest probability of implementation success, it has nonetheless been my experience that many programs are chosen “for all the wrong reasons”—to the detriment of students, staff, and schools.

Summarizing Part I of this Blog Series

In Part I of this series (posted on August 26th), The Top Ten Ways that Educators Make Bad, Large-Scale Programmatic Decisions: The Hazards of ESEA/ESSA’s Freedom and Flexibility at the State and Local Levels [CLICK HERE], I noted that:

  • Beyond the policy-level requirements in the newly-implemented Elementary and Secondary Education/Every Student Succeeds Act(ESEA/ESSA), the Act transfers virtually all of the effective school and schooling decisions, procedures, and practices away from the U.S. Department of Education, and into the “hands” of the respective state departments of education and their state’s districts and schools.
  • Because of this “transfer of responsibility,” states, districts, and schools will be more responsible (and accountable) for selecting their own approaches to curriculum, instruction, assessment, intervention, and evaluation than ever before.
  • This will result in significant variability—across states and districts—in how they define school “success” and student progress, measure school and teacher effectiveness, apply assessments to track students’ standards-based knowledge and proficiency, and implement multi-tiered academic and behavioral services and interventions for students.

All of this means that districts and schools will have more freedom—but  greater responsibility—to evaluate, select, and implement their own ways of functionally addressing all students’ academic and social, emotional, and behavioral learning and instructional needs—across a multi-tiered continuum that extends from core instruction to strategic response and intensive intervention.

Part I of this series then described the “Top Ten” reasons why educational leaders make flawed large-scale, programmatic decisions—that waste time, money, and resources; and that frustrate and cause staff and student resistance and disengagement.

The flawed Reasons discussed were:

1.   The Autocrat (I Know Best)

2.   The Daydream Believer (My Colleague Says It Works)

3.   The Connected One (It’s On-Line)

4.   The Bargain Basement Boss (If it’s Free, It’s for Me)

5.   The Consensus-Builder (But the Committee Recommended It)

6.   The Groupie (But a National Expert Recommended It)

7.   The Do-Gooder (It’s Developed by a Non-Profit)

8.   The Enabler (It’s Federally or State-Recommended)

9.   The Abdicator (It’s Federally or State-Mandated)

10.   The Mad Scientist (It’s Research-based)

By self-reflecting on these flawed approaches, the hope is that educational leaders will avoid these hazards, and make their district- or school-wide programmatic decisions in more effective ways.

Summarizing Part II of this Blog Series

In Part II of this series (posted on September 9th), “Scientifically based” versus “Evidence-based” versus “Research-based”—Oh my!!! Making Effective Programmatic Decisions: Why You Need to Know the History and Questions Behind these Terms [CLICK HERE], I noted that:

  • The term “scientifically based” appeared in ESEA/NCLB 2001 twenty-eight times, it was formally defined in the law, it appeared in IDEA 2004 (the current federal special education law), and it was (at that time) the “go-to” definition in federal education law when discussing how to evaluate the efficacy, for example, of research or programs that states, districts, and schools needed to implement as part of their school and schooling processes.

And yet, this term is found in ESEA/ESSA ONLY four times, and it appears to have been replaced by the term “evidence-based.”

  • The term “evidence-based” DID NOT APPEAR in either ESEA/NCLB 2001 or IDEA 2004, but it DOES appear in ESEA/ESSA 2015 sixty-three times—most often when describing “evidence-based research, technical assistance, professional development, programs, methods, instruction, or intervention.”

As the new “go-to” standard when determining whether programs or interventions have been empirically demonstrated as effective, ESEA/ESSA 2105 defines this term.

[CLICK HERE for the ESEA/NCLB 2001 “scientifically based” and ESEA/ESSA 2015 “evidence-based” definitions in Part II of this Blog]

  • The term “research-based” appeared in five times in ESEA/NCLB 2001; it appears four times in IDEA 2004; and it appears once in ESEA/ESSA 2015.  When it appears, the term largely used to describe programs that need to be implemented by schools to support student learning.

Significantly, the term “researched-based” is NOT define in either ESEA law (2001, 2015), or by IDEA 2004.

Part II of this series went on to recommend a series of questions that educational leaders should ask when told that a program, strategy, or intervention is scientifically based, evidence-based, or research-based.

For example, I noted: If someone endorses a program as “scientifically based,” educational leaders should ask what the researcher or practitioner means by that term.  Then, the educational leader should ask for (preferably refereed) studies that “support” the program, and their descriptions of the:

  • Demographic backgrounds and other characteristics of the students participating in the studies (so you can compare and contrast these students to your students);
  • Research methods used in the studies (so you can validate that the methods were sound, objective, and that they involved control or comparison groups not receiving the program or intervention);
  • Outcomes measured and reported in the studies (so you can validate that the research was focused on student outcomes, and especially the student outcomes that you are most interested in for your students);
  • Data collection tools, instruments, or processes used in the studies (so that you are assured that they were psychometrically reliable, valid, and objective—such that the data collected and reported are demonstrated to be accurate
  • Treatment or implementation integrity methods and data reported in the studies (so you can objectively determine that the program or intervention was implemented as it was designed, and in ways that make sense);
  • Data analysis procedures used in the studies (so you can validate that the data-based outcomes reported were based on the “right” statistical and analytic approaches);
  • Interpretations and conclusions reported by the studies [so you can objectively validate that these summarizations are supported by the data reported, and have not been inaccurately- or over-interpreted by the author(s)]; and the
  • Limitations reported in the studies (so you understand the inherent weaknesses in the studies, and can assess whether these weaknesses affected the integrity of and conclusions—relative to the efficacy of the programs or interventions—drawn by the studies).

The point of the questions and this discussion was to encourage educational leaders:

  • To go beyond “testimonials” and “hearsay” when programs, strategies, or interventions are recommended by others
  • To ask the questions and collect the information and data needed to objectively determine that a “recommended” program or intervention is independently responsible for the student outcomes that are purported and reported
  • To determine if there is enough objective data to demonstrate that the “recommended” program or intervention is appropriate for the educational leader’s own students, and if it will potentially result in the same positive and expected outcomes
  • To determine if the resources needed to implement the program are time- and cost-effective relative to the program’s “return-on-investment”
  • To determine if the “recommended” program or intervention will be acceptable to those involved (e.g., students, staff, administrators, parents) such that they are motivated to implement it with integrity and over an extended period of time

Today’s Discussion:  John Hattie and Meta-Analyses

Professor John Hattie has been the Director of the Melbourne Educational Research Institute at the University of Melbourne, Australia, since March 2011.  His research interests include performance indicators, models of measurement, and the evaluation of teaching and learning. He is best known for his books Visible Learning (2009) and Visible Learning for Teachers (2012).

Anchoring these books is Hattie’s critical review of thousands of published research studies in six areas that contribute to student learning: student factors, home factors, school factors, curricular factors, teacher factors, and teaching and learning factors.  Using those studies that met his criteria for inclusion, Hattie pooled the effect sizes from these individual studies, conducted different series of meta-analyses, and rank ordered the positive to negative effects of over a hundred approaches—again, related to student learning outcomes.

In Visible Learning, for example, Hattie described 138 rank ordered influences on student learning and achievement based on a synthesis of more than 800 meta-studies covering more than 80 million students.  In his subsequent research, the list of effects was expanded (in Visible Learning for Teachers), and now (2016), the list—based on more than 1,200 meta-studies—includes 195 effects and six “super-factors.”  All of this research reflects one of the largest integrations of “what works best in education” available.

What is a Meta-Analysis?

A meta-analysis is a statistical procedure that combines the effect sizes from separate studies that have investigated common programs, strategies, or interventions.  The procedure results in a pooled effect size that provides a more reliable and valid “picture” of the program or intervention’s usefulness or impact, because it involves more subjects, more implementation trials and sites, and (usually) more geographic and demographic diversity.  Typically, an effect size of 0.40 is used as the “cut-score” where effect sizes above 0.40 reflect a “meaningful” impact.

Significantly, when the impact (or effect) of a “treatment” is consistent across separate studies, a meta-analysis can be used to identify the common effect.  When effect sizes differ across studies, a meta-analysis can be used to identify the reason for this variability.

Meta-analytic research typically follows some common steps.  These involve:

  • Identifying the program, strategy, or intervention to be studied
  • Completing a literature search of relevant research studies
  • Deciding on the selection criteria that will be used to include an individual study’s empirical results
  • Pulling out the relevant data from each study, and running the statistical analyses
  • Reporting and interpreting the meta-analytic results

As with all research, there are a number of subjective decisions embedded in meta-analytic research, and thus, there are good and badmeta-analytic studies.

Indeed, as emphasized throughout this three-part series, educational leaders cannot assume that “all research is good because it is published,” and they cannot assume that even “good” meta-analytic research is applicable to their communities, schools, staff, and students.

And so, educational leaders need to independently evaluate the results of any reported meta-analytic research—including research discussed by Hattie—before accepting the results.

Among the questions that leaders should ask when reviewing (or when told about the results from) meta-analytic studies are the following:

  1. Do the programs, strategies, or interventions chosen for investigation use similar implementation steps or protocols?

In many past Blogs, I have discussed the fact that the Positive Behavioral Interventions and Supports (PBIS) framework advocated by the U.S. Department of Education’s Office of Special Education Programs (and its funded national Technical Assistance centers) is a collection of different activities that, based on numerous program evaluations, different schools implement in different degrees (or not at all) in different ways.

Given this, a meta-analysis of many separate PBIS research studies might conclude that the “PBIS framework contributes to student learning,” but the educational consumer has no idea which PBIS activities contributed to this result, nor how to functionally implement these different activities.

In addition to this issue, some researchers warn of an “agenda bias” that occurs when researchers choose specific areas to investigate based on wanting a personally- or politically-motivated conclusion.  Often, this bias tends to affect (or continue to bias) many other research-related procedures (see the other question areas below)—resulting in questionable or invalid results.

Next Question:

2. Are the variables investigated, by a meta-analytic study, variables that are causally- versus correlationally-related to student learning, and can they be taught to a parent, teacher, or administrator?

Educational leaders need to continually differentiate research (including meta-analytic research) that reports causal factors versus correlationalfactors.  Obviously, causal factors directly affect student learning, while correlational factors contribute to or predict student learning.

Similarly, they need to recognize that some meta-analytic results involve factors (e.g., poverty, race, the presence of a significant disability or home condition) that cannot be changed, taught, or modified. 

Moreover . . . once you read and understand Hattie’s functional definitions for the terms that he uses to summarize his meta-analyses, you realize that three out of six of his “Super Factors” cannot be changed by any teacher or classroom intervention (Teacher Estimates of Achievement, Self-Reported Grades, and Piagetian Levels).

Many of the approaches that Hattie rates as having the strongest effects on student learning contribute to (but do not cause) these student outcomes. 

For example, while effective classroom instruction and behavior management contribute to student learning, mastery, and proficiency, some students learn a great deal even when their teachers are not instructionally effective or their classrooms are not consistently well-managed.  Thus, these factors correlationally contribute to student learning, but they do not cause student learning.

This crucial point is not intended to invalidate Hattie’s meta-analytic results (or anyone else’s).  It simply is to say that educational leaders need to determine the “meaningfulness” of meta-analytic results, while also putting them into their “implementation context.”

Continuing on:  while some studies differentially and concurrently investigate multiple programs, strategies, or interventions, these individual studies are few and far between. 

Thus, educational leaders cannot simply take the “Top Ten” Hattie approaches on his list, implement them, and assume that student learning results will increase.  This is because most of these “Top Ten” approaches have never been studied together, and they might not be applicable to their students or instructional conditions.

Relatedly, educational leaders need to be wary of “Hattie consultants” who believe that they can synthesize all of the independent meta-analyses of different programs, strategies, or interventions conducted by Hattie into a meaningful implementation plan and process for their school or students.

Critically. . . Hattie has provided a useful “road map” to success. . . but remember, there are “many roads to Rome.”

Next Question:

  • In conducting the literature review, did the researchers consider (and control for) the potential of a “publication bias?”

One of the realities of published research is that journals most-often publish research that demonstrates significant effects.  Thus, a specific program or intervention may have ten published articles that showed a positive effect, and fifty other well-designed studies that showed no or negative effects.  As the latter unpublished studies are not available (or even known by the researcher), they will not be included in the meta-analysis.  And so, while the meta-analysis may show a positive effect for a specific program, this outcome may not reflect its “actual” negative to neutral impact.

There are research methods and tests (e.g., using funnel plots, the Tandem Method, the Egger’s regression test) to analyze the presence of publication bias, and to decrease the potential of false positive conclusion.  These, however, are beyond the scope of this Blog.

Suffice it to say that some statisticians have suggest that 25% of meta-analyses in the psychological sciences may have inherent publication biases.

What should educational leaders do? Beyond their own self-study of the meta-analytic research—including the individual research studies involved—that appears to support a specific program, strategy, or intervention, educational leaders need to:

  • Identify the short- and long-term “success indicators” of these programs specifically for their schools or with their students;
  • Conduct pilot tests before scaling up to whole-school or system-wide implementation;
  • Identify and use sensitive formative evaluation approaches that detect—as quickly as possible—programs that are not working; andMaintain an “objective, data-driven perspective” regardless of how much they want to program to succeed.

In other words, educational leaders need to revalidate any selected program, strategy, or intervention when implemented with their schools, staff, and/or students—regardless of that program’s previous validation (which, once again, may be due to publication bias).

Next Question:

  • What were the selection criteria used by the author of the meta-analysis to determine which individual studies would be included in the analysis, and were these criteria reliably and validly applied?

This is an important area of potential bias in meta-analytic research.  It occurs when researchers, whether consciously or not, choose biased selection criteria.  For example, they may favor large-participant studies over single subject studies, or randomized controlled studies versus qualitative studies.

This selection bias also occurs when researchers do not reliably and/or validly apply their own sound selection criteria.  That is, they may include certain studies that objectively don’t “qualify” for their analysis, or exclude other studies that meet all of the criteria.

Regardless, selection biases influence the individual studies included (or not included) in a meta-analysis, and this may skew the results.  Critically, the “skew” could be in any direction.  That is, the analysis might incorrectly result in negative, neutral, or positive results.

This issue is even more compounded as Hattie included numerous meta-analyses conducted by other researchers in some of his meta-analyses.  Thus, he might have pooled other authors’ selection biases with his ownselection biases to create some “Super Biases” (for example) within his “Super Factors.”

While I know that many educational leaders, at this point in our “conversation,” are probably wondering (maybe, in frustration), “Why can’t I just ‘trust the experts?’” or “How do I do all of this?” 

And I do feel your pain...

But the “short answer” to the first question (as noted in the earlier two Blogs in this series) is that “blind trust” may result in adopting a program that does not succeed; that wastes a great deal of time, training, money, and materials; and that undermines student success and staff confidence.

The “short answer” to the second question is that these questions should be posed to the researcher or the person who is advocating a “meta-analytically-proven” program.  Let them show you the studies and reveal the drilled-down data that is (presumably) at the foundation of their recommendation.

But. . . in addition. . . please recognize that many school districts have well-qualified professionals (either in-house, at a nearby university, in the community/region, or virtually on-line) with the research and analysis background to “vet and validate” programs, strategies, and interventions of interest. 

Use these resources. 

The “front-end” time in well-evaluating a program will virtually always save enormous amounts of “back-end” time when an ineffectively researched or chosen program is actually implemented.

Next Question:

  • Were the best statistical methods used in the meta-analysis?  Did one or two large-scale or large-effect studies outweigh the results of other small-scale, small-participant studies that also were included?  Did the researcher’s conclusions match the actual statistical results from the meta-analysis?

I’m not going to answer these questions in detail. . . as we’re now teetering on methodologically complex (but important) areas.  [If you want to discuss these with me privately, give me a call.]

My ultimate point here is that—as with any research study—we need to know that the meta-analytic research results and interpretations for any program, strategy, or intervention are sound.

As immediately above, educational leaders need to invest in “high probability of success” programs.  Anything less is irresponsible.

But There’s More:  The Method is Missing

But. . . there IS more . . . even when the meta-analytic research is sound.

As alluded to above. . . just because we know that a program, strategy, or intervention significantly impacts student learning, we do not necessarily know the implementation steps that were in the research studies used to calculate the significant effect . . . and we cannot assume that all or most of the studies used the same implementation steps. 

To get to the point where we know exactly what implementation steps to replicate and functionally use in our schools and with our staff and students (to get the benefit of a particular effect), we (again) need to “research the research.”

Case in point.  Below are Hattie’s current “Top Twenty” approaches that have the strongest effects on student learning and achievement:

  1. Teacher estimates of achievement
  2. Collective teacher efficacy
  3. Self-Reported Grades
  4. Piagetian Programs
  5. Conceptual change programs
  6. Response to intervention
  7. Teacher credibility
  8. Micro teaching
  9. Cognitive task analysis
  10. Classroom discussion
  11. Interventions for LD
  12. Teacher clarity
  13. Reciprocal teaching
  14. Feedback
  15. Providing formative evaluations
  16. Acceleration
  17. Creativity programs
  18. Self-questioning
  19. Concept mapping
  20. Problem solving teaching
  21. Classroom behavior

After reviewing these. . . OK . . . I’ll admit it.  As a reasonably experienced school psychologist, I have no idea what that vast majority of these approaches are at a functional level. . . much less what implementation steps to recommend.

To begin to figure it out, I would first go back to Hattie, and look at a Glossary (for example, from Visible Learning for Teachers, 2012) that explains the research reflected in the effect sizes for the approaches he has rank-ordered.

[CLICK HERE]

Example 1:  Self-Reported Grades

One of Hattie’s Super Factors, “Self-Reported Grades.”  For this effect, the Glossary linked above provides the following information:

Self-reported grades are at the top of all influences. Children are the most accurate when predicting how they will perform. Hattie explains that if he could write his book Visible Learning for Teachers again, he would re-name this learning strategy, “Student Expectations” to express more clearly that this strategy involves the teacher finding out what are the student’s expectations, and pushing the learner to exceed these expectations. Once a student has performed at a level that is beyond their own expectations, he or she gains confidence in his or her learning ability. 
Example for Self-reported grades: 

Before an exam, ask your class to write down what mark the student expects to achieve. Use this information to engage the student to try to perform even better.

Hattie cites five meta-studies for this effect:

  1. Mabe/West (1982): Validity of self-evaluation of ability
  2. Fachikov/Boud (1989): Student Self-Assessment in Higher Education
  3. Ross (1998): Self-assessment in second language testing
  4. Falchikov/Goldfinch (2000): Student Peer Assessment in Higher Education
  5. Kuncel/Crede/Thomas (2005): The Validity of Self-Reported Grade Point Averages, Class Ranks, and Test Scores

As noted earlier in this Blog, and as defined here, a student’s Self-Reported Grades cannot be changed by having a classroom teacher “do an intervention.”  If students’ beliefs about their prospective grades are inaccurate, their teachers might be able to provide them with more data or feedback and, thus, change their accuracy.  But what happens if students accurately state that they are going to fail a test . . . and they do?

How will that change their motivation or proficiency in the future?

Or, what if students underestimate their grades on a test, and perform better than expected?  How will this necessarily improve these students’ motivation such that they master more material in the future?  Perhaps the underestimate and then the “better-than-expected grades” will lull these students into believing that they are “doing enough” to get good grades . . . they just didn’t realize it before?

Herein lies the danger. 

In order to use Hattie’s results, we need to know his definition of Self-Reported Grades, the research that was integrated into the meta-analysis, whether the variable can be externally influenced (e.g., through a teacher’s intercession or intervention), and then the explicit, scientifically-based methodology needed to effect the change.

None of these conditions are immediately or functionally apparent from a rank-ordered list of meta-analytic effect sizes.

And, there is no single consultant or “anointed” group of consultants who “hold the keys” to operationalizing Hattie’s statistics into student success.

But let’s take two more Hattie factors/approaches to further demonstrate that “The Method is Missing.”

Response to Intervention and Comprehensive Interventions for Learning Disabled Students

Response to Intervention, once again, is one of Hattie’s Super Factors. 

The Glossary defines Response to Intervention as “an educational approach that provides early, systematic assistance to children who are struggling in one or many areas of their learning. RTI seeks to prevent academic failure through early intervention and frequent progress measurement.”  In Visible Learning for Teachers, Hattie devotes one paragraph to Response to Intervention—citing seven generic “principles.”

Hattie’s meta-analysis of the research that he categorized as “Comprehensive Interventions for Learning Disabled Students” resulted in one of the five top effect sizes relative to impacting student learning and achievement. 

In the cited Glossary, it was noted that:

The presence of learning disability can make learning to read, write, and do math especially challenging. Hattie admits that “it would be possible to have a whole book on the effects of various interventions for students with learning disabilities” (Hattie 2009), and he references a 1999 meta-study.

To improve achievement teachers must provide students with tools and strategies to organize themselves as well as new material; techniques to use while reading, writing, and doing math; and systematic steps to follow when working through a learning task or reflecting upon their own learning. Hattie also discusses studies that found that “all children benefited from strategy training; both those with and those without intellectual disabilities.”

Once again—for BOTH of these approaches, there is no specificity. Moreover, NO ONE reading Hattie’s books would have a clue as to where to begin the implementation process for either.

More specifically:  Response to Intervention is not a single, replicable intervention. 

Many different researchers have defined it, its components, its implementation processes, and its applicability (for example, to literacy, math, language arts, behavior) in many different ways.

And so. . . from Hattie’s research, one would conclude that this is a worthwhile area to research when students are academically struggling or presenting with challenging behavior.  But, one would have to analyze the specific research for their area of student concern.

More specifically:  Hattie describes “Comprehensive Interventions for Learning Disabled Students” in the plural.

And so. . . from Hattie’s research, which learning disabilities are did his meta-analytic studies address?  What were the interventions?  At what age and level of severity did the interventions work with students?  And, how was “success” defined and measured?

As Hattie himself noted. . . he could write a book just in this area (and some esteemed educators have).

But once again, while it is important to know that some interventions for learning disabled students work, one would have to answer the questions immediately above, know the research-to-practice in a specific area of disability, and have the consultation skills to help teachers implement these interventions “in real time.”

Conclusions

I want to make it clear that this Blog is NOT questioning Hattie’s research in any way. 

Hattie has made many astounding contributions to our understanding of the research in areas that impact student learning and the school and schooling process.

However, consistent with the theme of the three Blogs in this series, I AM expressing concerns—and, hopefully, providing good guidance—as to how educational leaders need to analyze, understand, use, and make systems-level decisions based on school and psychoeducational research. . . research that varies in both quality and utility.

As noted numerous times across the three Blogs:  I fully understand how challenging it is for districts and schools to analyze the research related to the empirical efficacy of a specific program, strategy, or intervention.  I also recognize—as a practitioner who works in the schools—their limited time and more limited resources. 

And I agree that districts and schools should be able to trust the “national experts”—from their national associations, to their departments of education, to their published journals—in this regard.

But testimonials do not qualify as research, and—unfortunately—some “research” is published in the absence of impartiality.

We need to be careful.

Districts and schools need to selectively do their own due diligence. . . or at least consult with professionals who can provide objective, independent evaluations of the curricula, programs, or interventions being considered for student, staff, and school implementation. 

Hopefully, the narrative in these three Blogs will provide educational leaders with the information and questions that need to be asked. . . providing an assist in the due diligence process.

In the end, schools and districts should not invest time, money, professional development, supervision, or other resources in programs that have not been fully validated for use with their students and/or staff. 

Such investments are not fair to anyone—especially when they become counterproductive by (a) not delivering the needed results, (b) leaving students further behind, and/or (c) creating staff resistance to “the next program”—which might, parenthetically, be the “right” program.

I hope that this discussion has been useful to you.

As always, I look forward to your comments. . . whether on-line or via e-mail.

I hope that your school year continues to be successful.  We are still thinking about those in the greater Houston area and across Florida. . . and now, in Puerto Rico and across the Caribbean.

If I can help you in any of the areas discussed during this Blog series, I am always happy to provide a free one-hour consultation conference call to help you clarify your needs and directions on behalf of your students, staff/colleagues, school(s), and district.

Best,

Howie