Would you use this data to rate a teacher’s performance?
The New York Times released “value-added” data for 18,000 elementary school teachers in New York City recently, joining the Los Angeles Times, which released data for LA teachers last year. But unlike in LA, the New York Times allowed the data itself to be released, allowing analysts to really dive in and look at the performance assessments and whether or not they make sense. There’s been a raging argument about whether this “value-added” metric successfully measures teacher performance. And this release of data, when you actually break it down, doesn’t do the job.
Gary Rubinstein has been analyzing the data at his site, and he’s found some incredible results. First, he found that there’s almost no correlation among teachers year-over-year in the data. A teacher is likely to be judged as effective one year and ineffective the next. The average change in performance was a relatively large 25 points, and it did not fit with commonly accepted beliefs that teachers improve performance over time, particularly between the first and second year.
But there’s more. Rubinstein further found that teachers showed a wide degree of variance in performance in different classes in the same year. They also showed the same kind of variance in the same subject in the same year in different grades. The scatterplot above is a representative example.
Rather than report about these obvious ways to check how invalid these metrics are and how shameful it is that these scores have already been used in tenure decisions, or about how a similarly flawed formula will be used in the future to determine who to fire or who to give a bonus to, newspapers are treating these scores like they are meaningful. The New York Post searched for the teacher with the lowest score and wrote an article about ‘the worst teacher in the city’ with her picture attached. The New York Times must have felt they were taking the high-road when they did a similar thing but, instead, found the ‘best’ teachers based on these ratings.
I hope that these two experiments I ran, particularly the second one where many teachers got drastically different results teaching different grades of the same subject, will bring to life the realities of these horrible formulas. Though error rates have been reported, the absurdity of these results should help everyone understand that we need to spread the word since calculations like these will soon be used in nearly every state.
Since this appeared last week, others have picked up on the data. And Rubinstein had a follow-up yesterday showing that value-added performance data from charter schools shows that teachers are providing about the same value there as in all other public schools. Rubinstein writes that “the high correlation in this plot reveals that the primary factor in predicting the scores for a group of students in one year score is the scores of those same students in the previous year score.”
Lots of decisions in education are being made using this type of data. It’s used to single out “bad teachers” and argue for pay-for-performance models. Yet the data, to use a technical term, is dogshit. All it does is serve to humiliate teachers, and emphasize a flawed theory that “fixing schools” can be accomplished by throwing so-called “bad teachers” out of work.