Tuesday, September 7, 2010
Teacher Evaluation
Acceptance, and Critics
By SAM DILLON
How good is one teacher compared with another?
A growing number of school districts have adopted a system called value-added modeling to
answer that question, provoking battles from Washington to Los Angeles — with some
saying it is an effective method for increasing teacher accountability, and others arguing
that it can give an inaccurate picture of teachers’ work.
The system calculates the value teachers add to their students’ achievement, based on
changes in test scores from year to year and how the students perform compared with
others in their grade.
People who analyze the data, making a few statistical assumptions, can produce a list
ranking teachers from best to worst.
Use of value-added modeling is exploding nationwide. Hundreds of school systems,
including those in Chicago, New York and Washington, are already using it to measure the
performance of schools or teachers. Many more are expected to join them, partly because
the Obama administration has prodded states and districts to develop more effective
teacher-evaluation systems than traditional classroom observation by administrators.
Though the value-added method is often used to help educators improve their classroom
teaching, it has also been a factor in deciding who receives bonuses, how much they are and
even who gets fired.
Michelle A. Rhee, the schools chancellor in Washington, fired about 25 teachers this
summer after they rated poorly in evaluations based in part on a value-added analysis of
scores.
Formula to Grade Teachers’ Skill Gains Acceptance, and Critics... http://www.nytimes.com/2010/09/01/education/01teacher.html?...
1 of 4 9/7/10 12:38 PM
And 6,000 elementary school teachers in Los Angeles have found themselves under scrutiny
this summer after The Los Angeles Times published a series of articles about their
performance, including a searchable database on its Web site that rates them from least
effective to most effective. The teachers’ union has protested, urging a boycott of the paper.
Education Secretary Arne Duncan weighed in to support the newspaper’s work, calling it an
exercise in healthy transparency. In a speech last week, though, he qualified that support,
noting that he had never released to news media similar information on teachers when he
was the Chicago schools superintendent.
“There are real issues and competing priorities and values that we must work through
together — balancing transparency, privacy, fairness and respect for teachers,” Mr. Duncan
said. On The Los Angeles Times’s publication of the teacher data, he added, “I don’t
advocate that approach for other districts.”
A report released this month by several education researchers warned that the value-added
methodology can be unreliable.
“If these teachers were measured in a different year, or a different model were used, the
rankings might bounce around quite a bit,” said Edward Haertel, a Stanford professor who
was a co-author of the report. “People are going to treat these scores as if they were
reflections on the effectiveness of the teachers without any appreciation of how unstable
they are.”
Other experts disagree.
William L. Sanders, a senior research manager for a North Carolina company, SAS, that
does value-added estimates for districts in North Carolina, Tennessee and other states, said
that “if you use rigorous, robust methods and surround them with safeguards, you can
reliably distinguish highly effective teachers from average teachers and from ineffective
teachers.”
Dr. Sanders helped develop value-added methods to evaluate teachers in Tennessee in the
1990s. Their use spread after the 2002 No Child Left Behind law required states to test in
third to eighth grades every year, giving school districts mountains of test data that are the
raw material for value-added analysis.
In value-added modeling, researchers use students’ scores on state tests administered at the
end of third grade, for instance, to predict how they are likely to score on state tests at the
Formula to Grade Teachers’ Skill Gains Acceptance, and Critics... http://www.nytimes.com/2010/09/01/education/01teacher.html?...
2 of 4 9/7/10 12:38 PM
end of fourth grade.
A student whose third-grade scores were higher than 60 percent of peers statewide is
predicted to score higher than 60 percent of fourth graders a year later.
If, when actually taking the state tests at the end of fourth grade, the student scores higher
than 70 percent of fourth graders, the leap in achievement represents the value the
fourth-grade teacher added.
Even critics acknowledge that the method can be more accurate for rating schools than the
system now required by federal law, which compares test scores of succeeding classes, for
instance this year’s fifth graders with last year’s fifth graders.
But when the method is used to evaluate individual teachers, many factors can lead to
inaccuracies. Different people crunching the numbers can get different results, said Douglas
N. Harris, an education professor at the University of Wisconsin, Madison. For example,
two analysts might rank teachers in a district differently if one analyst took into account
certain student characteristics, like which students were eligible for free lunch, and the
other did not.
Millions of students change classes or schools each year, so teachers can be evaluated on the
performance of students they have taught only briefly, after students’ records were linked to
them in the fall.
In many schools, students receive instruction from multiple teachers, or from after-school
tutors, making it difficult to attribute learning gains to a specific instructor. Another
problem is known as the ceiling effect. Advanced students can score so highly one year that
standardized state tests are not sensitive enough to measure their learning gains a year
later.
In Houston, a district that uses value-added methods to allocate teacher bonuses, Darilyn
Krieger said she had seen the ceiling effect as a physics teacher at Carnegie Vanguard High
School.
“My kids come in at a very high level of competence,” Ms. Krieger said.
After she teaches them for a year, most score highly on a state science test but show little
gains, so her bonus is often small compared with those of other teachers, she said.
The Houston Chronicle reports teacher bonuses each year in a database, and readers view
Formula to Grade Teachers’ Skill Gains Acceptance, and Critics... http://www.nytimes.com/2010/09/01/education/01teacher.html?...
3 of 4 9/7/10 12:38 PM
the size of the bonus as an indicator of teacher effectiveness, Ms. Krieger said.
“I have students in class ask me why I didn’t earn a higher bonus,” Ms. Krieger said. “I say:
‘Because the system decided I wasn’t doing a good enough job. But the system is flawed.’ ”
This year, the federal Department of Education’s own research arm warned in a study that
value-added estimates “are subject to a considerable degree of random error.”
And last October, the Board on Testing and Assessments of the National Academies, a panel
of 13 researchers led by Dr. Haertel, wrote to Mr. Duncan warning of “significant concerns”
that the Race to the Top grant competition was placing “too much emphasis on measures of
growth in student achievement that have not yet been adequately studied for the purposes
of evaluating teachers and principals.”
“Value-added methodologies should be used only after careful consideration of their
appropriateness for the data that are available, and if used, should be subjected to rigorous
evaluation,” the panel wrote. “At present, the best use of VAM techniques is in closely
studied pilot projects.”
Despite those warnings, the Department of Education made states with laws prohibiting
linkages between student data and teachers ineligible to compete in Race to the Top, and it
designed its scoring system to reward states that use value-added calculations in teacher
evaluations.
“I’m uncomfortable with how fast a number of states are moving to develop teacherevaluation
systems that will make important decisions about teachers based on value-added
results,” said Robert L. Linn, a testing expert who is an emeritus professor at the University
of Colorado, Boulder.
“They haven’t taken caution into account as much as they need to,” Professor Linn said.
Formula to Grade Teachers’ Skill Gains Acceptance, and Critics... http://www.nytimes.com/2010/09/01/education/01teacher.html?...
4
Tuesday, August 17, 2010
Standardized testing
According to a memo from New York State Education Dept (NYSED) dated July 28, 2010 (http://www.oms.nysed.gov/press/Grade3-8_Results07282010.html), “cut scores for the state’s 2010 Grade 3-8 assessments in Math and English tests were set according to new Proficiency standards redefined to align them with college-ready performance”. My first thoughts went to the several times I heard speakers in the last decade that referenced educational testing and standards. How many times did we hear that the “new standards” were set so that students would be prepared for college (http://www.eagleforum.org/educate/2001/june01/standards.shtml)? How odd then the statement from the memo. “As a result of raising the bar for what it means to be proficient, many fewer students met or exceeded the new Mathematics and English Proficiency standards in 2010 than in previous years”. Wasn’t the bar supposed to be set there?
There are at least two logical inferences from this last statement from the memo. First inference, the bar can be moved. Doesn’t that empty the word “standard” of any meaning? Do not the standards set the bar? Isn’t moving the bar considered a deceitful activity, a lie? Second inference, many students in past years were not really as proficient as advertised. Let me quote again from the memo "We are doing a great disservice when we say that a child is proficient when that child is not. Nowhere is this more true than among our students who are most in need. There, the failure to drill down and develop accurate assessments creates a burden that falls disproportionately on English Language Learners, students with disabilities, African-American and Hispanic young people and students in economically disadvantaged districts” But doesn’t the accuracy of the assessments in this instance rest on where the bar is placed? Doesn’t “failure to develop accurate assessments” really mean the NYSED failed to put the bar in the right place? Is this educational language an attempt to deflect that responsibility?
More lies are implicit in Chancellor Merryl H. Tisch statements. “The Regents and I believe these results can be a powerful tool for change. They clearly identify where we need to do more and provide real accountability to bring about the focused attention needed to implement the necessary reforms to help all of our children catch up and succeed”. If the movement of bar created lower proficiency rates, then how could knowing the results be a powerful tool for change? How does knowing that we are not doing as well as we thought help us “clearly identify” anything let alone “implement necessary reforms”? The results of these tests simply beg the question, what are the necessary reforms? A question I believe Outcome-based Education has been discussing for the better part of a decade.