Value-added assessment is a new way of analyzing test data that can measure teaching and learning. Based on a review of students’ test score gains from previous grades, researchers can predict the amount of growth those students are likely to make in a given year. Thus, value-added assessment can show whether particular students – those taking a certain Algebra class, say – have made the expected amount of progress, have made less progress than expected, or have been stretched beyond what they could reasonably be expected to achieve.
– The Center for Greater Philadelphia
Professor Andrew Ho came and spoke to my school reform class tonight about the idea of value added and its space in the conversation on American education.
We started looking at a scatterplot of local restaurants situated by their Zagat rating and the Zagat average price per meal.
Ho then plotted a regression line through the scatterplot and took note of one restaurant that had a higher score than predicted for it’s cost.
The temptation was to claim our overachieving restaurant was a good buy for the money. Who’d expect a restaurant with such inexpensive food to have such a high rating?
Then he asked us what we didn’t see.
Portions, ambiance, quality, location, service, selection, etc.
Any of these is familiar to someone who’s debated with a group of friends when attempting to select a restaurant.
His point was simple. Expectations changes based on what you base expectations on.
Ho relabeled the axes – this year’s test results, previous year’s test results.
He asked us what we didn’t see.
Content, delivery, socioeconomic status, race, home life, sports, after-school activities, tutoring, mentoring, etc.
This is to say nothing of the fact that perhaps there is a natural spread to knowledge and growth that is beyond the influence of a teacher or the fact that different combinations of teachers in the life of a student in a given year could have varying effects on achievement.
A psychometrician, statistician and policy researcher, Ho then laid some data on us from the research on value added:
- Estimates of value added are unstable across models, courses that teacher might teach, and years.
- Across different value-added models, teacher effect. ratings differ by at least 1 decile for 56%-80% of teachers and by at least 3 deciles for 0%-14% of teachers (this is reassuring).
- Across courses taught, between 39% and 54% of teachers differ by at least 3 deciles.
- Across years, between 19% and 41% of teachers differ by at least 3 deciles.
He then made a point that’s come up time and again in my statistics course, “Any test measures, at best, a representative sample of the target domain.”
But we’re not seeing samples that are representative. According to Ho, “In practice, it is an unrepresentative sample that skews heavily toward the quickly and cheaply measurable.” We’re not learning about the population. Put differently, we can’t know all that we want to know. Anyone who says differently is selling something.
When questioned on teacher assessment in his recent Twitter Town Hall, Sec. Duncan said he favored multiple forms of assessment in gauging teacher effectiveness. Nominally, Ho explained, this makes sense, but in effect it can have unintended negative consequences.
Here too, Ho cautioned against the current trend. Yes, value added is often used in concert with observation data or other similar measures. If those observations are counted as “meets expectations” or “does not meet expectations” and all teachers meet expectations, though, we have a problem. The effect is to mute the impact of this measure in the composite. While it may be nominally weighted at 50%, if value added is the only aspect of the composite accounting for variance, “the contribution of these measures is usually much higher than reported, as teacher effectiveness ratings discriminate much better (effective weights) than other ratings.”
Ho’s stated goal was to demystify value added. In that he succeeded.
He left us with his two concerns:
- The current incentive structures are so obviously flawed, and the mechanisms for detecting and discouraging unintended responses to incentives are not in place.
- The simplifying assumptions driving “value added,” including a dramatic overconfidence about the scope of defensible applications of educational tests (“numbers is numbers!”), will lead to a slippery slope toward less and less defensible accountability models.
I’d hate to think we’re more comprehensive in our selection of restaurants than teacher assessment.