Davis' Failed Test

by Howard Fienberg
September 23, 2002

California's ambitious and costly school testing system, the Academic Performance Index (API), appears severely flawed. The regime under which improving schools and their teachers receive cash awards fails to include the test scores of one out of every 5 students in the state. According to an investigative series from the Orange County Register, that omission results in an average 20 point margin of error in schools' test scores. It may be that some California schools which got awards did not actually earn them, while other schools were unfairly left wanting.


Governor Gray Davis pushed legislation for the API through the state legislature shortly after taking office in January 1999. The Public Schools Accountability Act was signed into law in April 1999, and the governor appointed a committee to sort out the details of creating the testing system. Schools get an API score out of a possible 1,000 points, based on the Stanford 9 standardized test. Score improvements are rewarded with cash. But it seems that some details were never really worked out - the margin of error was never discussed, except for in passing at a September meeting that year.


The OC Register's investigation reveals that various loopholes mean that about 828,000 students of the 4.5 million second through 11th graders statewide are dropped from the API. Not only are some students not taking the exams, but many others that do take them have their scores removed afterwards. California claims that 98-99 percent of all its students are tested, but it appears that, on average, only 82 percent of students are counted in the end. It also appears that small schools are more likely to win awards, since it is easier under the API system for them to change their scores.


All testing systems have some error, but where is all this error in the API coming from? State lawmakers decided not to count the scores of students new to a school district. There may have been merit to this position, since it is not very fair to judge a school based on children it has not yet had a chance to educate. Disabled students were granted special rules, like extra time, a common practice in most tests. But the scores of disabled and special education students were frequently excluded from the API, and parents could sign waivers to excuse their children from it. In their defense, lawmakers and state officials told the OC Register that there was no way they could have known the number of students who would be excluded when they were designing the API.


Under the API, if schools meet a certain target score, they qualify for awards. However, some schools which appear to have scored too low to qualify may have actually made their target score and others which qualified for awards may have missed their target score. The large margin of error means that, if a school's score falls within the boundaries of the margin of error around a target score, we have no way of knowing for certain if they hit the target or not.


The Complications of 'Grouping'


The API awards system was not only designed to reward improvements in overall scores, but also scores in each major ethnic and racial group, as well as among the poorest students. The OC Register found the unintended consequence of the 'grouping' system was favoritism for the least diverse schools when it came to getting awards. The paper compares it to the difficulty of winning consecutive coin tosses. "The fewer groups, the fewer coin tosses a school has to win." About 58 percent of schools statewide with only one major ethnic/racial group won awards in 2001, compared to almost 29 percent of schools with four or more groups. As a result, mostly white schools received an average of $21 per student, while the most diverse schools nabbed an average of only $9 per student.


Such disparities in outcome would not be quite so disturbing were it not for the API's margin of error. The complexity of the state's 'grouping' system creates extra problems on top of the regular error. Error rates grow as the API measures smaller and smaller sub-groups, inevitably obscuring real gains and losses. While schools with less than 100 students don't receive API awards because of the unreliability of their scores, the OC Register points out that the scores of groups can include as few as 30 students.


What Now?


Stanford University researchers, commissioned by the state to look for ways to improve the API, concluded that so many students were left out that the scores were unreliable. "I just don't think (the API) is accurate," Margaret Raymond, a co-author of the Stanford study, told the OC Register. "It's not an accounting of what they are doing with all the students in the school."


California officials call the system a work in process. The API should be more reliable this year, as a federal education push will include more students in the counting. Further, any dispute over who gets or misses out on awards is not currently an issue - California's budget deficit has led to a temporary suspension of cash awards.


State officials told the OC Register they didn't disclose the API's error rate for three years because it would have been too confusing to the public. What is more confusing is how California could have seemingly ignored the problem while doling out $744 million to schools based on score improvements with minimal statistical significance. Measuring educational performance is a worthwhile goal, but accuracy and method should not be ignored in the process.

See the original:

return to Howard Fienberg's page