|
|
 |

STANDARDIZED TESTS: TERMS & DEFINITIONS
Standardized test: a test in which all the questions, format, instructions, scoring and reporting of scores are the same for all test takers. Procedures for creating such tests are standardized, as are the procedures for creating, administering and analyzing the test.
High-stakes test: a test that determines a critically important decision in a child's education, e.g. promotion or graduation. The Mayor of New York made the 3rd grade reading and math tests “high-stakes tests” by tying promotion to the test score. Regents exams in New York State are “high-stakes” because failing to pass any one of five exams will prevent a student from earning a high school diploma.
Norm: sometimes referred to as the 50th percentile. The norm divides test takers into two groups - above or below the 50th percentile. Though often used differently, the norm in test scores merely denotes a place in the middle of the distribution of scores. Once determined, all other scores are described in reference to this norm -- hence “norm-referenced.” Establishing national norms in this way has become controversial because by definition half of all people who take the test must score below average.
Reliability: in testing, reliability is a measure of consistency. So, if a group of people took a test on two different occasions they should get pretty much the same score both times. If a test is not reliable, it has to be abandoned. How do we know if a test is reliable? Test makers hope for a correlation between two administrations to reach .85 or higher. Of course, testing the same people twice is often inconvenient and timing is a concern. So, often one group of questions is correlated with another group of questions on the same test.
Validity: a test can be reliable without being valid. Validity defines how accurately a test really measures what its promoters claim it measures. Some call this “meaningfulness." Are tests always "meaningful" in the same way? No, there are different types of validity:
- predictive validity (also called criterion-related validity) measures how well a test predicts future school performance. For example, though it is generally assumed that performance on state standardized exit tests indicates how well those students will fare in college, there is much evidence to the contrary.
- content validity measures how well the test covers the subject content being tested.
- consequential validity refers to the inferences that are made from test results. Some experts have expressed concerns about the consequential validity of NY State's Regents exams in Global and US History, English, Science, and Math. They believe that a student who scores well on a Regents test may not necessarily be well-prepared to pursue that subject at the college level, and vice versa.
Obviously, we should care a lot about how valid tests are. A standardized reading test may measure how successfully a child can answer the set of questions on that specific test, but may tell us very little about whether the child enjoys using the skill tested. Such an omission prevents us from knowing how well the child will continue to develop the skill.
There are different kinds of test scores:
- Norm-referenced scores:
these compare a student's test performance to the performance of a clearly defined reference group called a "norming group." The scores of the norming group are used to devise test norms -- normal - below normal and above normal performance. The tests should have been normed on a population similar to the one taking the test.
- Criterion-referenced scores:
these scores say something about how the person tested performed relative to an absolute performance standard determined by the test-maker. Many criterion-referenced tests would be better referred to as "content-referenced."
- Cut score:
the minimum level a test-taker must attain in order to "pass" a given exam. Just where to put that level can determine whether children are labeled passing or failing. In the June 2003 Math A Regents, the cut score was set so high that nearly 70 percent of New York State students failed the Math A test. In January 2004, the cut score for the Math A Regents exam allowed 80-90 percent of the students to pass.
Measurement Error: one way in which reliability or lack of reliability of a test is indicated; all tests have measurement error.
Scaling: sometimes called "grading on a curve," scaling assigns scores received on standardized tests so they fit the classic "bell-shaped" curve used in statistics. Scaling insures that there is the same number of bottom and top grades with most people in the middle. Knowing in advance what the grading criteria and the objectives being assessed are eliminates the issue of "scaling." Everyone can pass.
Sampling: a way to get information about a group by examining only some members of the group or by giving all members only small parts of the whole test.
Performance Assessments, also known as Authentic Assessment or Alternative Assessment: a complex approach to assessment that uses direct measures of learning – essay writing, research projects, term papers – rather than test-driven indicators of learning. An oral defense is frequently part of the performance assessment.
|
 |
 |
 |
NCLB is up for reauthorization NOW!
Read about it in THIS BOOKLET
Then contact your congressperson
|
|
|