Test item reliability indicates how consistent the results produced from items on a test are. Consistency can refer to the items’ stability over time or the consistency of the items with each other. If an item is unreliable, statistical relationships will be weaker than they really are, and inappropriate conclusions may be drawn regarding the relationships between variables.
A measurement of reliability consists of the extent to which an observed score (which is the true score plus or minus error) accurately reflects the true score. Returning to the example in this week’s Introduction, if your true weight were 150 pounds and you stepped on the scale hundreds of times, it would sometimes show 149, sometimes 152, and sometimes 151. If you averaged all of those weights, you would come close to your true score. If you looked at how much the weights varied, you would have a good measure of the scale’s error. The situation is similar with a psychological test—a score on an IQ test represents an estimate of the theoretical “true” IQ; however, that observed score also includes error.
Researchers or test developers measure a test’s reliability with a reliability coefficient, generally a positive correlation coefficient that is somewhat less than 1.00. (A correlation of 1.00 would indicate perfect correlation, which is theoretically impossible due to inherent error in measurement.) Acceptable reliability coefficients for psychological tests or test items are generally at least .70. If you know a test’s reliability, you can calculate its margin of error, a “plus or minus” band that indicates an interval likely to contain the true score.
For this week’s Discussion, think of a specific testing scenario in an organization. Then consider a reliable test item for that testing scenario and an unreliable item for that same testing scenario. Consider how you might know if these items are reliable or unreliable.
With these thoughts in mind:
Post by Day 4
1. Describe, briefly (one paragraph) a situation in which and organization would use a test (for example, selection, executive coaching, a training situation)
2. Then, describe:
- A reliable test item
- An unreliable test item
3. Have at least four (4) academic citations (see expectations announcement) drawing from the course readings.
NOTE – Part 2 of the discussion post is NOT easy. An understanding of what makes and item reliable is needed, in order to write a good and a bad item. The readings do not provide easy examples, but instead convey information that will, hopefully, help you understand such items.
An alternative source that might help you understand bad (unreliable) items is:
But use what is gleaned from this (or other sites) to help understand the readings, not as a source cited in the discussion post.