Statistical Validity 2017-08-03T02:40:20+00:00


What about statistical validity and reliability?

All assessments used by the capabilities group have detailed technical manuals, created by psychometricians, which support the assessments validity and reliability.

Very basically, the assessments must not only measure what they say they are measuring (Validity), they also must provide consistent results (Reliability).


Reliability is generally measured 3 ways:

Stability or Test-Retest – The assessment produces similar results if taken again.
Alternate Form – Varying the assessment slightly produces similar results on both forms of the assessment.
Internal Consistency – One portion of the assessment will produce a result similar to another portion of the assessment.

Reliability is stated as a correlation between scores of Test 1 and Test 2. The values for reliability coefficients range from 0 to 1.0. Statistically, if the reliability of a standardized test is above .80, it is said to have very good reliability; if it is below .50, it would not be considered a very reliable test.


Even if a test is reliable, it still may not provide a valid measure. Validity refers to the accuracy of an assessment — whether or not it measures what it is supposed to measure. Validity is even more crucial than the reliability.

There are three ways in which validity can be measured. In order to have confidence that a test is valid and therefore the inferences we make based on the test scores are valid.

Content –
Content validity examines the degree to which the topic is examined. It examines if the assessment is measuring the full range of what it says it is measuring. This is more easily measured where there is a need to reflect knowledge of a particular topic area or job skill.

Criterion –
Criteria or concrete validity is the extent to which the assessment is related to concrete criteria in the “real” world. Concurrent validity measures the extent to which scores relate to data collected recently versus predictive validity which measures the extent to which an assessment can predict future performance or behavior.

Construct –
Construct validity involves a statistical analyses of the items on the assessment including the relationships between responses to different test items. The comparison’s intent is to determine that the assessment is measuring what it says it is measuring. This is often done by comparing items to other measures that are considered valid.

Contact us to speak to our experts about validity and reliability!