Neuropsychological assessments are ancillary to the typical neurological examination, and establish better precision of the type and extent of cognitive and behavioral deficits found on bedside testing. To accomplish this goal, neuropsychological tests need to be standardized, valid and reliable, and need to make meaningful comparisons of patient- derived data to normative performance.
Standardization of neuropsychological testing procedures allows for the comparison of performance across time and across individuals. Because the interpretation of neuropsychological tests requires consistent assessment methods, neuropsychological tests are standardized to testing environment, presentation of instructions, presentation of testing stimuli, seeking clarification of response from the patient, scoring procedures, and interpretative methods. Standard testing methods provide all patients with an equal opportunity to perform the required task.
Another important aspect to neuropsychological assessment is the use of reliable and valid tests. Reliability refers to the degree to which a given test score represents differences in the behavior of interest as opposed to error. Reliability measures include the degree to which a test provides consistent scores across different time points (test- retest reliability); the degree to which a test provides consistent scores across different examiners (inter-rater reliability); the degree to which the test is internally consistent (split-half reliability); and the degree to which a test provides consistent score across alternate forms of the test (alternate-form reliability).^! A valid test is one that measures what it is purported to measure. This aspect of test development and utilization depends upon many factors. Validity measures include the degree to which the items of the test cover the relevant domain of interest but not other domains (content validity); the degree to which the test agrees with either a gold standard of the domain of interest or another test designed to measure the same domain (concurrent validity); the degree to which a test does not agree with another test designed to measure a different domain (discriminant validity); and the degree to which a test measures the theoretical construct under study (e.g., intelligence) (construct validity).
The relationship between reliability and validity is statistically constrained. Validity scores cannot exceed the square of reliability scores. Thus, a test can be highly reliable but not valid, but a test can never be valid and not reliable. One way to increase reliability of a given test is the use of standardized testing methods, as discussed above. If the manner in which a test is administered varies across examiners or across testing sessions, extraneous error is introduced, which lowers reliability and validity.
The ability to make meaningful comparisons of an individual patient's performance to normative expectations is constrained by how test performance is interpreted. Neuropsychological assessments are usually requested to identify and quantify some form of cognitive impairment. Thus, neuropsychological tests are designed to assess deficits in performance. y In order to assess impairments, there must be some knowledge of normal expectancies of performance. Thus, testing results must have a referent for comparison, either individual comparisons or normative comparison groups.
Individual comparisons of test performance are the most obvious method for identification of deficits. In individual comparisons previous data from a patient are used to compare with present performance. Thus, if a patient demonstrated above-average performance on a test of verbal memory prior to developing herpes simplex encephalitis, but performed at the below-average range of performance on the same test following the illness, one could conclude that a decline in verbal memory had occurred. In reality such a situation rarely occurs because neuropsychological data are usually not available from the period prior to disease or trauma. Even when such data are available there are other possible explanations for the decline in performance. If for any reason the scoring on the first or second exam is incorrect, scores are artificially inflated or minimized. Again, standardization of testing methods and concerted efforts to eliminate testing bias help to reduce this occurrence.
A second referent for neuropsychological test performance interpretation is the population-based comparison. This method is most frequently used in neuropsychological assessment and involves the comparison of individual patients' test scores to norms that detail performance on the measure of interest by a standardization sample. A relative ranking of the individual's score can then be determined according to the performance of the normative sample. A norm is constructed by sampling a large number of individuals, called the standardization sample, on the measure of interest. The standardization sample should be representative of the population of interest. Thus, if one wishes to develop norms for adult performance on a measure of general intelligence, the standardization sample must include individuals from all relevant age ranges, gender distributions, and educational strata. The norms provided for the Wechsler Adult Intelligence Scale-Revised (WAIS-R) are an example of such a procedure. The authors of this test gathered a random sample of 1880 adults from four geographic regions of the United States, stratified for age, gender, ethnic background, occupation, education, and urban versus rural residence according to U.S. census data. The test was administered to these individuals, providing a range of performance measures. Thus, the standardization sample is reflective of adults residing in the United States
Figure 27.2.4-1 The normal distribution is approached when many scores of a given test are collected from a random sample. This distribution is symmetrically shaped with common measures of central tendency (mean, median, mode) and equal variance (sd = standard deviation) on either side. The percentages of cases falling within each standard deviation are presented between the dashed lines. The ranking of any single score may be represented by standard deviation units, z score units, or percentile ranking.
at the time of testing, and the norms of WAIS-R performance are reflective of this population.
Random sampling is required for the development of most norms, because a random sample, stratified for pertinent variables, is one of the assumptions of a normal distribution. Sampling theory states that large, random samples of test scores will approach a normal distribution as the sample size approaches infinity. The characteristics of a normal distribution are particularly well-suited to norm development ( .F.ig,..2Z-.1 ). In a normal distribution the measures of central tendency (mean, median, mode) are unitary and the dispersion of scores (standard deviation) on either side of mean is symmetrical. The shape of a normal distribution has been called a bell-shape curve because it peaks in the middle and trails off as you approach either tail. The characteristics of a normal distribution allow the ranking of average performance. Approximately 68 percent of all cases (or WAIS-R scores in our example) fall within one standard deviation of the mean, and approximately 95 percent of all cases fall within two standard deviations of the mean. This characteristic allows for a meaningful comparison of an individual's score to the standardization sample. Therefore, one can express an individual's performance in terms of standard deviations away from the mean of the standardization sample; z- scores, which represent the individual's score minus the mean of the standardization sample divided by the standard deviation of the standardization sample; or a percentile ranking based on the standardization sample. In most neuropsychological assessments, one of these methods of ranking an individual's performance will be reported.
Comparison of an individual's performance to these norms then provides a meaningful yardstick to identify strengths and weaknesses. If someone is performing above the mean of the standardization sample, that performance is above average for the population. Such a comparison scheme is very helpful in tracking patients' abilities and disabilities. It does, however, raise the question of whether lower than average performance really equals impaired performance. Since the standardization sample assumes normal distributional properties, the probability of occurrence of any score can be calculated. In most clinical neuropsychological assessments, performance is defined as impaired when the probability of obtaining a score that low or lower in the standardization sample is less than 5 percent. This corresponds to the standard level of significance used in most statistical analyses.
Since the standardization sample is used to interpret individual test performance, an important consideration in all neuropsychological assessments is the adequacy of the sample. Even with well-normed tests such as the WAIS-R, care needs to be exercised when comparing specific individuals with the standardization sample. For example, the standardization sample for the WAIS-R has adequate representation of individuals up to the age of 75, but does not provide any normative information for individuals over the age of 75. Thus, if the performance on a 90- year-old patient on the WAIS-R is compared with this standardization sample the results may not truly reflect the performance of all 90-year-old persons.
Many other tests used in neuropsychological assessments have much more limited standardization samples, and as such may be subject to erroneous interpretations. The interpretation of many neuropsychological tests is based on standardization samples of less than 100 scores. Heaton and colleagues  have attempted to generate better norms for many commonly used neuropsychological tests. Their efforts have surely led to improved interpretative methods.
However, even this effort suffers from a lack of generalizability to the populations of interest. This is because the standardization samples they used were not randomly selected but rather were composed of volunteers from limited geographic regions. Therefore, it is important for the consumer of neuropsychological assessment results to be aware of how the interpretations are made and the norms used in those interpretations.
Was this article helpful?