Interview with Kathleen T. Williams, Ph.D., NCSP, Author of the Expressive Vocabulary Test, and VP for the Office of Academic Initiatives and Test Development at the College Board

Schreiber: Hi Kathleen. Thanks for taking the time this morning to talk with me about the Expressive Vocabulary Test. Let's tell the readers what the Expressive Vocabulary Test is and what it will measure.Williams: The test [the Expressive Vocabulary Test] was designed to give the examinee a chance

Schreiber: Hi Kathleen. Thanks for taking the time this morning to talk with me about the Expressive Vocabulary Test. Let's tell the readers what the Expressive Vocabulary Test is and what it will measure.

Williams: The test [the Expressive Vocabulary Test] was designed to give the examinee a chance to talk and to demonstrate the size of his or her vocabulary knowledge by speaking. It was designed as the co-norm measure to the third edition of the Peabody Picture Vocabulary Test. That's how it was created. And I had the opportunity to author the first version and now the revision. The PPVT-4, which is widely used, is easy to give because the examiner says a word and all the examinee has to do is point. That has been and continues to be a very popular test with young children or individuals with disabilities -physical disabilities or other handicapping conditions -because it's very nonthreatening. The respondent can even give an indication of his or her response with an eye blink. It's sampling just the vocabulary that's presented by the examiner. With the Expressive Vocabulary Test, the examinee is presented with pictures and situations along with the stimulus question and the examinee has the freedom to tell the examiner, from the examinee's experiences and background, what his or her label for that concept would be. So what I like to tell people is that it's like getting the other half of the picture. With the PPVT, you get the receptive half; you get a sampling of how much vocabulary an individual understands. But with the EVT, you get an understanding of how much vocabulary the individual can actually use and what kind of concepts they know.

Schreiber: It is a test that's administered individually, and the age range is similar to the PPVT?

Williams: Exactly the same. And the original, and now the revision, are both normed on the same population. Co-norming was really meeting an unmet need because if a professional wanted the receptive/expressive picture, they were forced to use two tests that were normed on different populations. Psychometrically that's a very invalid approach. I don't think you have to have a lot of measurement knowledge to understand why that really is an invalid comparison of scores. You'd have populations that were sampled at different points in time. So if I sampled people in 1980 with one test and came up with normative scores and then I sampled people in 1990 with another test, especially in the area of vocabulary; well, vocabulary changes, and the usage of a word changes, like "vehicle" in 1980 was not as common as it was in 1990. So if you are making that comparison of the two tests, normed on different populations at different points in time, you don't have to get very deep into psychometric principles to understand qualitatively why it's not an appropriate approach. So we had this unmet need.

Schreiber: So that was the original EVT and that was published in about 1997, correct?

Williams: Correct.

Schreiber: So now you've revised the EVT and the EVT-2 is the result. Give us a glimpse of the revision and some of the key features of the revision.

Williams: Well revising a vocabulary test approximately every 10 years is important because of vocabulary usage and change. How common words are to people at any age changes. So there were items that needed to be changed, illustrations that needed to be updated. And also we had learned a lot in the approximately 7 or 8 years since the EVT had been out. In the original, we used both labeling and synonym items because we wanted to get at a wider range of vocabulary; we wanted to make it more focused, and that's why we came up with the synonym-item type where the examiner says a word and the examinee says another word that meant the same thing and went with the picture context. And that way we could get at not just nouns, but verbs, adjectives, and adverbs. And when we went to the revision, we expanded that even more by asking a very specific stimulus question with each item. In that way, we got at numbers, colors, and what I like to call life-skill words -words that are important for functioning in an environment. For example, it would be very important to know that an adult with disabilities understands what a "banister" or a "railing" is, in the event they're given instructions to use the railing in a stairwell.

And we enhanced the test with words that individuals would only encounter in classroom settings. In fact, one of the features of the new editions of both the PPVT-4 and EVT-2 is the descriptive analyses that I developed. There are four different descriptive analyses that people can use, and one, for example, categorizes words as words at home and words at school. We can go into that a little bit more if we have time, but that's what we did with the new EVT-2.

One of the things we found out and one of the questions I got asked over and over again when I'd go out and do workshops on vocabulary development and language, is people would come up to me and say, "I think there's something wrong with your test. I got a higher score on the EVT-2 than the PPVT-4 and that's just impossible." "It's easier to hear something and just point than it is to speak." Well, that's not exactly true. When you're sitting in a classroom and you're listening to a lecture, you're the listener in any situation. The listener is not in control of the situation. The listener is not in control of the syntax that the speaker uses and not in control of the vocabulary that the speaker selects. So somebody can be in a receptive situation and not perform as well as when we let them talk. By letting the individual talk -the child talk--he or she can demonstrate that "OK, you've showed me a picture of a couch on the PPVT-4 and you said, 'Show me sofa.' And I've never heard of that word, but if you show me that same picture on the EVT-2 and say 'What is this?', I can tell you it's a davenport, or a couch, or whatever the word is that I've learned in my context. I understand the concept and I have a label for it, but it's not the one you picked for me when you gave me the receptive test."

When you see a student who scores higher on the EVT-2 than the PPVT-4, that's valid -that certainly can be valid. But you need to spend some time exploring why that is. One of the interesting scenarios I've seen is individuals from maybe a slightly different culture, not the common middle-class culture, scoring higher on the EVT-2, scoring in the average range. Well what does that mean? It certainly doesn't mean that they're language deprived. It just means that their language is different. It's not the same set of labels that you tested them on in the PPVT-4 but it's allowed them to speak and demonstrate their vocabulary knowledge. Well guess what? You find out they have a pretty good lexicon. It may be culturally different, but the labels are appropriate labels.

Another example, and let's be real dramatic, you gave the PPVT-4 and the standard score was 75 and then you gave the EVT-2 and the standard score was 100. OK, here's a child who performs better and engages in the task better when you let him speak. This is a good thing to learn about the child. And the other thing you need to recognize is that this child has conceptual knowledge. Teaching a child that there's more than one label for something is so much easier if she or he has a label for it -he or she has concepts. So if you see a child like that you say great, this child may need some intervention, but it's simply to learn multiple labels for things.

The other thing that can happen is that a child can score a lot higher, way above average on the EVT-2. And you ask yourself, if the child does better when he's allowed to speak, then I wonder if the reason he's having difficulty in the classroom is because the classroom is so receptive? When I was working in the schools (I worked in the schools both as a speech-language pathologist and a school psychologist and I also have a license in elementary education), I found the main reason kids get referred is because they don't listen, they don't follow directions, and they can't copy from the board. You could take those three referrals and just make a checklist and, you know yourself, that's what teachers say. Well here you took this little kid and in just a few minutes (the PPVT-4 takes you about less than 10 minutes to give, the EVT-2 maybe around 10-15 minutes), you find out that this kid has some really good verbal skills if you let him speak.

Schreiber: That's a very interesting concept.

Williams: It is very interesting and it's very informative. Most importantly, we don't want a child to get mislabeled. We don't want to under-identify with any of these brief measures. The EVT-2 and the PPVT-4 are both very brief measures. They're not giving you the whole picture. They're giving a nice, valid, reliable snapshot of somebody's vocabulary. But if you're really going to look at their language, you want to use something that delves into different language skills.

Williams: In doing the revision, we saw that there were a lot of reasons that you could get really helpful information by giving both the PPVT-4 and EVT-2. So one of the things we did was to expand the item pool greatly, and we came up with two parallel testing forms for both the PPVT-4 and the EVT-2. We foresaw that people would want to do a lot of re-testing and tracking of progress receptively and expressively. We've added a growth scale value for monitoring progress. There's a descriptive analysis, because we have a bigger item pool and we expanded the item types. You can now look at the words as for home or school, or as a part of speech. You can do an analysis by Beck's approach, the three-tier approach, where you divide the words into tiers.

Schreiber: And there's better balance for diversity, isn't there?

Williams: Right.

Schreiber: You've talked about the atypical profile that you would get if you have a child who scores high on the EVT-2 and then lower receptively, but it's more typical to have the reverse. And when you have that profile then you could be dealing with a word-retrieval problem.

Williams: Exactly. That's usually the first thing people are thinking about: Is this a retrieval problem? And they begin looking for that higher receptive, lower expressive profile. But psychometrically it comes out to be 50/50. Half will have one higher than the other.

Schreiber: Well that's logical. And is this discussed in the User Manual?

Williams: Yes, how to do the comparison is discussed in the manual and some of the reasons for it.

Schreiber: Now the sample size for your normative data is a lot larger than in the past. And you did match to the current U.S. Census data.

Williams: Very closely! Amazingly close! That's AGS/Pearson's good work. And one of the reasons they went for such a large sample, is to have grade-based norms as well as age-based norms. And grade-based norms are, I guess, what a lot of schools are using for Reading First programs. The sample is exactly the same for both. That took a lot of work.

Schreiber: I bet it did, with a lot of hands helping in the process. I know our readers would be interested in knowing about the growth scale value (GSV) because the previous version didn't have that feature, correct?

Williams: Right, that's been added to both the PPVT-4 and the EVT-2. And there is a conversion formula for the old PPVT-III or an original EVT score. The GSV was developed using Rasch's Item Response Theory (IRT). The best way to explain it to people is that it's an equal interval scale, just like measuring somebody in inches. Let's say you're an 8-year-old and you get a standard score of 100 based on so many raw score units. And then the next year when you're 9, if you've made a year growth in vocabulary, you should get the same standard score (100) because you're one year older. But what if you really wanted to see the growth? How would you put a level of ability on the more difficult items that had to be answered to get a higher raw score? That's where the Rasch scaling helps. And this is what's good about having multiple forms of the test. Some people would like to use this as a measure of what the student's response to intervention (RTI) is. Vocabulary is influenced so much by reading and other programs that if you wanted to measure the impact of your reading program, you could actually do it with a vocabulary test, even though it's oral, it would demonstrate how much the student is profiting from what's going on in the reading program. So let's say you gave one form and you got a raw score of 61, a GSV of 136. And then let's say 6 months later you had a raw score of 70, a GSV of 143. You could plot these scores to get a visual idea of how much change occurred over time. Now, of course, the raw scores can be computed to standard scores. And the standard score may still indicate a below-average performance, but there is still growth occurring. So the GSV gives you that equal interval scale to say that even though the student I'm working with is still below average, there's measurable growth occurring even in this short time interval. I think it's going to be a helpful device for people who want to give parents or others they're working with an indication of growth.

Schreiber: If a student were receiving services for most of his or her school years, you could see the growth pattern over time and it would be very helpful.

Williams: Exactly.

Schreiber: The revised record form is so user friendly; the examiner doesn't have to do a lot of note taking or writing. You've given a lot of possible responses so you can just circle the examinee's response.

Williams: In the previous version, we listed the most common correct and the most common incorrect responses. And we put the least common correct and least common incorrect in the manual. But with this version, we put all possible correct on the record form. So there's nothing to look up. If the child doesn't say one of the listed responses, it wasn't accepted during standardization. And that's what people want to know. They don't want to have to make any guesses. And then for the incorrect column, because it's an open-ended response and you could get a thousand different incorrect answers, we did put just the most common incorrect, and have certainly put the prompts, or what needs to be prompted for a better answer. Also, there are certain words that can't be prompted and that's explained both in the manual and briefly on the record form.

Schreiber: Kathleen, you've given us excellent insight into the revisions for the EVT-2 and interpretations of examinee performance. It truly will be a test that balances the PPVT-4. For more information, readers can visit www.speechandlanguage.com

Schreiber: Thank you so much for your interview today and best wishes at the College Board.

Williams: Thank you.