And they might find that interesting too to hear, just something about that ‘cause it’s---it’s very unique. Yeah, Charles Stansfield, that’s who I am.
Well, my company is called Second Language Testing Incorporated, and I created it, it’s a little over six years old now, it was created in a---early April of 1994, um---and---a---at that---that was just at the time that I was leaving the Center for Bilinguistics. Um---and I had the idea that I a---it was clear to me that---that---more and more the possibility of virtual company was becoming a reality, um---actually I had done a test virtually, in a sense, a---through the web, in 1989 with Alana Shohami, um---in those---those were the days of bit net and a---we used to a---it was a Hebrew test that I’d gotten a contract for, and um---we um---a---subcontracted with a---with Alana at a---at Tel Aviv University. And I was then at Cal, and everyday a---she would send us um---stuff that her---she and her graduate students had developed for the test, various things that they had put together. And there’s no---an eight or nine hour time difference between a---Israel and the east coast, and so, they would e-mail that to us everyday as a file and we would get it just as we were coming in to work and then by the end of the day we would a---have prepared our reply and our critique of everything they had done, and then we would e-mail it out to them before the end of the day, and that process repeated itself everyday for a year. And actually it worked out very well because um---it---all of the feedback that they got was all in writing and had to be very specific and very plainly stated. And um---in the days prior to that everything was done, in a---in a sense, through oral communication, or much of it was done through oral communication, so it---I immediately saw that this a---a---this had good potential for giving good clear advice, and---an---an---and you could work with anyone, anywhere in the world. And a---so when I---when I left Cal I decided that that would be a model to follow, and a---the company um---has followed that model, and any given year I’ll work with about fifty people a---all over the world um---whatever contracts that I get, or work that I get t---to carry out I try to s---to---a---to delegate out everything I can, and then what I can’t I---I do myself. And um---so, I’m constantly a---a---in---interacting with people through e-mail, sending and receiving files, um----etcetera. The company does test translation, a---test adaptation, and we also do foreign language proficiency test development, foreign language in ESL. Um---the um---the work is---I guess pretty much split between all of those, a---we’re doing quite a bit a---a growing amount of test translation, test adaptation, we’ve a---translated the GED tests into Spanish, or adapted them I should say, graded and adaptation in that case, for Spanish , and a---so that was really a---the GED is a five test battery, so we’ve got a---and we did three parallel forms, so that’s fifteen instruments, plus we did practice tests, so that was quite a bit of work. And then a---for state assessments we’ve done a---Massachusetts for---since 1996, and Rhode Island also so---no we did Massachusetts since 1997, and Rhode Island since 1996, a---translating the state assessments to a---to Spanish, and in the case of Rhode Island, the first year, in ’96, we also translated to Portuguese and---and a---Lao and Comer. And a---we’ve done some translation; to Haitian Creole and to Navajo and um---so we do quite a bit of test translation, I think we’ve done more than anyone in terms of a---of that. Um---and---a---of course there we’re working with translators who have to be trained in item writing techniques because test translation is a lot like literary translation, it’s a specialized form of translation, just as you can’t translate poetry unless you’re a poet, um---you can’t um---really translate a test and---and do a good job of it unless you were an experienced item writer. Um---and so the a---the---the translators have to be trained in item writing before they can begin to translate the test, and then we get it reviewed by another translator/item writer as well, and usually a couple of reviews along those lines. So um---we---and then of course foreign languages---a---whatever---um---I typically will a---try to get contracts with the federal government, or sometimes subcontract with state---with larger, what you might call, full service testing companies who don’t have a foreign language group within them. When they get a contract that involves multiple tests and amongst them is a foreign language then they’ll subcontract that out to me. So all of this works virtually, and I’m the only full time employee, and I sit at my computer from about a, 8 AM to 8 PM, a---just about seven days a week, and um---it’s a lot of fun.
For that is an article---is a---is a volume that I a---edited several years ago---a---must a been about 1987 in came out. It’s dated, but there hasn’t been that many new test come out. The Woodcock Munoz a---is a new one, but---but there hasn’t been very many new tests come out.
Well, I think as we---you know the accountability movement has---has been strong at least since the ‘70s, and someone here---I was at a presentation the other day and---said that it---it goes back to the ‘50s, but a---I certainly remember it becoming very strong in the ‘70s, kind a been as a reaction to the ‘60s, which was a very much feel good about yourself period. That is a---we told a---educators that the most important thing was not a---student learning, or student achievement, but rather that the---the child develop a positive self-image. And so---a---everything that---that---essentially the educator would infer from that a---an---and in fact, often was very directly stated, a---that the students need a---positive reinforcement, a---not negative reinforcement. A---they don’t need to a---be pushed, they’ll learn when they’re ready a---etcetera. And so this was---this pretty much characterized education in the United States during the ‘60s, and into the early ‘70s, but about that period a---a---a reaction---a---developed---a---which essentially said that um---a---um---a---we need accountability, we’re not sure what’s being---what’s being learned, if anything, and um---a---so state legislatures started um---to get the message and they started a---um---creating um---various mechanism for establishing accountability. We go back in the ‘70s, it was performance objectives and a---and continued with a minimal cont---minimal competency testing---a---basic skills testing in the ‘80s, and---a--- now in the ‘90s we have standards based testing, and---a---so that really takes us up---up to the current day.
Well, I---I would say that, that at the moment um---since the---since the late 1980s we’ve been---we’ve seen something a---occur, and that’s something called a---alternative testing. And a---in the nin---late 1980s this term started being used. Today it’s often a---can be confused with the term alternate testing, which is someth---an alternate test, which is something different. But an alternative test a---going back into the ‘80s, was any test that was not multiple choice. And um---and so almost by definition the student then had to do something, had to produce something, typically write something, speak something, a---or do---or---or demonstrate, give a response in some other way. Um---and that was viewed as the alternative to um---to---to the traditional multiple-choice test. Um---other things surrounded that, like a---the concept of um---authentic testing, authentic assessment, um---which um----a was an effort that---that came about in fields other than language, but they borrowed a term out of the language field, a---authentic, um---to essentially say that we want to create tests which will re---have tasks on ‘em, like the tasks that one has to use in the real world, or one has to carry out in the real world. So an algebra test, a---the algebra problems, might go way beyond mere presenting a---a formula to---to solve, or asking a question a---outside of a context. A---everything of---a would be put within a context, a---like the old word problems, so um---in a sense what all---what---what authentic assessment does is to go back and create um---a---a word problem a---and a fairly complex word problem, which becomes the contextualization for an item, and that would be the case in algebra and social studies and math, a---science, etcetera. Everything is contextualized rather than decontextualized now.
Well I think that um---we’re---we’re at a stage right now where---where alternative assessment is still very popular. A---we may---we may see some decline in popularity, um---I---I think that what’s happened is that alternative assessment has posed some problems. A---the um---a---the alternative assessment, authentic assessment, it was said, again going back ten years, that this form of assessment would be f---more fair to the language minority student. Um---actually I felt at the time that wasn’t going to be the case and it’s turned that that, that has been the case, a---because it---what it’s done is put a heavier language load into the assessment. A---I---a currently am working with the state of Delaware on a---um---a project that---involving linguistics simplification of a---of the state assessments, and a---there we’re dealing with the science assessments, and we were looking, a couple a weeks ago at---at eight grade science assessments, and we found a---an item, an eight grade science item, which had a readability level of grade fourteen. Now that’s because, again, of the---the large amount of contextualization, the use of a---low frequency vocabulary, as science vocabulary often is in terms of daily conversation, and a---as a result we have now attested as more measure of reading, and reading comprehension, than it is a measure of science. It’s tapping m---reading much more than it’s tapping science. Um---so---a---of course, items like that require a---a great deal of sim---of simplification, and unless you’re in the language you don’t rea---you’re not attuned to language being an issue that you’re---that you’re---that you’re tapping---a---with your tests. So when we go to---when we look at al---at alternative assessment a---particularly it---when it means that the student must write out the response, which is typi---the typical type of response that’s involved, um---now the student doesn’t simply have to have the receptor skills to understand the question, they also have to have the productive skills to express the answer. And a---al---alternative assessment, authentic assessment, a---there’s an issue a---in a process in all of these, a---an emphasis on process, more than simply giving the right answer, explain how you get to the right answer, and show that you understand the process behind it. Well that requires a---a good command of the rhetoric, of the language, to---to explain all of that. And of course language minority students don’t a---don’t have that, and---and in fact, a---that’s been one of the downfalls a---of---of---of process based questions is that um---students, whether they are even native speaking students, are---are not native English students are not taught a---how to think through the process of how they got to the correct answer. They learn to do it once, they automatism the process, and then they forget how to describe the process. Um---so um---a---but at any rate all of this has a---has---has---had a---the effect of putting a great language load in---in---in assessments that are being used today in the schools.
Well, a---I think that that---that is a possibility. I think the problem with simplifying a---all text is that we also want the students to grow and to be able to handle increasing texts of increasing sophistication, in linguistic sophistications, in linguistic complexity. Um---but is---as far as what linguistic simplification is; it’s the process of---it---of---of---I often define it in the simple term of reducing the readability level, or the reading---a---the reading level of a text. Now to do that, that’s easy to say, how do we do it really is the issue. And a---that will involve substituting high frequency vocabulary for low frequency vocabulary. So if we see a word like inebriate we’re gonna replace it with a word such as drunk---a---which is a much higher frequency. And um---so in a sense we um---we’ll look at words which a---we read a text, underline words which we thinks have a---a lower frequency form. A---we actually often do it with a word frequency a---list as well, and with a s---with a thesaurus, because a---sometimes we’re---we’re asking ourselves, now what is a local frequency word for this, and---and a---so we can---we can approach it that way. The important thing to be careful of in testing, particularly in content based testing, testing in areas other than language, is that we don’t a---simple---we don’t attempt to simplify terms that---which are terms that are jargon a---that are germane to the subject being tested. Um---so---a---if we---if we do that then we remove a---a knowledge of the vocabulary, the jargon of the field, which---then teachers in that field feel that is a part of knowledge of the field. So we’re, in a sense, taking out content from the text, we don’t wanna remove content, we simply wanna t---a reduce the language load. So um---we will---wa---one issue---wa---one fir---that the first step, or one important step is reducing the um---um---the---a---difficulty vocabulary that the students are bei---are having to read. The second step is reducing the a---grammatical complexity of the a---of the sentences. Um---that often involves a---taking complex sentences and breaking them down into simple se---simpler sentences. A---one sentence that’s long may become two sentences that are short, or three sentences that are short. Um---sometimes, actually, um---if---in---a---it can---it can work the reverse, that is sometimes we will take a uh---an idea which is expressed in a fairly short sentence, which has a lot of imbedding, and we will take that idea and um---a---put more phrases, more clauses into the sentence to actually a---s---a---make the sentence, at least in terms of surface structure, um---more transparent. Um---and um---a---the deep structure is a---is---is less simple, in that case. And sometimes, of course, we break it into separate sentences. So it a---it---it---it often results, in fact, in more words in the um---in the text, although not always because many time in---when we’re looking at assessments we see that the emphasis---the emphasis on um---authentic assessment and on contextualization, a---means that there’s a lot of verbiage in there that doesn’t need to be there. And so as we get to the issue of simplifying test questions in the content areas, we’re also looking at a---is---does this verbiage need to be there. And um---it’s not that we through out all the contextualization, but we throw out unnecessary verbiage in the contextualization, because that again, that just simply adds to the reading load, to the processing time for the student a---etcetera. So this is really what the linguistic simplification of the test item looks like. And it’s a lot like that in terms of a---um---um---a text itself, you’re gonna simplify a text, it’s gonna be approached pretty much the same way.
Well---um---typically people acquire receptive skills before their---they acquire product---a---productive skills. And so---a---when we get to authentic assessments that require the students to produce a response, then we are tapping a skill and ability which may be less developed then the---then the receptive skill. A---one of the nice things about the old multiple choice test is that it doesn’t require production, and fo---that may actually favor a---an English language learner. But because it’s not it’s---it’s going with the skill which they probably have more developed a---then---then the one that they have less---less developed.
Well, I guess a principle cautionary note would be that with the um---a---when the student does have to write out the response, in alternative assessment, the teachers in the content areas have to be sensitive to the fact that they’re dealing with an English language learner. And the don’t want to be grading on expression. Um---they need to look past a cer---surface structure errors and English expression and get to the content. Now, obviously that means that they’re not going to grade off for spelling, and things of that nature, for missing defend articles, but it also means that they have to be little bit generous, because many times the student will know the content um---but will not be able to express it. And so the---the teacher needs to look for key words that a---that---that---that are---underlie a concept and see if that key word is there in the a---in---in the um---response. It may not be developed, it may not be adequately elaborated, the idea behind that key word, but if the key word is there then the teachers says, ca---is in a position to say to themselves; well, this student does grasp the concept and simply lack in the language skills to express it. And at that point they’re going to be much more generous in the way they approach the a---th---approach the grading of that---that paper.
Well I think that the key in---in terms of the nature of language proficiency, is the word proficiency. An---and proficiency in testing in general, if we are taking it outside of the area of language, means a---a measure which allows us to make some statement that---that a person is probably gonna be able to perform a---at a given level, carry out a task successfully. And so we apply that then to um---to language, and what we find is that a proficiency test is a test that is based on language, you’re ability to use language in the real world. And so the nature of language proficiency as---as a construct in itself, proficiency, what does proficiency mean in language testing, it means the ability to use language in the real world. Um---so if we take a scale, like the Foreign Service Institute scale, or the Interagency Language Roundtable scale, rediscovernment scale, all those are referring to the same scale. Um---we have a zero to five scale there with points one through five being clearly defined, and each point representing um---in---in real world terms, the a---what the pur---what the kinds of tasks that the person would be able, speaking task of---and---a situations that weren’t---the---that the person would be able to handle, and with what degree of general accuracy. So that if a person re---it---a---after an o---oral proficiency interview achieves a 3, a level 3, then we have certain knowledge and expectations about what that person can do in the real world. And those are broad expectations a---because it’s a broad---a broad level a---a----a---of proficiency that’s demonstrated. So proficiency um---is of course---I could say a lot more about it. It’s certainly not a---based on---on instruction. Um---a---the point of reference is the real world, it’s not any particular curricum---curriculum. Curricular-based test is a progress test, or an achievement test. Um---so---a---it’s hard to prepare for a proficiency test other than just generally improving your language competence. Where as um---an achievement test would be based on um---um---the results would be based on how well you’ve mastered a particular curriculum or course of study.
Well I think there’s problems a---with um---with the use of a single instrument a---to---to make a---a decision about a student. But certainly that’s typically what the teacher is offered, they’re offered a score of one through five, or zero through five, on the LAS test for instance, the Language Assessment Scales, uh---and in the case of most language minorities st---minority students it’s going to be a---something between one and a four or one and a three. Um---those are pretty broad distinctions and um---actually the a---um---level three is a---a---what we might call a level of inner language, there’s not really much structure there, and at level four now, we’re beginning to see some a---some fairly complete a---approaching the native speaker level a---set of a---of a---um---of---of internalized language a---proficiency. Um---so, we’re really down to a two level system, because level one is really level zero on the LAS, and level two is the level of listening comprehension, so the LAS really is a two point scale for most a---for most students. So a---it doesn’t tell us um---a great deal, but it’s a starting point, I don’t wanna single out the LAS, a---it is the most widely used instrument, but there are other instruments as well that um---that---that don’t---that also have their problems, um---that are used in the public schools. Ultimately what we need is to consider a---a score on something like that as a starting point. A---if we see a student with a---has a LAS score of three, um---if we see a student with a LAS score of two, we can se---realize that on the---on the actual test himself a---the student was able to say very little. And a---so they really are at a listening comprehension only level, not at a speaking level. And level three they---they---they have um---a---a very elementary command of spoken language. Um---so both of those a---two types of generalization that we can make about those scores, a---are useful to us, and they both, of course, indicate that the child needs a---a---a---an extensive amount of instruction um---in order to bring up their English skills, a---in order for them to become mainstream. Um---so we’re talking about a student that is going to require quite a bit of a---instruction, but as you can see, that’s essentially a classification decision. A---as far as getting a profile of strengths and weaknesses, a diagnostic profile, we have a long ways to go.
Well, I think we need um---a---instruments that would allow a---a---kind of a---um---if we go back to the bilingual syntax measure um---it had, I think, a---eighteen questions on it that were scored and each of them was assessing a different grammatical structure, was getting a different grammatical, sometimes there was two questions, generally of the same structure, and the student had to produce the response, there was no guessing factor. Um---if we could take an instrument like that and look and see which um---um---grammatical structure thes---thi---the student demonstrated in either a partial or fu---fu---full able---ability with, then, now we are getting at diagnostic information which can be useful to---to the teacher. Um---um---that a VSM has fallen into disuse an---but a---but I think that it had it---it had some real strengths in that respect, in terms of being, you know, provide diag---diagnostic information to the teacher. A---the IPT a---test is a---is somewhat similar to the VSM, it’s um---a---and probably the second most widely used a---test today, um---after the LAS, possibly, I’m not sure, the Woodcock-Munoz is a---has a lot of users as well, as does the LAB and a---particularly in the northeast language assessment battery. But um---um---the a---the IPT um---one could transcribe, in fact one does have to transcribe in many cases, the students responses. And a---and then do a kind of content analysis, content profile, to see which grammatical structures the student has a---has a---acquired a knowledge of and ability to utilize, and um---and---and then go fo---move forward from there. Ideally, testing has always provided diagnosting information, but it’s just never been utilized. And um---a---but of course in multiple choice testing we have the guessing factor, so it’s hard to make any assumptions, a solid assumption, to about how the student got to that correct answer. In productive skills testing we can make, a---particularly in the case of short answer questions, a---we can make a---a---some---we do get some clear information about what the student knows and---and can do.
Well, a---I think that um---a----cognitive academic language ability, of course, is a---is in a sense a high level of language proficiency. Um---the a---and we have to combine it with the whole idea of content knowledge, a---because as we acquire cognitive academic language we are talking about the kinds of---the kind of languages as associated with the mastery of given---given areas of content. Um---so, it’s hard to separate, actually, language from content, um---certainly at language proficiency i---should grow, is supposed to grow, as---a---as---as we acquire more content knowledge. More content gives us more to talk about, and um---um---one it’s hard to im---one of the reasons that the scale, like the government scale, is said to be biased against a---illiterate people, is because um---the higher points on the scale assume that one has a college degree and can talk about a wide variety of the kinds of things that a college---a---educated person should be able to say something about. But the fact---and some people question: is that real, is that really, than, a measure of communicative competence? But the fact is that an educated person can talk about more things than an uneducated person. And so if the language competence underlying that education is also there, then um---at that point we see the person has greater communicative competence. As far as teasing out the two, separating the two, I think it’s important to understand that they’re both intertwined, and yet one can exist without the other. One can have a---a knowledge of the subject matter without the language skills, one can have a kind of general language competence without the specialized a---vocabulary of a particular subject area of which one may only wa---know in ones native language. Um---and---similarly one can have good language proficiency, pretty good language proficiency, with a---in fact excellent language proficiency, but without the a---a---the knowledge of a particular subject area in the vocabulary and the expression routines that are associated with that subject area.
Well I think that a---when we get to the issue of---of state assessments a---in particular, because a---that’s where we see a great deal of testing of English language learners today, or an increasing amount of testing, um---the principle factor is of course, language. Um---a---we’re trying to get to---a---we’re trying to administer a test of a subject area, typically math or science and third most frequently, social studies, um---sometimes health a---related to science but a---sometimes a separate assessment for that. Um---and in each of these cases the student has content knowledge which they---they can express the test, as assigned, to assess that content knowledge, but they---they are going to have to demonstrate it by understanding complex questions, and they’re going to have to often give a---short answers or extended response a---types of answers. Um---so language plays a role and in that case it’s hard to separate language from---from content knowledge, because language is the vehicle by which we measure content knowledge, and so at that point we have a co---a confounding of the two. Um---the---the solution a---in this clearly is the---is to reduce the language load and the---in the tests of content knowledge and a---I think we have a long ways to go in that area. Um---a---I think we will begin to see that in the next couple of years, a---that---that as the a---new Title One regulations require all students to be included in state assessment programs, and the states are held accountable for the limited English proficient students, and they see the scores looking quite poor, they’re gonna ask themselves: what can we do---a---to make these tests less biased against these students, because now they will be---they will have an interest in the---the test being linguistically fair to the students themselves. That is the---the district will have, and the state will have an interest in---in seeing the tests be fair to the student, because if it---if it---a---considers um---the students lack of full English language proficiency, and if it’s designed in such a way so it doesn’t require a high degree of language proficiency, a---and is focusing more on the content knowledge, then the state, actually, will look better to provide a more valid score, and the state will a---will---seem---will---will---be able to show that it’s actually educating the student, in that content area. So, ---language is the principle issue, but there are other issues as well, a---there’s familiarity with the testing context, with the formal test situation, um---there’s a---familiarity with multiple-choice answer sheets, a---with the multiple-choice test item format, um---with the pressure surrounding high stakes testing, um---all of these things um---a---if a student is not accustomed to them, a---then they’re going to pr---affect performance. And finding, we just think of assessing content knowledge, the um---a---the---the student who is a---is a recent arrival, say at the secondary level, a---some grade level on---at the secondary level from another country, has content knowledge, and even at the advanced, at the elementary grades as well, a---they have content knowledge but the content knowledge was acquired in the---in another language. And it’s another content, it’s not the same curriculum, it’s not based on the state standard, it’s based on a---on a different a---set of a---of ideas about the curriculum should include. And so in that case, the test is---is going to probably be an underassessment, the test score is going to reflect an underassessment, of the students actual command of that content, because our standards based tests are, in many respects, curriculum oriented tests.
OK. Well that’s a very interesting a---and complex set of issues actually. Of course if we go back, only five years, a---certainly seven years, a---we would see that it’s---it was routine to exclude a---limited English proficient students from a---any of the state assessment programs, they simply didn’t have to participate. Um---there were two reasons, kind of opposite reasons given. Um---first a---it was---a---it was felt that the---that the district a---that a---let me say first it was felt, the test itself, was unfair to the student because the students lack of engli---of English language proficiency, there was an awareness at that time um---that a---that the te---that there wasn---that the test itself would be unfair for the student. And so the students um---was excluded from having to take the test, and as a result, and this was never discussed in terms of the rationalization for excluding the student, but the district then wasn’t held accountable, and the school wasn’t held accountable, for the students scores, because the student didn’t take the test, the scores never appeared in the school totals, or the district totals, or the state totals. And so we go back of---about a---in the late ‘80s there was a bit of a scandal because someone published an article a---talking about the Lake Woebegone effect. A---actually a pediatrician who a---worked in Appalachia a---found that um---um---that---he was very about the---level of knowledge of the students in---of the children that he was treating, and a---he checked a---got some test results from the district, and found that a---in spite of the fact that he was in a very poor district, the students a---the district scores---the district means on all the standardized test administered in the district, were above the national average. Um---and he said that just can’t be, so he checked with neighboring districts and their means were all above the national ass---average. And so on his own he sent out a questionnaire to districts all around the country. And every district that responded was---had---had mean scores, average scores, above the national average on these standardized tests, and so it caused a---a good bit of a scandal. A---b---but selection of students of students, or exclusion of students, was one of the reasons um---that um---a---that that turned out to be the case. And um---a---so what we’ve seen a---is a growing awareness um---on the part of educators, particularly language minority educators, um---that---when students are excluded from the testing program, the district really isn’t held accountable. Um---and that if students a---are going to improve, if---if---lan---language minority students are going to get the fu---get the---enjoy the full affects of education reform, a---which is to have a refocusing, a rechanneling of energy in developing content knowledge, particularly in math and science, then those students are going to have to be um---a---not simply set aside in some special program down the hall, but rather they’re gonna be brought in---have to be brought into the mainstream and made a part of the---of the---of the---the---a---the school districts a---total population. And that, in the past, a---simply wasn’t the case, I mean, ESL teachers and bilingual teachers enjoyed the um---you might say the advantage of being left alone, to educate the children on their own, a---and---and I think they did a good job, and they worked hard at it, but a---for a variety of reasons um---the---the children weren’t brought into the mainstream um---and---a---for the ESL teacher and bilingual teacher that was often very comfortable a---to have full responsibility for those---for those---those children. And they identified with them a great deal, but somewhere in the mix um---things have---haven’t turned out a---as---as people would have hoped. A---and so what we see is that we have a high dropout rate among language minorities children. Hispanics, for instance, nationally have a dropout rate depen---from state to state, between thirty and thirty-five, sometimes higher, percent. That proportion of students aren’t completing high school and are dropping out of high school. That’s not good, in spite of all of the effort and energy that’s been put into their education. Um---well, part of it has to do with---with other things that happened besides what the ESL or bilingual teacher does. Um---the---those teachers are ultimately trying to make up for deficiencies, and at some point the student is supposed to exit that program and go into a mainstream program, with a mainstream content teacher. Um---as that happens a---what we find then is a whole new set of---of a---of expectations come into place, and those expectations are often expectations that the student won’t have to perform, won’t a---be asked to do things, and as a result, they’re not given the attention that they---that they would otherwise get. They’re not encouraged to go into advanced placement a---they---a---enroll in courses that are going to lead to take---taking advanced placement exam, they’re not expected to go to college, um---and they’re---they’re tracked out, or not count---it---they’re tracked out on a negative way, or they’re not counseled in a positive way. So by including students in the state assessments, um---we are, and the state assessments being the vehicle for accountability, we are ensuring that these students get the same attention, in fact will probably get more attention, in order to make sure that um---that they are able to perform well a---on those state assessments. That’s the rationale behind it, and a---I think it’s a good rationale.
Well, when districts publish test scores, um---a---a---going back to what I was saying earlier about a---trying to make district (coughs) responsible for those students.
When districts publish test scores, um---those scores are essentially mean scores, um---for each school in the district, and each subject area. And um---statistics are nice, but they a---the---s---individuals get lost in statistics, and that’s the weakness with statistics. And in a---and so do groups, um---so if we have a---a mean score for all students, it looks like it’s about what it should be, a---that looks fine, but it hides all of the students that are---that are below that mean. A---it hides all the levels of a neutral performance, and it also hides all the groups that are below that mean. So, in order to be able to judge (unintelligible) whether a district is doing a good job of educating its language minority students, we really need to see these scores broken out, a---what we call disaggregated, by different types of groups. A---historically in testing we’ve separated by gender, male female, so we think a test like the SAT, there’s one set of norms for a man and another set of more---norms for women. Um---and---and---um---typically in test programs they will---they will do that type of breakout as well to see if---if there’s any potential bias in the a---in the results, and if the---if men do better than women then they ask them, or if women do better than men they ask themselves why is this and is this reasonable, as to be expected. If it’s reasonable, to be expected, then actually that---that---that works in favor of the validity of the test, if it’s not to be expected then it’s a threat to validity and people have to examine it much more carefully. But, going back to the issue of disaggregating test scores, by both including language minority students in the total we um---um---we get the effect of have---making distant---district responsible for those students. But also by disaggregating them from the total, we allow the public to look at how this district is---how well---a---how good a job the district is doing at educating specific groups of students. And so, actually, it---it enhances the accountability by disaggregating scores. So scores should be both aggregated and disaggregated, and um---what we see is that states have been more and more willing to aggregate, they’re not happy about disaggregating a---those scores, but ultimately a---that’s necessary for the state, for the district, and for the school to have a clear picture of how well they’re doing with---with specific groups
Well, the inclu---improving America Schools Act of 1994 is essentially the reauthorization of the old elementary a---s---a---and---a---secondary education act of 1965, which gets reauthorized, I think, every five to seven years. And um---it was given a new name in 1994, and so that’s the Improving America Schools Act, or ni---IASA, as it’s often called. A---it---a---requires that by the year---the academic year 2000, 2001, scho---states and school districts um---adm---a---have a program of assessment for all students in the district. It does---the wording of the act actually does not allow for any exclusions um---on any bases. Now it may end up that---that---a---as we get a---closer to the a---the deadline that um---that---that exceptions are allowed on a---in extreme cases, but a---they’re not going to be allowed for language proficiency, they may be allowed in certain areas of special education. Um---and what’s it’s going to require a---is that---a---again all students be included in the assessment program, and so then that takes to the issue of (interrupts for cough) So then that takes us to the issue of: how do we include all students in the assessment program, and do so in a way that’s fair to the student and that produces the most valid test score. Um---because um---I guess I would say---stop to say that the test itself is not valid a---f---for everybody, there is such a thing as a fit between the test and the examinee. So while we may have provide---may---a---the test publisher may offer evidence of the validating of that evidence may be very credible, a---we still have to be concerned about the fit between the test and any given examinee. So, how are we going to make this test more valid for these language minority students is the---is the issue the districts will having---will be having to face. And they’re gonna have to face that by the academic year 2000-2001. A by the end---by the spring of 2001, which is essentially a year from---from right now, a---they’re going to have to have, not only a---a formal program in place, and be able to have a description of it, a---but also they’re gonna have to have criteria for accommodating these students into that program. And by accommodating I mean a---both the general meaning as well as the specific meaning which we use in the testing field, which is some a---change in the, typically in the test administration procedure, or in some aspect of the---of the testing program to a---address a---a difficulty or deficiency which a student may have which is not related to the subject that’s being tested on the test. Um---so, at the moment a---I’m wearing glasses and that’s an accommodation that’s given to me. Um---I don’t really need it at the moment, but I’m---when I’m looking at the computer a---I do need it, a---without it I wouldn’t be able to perform very well ‘cause I have terrible vision. Um---and---and if you were to offer that accommodation to someone who doesn’t use glasses, who doesn’t need glasses, if I were to hand my glasses to someone who doesn’t need them, a---they---they wouldn’t want them, a---not only because it would mess up their vision, but because a---in addition it simply wouldn’t help their vision. And a---the same can be said an---a---I think the glasses as an example, is a good example of what a good accommodation is, and it provides us a frame of reference for judging an accommodation. A---a---an accommodation in an ideal world makes up for a deficiency which is not related to the content being assessed by the test. Um---the districts then have a---a---a large number of accommodation options that are open to them. Many of these accommodations are a---simple and easy to a---allow. Um---if we go back ten years we see that very few districts allowed extra time on a standardized test. Today most districts are allowing extra time on state assessments. Um---it was previously believed that standardized tests should all be timed and that included state assessments, and so any time a student received an accommodation such as extra time, that was considered to break standardization and therefore to, in a sense, invalidate the score, invalidate any comparison between that score and another---and another student. In spite of the fact that---that it was considered that---that achievement tests, standards based assessments, a---were supposed to be power tests and not speeded tests. And therefore speed of res---getting through the test was supposed to have nothing to do with your actual score. So it theory, getting additional time to complete the test shouldn’t invalidate the test score at all, but nonetheless, this is the way things were---were looked at. Um---extra time is, today, a common um---accommodation the districts make, and I think it’s a useful one. It’s of use to---to---students who are kind of at a mid level in their language proficiency, they’re---they’re---they’re---a---perhaps mainstreamed already, they can deal with the English language in a fairly broad way, with a wide variety of English language, but their processing time is slower a---because they have to through a---a---a longer process of decoding a---what they’re---takes them longer to decode what they’re reading, and---and the---they are having to tu---to be---to pay more attention to the um---a---to the language itself, and---and think through those issues in the decoding process, and then get to the content. So, all of which puts additional stress on ‘em, slows them down, and takes them longer to get through the---through the assessment. Um---that’s um---um---that’s one aspect of accommodation, but that’s only one accommodation, and I could talk a, you know, a great deal about different accommodations, there are---there are many accommodations a district---districts can make. Some of them deal with the---with the a---administrative setup um---um---in the sense of a---I just gave an example, the time in your scheduling accommodation, a----there are also response accommodations so that the student may actually dictate their answer and then that answer would be written down by someone else, that---that would be what we would call a response accommodation. (Pause for coughs and drinks) And so besides presentation and res---and a---response accommodations, um---I’m sorry, ti---besides timing, scheduling, and response accommodations, we also have presentation accommodations. And those are accommodations typically in the---in the test format, um---that is they a---they may involve a translation or a native language version of the test. They may involve um---access to um---a---a glossary, or---which is actually on the test, um---a---at the bottom of the page there may be certain words in English which are defined in English which are not relevant to the content, but nonetheless, may be words the student may not know. Those are---those are all presentation a---accommodations. Um---well, that’ll---that’ll get us a---that gives us---gives us some idea of the kinds of accommodations a district can make.
Well, historically accommodations a---have been used um---for sometime in the field of special ed---education, essentially over a decade, and um---as a result, special ed programs have developed a list of accommodations and they’ve classified the accommodations in the way that I just described. So that classification system of presentation, response, setting, I didn’t mention, but the whole idea of taking a test in a---a---in a small room as opposing to a large um---a---a---a large auditorium or a cafeteria, or taking a test with a familiar teacher, such a special teacher or the ESL teacher, these are all setting accommodations, which are possible as well. And ESL students have been---are typically granted a---many of those accommodations which are granted to a---to special ed students. The problem with those kinds of accommodations is they don’t really address the---the particular deficiency that the ESL student has, which is a language deficiency. So if we say that we will allow the ESL student, the English language learner, the opportunity to dictate their answer and have someone write it down, um---that---that may be a---useful a---to special ed students who have some type of physical impairment, or have some type of dyslexic or other type of problem which prevents them from writing um---but they have the content knowledge and we want to assess that content knowledge. Um---a---a blind student may have this sa---similar problem, a---but you have content knowledge. So that’s a relevant accommodation for many students in special education, and---a---it addresses their deficiency. Um---in the language fi---in---with English language learners, when we transfer these accommodations over, they---they don’t do any harm, but the question is how much good to they do, and for how many students. A---if---so what we have to do is to ask ourselves as teachers who have English language learners as students in our classrooms, what are the language---what are the deficiencies that this student has a---that are going to affect their performance on their test, which are not relevant to the content of the test. What, in measurement, we would call construct or relevant variance, a---that is, they’re going to---those deficiencies are going to play a role in the outcome, but they’re not relevant to the construct being assessed. (Faint siren in the background) Um---and it---again, in the case of the ESL student, of the a---English language learner, the large deficiency---the largest of---amount of variance, of construct irrelevant variance, is going to come from the language factor. So, how do we accommodate the language factor? Well, of course, that’s the central issue we have to ask ourselves in standardized testing for limited---for English language learners. An obvious way is through creating a translated version of the assessment. That is to translate the version, a---the---the standards based assessment, directly into the students’ native language, and allow the student to take the test in the native language. Translation’s not a panacea, I just wanna say that much about them, and I wanna go on (voice changes) um---that in great depth, a---but it will address the needs of many students, particularly students who have been educated in the language, so if we’re dealing with a bilingual program or an emersion program where the students have been educated in the language or are being educated in the language, um---then a translation---translated version of the test---of the assessments gonna be very appropriate. Um---on---if we’re dealing with a students who’s recently arrived from another country and has good literacy skills in---in the native language, then the translated version will be more appropriate than an English version, if the student doesn’t have a good command of English. It may not be fully appropriate, because that student hasn’t received a---instruction in the same um---in the same curriculum. Maybe the same content but not the same curriculum. Um---but still, it’ll be more appropriate than the English language assessment. Um---but there are other kinds of linguistic accommodations that can be a---that---that can be carried, that can be allowed to the student, one, of course, is a linguistically simplified version of the test, where we’re---where we’re replacing low frequency vocabulary in English with high frequency vocabulary in English. And generally reducing the---the reading---the readability level of the test, the grade reading level of the---of the---of the test items. So that now we’re not, instead of test---of assessing say science and English a---proficiency, the test is much more tapping science than it is English proficiency. So, that’s another thing that we can do to address the---the a---to make an accommodation which---which addresses the deficiency that the English language learner has. We can create a glossary, in English, of terms that are not related to the content a---define those terms, usually at the bottom of the of the page is the way it’s done often, um---and---so we create a parallel test booklet for the---for the English language learner with the same items on the page, but at the bottom of the page there’s four, five, or six definitions of---of words in English which a---the English language learner a---may not know. And um---n---um---and so that’s a---another thing that we can do, and of course there’s bilingual dictionaries, a---which we can allow the English language learner to use. Bilingual dictionaries, there’s a lot of confusion about bilingual dictionaries, um---if the---the average a---monolingual a---speaker of English says, you say dictionary to ‘em and they say: no, that’s not fair, because dictionaries define words, and so if a student doesn’t know a term, they can look it up in the dictionary and they’ll get a definition of it, and this is providing the English language learner with an unfair advantage. Well that’s not the case of---of the standard bilingual dictionary. Now when you get to a---a very large, thick, um---comprehensive, bilingual dictionary (pause) they do um---in some cases if they can’t give a synonym, then they may a---they---they may give some type of a---of a definition. A---and they will certainly also give um---um---um---the---um---um---use the word in context, a---in a---in a---in a kind of---in---in a context which wou---might be an or---ordinary usage for that term. But if we take a small, bilingual dictionary, what you might call a school dictionary, school based, bilingual dictionary, which is not a three or four inch thick dictionary, maybe a one inch thick dictionary, um---they are---the bilingual dictionary is simply going to be a---a list of parrot associates, you’re gonna have one word in English and one word in the other language, and vice versa. Um---and that doesn’t define terms, and so a---a bilingual dictionary, a small, bilingual dictionary, of that nature is actually a big assist to the student because it takes this word in English, which doesn’t have any meaning for the student, and it gives them, the student, that word in their native language and now all of a sudden the word and the---and the---and the---the---the---the sentence and the question acquire meaning. And, so that’s another thing that we can do to help the English language learner, it’s a relevant modification, a---which a---can be applied to English language learners to help them show what they can actually do on the test and reduce the language load.
Well, actually the---the question of reliability and validity of classroom tests has---has long been a problem. To some extent um---it’s viewed that---that---a---this is a measurement specialist bringing this issue to bear on classroom teachers and judging their tests, and they shouldn’t be judged that way. And I think there’s some---there’s---there’s some truth in that, but on the other hand, if teachers are going to draw inferences about whether the student has learned the content, they need to be sensitive to the issues of reliability and validity. And they can make a big difference, actually, the failure to consider those issues, not only in a---in---and appreciate, and the meaningfulness of a students score, but even in what a student learns. Let me start with a reliability example. Um---to improve the reliability of a test it’s---it’s---it’s pretty easy, all we---the---for---a---a basic rule is to increase the length of the test. So if we have a ten item test, um---that’s gonna give us a low reliability. If we increase it to thirty items, now we’re gonna start to get pretty moderate reliability, pretty good reliability, um---with a thirty item test. What that means is that if we picked another thirty items, or if another teacher came up with a thirty item test of the same content, then there would be a high correlation, or pretty good correlation, between the two scores that the student would take---from taking test, would---would obtain, from taking test A and test B. So now we can have some confidence in the score that we can’t have with a thirty item---with a ten-item test. In a thirty-item test how much confidence does that allow? Well, not as much as a forty item test. When we get to forty items, now we’re getting to um---um---probably a---a test of---of---of some respectable degree of reliability. So, um---the---the issue of reliability is um---wh---it---it may sound simplistic to simply say increase the test length, but if you realize that a twenty item test is like two, ten item tests, and a forty item test is like four, ten item tests, now we’ve got four different measures of a students competency on a---in the given subject area, or on a given area of the content. And those four tests, or that test that’s four times as long, is going to give us a much clearer picture of that students strengths and weaknesses in terms of mastery of that aspect of the a---of the subject matter. So that’s why reliability is important. Now, validity is um---a---is a more complex issue, but um---let me give an example from the foreign language field. A---foreign language teachers have no---have been notorious over the years for saying: ‘I want my students to learn to speak French,’ or, ‘I want my students to learn to speak Spanish,’ a---and then---and then focusing their instruction actually on developing, listening, and listening comprehension, and speaking skills. Well, what happens when the tests come up---comes up, it’s not a test of listening skills, it’s not a test of re---of ---of speaking skills, it’s entirely a test of grammar and vocabulary, and it’s entirely a written test. And, so, the teacher, then, doesn’t understand why the student doesn’t have a good attitude as they’re working hard to drill the teachers---to drill the students to a---to develop those listening and---and speaking skills. A---the students are a---bright, and they soon learn what the teacher feels is important, and they judge what the teacher feels is important by what the teacher tests. And that’s a fair a---conclusion to draw. And so, again, when the tea---when there’s this---this juncture between what the teacher says is important, or what the teacher does in the classroom in their---in their regular instruction, and what they test, then we get results that um---that don’t reflect the curriculum, and don’t reflect what um---um---a true measure of what the tea---perhaps the student has learned, and don’t provide any positive backwash to the instruction and to the goals of the teacher. When the student is constantly being given tests that are written, and yet the teacher is saying that their first goal is to develop speaking skills, a---then there is no backwash, there is no integration between the types of assessment used and the curriculum that’s---that’s a---being given to the student. So that’s an example of the lack of validity in testing. Now, in the different content areas, each---teachers in each content area would have to---would have to look and see the um---a---the issue of the match between what their goals are in that content, what are the states standards in that content area, and make sure that the test itself reflects those state standards and those goals. If---if---if---the test does, then there’s gonna be this positive backwash between the assessment and the instruction.
Well, if we’re---if we’re---if we’re talking about an alternative assessment, that’s a fairly high stakes assessment, the a---high stakes from the in---individual student, or for the district, for the school districts, because a---typically your standards based assessments are not really high stakes for the individual student, but rather for the district, because the district um---if the district performs poorly um---quite frankly um---it um---superintendent may have to leave, um---there may be shake-ups of various ty---various types. If the school performs poorly a---there can be big repercussions, again, the principal may have to leave, there may be pressure on the principal, a---or on the superintendent to put a---bring a new principal in to improve the schools scores, and---and this comes from---from parents, not only who are interested in their children, it also comes from realtors who are interested in property values, and teachers, a---b---wh---buyers of---of real estate, of homes, will look and see how well students in that school are performing on statewide assessments before buying a home in---in---in the region that’s served by that school. So this has um---a---actually a---a---a---real affect in the real world, um---and um---um---standards based assessments ha---a---are---are really quite important in the real world.
Well, um---the---there is---there is a certain familiarity affect in performance. That is; once a person has practiced something, they’re gonna be---they’re gonna d---be better able to do something that they’ve practiced, not something that they haven’t practiced. So we have the issue of practice, and if one is practiced a particular kind of language, or one is used to talking about a topic or writing about a topic or reading about a topic, then you’re gonna be more familiar, more facile, in dealing with that topic in terms of language. Um---there is also a content a---matter, if you’re familiar with the content um---then you’re going to be able to use the language to talk about that content, if you---if you studied the content in that language. So, these are---these are both factors that a---that---that---that come to play here.
Well, um---a---I would say in---sh---in---in---in