JAMAL ABEDI:   My name is Jamal Abedi.  I’m director of technical projects at the UCLA Center for Evaluation. CRES, National Center for uh Student Testing and Evaluation.

Uh, the issue concerning the standardized testing and students with limited English proficiency is extremely important issues uh because uh we have done some research, there are some—many research on the standards achievement test and they’ll all saying that we have to be careful and we have to make sure that the standardized achievement tests are used correctly with the students of limited English proficiency, the reason for that because language is an issue. Uh, some of the items may not have—maybe to complex of linguistics for those students. If they do not understand the language of the questions, they will not be able to actually respond to the questions. We have lot of uh research, literature, suggesting that the students with limited English proficiency has the content knowledge, but they may not have the ability or the language capability to express the content that they know. So we have to be extremely uh careful that we uh, uh use the language—the type of language they—they understand, we do not use uh complex language. Uh, we try to differentiate between uh, related uh, linguistic complexity or language to the content assessment and unrelated. We have to avoid unrelated language complexity of the test items. Those are the type of things that uh, uh would be problematic. Those reduce the validity of the test and creates some type of uh measurement conditions that may not uh be productive—may not be good.

Uh, it’s very simple like—an example of language complexity probably would be uh the length of items. Longer items are more difficult, for instance, that’s the—a simple example, uh we have done some research and we have found the students with a—a limited English proficiency may have uh difficulty understanding and following items with the longest stem and line alter—options.  Uh, because—because items with long uh and lot—lot of language would be difficult for them to—to absorb, to—to understand. So the shorter form—I mean that would be, like a said, a simple example. The shorter the better it would be for them. Uh, we—we talk about some other linguistic uh—uh issues, for instance, passive versus active voice. Uh, it would be easier for them to—for the questions to be in active rather than passive voice. Uh, would be uh, uh easier for them to—to stay with uh simple um uh language of tests rather than, for instance, conditional clause for instance. That’s another just—just trying to make it as easy as—as understandable as possible.

Uh, the difference between related and non-related vocabulary uh in items, there are some uh terms that are uh related to the content, to the assessment. For instance, in math, circle, rectangular, uh all those things, fractions. Things like that are parts of assessment, but when—when you talk about—for instance, we had and item in the national assessments that says that uh a certain reference file contains such and such. In that particular item, the—the issue was about how many millions, how many hundreds is in—are in millions. But when you say a certain reference file contains that, the student think that they have to know exactly what they mean by a certain reference file. But we changed that items to uh x of that many hamburgers rather than a certain reference file contains that—that many files and they could understand that very well.  So—so by unrelated vocabulary I mean anything that are not related to the content being assessed. 

Yeah, you know, research and also some other national research uh it is clearly shown that when you avoid complex language that everyone benefits from it. So it’s not just the students with limited English proficiency. However, when you reduce the com—complex language, you reduce the gap between the students with limited English proficiency and others. So the—the gap becomes closer and closer, but—but everyone benefits. I don’t think that anyone wants to have very complex language that are not related to the content uh because you are not testing the language, you are testing the content knowledge. So they have to understand and you have to—the teachers have to make sure that questions are so clear they get to the point and the point being the content being assessed rather than measuring the language. Uh, language is—is ex—language and culture both are not only about the language.  Anything that could be uh thought of as bias language or cultural bias should be avoided in—in questions.

Um, anything that uh relates to a culture, um for instance any kind of holidays, anything that is a cultural related if the test construction assumes that all different cultures no about uh—about that particular uh term.  Uh, um, for instance, uh, um I—I may not be able to give you a—a good example right now but uh, uh different types of holidays for different cultures. Uh, Cinco de Miyo, for instance, for one culture may be understandable for other—other cultures it may not.  If you use that as a term and the others do not understand that, that may be a cultural bias.

Uh, reliability and validity are very, very important topics; at the same time they’re complex measurement issues. Um, I don’t want to go through technical terms um because there are a statistical, very complex statistical. We teach those concepts in—in classrooms in graduate schools and there are some, as I said, very uh complextic we call issues involved in those, but uh to put it in the way that we all follow um—we all understand is when you are talking about reliability or talking about consistency over assessment. When you use assessment uh in one occasion and you use it in other—in other occasion, you get the same results. If you do not get the same results, then—then there are some errors—measurement error involved, uh, if—if some questions are not clear, then in one occasion, a student may have some kind of under—understanding.  In another occasion, may have different types of understanding. So—and again, language may be a very important uh part of this.  If the language is complex, then the understanding may be different on different occasions on—by different subgroups.  So by reliability we mean, the test should measure what it is intended to measure. Uh, if test is going to measure uh let’s say content knowledge in math and science, it should not measure language, for instance. Again, I—I—I want to talk about the issue—the importance of issue of complexity of language or any other factor, culture, language or any other factor that is—that are not related to the content assessment. That actually adds or creates some validity issues—validity problems.  So in order uh for test to be valid, tests must assess or measure whatever it is intended to measure and we have to avoid any kind of factors that could affect any kind of unrelated or any kind of problem that may cause the validity uh to—to follow those kind of tests.

Uh, in order to—first of all, rubric are, when you are talking about alternative assessment, um or authentic assessment, you have to have ru—rubrics because one of the main issues in alternative assessment is uh scoring. In multiple choices, anyone can score it and they have—they come to the same—same type of answer, same type of scores, even computers, machines can uh score because there is—I mean, quite clear valid options. But in multiple—in alternative assessment, uh there are open-ended questions. And open ended questions—the main issue for them would be uh scoring because scoring to some extent is a little bit more subjective then—so they set up the scoring—in order to come up with objective scoring, you have to have rubrics. Uh, to—to—to have more objective of course you have to have fair rubric, more defined, a more objective rubric. You have to have instructions—clear instructions how to score, you have to give uh examples, sample items. For instance, if you have a rubric with a 1, 2, 4, 1, 2, 5, you have clearly define what you mean by 1, what do you mean by 2, and so forth. How much response, what type of response you need to get in order to—to ask only 1, or 2, or 3, or 4. So it clearly defines different levels. What is the purpose? What’s the instruction?  What do you expect from the students in order to assign certain level of scores.  Then you create this rubric, then you have to validate it. You have to actually give it to a group of the scorers and ask them to use it and score a test and then if possible, compare that with some other existing scores. For instance, you have grade point average and let’s say you—you want to create a rubric for history or math. You have existing math and history scores to compare it with that and see if your—if the scores that are results that are based on the scoring rubric are consistent with the previous scores.  So a kind of equality assessment concurrent validity meaning that you have to correlate, you have to compare your newest scores with the valid, existing scores, if possible.

OK, um should we put it in some kind of content areas for instance in math or science? (interruption)  In math, let’s say for Grade 4 students, you expect them to do uh, um multiplication, division well and you—you expect them to come up with some explanation of how they did that.  If—if—if that’s the case then that would probably be good. But if they cannot explain what they have found then that may be a so-so average or poor maybe something that they couldn’t even follow the instructions of the—the operation.  So—so you need to define exactly what are expectations. What do you expect from the student to do?  And again, as um—and when you uh write this rubric, you have to uh provide clear examples of what is a good uh—uh maybe—maybe provide some examples of the good uh responses from uh students and just clearly say, “This is what—what—these are some type of things that deserve a good uh, uh categorization or a good score.  This is type of uh responses that is not so good.” So come up with as many examples object uh as—as possible. 

Uh, anything uh has to have a base.  Without a base uh you really, I mean uh the type of scoring rubric you create may not be valid. Uh, so let’s say uh for history or geography you want to create a rubric. It has to be based on—on the curriculum, on the uh type of knowledge the students should have and has to be based on the—on some kind of uh principle of measurement—principle of assessment.  Measurement theory and things like that. And when you create and again it has to be based on some techniques, validation techniques and things like that. So I suggest that uh if teachers are going to uh start doing—uh creating rubrics, familiarize themselves with some uh basic concepts of measurement, of assessment in classrooms.  There are good books that are uh really very, very useful to read and follow.  I’m not saying about very technical books like “Theory of Measurements” because those—those may be—but there are some books that are written particularly for uh, uh classroom teachers and they will be extremely helpful to read and follow. See for instance what is reliability, validity. Uh, in general, what’s the theory of measurement? What is meant by error of measurement?  Uh, how do we avoid uh measurement errors and things like that?  So that’s something that I highly recommend to teachers.

Experience first. (interruption)  Um, what teachers need to know in assessment in order to do a good assessment uh, uh first of all be—of course they need to know some basic concepts in assessment and experience is—is very important. But experience alone and knowledge alone may not work. Uh, it has to be—it has to be a combination of that. Again, uh doing some—uh referring to some books that are designed for classroom teachers would be—would be extremely helpful.  Uh, to just gain some uh knowledge and understanding of assessments in general, especially because uh theory of measurements and measurement issues that we teach in measurement classes at universities, they are very different with what classroom teachers actually need to know. Classroom teachers need to know a uh practical views, practical aspects of their assessment. Uh, a combination of uh experience and uh theories, uh existing theories.

 What questions should teachers consider uh—uh for evaluating the test that the uh—they construct. The first question is relevant.  How relevant are those questions?  How clear are those questions?  Content wise, are those relevant to do as students because opportunity to learn, for instance is one of the most important issues in assessment.  When they create questions, they have to think if the students have the opportunity to have—to know, to learn those concepts, they have to know if the students actually—uh those are the contents that are considered in the—in the classroom. So classroom instruction and assessment match between those—these two are so important. Uh, the issue of culture and language is extremely important issue of those relevant.  If—if a teacher teaches in classroom with the majority of the students of being limited English proficient, it’s—the broad question to ask is, “Are these questions simple enough linguistically for them to understand? Are there any technical terms, linguistic terms that may not be relevant or that are not part of content that could be avoided?”  Any kind of linguistic uh complexity or cultural bias that could be avoided.  First question is—uh a question is “Are those still there?  Do we think about them?” And maybe just use them in a small group of students, couple of the students interview, use the questions and ask them “Are these sentences clear? Are the questions clear? Any—any part that you may not understand?” Just a short little interlude a couple of the students would be very helpful.  Occasionally we do that and we—we find out that talking to students, one, two, or three students and asking them about the uh—whether they followed the questions is a great help—is extremely im—extremely helpful.

The choice between alternative and traditional assessment, uh by traditional assessment usually we mean multiple choice and standard achievement test and a balance between these two I think should be used uh, uh—both of them should be used and they—both of them have some—some benefits, some usefulness.  Uh, the traditional assessment—the good thing about traditional assessment is—because it has representation of the content. When we have uh enough good or large number of items, we know that those items actually are represented of the content. But in alternative assessment, even though we are doing the above and beyond recognition because in multiple choice as you know is mainly recognition of the—but in alternative assessment we go to a higher level of thinking but we are limited on number of questions. We may not have enough questions to represent the different parts of the text or instruction. So a balance between those two would be great. Using alternative assessment, understanding limitations with alternative assessment.  Uh, one of the big limitations in alternative assessment is that they are built for native English speakers and uh the students with limited English proficiency may have difficulty on this—on some of the language.  Uh, uh so understanding limitations of those, uh using them with the, again, conjunction with a—with a—with the alter—authentic assessment or alternative assessment would be the best thing to do.  Uh, another major limitation with au—with traditional assessment is—is that being only one single criteria.  Uh, one of the dangers of using only one criteria is that whatever that criteria is, it’s not reliable. Uh, for assessment, the most important thing that we have to do and we need to do is to use logical criteria. Do not trust on a single criteria of whatever that criteria might be. 

We have done some research, I don’t want to name uh particular test, what test was that, but some of the commentary very commonly used language proficiency tests and we found that the uh—uh assessment characteristics or the outcome of that measurement was not enough at all. Uh, the distribution was in term—of the statistical terms was extremely miscued. Meaning that it was—there was-even with students with limited English proficiency was the ceiling affect.  The test did not have enough power to distinguish between high performing, low performing language proficiency because one thing I wanted to say is that when we’re talking about students with limited English proficiency, we are talking about a group with very different characteristics, very different levels of knowledge. We really are not talking about one group with same—they’re—there are very different. They are from high—very high level of language proficiencies to very low. Sometimes even some—uh some students with limited English proficiency may have higher language proficiency then native English speakers but they are categorized, as LEP so there is a big range. Now when we are talking about that, we have to understand that we are not to care about one group very homogenous type of—of the students.  So uh—so some of these tests may not be able to actually uh, uh assess this large range of ability between this. Uh, if the trials are only one of the student’s test as we uh—I mean, as I just told you in that—that part of your case, we may not be able to.  So there are major, major limitations with some of those existing tests and if we use only one of those tests, we will be—how I should say that—in trouble. We’ll really be in trouble because we should not trust.  So using more than one criteria, using uh more than one technique. If you use, let’s say uh test, multiple choice test, we also use interview, we also use other types of assessment. Uh teacher judgment is a valuable tool uh because teachers have uh experience working with—with the students. They spend time. They know them.  So use whatever criteria they have, but avoid using only single criteria and base their judgment on that single criteria.

How teacher can provide a judgment that is useful in terms of measurement uh or kind of a scoring goals so how can we formulize that?  Uh, the best way would probably to follow it’s kind of rubric.  Let’s say 1, 2, 5 or 1, 2, 10 or however you want to put it and make it a little bit more objective. So a teacher has some kind of scoring instruction for herself/himself based on that—says that, for instance, if your students have this level of proficiency I give him/her a score of 5 or something like that. So try to make it as objective as possible and try to convert it uh to—to uh—convert it in some kind of—of the scores or something. Uh, however range teachers want to—to have, but makes some kind of instruction for—if someone else—if some other teachers want to follow the same instruction, they come up with the same scores. So make it as objective rather than subjective—as objective as possible.  But we found on literature—research literature suggests that teacher’s judgment is important, is reliable and is very, very, uh, uh, good tool to use for assessment in conjunction with other—other—other uh assessment tools for such as uh, traditional tests and others.   Uh, it is very helpful.

Yeah, what are the downsides of performance assessment for language minority students?  Uh, performance assessment is important. They touch higher level of language—of uh, um mental process, of uh, mental abilities. So they are very important but at the same time there are some downsides. The first—especially for language assessment—minority students involve more language because when we use uh alternative assessment we are usually talking about essays. Essays involve more language. Not only in the questions but also for the response.  In order for a student to respond to alternative assessment, they have to write. They have to use more language. So there us more language load or more language demands for alternative assessment. And the issue of language becomes even—even more critical, more important in alternative assessment.  Teacher has—teachers have to be aware of that. Uh, for multiple choice tests or uh traditional tests—we call it traditional tests sometimes referred to, uh, um test makers try to avoid sometimes because—because they are—they are making this test for a logical purpose for the students. Making sure that at least taking of some of the complexity of language. But for um, alternative assessment, that may not happen. So, language issues, cultural issues, uh lack of representativeness uh because one of the issues in assessment is the assessment tools, assessment questions be representative of the content and when we have only two or three questions, that may not be a good representation.  So there are—there are major issues and major limitations with alternative assessment and teachers have to be aware of those issues. 

Uh, how teacher observe or do the balance between alternative and traditional assessment. Those are I should say two—two different tools uh sometimes completely different but they are—they are aimed to measure the same thing as to knowledge and a balance between the two would be extremely useful using both of them. Uh, knowing the limitations for—uh, alternative assessments have limitation of a lack of representation sometimes and also more language and cultural load, at the same time measuring higher level of mental ability, mental knowledge. But traditional assessment has the advantage of being more objective—a scoring objective—objective—objectivity of the scoring and also more representation.  So using both of them at the same time uh would be extremely helpful but at the same time you have to be aware that time also—assessment time. You don’t want to spend to much time—you don’t want to take too much of the student’s instruction time on assessment.  This is again teachers uh judgment, uh, how much time of the students he or she wants to take for assessment. That’s another issue that teachers have to—a decision that teachers have to make.

The issue of higher order of thinking with a traditional and alternative assessment. We all know that multiple-choice items test are mainly, not always, are mainly measuring recognition. Uh, because uh—in multiple choice for instance, uh which is four or five choices, uh students have to recognize which choices are the correct answer. Uh sometimes guessing is a big factor because they may not know uh the right answer but they guess and guess is a big factor. It’s one of the issues in error of measurement in multiple-choice factors. But uh, in the um, alternative assessment uh we measure high level of uh thinking. Our lev—level of mental process—uh, processes. Uh, uh because students have to response, have to remember, have to write, have to come of it some answers. The answers are not there. So they have to uh—they have to think about, they have to use this knowledge, they have to use their understanding, they have to think uh and they have to come up with good answers unlike multiple choice that select from.

Uh, one of the main issues, main concern for ESL students—not ESL teachers, all teachers because when we are talking about ESL student in general, language is the most important factor, most important issue. Uh, in order to have a fair assessment for everyone, for all, uh we have to think oft eh content more rather than language. Try to measure the content to the extent possible. Try to avoid any kind of complexity in the language of the—of the test. Uh, how can we do that?  Uh, try to understand the difference between related, unrelated vocabulary. Anything that teacher’s judges or maybe consult with other teachers or maybe even consult with the linguistic expert, a content expert. Teachers are content experts. Uh, just try to make a judgment which part or which—uh, I mean, vocabularies related to the language are not and try to avoid those are not—that are not related.       

My experience as a person who came to this country for a higher education and what experiences I can share with you—a type of experiment I can share with you. Now I just have one story that is very interesting. I—I got my Masters Degree from the University of Tehran and came her uh to Vanderbilt University for—and I got a second masters and PhD in psychometrics. When I arrived there, of course at the University of Tehran, we have to read uh most of our textbooks in English so I had—I had a good level of understanding of writing—reading and writing in English, but not the speaking. When I came here, was the first few months that I had to—I—I—I was with a group of uh—of fellow students. One of them asked me, “Do you mind if I smoke?”  I—I didn’t understand the question, I thought it was polite to say, “Yes,” always to say yes so I said yes and—and then the person actually didn’t smoke. And then later—after a year I realized I should have said, “No, I don’t mind to.” Uh, so I said yes, so the person didn’t smoke.  And we have a lot of—I mean, experiences like that and I went to class for the first time. Uh, teacher gave me uh a lot of—400 pages to read for the first week and it was extremely difficult on me. And I really had to stay and read uh materials and I couldn’t read even—even a third of the materials that they expected us to read, uh, so a lot of difficulties. And teachers of uh ESL students much realize that they are not as fluent as they are. It’s not their native language and takes some time for them. So they have to be patient. They have to understand their limitations; uh they have to understand their cultural differences and things like that. If they do, then—then not only for assessment, for instruction and everything, they will find them to be good students. But if they don’t, uh they uh—I mean—ESL students because of their limitations they are shy, they cannot relate. So—so teachers have to be a little bit more accommodating and uh they—by understanding their limitations, they try to uh work with them.  

Actually both because from—the culture is very different principles and uh things. Um, the cultural differences is very important issues but that when you—when you mix them, language and culture, uh it becomes a very, very difficult mixture and uh it is very, very hard to really overcome problems. Uh, but I should say language is—is the hardest part because when you come here and you cannot relate, even though you—you think you know the language, you cannot speak the language. You know that others cannot understand you. Uh, that’s—that’s—that’s difficult.

Uh, each culture, of course, has its own unique uh characteristics and stuff like that. Uh, in our culture, there is more uh emotion, uh more friend—friendliness, uh I mean caring. For instance, if a friend asks a friend to uh, um—for—for anything, a good friend actually does that. Uh, is less money oriented? More—more—how I say that?  Emotional, but when we come here it’s—it’s not—I’m not saying that—that emotion doesn’t exist, but less.  So uh, these are the types of things when you come here you may not expect your friends as much—uh, your friends to do as much as you expect them in—in our culture.  Or family, for instance, fathers and mothers and uh brother and sisters uh they live with each other for a long time, even after they get married. They’re still one family. Here you see the same thing but not with all the families. Some of the families may not be as related. So there are—there are differences.