I’m Thomas Romberg. I’m a professor at the University of Wisconsin, Madison.

Well, assessment—we don’t think of assessment in terms of at least four types of assessment. Uh, let me start by talking about external assessment, an assessment that’s given by somebody other then the teacher in order to make judgments about their class or the school or whatever. Those—that assessment practice began in the 1830’s in uh (clears throat) Massachusetts when the school board decided they couldn’t trust teacher’s judgment about student’s progress, that they needed to have some series of questions that they gave to students in order to promote them to the next grade or whatever. And it became a part of the history of American education that we—we not only expect teachers to make judgments about kids progress—classroom assessment as giving grades and that kind of thing—but have someone else externally make judgments as to whether or not the teacher did a good job. The kind of accountability issue that is now central in a lot of the current debates has a long history in American education. Um, but the kinds of tests that were given then are not at all like the tests that are currently given. They’re a series of questions that uh students need to respond to usually in some written form. Um, the next step in the history of testing was the work associated with intelligence testing, the Bonay—the Stanford Bonay, the Wexler, and so on. Now those are clinical tests that are expensive and difficult to give, given on an inno--individual basis in order to identify where people are with respect to some normal distribution with respect to beginning with intelligence, but later with respect to mathematical ability or reading ability or whatever. The third part of extern—external testing—in the history of external testing happened after World War I. The intelligence testing in—in depth was back at the turn of the century—of the last century, not this one. But after—during World War I, one of the problems that the American military had—the Army had was for the first time we’d ever had conscript—conscription. We, you know, dragged everybody off the street to be part of the army. And in order to make determination of who ought to be an officer and who ought to be an infantryman and so on, the military decided they needed to have some quick way of judging how people fit in the develop of what was then called “Army Alpha,” which is the first multiple choice test. Uh it was the beginning point of a quick and as they referred to it, dirty, you know, approximation to intelligence. It was an attempt to try to build a—an aptitude measure. Uh it proved to be very successful in terms of the initial breakout. As a consequence of that, the testing industry as we now know—it gradually grew. The Eastern Ivy League schools following World War I decided that as they got applicants from all over the country—might even be Provo, Utah or some place—they were unable to make the same kind of judgment about the applicants as they had from the prep schools in the East whether they came from Choate or Boston Latin or some of the other prep schools there. So that they ask—they formed what was called “The College Entrance Examination Board” and they developed a company educational testing service that developed the SAT, the Scholastic—at that time called the Scholastic Aptitude Test, which is a multiple choice test based on the ideas out of “Army Alpha.” Again that proved to be rather successful, useful as an initial screening device. It—you’re able to then indicate people on some normal distribution and where—where they fit in that normal distribution if you give enough tests. That became the kind of American notion of an indicator, a test to indicate very quickly, very simply where people were in some distribution. And, of course, what happened next was that people started using that technology to develop achievement tests in mathematics, and science, and reading, and social studies. And uh the fourth event in the history of testing is post-World War II when people began to realize, particularly in writing and reading area, that they multiple choice test was not a very good indicator of the underlying construct of writing or of reading. If you answer a series of multiple-choice questions about a half—after reading a passage or after examining something and then say, “Find the writing errors in this piece,” it didn’t tell you very much about whether you could read or whether you could write. And so in the 1960’s there began what is now referred to as “Performance Assessment,” questions about posing students something of a broader nature and then having to go through a process of making judgments about the quality of the responses. Please note, the efficiency of the multiple-choice test, I can count the number of correct responses by machine, by technology. I don’t have to spend a lot of time grading them and so on. I can simply count the number that were cor—there. It’s a very easy and cost-efficient to gather some information, but the real question is, “Does it gather the right kind of information to make some of the judgments we want to make?” Uh, so in writing now, in reading and so on, most of—you know, people still use multiple choice tests because they’re efficient in order to capture some features about a student’s knowledge in a domain. Can they spell? Can they do computation? Can they do certain kinds of relatively simple, routine tasks that schools want and we want to know that? But if you want to go beyond that, “Can they write a paragraph? Can they solve a complex problem? Can they organize a science experiment?” Uh, a multiple-choice test isn’t going to give you that information. What you need is additional information about “Can they use these ideas in solving problems, in creating—writing a passage?” something of that nature. Sot he notion that we need to incorporate into our judgments about students, uh in terms of external assessments, we need to go beyond just the low level kind of skills to something beyond that. And that’s kind of where we are today, is going through the whole variety of processes of trying to build tests that have more then multiple choice questions—now please note, I’m not saying there’s anything wrong with that as a part of what needs to be done. It’s an easy and efficient way to gather information about how—pieces of knowledge students might know. But can they use that knowledge to solve a problem or something else is now what’s happening in a lot of the fields. Now that’s really external testing and part of the history of that. There’s also clinical testing, making judgments about where people are with respect to various attributes or characteristics that psychologists do an awful lot of. Business people do a lot of kind of clinical testing in a variety of different ways. There’s classroom testing, what teachers do in making judgments about kids progress. Um (clears throat) so there’s, you know, a variety of things. But the external assessment is (cough) basically an attempt (interruption) OK, where were we. (interruption) And there’s classroom assessment, the ways in which teachers make judgments about individual students and their progress toward learning outcomes that they’re interested in. And currently much of my research deals with trying to help teachers make better judgments about kids progress but that’s an area that has not been particularly well investigated and needs a—a fair amount of additional information. But the external assessment, which is the thing that uh drives a lot of the current political arguments about—you know, is still with us in the judgment that says, “Look, we can’t really trust teachers to make statements about growth and change and where kids are.” We need some externally administered—designed and administered test that will serve as an accountability vehicle.

Um, OK. Why we—the—the question is, “Why should we feel differently about testing then we did um half a century ago or the…”—I mean, think of it in the following sense. Whenever a new development happens in a particular field like the development of the multiple-choice test. It proved to be a very efficient, easy to administer, easy to score, easy to summarize, easy to del—develop. A lot of features of the—the characteristics of that particular test, whole field of psychometrics grew up and so on, is a consequence of how one deals with those—that kind of information. It’s very reliable, OK? It produces a—a certain kind of information. The difficulty is that it produces only information about a small part of what we expect students to accomplish as a result of schooling. A clearly the—the example within writing or reading make it very clear that if you want to—to test students in their ability to write, you don’t ask them a series of questions about writing, you ask them to write. Similarly in mathematics, if you want students to learn to solve non-routine problems, you don’t ask them a number of questions about “Can you do this, that?” You know, “Can you identify a particular figure? Can you describe something?” You want to say, “Here’s a problem, can you solve it?” And that—you know, the mismeasurement of math is saying basically what we’re looking at is saying that the picture one gets with respect to the kind of assessments that are typically given, the standardized test is a very shallow picture of what they’re able to do. If they really want to—if you really want to know what students are able to do to use the mathematical ideas, use—be able to write, be able to synthesize something, you’re—you’re not going to get it in that kind of test format. But because it is easy and because it’s a rough indicator and because it’s very reliable, people have a tendency to say, “Well, it—it costs too much to do the other. This will be the easiest thing as a—as an initial indicator.” If it’s taken as the original purpose like “Army Alpha” was, as a rough indicator, that’s fine but then it needs to be added to.

Oh, well it—it—it’s true that in almost any field, when something comes along that is—is—is new um whether it be the standardized test, the multiple-choice test that a lot of you put people in—where they are in terms of some—some percentile rank in a norm population, uh it becomes a very useful tool and something that people haven’t had before and they can begin to use it. Uh we do that all the time. We, you know—we event a handheld calculator and the next thing you know people are doing it for a whole lot of things that well—that becomes a very efficient tool to do certain things, but it doesn’t solve all the problems, you know, it certainly does a piece of the problem. And that historically happens in all fields. We event something, we—everybody gets excited. Uh I’ve been through a lot of this over my 50 years of being involved in education now and uh, you know, we thought about that in terms of television. I commented earlier that I taught mathematics over television for year. We thought that was going to be the Savior of education. OK? Um, we now think the computer is the Savior and its been around for 25 years or more. It hasn’t changed much of what’s happening in classrooms in lots of ways because it’s a very efficient tool to do certain things but it’s, you know, new tools are useful and helpful but they don’t solve the problem of “Are you going to get kids to learn to, for example, solve non-routine problems?” It’s a tool that allows you to do some problems you couldn’t do, but it doesn’t magically change what’s important for students to learn about writing or reading or mathematics or science.

The—the history of education in this country is um clearly marked by periods of arguments about the need to reform, the need to change, um the counter reaction of going back and sent back to the basics or let’s go back to what we—we did something better in the past. I mean we go back and forth on this in several ways. The current reform movement is a consequence in large part of the questions being raised in the early 1980’s from a nation at risk, “Educating Americans for the 21st Century,” basically arguing from an economic point of view that in mathematics and science in particular, we were falling behind countries in the rest of world in terms of our preparation—general preparation of the population for the shift and emphasis from a—an industrial society to an information society. Uh, I mean, that’s all part of the—the background to it. Uh assess—so that the reform movement is basically in a—in mathematics, is basically an attempt to upgrade the kind of mathematics we’re expecting students to learn as a consequence of the change in technology, as a consequence of what—what we know about how human bri—beings learn, and third, it’s a consequence of—of our breaking out of isolation from understanding what is being done in other countries. Uh putting those ideas together, back in the early 1980’s it was a call for reform and a call for change. And for the last 20 years we’ve been involved in that. Now clearly, part of the argument is related to assessment. Part of the argument is say—you know, part of the initial rational for why we were falling behind came from kind of the international comparison studies, the second international math comparison that basically says, you know, we’re kind of third world country in terms of our ov—overall mathematical performance. What are we going to do about it? Um, so the reform movement over the—over the past two decades has been gradually growing. Part of it associated with, “Hey, our kids aren’t doing well enough” in contrast to students in the rest of the world and part of it is saying, “Hey, they’re—we really need to rethink the mathematics we expect all students to learn and how they’re to learn it.” Uh, it’s not that we had answers but it was that we—we—there was a problem to be solved and people have been working at it. Now one compliment of that problem is that, yes you want changes in the curriculum. Yes you want change—changes in the way in which mathematics is taught. Yes you want ways—changes in the way in which you judge mathematical performance. And so the question now becomes, “Oh, well that’s how teachers make judgments about kids, but also how external tests are organized, delivered, given, and so on.” And so all of it is a question of—of making some changes to reflect the underlying changes that um are involved in shifting from a—an industrial to an information society.

Uh the Kroch National Studies uh that began shortly after uh Sputnik, uh questions about, you know, wanting to be competitive with other countries. The first international study back in the 60’s uh basically showed that American kids in math and science were not learning the same mathematics at the same level as those in other countries. Uh we tend to be very prevential, historically prevential. That is, we don’t—we assume we’re, you know, a great country with a, you know, powerful society, uh we must be doing a good job in education. And it’s a kind of shock when you find other people are doing—you know, their kids are doing something different. Uh we’ve been very prevential not paying much attention to what other people did back in the 60’s. Gradually what’s happened over the last few years is to begin to examine in some depth what other people do. Oh, you hear lots about what the Japanese do, and now the—the Singapore kids. And uh studies have been done of looking at kids from the Netherlands and from Germany and other industrialized nations that we might be competing with industrially. Uh first of all you learn that they don’t teach the same methodics we teach. Uh they don’t put the big emphasis on computational skills. Uh they—they tend to teach a much more integrated mathematics, uh notions of algebra and geometry and statistics permeate the curriculum early. There’s no such um program as we have historically at the middle school. Where we taught arithmetic up through the—roughly grade 6, and then 7 and 8 are kind of a—or 6, 7, and 8 are kind of a—well as—as been discussed in the literature the—you repeat an awful lot of what was done the first six years. It’s—it’s not really very interesting. It’s making sure everybody knows the arithmetic skills that were important for somebody back in the 1920’s or so. Uh so the middle school was an area that really needed a lot of—of changing. You look at other countries and they’ve got a lot of algebra, a lot of geometry and other things in their curriculum. Have for years. OK? They don’t teach a year long course in geo—in algebra or geometry. They don’t teach an advanced—I mean, their organization at the high school isn’t the same at all. Um so part of it then becomes—is that—the reason that kids are doing better. Uh it’s part of it, not all of it. Part of it also has to do with the way in which curriculum is organized, the—I mean, we—we have a system with a very differentiated policy with respect to education. We have 15, 16,000 different school districts that all make individual decisions. Most other countries have a Minister of Education that makes the basic decisions about the local schools, school districts, and so on about what is going to be emphasized and how it is going to be taught. Uh so that’s part of the issue is that we have a very differentiated society making—and historically what’s grown up is kind of things that are needed, particularly like—like uh coming about the high school math program. These are courses that are assumed to be needed for the university, college entrance. You know, this is the college entrance track. Uh other countries don’t do that in quite the same way. They begin to differentiate students at about age 14 or 15 and then separate them in variety of ways. Uh the other thing we’ve learned is that other countries have never bought into the multiple-choice test as a vehicle for making reasonable assessments. Um they see the value from an economical point of view. They’re easy to administer and score and all of that, but they have never believed that they’re very valid for what it is they wanted. It’s interesting to note on these international studies that are usually involved, they’ve used multiple-choice tests because of their efficiency. And people—countries like Japan or even Singapore or, you know, many of the European countries who do well on those tests say, “Yeah, but the—this is not—this doesn’t measure what we teach our kids.” They’re very unhappy—the Japanese are very unhappy with their performance on the international studies, even though they rank right at the top because they say, “That’s not what we want our kids to know.” OK? It’s interesting to—and we’d—we’d like to compete with them. Right? We’d like to be up—say, that’s a good indicator. Uh, there is a new international test that is going to be given—was given this spring. Uh, it’s called “The Program for International um Student Assessment,” PISA uh done by the organi—Organization of Economic Cooperation and Development, OECD, in Paris. Uh, this test is a consequence of the concern of many of the other industrialized nations about Temes and about Simes and the—the earlier studies of saying, you know, “We no longer want to administer the American kind of test. We want to administer a test that more reflects what we think kids ought to have learned and approach it.” So that new international study um test was administered this spring and results will be out. But please note it’s an expensive proposition to now go through and have people making judgments about the performance on rather open-ended tasks where you have to have two judges go through and come to agreement that this is the appropriate score and all of this performance things that we think are critical. And the—the Europeans, the other industrialized nations say, “This is a much fairer picture of—of mathematics performance or achievement then what we’ve typically done.” Um, we’re making progress.

Well, we don’t do everything other countries do because our history of our educational system and what we—we’re trying to do. Historically we have—and—and most countries have a—a common school up through a particular grade. Everybody studies basically the same things and then at some point they begin to differentiate. They begin to say some students need to uh begin to specialize and so we differentiate in certain ways. Uh, historically in the United States we have the common school that went from grade 1 to grade 8, in which we assumed all students would be in school and there were certain skills, reading, writing, arithmetic, etc., would be the common understanding. And then until the Great Depression of the 1930’s, only a few of the students who finished grade 8 went on to high school. High school was college preparatory or there might be some technical training or something else. But we differentiated right after grade 8. Uh, the Depression changed that for us. We couldn’t—I mean, kids when they finish in grade 8 couldn’t go out and find a job. There weren’t jobs to be had. You had to do something with them so over time we created the comprehensive high school kind of thing, the junior high and the high school, and some way to keep a larger proportion of the students in that track. Most European countries keep students a little longer in a common track, like up through about grade 9 or something of that nature. And then they—they differentiate, but when they differentiate they differentiate into schools. For example, in the Netherlands there’s a—at that time you shifted into a school that might be math/science emphasis or it might be liberal arts, both college bound. Or it might be industrial uh—or it might be apprentice training or—you know, there are a number of different—but you’re physically in different schools doing different things after that. Uh, we—we’re not in a position socially to adopt the kind of things that other countries do uh for a whole variety of reasons. I—I work extensively now with a group of scholars from the Freudenthal Institute at University of Utrecht uh and they’re working with us in the design of the number of instructional materials and tests and other things based on some of their ideas. They find the American system to be very strange in lots of ways. The local school boards, the decisions about local things, the emphasis on athletics and—I mean, a whole lot of things that just aren’t part of their culture. Um, the kind of notion that we have uh teachers who teach all subjects for the first six or seven years, you know, they teach every subject to every kid in their class. You know, the self-contained classroom. Uh, “Why do you do that?” are questions that are being—are raised. Well, these are historical traditions and traditions are awfully hard to change. Even the—the tradition of algebra, geometry, advanced algebra is a sequence which European countries all think is absurd. “Why would you do it that way?” Um, you know, that—that isn’t going to change. We may include a little more geometry in an algebra course and a little more algebra in a geometry course but it—we’re still going to la-label them very likely, algebra, geometry, advanced algebra because that’s what schools think universities want on the transcripts when they go in for freshman admissions.

Um, well as chairman of the group that produced the curriculum and evaluation standards for the National Council of Teachers of Mathematics um I obviously think that’s an important contribution to current American education in the reform efforts. Uh, what we were trying to do was to paint a vision for teachers um about where we ought to be heading. Not a recipe to follow, but a vision, a sense of here are some things that people need to start looking at. Um, it was a consequence—it grew directly out of the consequence of the arguments associated with the difficulties in—in the problems associated with the teaching and learning in math and science in a nation at risk in “Educating Americans for the 21st Century.” (coughs) It was um an attempt to say, “Hey, this is a general picture of—of what we ought to be thinking about. Uh, we ought to be thinking about mathematics in a slightly different way because the technology is made it possible for us to do some different things uh because we now know a lot about—more about human learning and because we know a little bit more about what other countries are doing and maybe we can learn a little bit from them as well. And here’s some ideas that ought to be there.” Uh, it’s proven to be—sometimes a little more then we thought and people have taken it as a Bible or as a, you know, a set of recipes to be followed and it was not intended as that. It was intended as a starting point for people to think about the teaching of algebra or the teaching of fractions or whatever and rethink what might happen. Uh, its proven to be uh much more powerful in some respects. Um, all 50 states have made changes in their state curriculum plans and organization and frameworks in light of the standards. Some better then others but it’s clearly made an influence there. It’s made an influence with respect to the National Science Foundation and their funding of several curriculum projects that uh say, you know, “If you’re really thinking that this is the direction we ought to go, then here are some kinds of ideas that teachers might want to use.” Uh, clearly the assessment issues became central and NCTM decided to produce a separate document called “Assessment Standards” that says, “Look, if we’re going to really—going to assess kids both in—for—for different purpose in—classroom assessments so teachers can make judgments about what to do next, how to judge kids as well as external assessments about how to judge kids pro—student progress or classroom progress or accountability, here are things to think about.” But the standards—the term ‘standard’ is an interesting one. Uh, it has several different connotations in our language and one of them is that a standard is like the flag you carry in front of the troops. And that’s—that was what we had in mind. That is a set of criteria to be met, but as a—a vision, as a starting point for the kind of reform we have in mind.

Um, the term assessment literacy is understanding and being able to use ideas about assessment is an important idea of what—what you’re—you’re talking about. Um, to be literate in an area means you not only understand the procedures, but you understand the situations in which those procedures can be used. When we wrote the assessment standards for the National Council of Teachers of Mathematics, we tried to portray that in a variety of ways and say, “Look, you want to gather evidence about what students can do. And so what you want to do is think about what are the sources of evidence that are available for different purposes? You want to judge what—how a lesson works so that you can plan tomorrow’s lesson. You need to observe kids; you need to listen to what they’re doing. You need to be able to organize that information and say, ‘Oh, I guess maybe I ought to shift emphasis tomorrow a little bit or…” You know, a lot of those decisions are made on the run by a teacher, but they ought to be aware that listening to students, seeing what they do, hearing their questions, hearing and understanding their questions, they’re—you know, are a staring point for kind of making instructional decisions. Second, that if you want to be able to judge the progress, you need to know not only what the intent of this lesson or this unit or this chapter is about, but how that fits in a broader picture over some year or two years or fi—five years. This is a part of student growth and development. And that’s a different purpose and different kind of evidence might be there and simply giving them a grade isn’t—isn’t what it’s about. It’s saying, “Where are they to respect to this?” Likewise understanding the purpose and, you know, intent of external assessment. Accountability issues, being able to demonstrate for others. I mean, the—the statement, you know, “Accountability is lithest.” It’s been with us for, you know, almost 200 years, it’s going to be with us. And teachers need to understand that’s important. But it is not a complete—you know, the best instrument is not going to be a complete picture; it’s going to be an indicator. It’s going to be able to say something about as you aggravate across students and across pieces of information, this is a summary picture, not a very—but they ought to understand that that—these are all different kinds of purposes. You know, you use information to make different decisions and as long as you’re—you’re saying the information is valid for making that kind of decision, and that it’s reliable in a sense that if tomorrow I wanted to gather other evidence, it would be very similar to that. Um, that’s what you’re looking for.

Well, the other comment that I guess I would like to make with respect to the standards and reform movement is that there’s a—a tendency in uh—among policymakers and administrators and others to think of the movement as something—“Well, if we can’t get it done in a year or a few months uh then what’s this all about?” When we wrote the NCTM curriculum and evaluation standards, we figured it would be a generation of teachers. It would be 20, 25 years before many of the ideas became—came into practice. But these are ideas that need to be discussed, argued, modified, extended, adapted in a variety of ways to meet local conditions in local situations. It’s not a panacea. You know, it’s not a recipe to be followed; it’s a set of ideas to be thought about.