LANGUAGE TESTS
The
previous chapters introduced a number of building blocks for designing language
tests. You now have a sense of where tests belong in the larger domain of assessment.
You have sorted through differences between formal and informal tests. formative
and summative tests, and norm- and criterion-referenced tests. You have traced
some of the historical lines of thought in the field of language assessment.
You have a sense of major current trends in language assessment, especially the
present focus on communicative and process-oriented testing that seeks to
transform tests from anguishing ordeals into challenging and intrinsically
motivating learning experiences. By now, certain foundational principles have
entered your vocabulary: practicality, reliability, validity, authenticity, and
washback. And you should now possess a few tools with which you can evaluate
the effectiveness of a classroom test.
TEST TYPES
The
first task you will face in designing a test for your students is to determine
the purpose for the test. Defining your purpose will help you choose the right
kind of test, and it will also help you to focus on the specific objectives of
the test. We will look first at two test types that you will probably not have
many opportunities to create as a classroom teacher-language aptitude tests and
language proficiency tests-and three types that you will almost certainly need
to create-placement tests, diagnostic tests, and achievement tests.
LANGUAGE APTITUDE TESTS
One
type of test-although admittedly not a very common one-predicts a person's success
prior to exposure to the second language. A language aptitude test is designed
to measure capacity or general ability to learn a foreign language and ulti- mate
success in that undertaking. Language aptitude tests are ostensibly designed to
apply to the classroom learning of any language. Two standardized aptitude
tests have been used in the United States: the Modern Language Aptitude Test
(MLAT) (Carroll & Sapon, 1958) and the Pimsleur Language Aptitude Battery
(PLAB) (Pimsleur, 1966). Both are English language tests and require students
to perform a number of language-related tasks. The MLAT, for example, consists
of five different tasks.
Tasks in the Modern
Language Aptitude Test
1.
Number learning: Examinees must learn a
set of numbers through aural input and then discriminate different combinations
of those numbers.
2.
Phonetic script: Examinees must learn a
set of correspondences between speech sounds and phonetic symbols.
3.
Spelling clues: Examinees must read
words that are spelled somewhat phonetically, and then select from a list the
one word whose meaning is closest to the disguised" word.
4.
Words in sentences: Examinees are given
a key word in a sentence and are then asked to select a word in a second
sentence that performs the same grammatical function as the key word.
5.
Paired associates: Examinees must
quickly leam a set of vocabulary words from another language and memorize their
English meanings.
A
typical example of a standardized proficiency test is the Test of English as a Foreign
Language (TOEFL.) produced by the Educational Testing Service. The TOEFL is
used by more than a thousand institutions of higher education in the United
States as an indicator of a prospective student's ability to undertake academic
work in an English-speaking milieu. The TOEFL consists of sections on listening
comprehension, structure (or grammatical accuracy).reading comprehension, and
written expression. The new computer-scored TOEFL announced for 2005 will also
include an oral production component. With the exception of its writing
section, the TOEFL (as well as many other large-scale proficiency tests) is
machine-scorable for rapid turnaround and cost effectiveness (that is, for
reasons of practicality).
Research
is in progress(Bernstein et al., 2000) to determine, through the technology of
speech recognition, if oral production performance can be adequately machine-scored.
(Chapter 4 provides a comprehensive look at the TOEFL and other standardized
tests). A key issue in testing proficiency is how the constructs of language
ability are specified. The tasks that test-takers are required to perform must
be legitimate samples of English language use in a defined context. Creating
these tasks and validating them with research is a time-consuming and costly
process. Language teachers would be wise not to create an overall proficiency
test on their own. A far more practical method is to choose one of a number of
commercially available proficiency tests.
PROFICIENCY TESTS
If
your aim is to test global competence in a language, then you are, in
conventional terminology, testing proficiency A proficiency test is not limited
to any one course, curriculum, or single skill in the language; rather, it tests
overall ability. Proficiency tests have traditionally consisted of standardized
multiple-choice items on grammar, vocabulary, reading comprehension, and aural
comprehension. Sometimes a sample of writing is added, and more recent tests
also include oral production performance. As noted in the previous chapter,
such tests often have content validity weaknesses, but several decades of
construct validation research have brought us much closer to constructing
successful communicative proficiency tests.
PLACEMENT TESTS
Certain
proficiency tests can act in the role of placement tests, the purpose of which
is to place a student into a particular level or section of a language curriculum
or school. A placement test usually, but not always, includes a sampling of the
material to be covered in the various courses in a curriculum; a student's performance
on the test should indicate the point at which the student will find material neither
too easy nor too difficult but appropriately challenging. The English as a
Second Language Placement Test (ESLPT) at San Francisco State University has
three parts. In Part I. students read a short article and then write a summary
essay. In Part II, students write a composition in response to an article, Part
111 is multiple-choice: students read an essay and identify grammar errors in
it. The maximum time allowed for the test is three hours. Justification for
this three part structure rests largely on the test's content validation. Most
of the ESL courses at San Francisco State involve a combination of reading and
writing, with a heavy emphasis on writing. The first part of the test acts as
both a test of reading com- prehension and a test of writing (a summary). The second
part requires students to state opinions and to back them up, a task that forms
a major component of the writing courses.
DIAGNOSTIC TESTS
A
diagnostic test is designed to diagnose specified aspects of a language. A test
in pronunciation, for example, might diagnose the phonological features of
English that are difficult for learners and should therefore become part of a
curriculum. Usually, such tests offer a checklist of features for the
administrator (often the teacher) to use in pinpointing difficulties. A writing
diagnostic would elicit a writing sample from students that would allow the
teacher to identify those rhetorical and linguistic features on which the
course needed to focus special attention. Diagnostic and placement tests, as we
have already implied, may sometimes be indistinguishable from each other.
ACHIEVEMENT TESTS
An
achievement test is related directly to classroom lessons, units, or even a
total curriculum. Achievement tests are or should be limited to particular
material addressed in a curriculum within a particular time frame and are
offered after a course has focused on the objectives in question. Achievement
tests can also serve the diagnostic role of indicating what a student needs to
continue to work on in the future, but the primary role of an achievement test
is to determine whether course objectives have been met-and appropriate knowledge
and skills acquired-by the end of a period of instruction. Achievement tests
are often summative because they are administered at the end of a unit or term
of study. They also play an important formative role. An effective achievement
test will offer wash back about the quality of a learner's performance in
subsets of the unit or course.
SOME PRACTICAL STEPS TO
TEST CONSTRUCTION
The
descriptions of types of tests in the preceding section are intended to help
you understand how to answer the first question posed in this chapter: What is
the purpose of the test? It is unlikely that you would be asked to design an
aptitude test or a proficiency test, but for the purposes of interpreting those
tests, it is important that you understand their nature. However, your opportunities
to design placement, diagnostic, and achievement tests-especially the latter-will
be plentiful, In the remainder of this chapter, we will explore the four remaining
questions posed at the outset, and the focus will be on quipping you with the
tools you need to create such classroom-oriented tests.
ASSESSING CLEAR,
UNAMBIGUOUS OBJECTIVES
In
addition to knowing the purpose of the test you're creating, you need to know
as specifically as possible what it is you want to test. Sometimes teachers
give tests simply because it's Friday of the third week of the course, and
after hasty glances at the chapter(s) covered during those three weeks, they
dash off some test items so that students will have something to do during the
class. This is no way to approach a test. Instead, begin by taking a careful
look at everything that you think your students should "know" or be
able to "do," based on the material that the students are responsible
for. In other words, examine the objectives for the unit you are testing.
Selected
objectives for a unit in a low-intermediate integrated-skills course Form-focused
objectives (listening and speaking)
1. Students
will recognize and produce tag questions, with the
correct grammatical form and final intonation pattern, in simple social
conversations.
2. recognize
and produce what information questions with correct final intonation pattern.
3. Communication
skills (speaking)Students will state completed actions and events in a social
conversation.
4. Ask
for confirmation in a social conversation.
5. Give
opinions about an event in a social conversation
6. produce
language with contextually appropriate intonation, stress, and rhythm.
7. Reading
skils (simple essay or story) Students will recognize iregular past tense of
selected verbs in a story or essay.
8. Writing
skills (simple essay or story) Students will write a one-paragraph story about
a simple event in the past.
9. Use
conjunctions so and because in a statement of opinion.
DRAWING UP TEST
SPECIFICATIONS
Test
specifications for classroom use can be a simple and practical outline of your test.
(For arge-scale standardized tests [see Chapter 4] that are intended to be widely
distributed and therefore are broadly generalized, test specifications are more
formal and detailed.) In the unit discussed above, your specifications mu will
simply comprise (a) a broad outline of the test, (b) what skills you will test,
and ()what the items will look like. Let's look at the first two in relation to
the midterm unit assessment already referred to above (a) Outline of the test
and (b) skils to be included. Because of the constraints of your curriculum,
your unit test must take no more than 30 minutes.
DEVISING TEST TASKS
Your
oral interview comes first, and so you draft questions to conform to the accepted
pattern of oral interviews (see Chapter 7 for information on constructing oral
interviews). You begin and end with nonscored items (warm-up and wind-down)
designed to set students at ease, and then sandwich between them items intended
to test the objective (level check) and a little beyond (probe). Oral interview
format
A. Warm-up: questions
and comments
B. Level-check
questions (objectives 3, 5, and6)
1. Tell me about what
you did last weekend.
2. Tell me about an
interesting trip you took in the last year.
3. How did you like the
TV show we saw this week?
C. Probe (objectives 5,
6)
1. What is your opinion
about
? (news event)
2. How do you feel
about.
-? (another news event)
D. Wind-down: comments
and reassurance
You are now ready to
draft other test items. To provide a sense of authenticity and interest, you
have decided to conform your items to the context of a recent TV sitcom that
you used in class to illustrate certain discourse and form-focused factors.
DESIGNING
MULTIPLE-CHOICE
Test
Items In the sample achievement test above, two of the five components (both of
the listening sections) specified a multiple-choice format for items. This was
a bold step to take. Multiple-choice items, which may appear to be the simplest
kind of item to construct, are extremely difficult to design correctly. Hughes
(2003, pp. 76-78) cautions against a number of weaknesses of multiple-choice
items: The technique tests only recognition knowledge. Guessing may have a considerable
eftect on test scores. The technique severely restricts what can be tested. It
is very difficult to write successful items. Washback may be harmful.
SCORING, GRADING, AND
GIVING FEEDBACKK
Scoring
as you design a classroom test, you must consider how the test will be scored
and graded. Your scoring plan reflects the relative weight that you place on
each section and items in each section. The integrated-skills class that we
have been using as an example focuses on listening and speaking skills with
some attention to reading and writing. Three of your nine objectives target
reading and writing skills. How do you assign scoring to the various components
of this test? Because oral production is a driving force in your overall
objectives, you decide to place more weight on the speaking (oral interview)
section than on the other three sections. Five minutes is actually a long time to
spend in a one-on-one situation with a student, and some significant information
can be extracted from such a session. You therefore designate 0 percent to the grade
to the oral interview. 1ou consider the listening and reading sections to be
equally important, but each of them, especially in this multiple-choice format,
is of less consequence than the oral interview. So you give each of them a 20
percent weight.
That
leaves 20 percent for the writing section, which seems about right to you given
the time and focus on writing in this unit of the course. Your next task is to
assign scoring for each item. This may take a little numerical common sense,
but it doesn't require a degree in math. To make matters simple, you decide to
have a 100-point test in which the listening and reading items are each worth 2
points. The oral interview will yield four scores ranging from 5 to 1,
reflecting fluency. Prosodic features, accuracy ot the target grammatical
objectives, and discourse approprarcness. 1oeignt test scores appropriately,
you WI double each individual score and then add them together for a possible
total score of 40. (Chapters 4 and 7 will deal more extensively with scoring
and assessing oral production performance.)the writing sample has two scores:
one for grammar/mechanics (including the correct use of so and because) and one
for overall effectiveness of the message, each ranging from 5 to 1. Again, to
achieve the correct weight for writing. You will double each score and add them,
so the possible total is 20 points.
GRADING
Your
first thought might be that assigning grades to student performance on thistest
would be easy: just give an "A" for 90-100 percent, a "B° for 80-89
percent, and so on. Not so fast! Grading is such a thorny issue that all of
Chapter 11 is devoted to the topic. How you aSsign letter grades to this test
is a product ofthe country, culture, and context of this English classroom, institutional
expectations (most of them unwritten), explicit and implicit definitions of grades
that you have set forth, the relationship you have established with this class,
and student expectations that have been engendered in previous tests andquizzes
in this class.For the time being, then, we will set aside issues that deal with
grading this test in particular, in favor of the comprehensive treatment of
gracing.
GIVING FEEDBACK
A section on scoring
and grading would not be complete without some consideration of the forms in
which you will offer feedback to your students, feedback that you want to
become beneficial washback. In the example test that we have been refer ring to
here-which is not unusual in the universe of possible formats for periodic
EXERCISES
Note:
(D ndividual work; (G) Group or pair work; (C) Whole-class discussion.]1.
V)Consult the MLAT website address on page 44 and obtain as muchinformation as
you can about the MLAT. Aptitude tests propose to predict one's performance ina
language course. Review the rationale supporting such testing, and then
summarize the controversy surrounding aptitude tests. What can you say about
the validity and the ethics of aptitude testing2. (G) In pairs, each assigned
to one type of test (aptitude, proficiency, place-ment, diagnostic, or
achievement), create a list of broad specifications for the test type you have
been assigned: What are the test criteria? What kinds of items should be used?
How would you sample among a number ot possible objectives?3. (G) Look again at
the discussion of objectives (page 49). In a small group,discuss the following
Scenario: In the case that a teacher is faced with more objectives than are
possible to sample in a test, draw up a set of guidelines for choosing which
objectives to include on the test and which ones to exclude You might start
with considering the issue of the relative importance of allNC ODjccuves in the
context of the course in question. How does one ade sum pic objcctuves?4.
(V)Figure 3.1 depicts various modes of elicitation and response. Are thereother
modes of elicitation that could be included in such a chart? Justify your additions
with an exampie o each.5. (G) Select a language class in your immediate
environment for the followingproject: In small groups, design an achievement
test for a segment of the course preferably a unit for which there is no
current test or for which the present test is inadequate). Follow the
guidelines in this chapter for facility P and item discrimination (1D) index
for selected items. If there are structure of those items in a distractor
analysis to determine if they have6. (G) Find an existing, recently uscd
standardized multiple-choice test forwhich there is accessible data on student performance.
Calculate the item no data for an existing test, select some items on the test
and analyze the (a) any bad distractors, (6) any bad stems, or () more than one
potentiallycorrect answer7. W) On page 63, nine diferent options are listed for
giving feedback to students on assessments. Review the practicality of each and
determine the extent to which practicality (principally, more time expended) is
justifiably sacrificed in order to offer better washback to learners.FOR YOUR
FURTHER READINGCarroll, John B. (1990). Cogniive abilities in foreign language
aptitude: Then andnow. In Thomas s. Parry &Charles W. Stansfield (Eds.),
Language aptitude recon sidered. Englewood Cliffs, NJ: Prentice Hall
RegentsCarroll, the original developer of the MLAT, updates arguments for and against
some of the original cognitive hypotheses underlying the MLAT. In the same
volume, note articles by Oxford and by Ehrman contending that styles,
strategies, and personality may be further factors in the construct of language
aptitude.Brown.James Dean.
Source:
Brown, H. Douglas.
2003. Language Assessment Principles and
Classroom Practices. San Francisco, California