Tugas Language Assessment: Assignment 5

LANGUAGE TESTS

The previous chapters introduced a number of building blocks for designing language tests. You now have a sense of where tests belong in the larger domain of assessment. You have sorted through differences between formal and informal tests. formative and summative tests, and norm- and criterion-referenced tests. You have traced some of the historical lines of thought in the field of language assessment. You have a sense of major current trends in language assessment, especially the present focus on communicative and process-oriented testing that seeks to transform tests from anguishing ordeals into challenging and intrinsically motivating learning experiences. By now, certain foundational principles have entered your vocabulary: practicality, reliability, validity, authenticity, and washback. And you should now possess a few tools with which you can evaluate the effectiveness of a classroom test.

TEST TYPES

The first task you will face in designing a test for your students is to determine the purpose for the test. Defining your purpose will help you choose the right kind of test, and it will also help you to focus on the specific objectives of the test. We will look first at two test types that you will probably not have many opportunities to create as a classroom teacher-language aptitude tests and language proficiency tests-and three types that you will almost certainly need to create-placement tests, diagnostic tests, and achievement tests.

LANGUAGE APTITUDE TESTS

One type of test-although admittedly not a very common one-predicts a person's success prior to exposure to the second language. A language aptitude test is designed to measure capacity or general ability to learn a foreign language and ulti- mate success in that undertaking. Language aptitude tests are ostensibly designed to apply to the classroom learning of any language. Two standardized aptitude tests have been used in the United States: the Modern Language Aptitude Test (MLAT) (Carroll & Sapon, 1958) and the Pimsleur Language Aptitude Battery (PLAB) (Pimsleur, 1966). Both are English language tests and require students to perform a number of language-related tasks. The MLAT, for example, consists of five different tasks.

Tasks in the Modern Language Aptitude Test

1. Number learning: Examinees must learn a set of numbers through aural input and then discriminate different combinations of those numbers.

2. Phonetic script: Examinees must learn a set of correspondences between speech sounds and phonetic symbols.

3. Spelling clues: Examinees must read words that are spelled somewhat phonetically, and then select from a list the one word whose meaning is closest to the disguised" word.

4. Words in sentences: Examinees are given a key word in a sentence and are then asked to select a word in a second sentence that performs the same grammatical function as the key word.

5. Paired associates: Examinees must quickly leam a set of vocabulary words from another language and memorize their English meanings.

A typical example of a standardized proficiency test is the Test of English as a Foreign Language (TOEFL.) produced by the Educational Testing Service. The TOEFL is used by more than a thousand institutions of higher education in the United States as an indicator of a prospective student's ability to undertake academic work in an English-speaking milieu. The TOEFL consists of sections on listening comprehension, structure (or grammatical accuracy).reading comprehension, and written expression. The new computer-scored TOEFL announced for 2005 will also include an oral production component. With the exception of its writing section, the TOEFL (as well as many other large-scale proficiency tests) is machine-scorable for rapid turnaround and cost effectiveness (that is, for reasons of practicality).

Research is in progress(Bernstein et al., 2000) to determine, through the technology of speech recognition, if oral production performance can be adequately machine-scored. (Chapter 4 provides a comprehensive look at the TOEFL and other standardized tests). A key issue in testing proficiency is how the constructs of language ability are specified. The tasks that test-takers are required to perform must be legitimate samples of English language use in a defined context. Creating these tasks and validating them with research is a time-consuming and costly process. Language teachers would be wise not to create an overall proficiency test on their own. A far more practical method is to choose one of a number of commercially available proficiency tests.

PROFICIENCY TESTS

If your aim is to test global competence in a language, then you are, in conventional terminology, testing proficiency A proficiency test is not limited to any one course, curriculum, or single skill in the language; rather, it tests overall ability. Proficiency tests have traditionally consisted of standardized multiple-choice items on grammar, vocabulary, reading comprehension, and aural comprehension. Sometimes a sample of writing is added, and more recent tests also include oral production performance. As noted in the previous chapter, such tests often have content validity weaknesses, but several decades of construct validation research have brought us much closer to constructing successful communicative proficiency tests.

PLACEMENT TESTS

Certain proficiency tests can act in the role of placement tests, the purpose of which is to place a student into a particular level or section of a language curriculum or school. A placement test usually, but not always, includes a sampling of the material to be covered in the various courses in a curriculum; a student's performance on the test should indicate the point at which the student will find material neither too easy nor too difficult but appropriately challenging. The English as a Second Language Placement Test (ESLPT) at San Francisco State University has three parts. In Part I. students read a short article and then write a summary essay. In Part II, students write a composition in response to an article, Part 111 is multiple-choice: students read an essay and identify grammar errors in it. The maximum time allowed for the test is three hours. Justification for this three part structure rests largely on the test's content validation. Most of the ESL courses at San Francisco State involve a combination of reading and writing, with a heavy emphasis on writing. The first part of the test acts as both a test of reading com- prehension and a test of writing (a summary). The second part requires students to state opinions and to back them up, a task that forms a major component of the writing courses.

DIAGNOSTIC TESTS

A diagnostic test is designed to diagnose specified aspects of a language. A test in pronunciation, for example, might diagnose the phonological features of English that are difficult for learners and should therefore become part of a curriculum. Usually, such tests offer a checklist of features for the administrator (often the teacher) to use in pinpointing difficulties. A writing diagnostic would elicit a writing sample from students that would allow the teacher to identify those rhetorical and linguistic features on which the course needed to focus special attention. Diagnostic and placement tests, as we have already implied, may sometimes be indistinguishable from each other.

ACHIEVEMENT TESTS

An achievement test is related directly to classroom lessons, units, or even a total curriculum. Achievement tests are or should be limited to particular material addressed in a curriculum within a particular time frame and are offered after a course has focused on the objectives in question. Achievement tests can also serve the diagnostic role of indicating what a student needs to continue to work on in the future, but the primary role of an achievement test is to determine whether course objectives have been met-and appropriate knowledge and skills acquired-by the end of a period of instruction. Achievement tests are often summative because they are administered at the end of a unit or term of study. They also play an important formative role. An effective achievement test will offer wash back about the quality of a learner's performance in subsets of the unit or course.

SOME PRACTICAL STEPS TO TEST CONSTRUCTION

The descriptions of types of tests in the preceding section are intended to help you understand how to answer the first question posed in this chapter: What is the purpose of the test? It is unlikely that you would be asked to design an aptitude test or a proficiency test, but for the purposes of interpreting those tests, it is important that you understand their nature. However, your opportunities to design placement, diagnostic, and achievement tests-especially the latter-will be plentiful, In the remainder of this chapter, we will explore the four remaining questions posed at the outset, and the focus will be on quipping you with the tools you need to create such classroom-oriented tests.

ASSESSING CLEAR, UNAMBIGUOUS OBJECTIVES

In addition to knowing the purpose of the test you're creating, you need to know as specifically as possible what it is you want to test. Sometimes teachers give tests simply because it's Friday of the third week of the course, and after hasty glances at the chapter(s) covered during those three weeks, they dash off some test items so that students will have something to do during the class. This is no way to approach a test. Instead, begin by taking a careful look at everything that you think your students should "know" or be able to "do," based on the material that the students are responsible for. In other words, examine the objectives for the unit you are testing.

Selected objectives for a unit in a low-intermediate integrated-skills course Form-focused objectives (listening and speaking)

1. Students will recognize and produce tag questions, with the correct grammatical form and final intonation pattern, in simple social conversations.

2. recognize and produce what information questions with correct final intonation pattern.

3. Communication skills (speaking)Students will state completed actions and events in a social conversation.

4. Ask for confirmation in a social conversation.

5. Give opinions about an event in a social conversation

6. produce language with contextually appropriate intonation, stress, and rhythm.

7. Reading skils (simple essay or story) Students will recognize iregular past tense of selected verbs in a story or essay.

8. Writing skills (simple essay or story) Students will write a one-paragraph story about a simple event in the past.

9. Use conjunctions so and because in a statement of opinion.

DRAWING UP TEST SPECIFICATIONS

Test specifications for classroom use can be a simple and practical outline of your test. (For arge-scale standardized tests [see Chapter 4] that are intended to be widely distributed and therefore are broadly generalized, test specifications are more formal and detailed.) In the unit discussed above, your specifications mu will simply comprise (a) a broad outline of the test, (b) what skills you will test, and ()what the items will look like. Let's look at the first two in relation to the midterm unit assessment already referred to above (a) Outline of the test and (b) skils to be included. Because of the constraints of your curriculum, your unit test must take no more than 30 minutes.

DEVISING TEST TASKS

Your oral interview comes first, and so you draft questions to conform to the accepted pattern of oral interviews (see Chapter 7 for information on constructing oral interviews). You begin and end with nonscored items (warm-up and wind-down) designed to set students at ease, and then sandwich between them items intended to test the objective (level check) and a little beyond (probe). Oral interview format

A. Warm-up: questions and comments

B. Level-check questions (objectives 3, 5, and6)

1. Tell me about what you did last weekend.

2. Tell me about an interesting trip you took in the last year.

3. How did you like the TV show we saw this week?

C. Probe (objectives 5, 6)

1. What is your opinion about

? (news event)

2. How do you feel about.

-? (another news event)

D. Wind-down: comments and reassurance

You are now ready to draft other test items. To provide a sense of authenticity and interest, you have decided to conform your items to the context of a recent TV sitcom that you used in class to illustrate certain discourse and form-focused factors.

DESIGNING MULTIPLE-CHOICE

Test Items In the sample achievement test above, two of the five components (both of the listening sections) specified a multiple-choice format for items. This was a bold step to take. Multiple-choice items, which may appear to be the simplest kind of item to construct, are extremely difficult to design correctly. Hughes (2003, pp. 76-78) cautions against a number of weaknesses of multiple-choice items: The technique tests only recognition knowledge. Guessing may have a considerable eftect on test scores. The technique severely restricts what can be tested. It is very difficult to write successful items. Washback may be harmful.

SCORING, GRADING, AND GIVING FEEDBACKK

Scoring as you design a classroom test, you must consider how the test will be scored and graded. Your scoring plan reflects the relative weight that you place on each section and items in each section. The integrated-skills class that we have been using as an example focuses on listening and speaking skills with some attention to reading and writing. Three of your nine objectives target reading and writing skills. How do you assign scoring to the various components of this test? Because oral production is a driving force in your overall objectives, you decide to place more weight on the speaking (oral interview) section than on the other three sections. Five minutes is actually a long time to spend in a one-on-one situation with a student, and some significant information can be extracted from such a session. You therefore designate 0 percent to the grade to the oral interview. 1ou consider the listening and reading sections to be equally important, but each of them, especially in this multiple-choice format, is of less consequence than the oral interview. So you give each of them a 20 percent weight.

That leaves 20 percent for the writing section, which seems about right to you given the time and focus on writing in this unit of the course. Your next task is to assign scoring for each item. This may take a little numerical common sense, but it doesn't require a degree in math. To make matters simple, you decide to have a 100-point test in which the listening and reading items are each worth 2 points. The oral interview will yield four scores ranging from 5 to 1, reflecting fluency. Prosodic features, accuracy ot the target grammatical objectives, and discourse approprarcness. 1oeignt test scores appropriately, you WI double each individual score and then add them together for a possible total score of 40. (Chapters 4 and 7 will deal more extensively with scoring and assessing oral production performance.)the writing sample has two scores: one for grammar/mechanics (including the correct use of so and because) and one for overall effectiveness of the message, each ranging from 5 to 1. Again, to achieve the correct weight for writing. You will double each score and add them, so the possible total is 20 points.

GRADING

Your first thought might be that assigning grades to student performance on thistest would be easy: just give an "A" for 90-100 percent, a "B° for 80-89 percent, and so on. Not so fast! Grading is such a thorny issue that all of Chapter 11 is devoted to the topic. How you aSsign letter grades to this test is a product ofthe country, culture, and context of this English classroom, institutional expectations (most of them unwritten), explicit and implicit definitions of grades that you have set forth, the relationship you have established with this class, and student expectations that have been engendered in previous tests andquizzes in this class.For the time being, then, we will set aside issues that deal with grading this test in particular, in favor of the comprehensive treatment of gracing.

GIVING FEEDBACK

A section on scoring and grading would not be complete without some consideration of the forms in which you will offer feedback to your students, feedback that you want to become beneficial washback. In the example test that we have been refer ring to here-which is not unusual in the universe of possible formats for periodic

EXERCISES

Note: (D ndividual work; (G) Group or pair work; (C) Whole-class discussion.]1. V)Consult the MLAT website address on page 44 and obtain as muchinformation as you can about the MLAT. Aptitude tests propose to predict one's performance ina language course. Review the rationale supporting such testing, and then summarize the controversy surrounding aptitude tests. What can you say about the validity and the ethics of aptitude testing2. (G) In pairs, each assigned to one type of test (aptitude, proficiency, place-ment, diagnostic, or achievement), create a list of broad specifications for the test type you have been assigned: What are the test criteria? What kinds of items should be used? How would you sample among a number ot possible objectives?3. (G) Look again at the discussion of objectives (page 49). In a small group,discuss the following Scenario: In the case that a teacher is faced with more objectives than are possible to sample in a test, draw up a set of guidelines for choosing which objectives to include on the test and which ones to exclude You might start with considering the issue of the relative importance of allNC ODjccuves in the context of the course in question. How does one ade sum pic objcctuves?4. (V)Figure 3.1 depicts various modes of elicitation and response. Are thereother modes of elicitation that could be included in such a chart? Justify your additions with an exampie o each.5. (G) Select a language class in your immediate environment for the followingproject: In small groups, design an achievement test for a segment of the course preferably a unit for which there is no current test or for which the present test is inadequate). Follow the guidelines in this chapter for facility P and item discrimination (1D) index for selected items. If there are structure of those items in a distractor analysis to determine if they have6. (G) Find an existing, recently uscd standardized multiple-choice test forwhich there is accessible data on student performance. Calculate the item no data for an existing test, select some items on the test and analyze the (a) any bad distractors, (6) any bad stems, or () more than one potentiallycorrect answer7. W) On page 63, nine diferent options are listed for giving feedback to students on assessments. Review the practicality of each and determine the extent to which practicality (principally, more time expended) is justifiably sacrificed in order to offer better washback to learners.FOR YOUR FURTHER READINGCarroll, John B. (1990). Cogniive abilities in foreign language aptitude: Then andnow. In Thomas s. Parry &Charles W. Stansfield (Eds.), Language aptitude recon sidered. Englewood Cliffs, NJ: Prentice Hall RegentsCarroll, the original developer of the MLAT, updates arguments for and against some of the original cognitive hypotheses underlying the MLAT. In the same volume, note articles by Oxford and by Ehrman contending that styles, strategies, and personality may be further factors in the construct of language aptitude.Brown.James Dean.

Source:

Brown, H. Douglas. 2003. Language Assessment Principles and Classroom Practices. San Francisco, California

Tugas Language Assessment

Jumat, 27 Maret 2020

Assignment 5

Tidak ada komentar:

Posting Komentar