SUMMARY
Assessing Listening
Observing
The Performance Of The Four Skills
One important principle for assessing a
learner's competence is to conside
the
fallibility of the results of a single performance, such as that produced in a
test. As with any attempt at
measurement, it is your obligation as a teacher to triangulate your
measurements: consider at least two (or more) performances and/or contexts
before drawing a conclusion. That could take the form of one or more of the following designs:
·
several tests that are
combined to form an assessment
·
a single test with
multiple test tasks to account for learning styles and per
·
formance variables
·
in-class and
extra-class graded work
·
alternative forms of
assessment (eg, journal, portfolio, conference, observation, self-assessment,
peer-assessment).
Multiple
measures will always give you a more reliable and valid assessment than a single measure. A second principle is
one that we teachers often forget. We must rely as much aspossible on
observable performance in our assessments of students. Observable means being able to see or
hear the performance of the learner (the senses of touch, taste, and smell don't apply very
often to language testing!). What, then, is observable among the four skills of
listening, speaking, reading, and writing
THE
IMPORTANCE OF LISTENING
Listening has often played second fiddle
to its counterpart, speaking. In the standardized testing industry, a number of
separate oral production tests are available, but it is rare to find just a
listening test. One
reason for this emphasis is that listening is often implied as a component of speaking. How could you
speak a language without also listening? In addition, the overtly observable
nature of speaking renders it more empirically measurable then listening. But perhaps
a deeper cause lies in universal biases toward speaking. A good speaker is often
(unwisely) valued more highly than a good listener. To determine if someone is
a proficient user of a language, people customarily ask, "Do you speak Spanish?"
People rarely ask, "Do you understand and speak Spanish?"
Every teacher of language knows that
one's oral production ability-other than monologues, speeches, reading aloud, and
the like-is only as good as one's listening comprehension ability.
But of even further impact is the likelihood that input in the aural-oral mode
accounts for a large proportion of successful language acquisition. In a typical day, we do
measurably more listening than speaking (with the exception of one or two of your
friends who may be nonstop chatterboxes!).
Basic
Types Of Listening
As
with all effective tests, designing appropriate assessment tasks in listening
begins with
the specification of objectives, or criteria. Those objectives may be
classified in terms
of several types of listening performance. Think about what you do when you listen. Literally in
nanoseconds, the following processes flash through your brain:
1. You
recognize speech sounds and hold a temporary "imprint" of them in short-term memory.
2. You
simultaneously determine the type of speech event (monologue, interpersonal
dialogue, transactional dialogue) that is being processed and attend to its context (who the
speaker is, location, purpose) and the content of the message.
3. You
use (bottom-up) linguistic decoding skills and/or (top-down) background schemata to bring a
plausible interpretation to the message, and assign a literal and intended
meaning to the utterance.
4. In
most cases (except for repetition tasks, which involve short-term memory only), you delete the exact
linguistic form in which the message was originally received in favor of
conceptually retaining important or relevant information in long-term memory.
Each of these stages represents a
potential assessment objective:
•
comprehending of
surface structure elements such as phonemes, words, in nation, or a
grammatical category
•
understanding of
pragmatic context
•
determining meaning of
auditory input
•
developing the gist, a
global or comprehensive understanding
From
these stages we can derive four commonly identified types of listening performance,
each of which comprises a category within which to consider assessme tasks and procedures.
1. Intensive.
Listening for perception of the components (phonemes, words, intonation, discourse
markers, etc.) of a larger stretch of language
2. Responsive.
Listening to a relatively short stretch of language (a greeting, question, command,
comprehension check, etc.) in order to make an equally short response.
3. Selective.
Processing stretches of discourse such as short monologues for several minutes
in order to "scan" for certain information. The purpose of such performance is not
necessarily to look for global or general meanings, but to be able to comprehend
designated information in a context of longer stretches of spoken
language (such as classroom directions from a teacher, T or radio news items, or
stories). Assessment tasks in selective listening couldask students, for
example, to listen for names, numbers, a grammatical category directions in a
map exercise), or certain facts and events,
4. Extensive
Listening to develop a top-down, global understanding of spoken language. Extensive
performance ranges from listening to lengthy lectures to listening to a
conversation and deriving a comprehensive message or purpose. Listening for the
gist, for the main idea, and making inferences are all part of extensive
listening.
Micro-
And Macroskills Of Listening
The
micro and macroskills provide 17 different objectives to assess in listening micro and macroskills
of listening (adapted from Richards, 1983)
Microskills:
1. Discriminate
among the distinctive sounds of English.
2. Retain
chunks of language of different lengths in short-term memory.
3. Recognize
English stress patterns, words in stressed and unstressed positions, rhythmic
structure, intonation contours, and their role in signaling information.
4. Recognize
reduced forms of words.
5. Distinguish
word boundaries, recognize a core of words, and interpret word order patterns and
their significance.
6. Process
speech at different rates of delivery.
7. Process
speech containing pauses, errors, corrections, and other performance variables.
8. Recognize
grammatical word classes (nouns, verbs, etc.), systems (e.g.,tense, agreement,
pluralization), patterns, rules, and elliptical forms.
9. Detect
sentence constituents and distinguish between major and minor constituents.
10. Recognize
that a particular meaning may be expressed in different grammatical forms.
11. Recognize
cohesive devices in spoken discourse.
Macroskills:
12. Recognize
the communicative functions of utterances, according to situations,
participants, goals.
13. Infer
situations, participants, goals using real-world knowledge.
14. From
events, ideas, and so on, described, predict outcomes, infer links and connections between
events, deduce causes and effects, and detect such relations as main idea,
supporting idea, new information, given information, generalization, and
exemplification.
15. Distinguish
between literal and implied meanings.
16. Use
facial, kinesic, body language, and other nonverbal clues to decipher meanings.
17. Develop
and use a battery of listening strategies, such as detecting key words, guessing the
meaning of words from context, appealing for help and signaling
comprehension or lack thereof.
Developing a sense of which aspects of listening
performance are predictably difficult will help you to challenge your students
appropriately and to assign weights to items. Consider the following list of what makes listening
difficult (adapted from Richards, 1983: Ur. 1984; Dunkel, 1991
1. Clustering:
attending to appropriate "chunks of language-phrases, clauses, constituents
2. Redundancy:
recognizing the kinds of repetitions, rephrasing, elaborations, and insertions that
unrehearsed spoken language often contains, and benefiting from that
recognition
3. Reduced
forms; understanding the reduced forms that may not have been a part of an English
lcarner's past learning experiences in classes where only formal
"textbook" language has been presented
4. Performance
variables: being able to "weed out hesitations, false starts, pauses, and corrections
in natural speech
5. Colloquial
language: comprehending idioms, slang, reduced forms, shared cultural knowledge
6. Rate
of delivery: keeping up with the speed of delivery, processing automatcally as
the speaker continues
7. Stress,
rhythm, and intonation: correctly understanding prosodic elements spoken language, which
is almost always much more difficult than understanding the smaller
phonological bits and pieces
8. Interaction
managing the interactive flow of language from listening to speaking to listening,
etc.
Designing
Assessment Tasks: Intensive Listening
Once you have determined objectives,
your next step is to design the tast
including
making decisions about how you will elicit performance and how you expect the test-taker
to respond. We will look at tasks that range from intensive tening performance,
such as minimal phonemic pair recognition to extensive prehension of language
in communicative contexts. The focus in this section is the microskills of
intensive listening.
Recognizing
Phonological and Morphological Elements
A typical form of intensive listening at
this level is the assessment of recognition of phonological and
morphological elements of language. A classic test task gives a spoken stimulus and
asks test-lakers to identify the stimulus from two or more choices.
Paraphrase
Recognition
The next step up on the scale of
listening comprehension microskills is words phrases and sentences, which
are frequently assessed by providing a stimulus sentence and asking the test-taker
to choose the correct paraphrase from a number of choices. Designing Assessment
Tasks: Responsive Listening. A
question-and-answer format can provide some interactivity in these lower-end
listening tasks. The test-taker's response is the appropriate answer to a
question. Appropriate
response to a question
Test-takers
hear:
How
much time did you take to do your homework?
Test-takers
read:
(a)
In about an hour.
(b)
About an hour.
(c)
About $10.
(d)
Yes, I did.
The objective of this item is
recognition of the wb-question bow much and its appropriate response.
Distractors are chosen to represent common learner errors: (a) responding to bow
much vs. how much longer; (c) confusing bow much in ref. erence to time vs. the
more frequent reference to money: (d) confusing a wb-question with a yes/no
question. None
of the tasks so far discussed have to be framed in a multiple-choice format. They can be
offered in a more open-ended framework in which test-takers write or speak the
response. The above item would then look like this:
Designing
Assessment Tasks: Selective Listening
A third type of listening performance is
selective listening, in which the test-takerlistens to a limited quantity of
aural input and must discern within it some specific information. A number
of techniques have been used that require selective listening.
Listening
Cloze
Listening cloze tasks (sometimes called
cloze dictations or partial dictations) require the test-taker to listen to a
story, monologue, or conversation and simultaneously read the written text
in which selected words or phrases have been deleted. In its generic form, the test
consists of a passage in which every nth word (typically every seventh word) is
deleted and the test-taker is asked to supply an appropriate word In a listening cloze
task, test-takers see a transcript of the passage that they are listening to
and fill in the blanks with the words or phrases that they hear.
Information
Transfer
Selective listening can also be assessed
through an information transfer technique in which aurally processed information
must be transferred to a visual representation, such as labeling a diagram,
identifying an element in a picture, completing a form, or showing
routes on a map. At
the lower end of the scale of linguistic complexity, simple picturecued items are sometimes
efficient rubrics for assessing certain selected information.
Sentence
Repetition
The task of simply repeating a sentence
or a partial sentence, or sentence repetition,
is also used as an assessment of listening comprehension. As in a dictatic (discussed below), the
test-taker must retain a stretch of language long enough reproduce it, and then
must respond with an oral repetition of that stimulus Incorrect listening
comprehension, whether at the phonemic or discourse less may be manifested in
the correctness of the repetition. A miscue in repetition scored as a miscue in
listening.
Sentence repetition is far from a
flawless listening assessment task. Buck (20% p. 79) noted that such
tasks are not just tests of listening, but tests of general skills. Further, this
task may test only recognition of sounds, and it can easily be taminated by lack of
short-term memory ability, thus invalidating it as an assessme of comprehension alone.
And the teacher may never be able to distinguish a listening comprehension
error from an oral production error. Therefore, sentence etition tasks should be
used with caution.
Designing
Assessment Tasks: Extensive Listening
1. Can
listening performance be distinguished from cognitive processing fact such as memory,
associations, storage, and recall?
2. As
assessment procedures become more communicative, does the task take into account test-takers'
ability to use grammatical expectancies, lexical cocations, semantic
interpretations, and pragmatic competence?
3. Are
test tasks themselves correspondingly content valid and authentic-ths is, do they mirror
real-world language and context?
4. As
assessment tasks become more and more open-ended, they more closerresemble
pedagogical tasks, which leads one to ask what the difference is between assessment and
teaching tasks. The answer is scoring the former imply specified scoring
procedures, while the latter do not.
Dictation
Dictation is a widely researched genre
of assessing listening comprehension. In a dictation, test-takers
hear a passage, typically of 50 to 100 words, recited three times: first, at normal speed;
then, with long pauses between phrases or natural word groups, during which
time test-takers write down what they have just heard; and finally, at normal
spced once more so they can check their work and proofread.
Kinds
of errors:
1. spelling
error only, but the word appears to have been heard correctly spelling and/or obvious
misrepresentation of a word, illegible word grammatical error (For
example, test-taker hears I can't do it, writes I can do it.
2. skipped word or phrase
3. permutation
of words
4. additional
words not in the original
5. replacement
of a word with an appropriate synonym
Here
are some possibilities.
1. Note-taking.
In the academic world, classroom lectures by professors are common features of a
non-native English-user's experience. One form of a midterm examination at the
American Language Institute at San Francisco State Universin (Kahn, 2002) uses a
15-minute lecture as a stimulus.
2. Editing.
Another authentic task provides both a written and a spoken stimulus, and
requires the test-raker to listen for discrepancies. Scoring achieves relatively
high reliability as there are usually a small number of specific differences
that must
be identified. Here is the way the task proceeds. Editing a written
version of an aural stimulus.
3. Interpretive
tasks. One of the intensive listening tasks described above was paraphrasing a story or
conversation. An interpretive task extends the stimulus material to a longer
stretch of discourse and forces the test-taker to infer a response potential stimuli.
Assessing Speaking
Basic
Types Of Speaking
1. Imitative,
at one end of a continuum of types of speaking performance is the ability to simply
parrot back (imitate) a word or phrase or possibly a sentence. While this is a purely
phonetic level of oral production, a number of prosodic, lexical, and
grammatical properties of language may be included in the criterion performance.
We are interested only in what is traditionally labeled
"pronunciation"; no
inferences
are made about the test-taker's ability to understand or convey meaning or to participate in an
interactive conversation. The only role of listening here is in the short-term storage
of a prompt, just long enough to allow the speaker to retain the short stretch of
language that must be imitated.
2. Intensive.
A second type of speaking frequently employed in assessment contexts is the
production of short stretches of oral language designed to demonstrate
competence in a narrow band of grammatical, phrasal, lexical, or phonological
relationships (such as prosodic elements-intonation, stress, rhythm,juncture).
The speaker must be aware of semantic properties in order to be able to respond, but
interaction with an interlocutor or test administrator is minimal at best.
3. Responsive.
Responsive assessment tasks include interaction and test comprehension but at
the somewhat limited level of very short conversations, standard greetings and small
talk, simple requests and comments, and the like. The stimulus is almost always a
spoken prompt (in order to preserve authenticity), with perhaps only one or two
follow-up questions or retorts.
4. Interactive.
The difference between responsive and interactive speaking is in the length and
complexity of the interaction, which sometimes includes multiple exchanges and/or
multiple participants. Interaction can take the two forms of transactional language,
which has the purpose of exchanging specific information or interpersonal
exchanges, which have the purpose of maintaining social relationships
Micro
And Macroskills Of Speaking
Microskills
1. Produce
differences among English phonemes and allophonic variants.
2. Produce
chunks of language of different lengths.
3. Produce
English stress patterns, words in stressed and unstressed positions, rhythmic structure, and
intonation contours.
4. Produce
reduced forms of words and phrases.
5. Use
an adequate number of lexical units (words) to accomplish pragmatic purposes.
6. Produce
fluent speech at different rates of delivery.
7. Monitor
one's own oral production and use various strategic devices pauses, fillers,
selfcorrections, backtracking-to enhance the clarity of the message.
8. Use
grammatical word classes (nouns, verbs, etc.), systems (e.g., tense, agreement,
pluralization), word order, patterns, rules, and elliptical forms.
9. Produce
speech in natural constituents: in appropriate phrases, pause groups, breath groups,
and sentence constituents.
10. Express
a particular meaning in different grammatical forms.
11. Use
cohesive devices in spoken discourse.
Macroskills
12. Appropriately
accomplish communicative functions according to situations,
participants, and goals.
13. Use
appropriate styles, registers, implicature, redundancies, pragmatic conventions,
conversation rules, floor-keeping and -yielding, interrupting, and other
sociolinguistic features in face-to-face conversations.
14. Convey
links and connections between events and communicate such relations as focal and
peripheral ideas, events and feelings, new information and given
information, generalization and exemplification,
15. Convey
facial features, kinesics, body language, and other nonverbalo along with verbal
language.
16. Develop
and use a battery of speaking strategies, such as emphasizing words, rephrasing,
providing a context for interpreting the meaning of words, appealing for
help, and accurately assessing how well your interlocutor is
understanding you.
There is such an array of oral
production tasks that a complete treatment is almost impossible
within the confines of one chapter in this book. Below is a consideration of
the most common techniques with brief allusions to related tasks. As already noted in the
introduction to this chapter, consider three important issues as you set out to
design tasks:
1. No
speaking task is capable of isolating the single skill of oral production. Concurrent involvement
of the additional performance of aural comprehension, and possibly reading. is
usually necessary.
2. Eliciting
the specific criterion you have designated for a task can be tricky because beyond the word
level, spoken language offers a number of productive options to test-takers.
Make sure your elicitation prompt achieves its aims as closely
3. Because
of the above two characteristics of oral production assessment, it is important to carefully
specify scoring procedures for a response so that ultimately you achieve as high a
reliability index as possible.
Designing
Assessment Tasks: Imitative Speaking
You may be surprised to see the
inclusion of simple phonological imitation in a consideration of assessment of
oral production. After all, endless repeating of words phrases, and sentences
was the province of the long-since-discarded Audiolingual Method, and in an era
of communicative language teaching, many believe that non-meaningful imitation
of sounds is fruitless. Such opinions have faded in recent years as we discovered that
an overemphasis on fluency can sometimes lead to the decline of accuracy in
speech. And so we have been paying more attention to pronunciation, especially
suprasegmentals, in an attempt to help learners be more comprehensible.
Test
Of Spoken English (TSE)
Somewhere straddling responsive,
interactive, and extensive speaking tasks another popular commercial oral
production assessment, the Test of Spoken E (TSE), The TSE is a 20 minute
audiotaped test of oral language ability with academic or
professional environment. TSE scores are used by many North American institutions
of higher education to select international teaching assistants.
The scores are also used for selecting
and certifying health professionals such as physicians, nurses,
pharmacists, physical therapists, and veterinarians. The tasks on the TSE
are designed to elicit oral production in various discourse categories rather than
in selected phonological, grammatical, or lexical targets. The following content
specifications for the TSE represent the discourse and pragmatic contexts assessed in
each administration:
1. Describe
something physical.
2. Narrate
from presented material.
3. Summarize
information of the speaker's own choice.
4. Give
directions based on visual materials
5. Give
instructions.
6. Give
an opinion.
7. Support
an opinion.
8. Compare/contrast,
9. Hypothesize
10. Function
"interactively
11. Define
Using
these specifications, Lazaraton and Wagner (1996) examined 15 different specific
tasks in collecting background data from native and non-native speakers of English.
1. giving
a personal description
2. describing
a daily routine
3. suggesting
a gift and supporting one's choice
4. recommending
a place to visit and supporting one's choice
5. giving
directions
6. describing
a favorite movie and supporting one's choice
7. telling
a story from pictures
8. hypothesizing
about future action
9. hypothesizing
about a preventative action
10. making
a telephone call to the dry cleaner
11. describing
an important news event
12. giving
an opinion about animals in the zoo
13. defining
a technical term
14. describing
information in a graph and speculating about its implications
15. giving
details about a trip schedule
The final two categories of oral
production assessment (interactive and extensive speaking) include tasks
that involve relatively long stretches of interactive discourse (interviews, role
plays, discussions, games) and tasks of equally long duration but that involve less
interaction (speeches, telling longer stories, and extended explana- tions and
translations). The obvious difference between the two sets of tasks is the degree of interaction
with an interlocutor. Also, interactive tasks are what some would describe as
interpersonal, while the final caregory includes more transactional speech
events.
Interview
When "oral production
assessment" is mentioned, the first thing that comes to mind is an oral interview: a
test administrator and a test-taker sit down in a direct face-to-face exchange
and proceed through a protocol of questions and directives. The interview, which may be
tape recorded for re-listening, is then scored on one or more parameters such as
accuracy in pronunciation and/or grammar, vocabulary usage, fluency,
sociolinguistic/pragmatic appropriateness, task accomplishment and even comprehension. Interviews can vary in
length from perhaps five to forty-five minutes, depending on their
purpose and context. Placement interviews, designed to get a quick spoken sample
from a student in order to verify placement into a course, may need only five minutes
if the interviewer is trained to evaluate the output accuratels. Longer comprehensive
interviews such as the OPI (see the next section) are designed to cover
predetermined oral production contexts and may require the better part of an hour.
Every effective interview contains a
number of mandatory stages. Two decades ago, Michael Canale (1984) proposed a
framework for oral proficiency testing that has withstood the test
of time. He suggested that test-takers will perform at the best if they are led
through four stages:
1. Warm-up.
In a minute or so of preliminary small talk, the interviewed directs mutual
introductions, helps the test-taker become comfortable with the situation, apprises the
test-taker of the format, and allays anxieties. No scoring this phase takes place.
2. Level
cbeck. Through a series of preplanned questions, the interviewer stimulates the
test-taker to respond using expected or predicted forms and functions. If, for
example, from previous test information, grades, or other data, the test-taker has been
judged to be a "Level 2" (see below) speaker, the interviewer prompts will attempt to
confirm this assumption.
3. Probe.
Probe questions and prompts challenge test-takers to go to the heights of their
ability, to extend beyond the limits of the interviewer's expectati through increasingly
difficult questions. Probe questions may be complex in the framing and/or complex
in their cognitive and linguistic demand. Through probe items, the interviewer
discovers the ceiling or limitation of the test-taker's preciency. This need
not be a separate stage entirely, bur might be a set of questi that are interspersed
into the previous stage. At the lower levels of proficies probe items may simply
demand a higher range of vocabulary or grammar from test-taker than
predicted.
4. Wind-down.
This final phase of the interview is simply a short period of during which the
interviewer encourages the test-taker to relax with some questions, sets the
test-taker's mind at ease, and provides information about and where to obtain the
results of the interview.
Discussions
and Conversations
As formal assessment devices,
discussions and conversations with and among students are difficult to specify
and even more difficult to score. But as informal techniques to assess
learners, they offer a level of authenticity and spontaneity that other assessment techniques
may not provide. Discussions may be especially appropriate tasks through which to
elicit and observe such abilities as
topic
nomination, maintenance, and termination.
Games
Among informal assessment devices are a
variety of games that directly involve laguage production. Consider the
following types:
Assessment
games
1. "Tinkertoy"
game: A Tinkertoy (or Lego block) structure is built behind a screen. One or two
learners are allowed to view the structure. In successive stages of
construction, the learners tell "runners" (who can't observe the structure)
how to re-create the structure. The runners then tell "builders"
behind another screen how to build the structure. The builders may question or confirm
as they proceed, but only through the two degrees of separation. Object: re-create
the structure as accurately as
possible.
2. Crossword
puzzles are created in which the names of all members of a class are clued by
obscure information about them. Each class member must ask questions of
others to determine who matches the clues in the puzzle.
3. Information
gap grids are created such that class members must conduct mini-interviews of
other classmates to fill in boxes, e.g., "born in July." "plays the
violin," "has a two-year-old child," etc.
4. City
maps are distributed to class members. Predetermined map directions are given to one
student who, with a city map in front of him or her, describes the route to
a partner, who must then trace the route and get to the correct final
destination.
Oral
Proficiency Interview (OPI)
The best-known oral interview format is
one that has gone through a consider
able
metamorphosis over the last half-century, the Oral Proficiency Interview (OPD). Originally known
as the Foreign Service Institute (FSD) test, the OPI is the result of a historical
progression of revisions under the auspices of several ages cies, including the
Educational Testing Service and the American Council Teaching Foreign
Languages (ACTFL).
Source:
Brown,
H. Douglas. 2003. Language Assessment
Principles and Classroom Practices (112-184). San Francisco,
California
Tidak ada komentar:
Posting Komentar