SUMMARY
CHAPTER 10
CHAPTER 10
BEYOND
TESTS:
ALTERNATIVES
IN
ASSESSMENT
In
the public eye, tests have acquired an aura of infallibility in our culture of
mass producing
everything, including the education of school children. Everyone wants a test for everything,
especially if the test is cheap, quickly administered, and scored instantaneously. But we
saw in Chapter 4 that while the standardized test industry has become a powerful
juggernaut of influence on decisions about people's lives, it also has come under
severe criticism from the public (Kohn, 2000). A more balanced viewpoint is
offered by Bailey (1998, p. 204): "One of the disturbing things about tests is the
extent to which many people accept the results uncritically, while others believe that all
testing is invidious. But tests are simply measurement tools. It is the use to which we
put their results that can be appropriate or inappropriate."
It is clear by now that tests are one of
a number of possible types of assessment. In Chapter 1, an important
distinction was made between testing and assessing. Tests are formal
procedures, usually administered within strict time limitations, to sample the performance
of a test-taker in a specified domain. Assessment connotes a much broader concept
in that most of the time when teachers are teaching, they are also assessing.
Assessment includes all occasions from informal impromptu observations and comments
up to and including tests.
Early in the decade of the 1990s, in a
culture of rebellion against the notion that all people and all
skills could be measured by traditional tests, a novel concept emerged that began to
be labeled "alternative" assessment. As teachers and students were becoming aware of
the shortcomings of standardized tests, "an alternative to standardized testing
and all the problems found with such testing" (Huerta-Macías, 1995, p. 8) was
proposed. That proposal was to assemble additional measures of students-portfolios,
journals, observations, self-assessments, peer-assessments, and the like-in an effort
to triangulate data about students. For some, such alternatives held "ethical
potential" (Lynch, 2001, p. 228) in their promotion of fairness and the balance of power relationships
in the classroom.
Why, then, should we even refer to the
notion of alternative" when assessment already encompasses
such a range of possibilities? This was the question to which Brown and Hudson (1998)
responded in a TESOL Quarterly article. They noted that to speak of aiternative
assessments is counterproductive because the term implies something new and
different that may be "exempt from the requirements of responsible test
construction" (p. 657). So they proposed to refer to
"alternatives" in assessment instead. Their term is a perfect fit
within a model that considers tests as a subset of assessment. Throughout this
book, you have been reminded that all testsare assessments but more important,
that not all assessments are tests.
The defining characteristics of the
various alternatives in assessment that have been commonly used
across the profession were aptly summed up by Brown and Hudson (1998, pp.
654-655). Alternatives in assessments
1. require students to perform, create,
produce, or do something;
2. usc real-world contexts or
simulations:
3. are nonintrusive in that they extend
the day-to-day classroom activities
4. allow students to be assessed on what
they normally do in class every day:
5. use tasks that represent meaningful
instructional activities;
6. focus on processes as well as
products
7. tap into higher-level thinking and
problem-solving skills:
8. provide information about both the
strengths and weaknesses of students:
9. are multiculturally sensitive when
properly administered:
10. ensure that people, not machines, do
the scoring, using human judgment;
11. encourage open disclosure of standards
and rating criteria, and
12. call upon teachers to perform new
instructional and assessment roles.
THE
DILEMMA OF MAXIMIZING BOTH PRACTICALITY AND WASHBACK
The principal purpose of this chapter is
to examine some of the alternatives
assessment
that are markedly different from formal tests. Tests, especially large-sca standardized tests,
tend to be one-shot performances that are timed, multiple-choice decontextualized,
norm-referenced, and that foster extrinsic motivation. On other hand, tasks like
portfolios, journals, and self-assessment are
•
Open-ended in their
time orientation and format,
•
contextualized to a
curriculum,
•
referenced to the
criteria (objectives) of that curriculum, and
·
likely to build
intrinsic motivation.
One way of looking at this contrast
poses a challenge to you as a teacher test designer. Formal standardized tests
are almost by definition highly practical able instruments. They are designed to
minimize time and money on the part of
time designer and test-taker, and to be painstakingly
accurate in their scoring. Alternatives
such as portfolios, or conferencing
students on drafts of written
work,
or observations of learners over time all require considerable time and effort on the part of the
teacher and the student. Even more time must be spent if the teacher hopes to offer
a reliable evaluation within students across time, as well as across students (taking
care not to favor one student or group of students). But the alternative techniques
also offer markedly greater washback, are superior formative measures, and, because
of their authenticity, usually carry greater face validity.
This relationship can be depicted in a
hypothetical graph that shows
practicality/reliability
on one axis and washback/authenticity on the other, as shown in Figure 10.1. Notice
the implied negative correlation: as a technique increases in its washback and
authenticity, its practicality and reliability tend to be lower. Conversely, the greater
the practicality and reliability, the less likely you are to achieve beneficial
washback and authenticity. I have placed three types of assessment on the
regression line to illustrate.
The figure appears to imply the
inevitability of the relationship: large-scale multiple-choice tests
cannot offer much washback or authenticity, nor can portfolios and such
alternatives achieve much practicality or reliability. This need not be the case! The challenge
that faces conscientious teachers and assessors in our profession is to change
the directionality of the line: to "flatten" that downward slope to some degree, or
perhaps to push the various assessments on the chart leftward and upward. Surely we
should not sit idly by accepting the presumably inescapable conclusion
that all standardized tests will be devoid of washback and authenticity. With some
creativity and effort, we can transform otherwise inauthentic and
negative-washback-producing tests into more pedagogically fulfilling learning experiences. A
number of approaches to accomplishing this end are possible, many of which have
already been implicitly presented in this book:
·
building as much
authenticity as possible into multiple-choice task types and items
·
designing classroom
tests that have both objective-scoring sections and open-ended response sections,
varying the performance tasks
·
turning multiple-choice
test results into diagnostic feedback on areas of needed improvement
·
maximizing the
preparation period before a test to elicit performance relevant to the ultimate
criteria of the test
·
teaching test-taking
strategies
·
helping students to see
beyond the test: don't teach to the test"
·
triangulating
information on a student before making a final assessment competence.
The flip side of this challenge is to
understand that the alternatives in assessment are not doomed to be impractical
and unreliable. As we look at alternatives assessment in this
chapter, we must remember Brown and Hudson's (1998) admonition to scrutinize
the practicality, reliability, and validity of those alternatives at the same time that we celebrate
their face validity, washback potential, and authenticits It is easy to fly out
of the cage of traditional testing rubrics, but it is tempting doing so to flap our
wings aimlessly and to accept virtually any classroom activity as a viable alternative.
Assessments proposed to serve as triangulating measures competence imply a
responsibility to be rigorous in determining objectives response modes, and
criteria for evaluation and interpretation.
PERFORMANCE-BASED
ASSESSMENT
Before
proceeding to a direct consideration of types of alternatives in assessment word about
performance-based assessment is in order. There has been a gres deal of press in recent
years about performance-based assessment, sometimes merely called
performance assessment (Shohamy, 1995, Norris et al., 1998). Is the different from what is
being called alternative assessment ?
The push toward more performance-based
assessment is part of the same greeral educational reform movement that has
raised strong objections to using sundardized test scores as the only measures
of student competencies (see
example,
Valdez Pierce & O'Malley, 1992, Shepard & Bliem, 1993). The argument you can guess, was that
standardized tests do not elicit actual performance on part of test-takers. If
a child were asked, for example, to write a description of ca as seen from space, to
work cooperatively with peers to design a three-dimens model of the solar
system, to explain the project to the rest of the class, and to notes on a videotape
about space travel, traditional standardized testing would involved in none of
those performances. Performance-based assessment, however, would require the
performance of the above-named actions, or samples thereof, which would be
systematically evaluated through direct observation by a teacher and or possibly by self
and peers.
Performance-based assessment implies
productive, observable skills, such as speaking and writing, of content-valid
tasks. Such performance usually, but not always, brings with it an air of
authenticity-real-world tasks that students have had time to develop. It
often implies an integration of language skills, perhaps all four skills in the case of
project work. Because the tasks that students perform are consistent with
course goals and curriculum, students and teachers are likely to be more motivated to
perform them, as opposed to a set of multiple-choice questions about facts and figures
regarding the solar system.
O'Malley
and Valdez Pierce (1996) considered performance-based assessment to be a subset of authentic
assessment. In other words, not all authentic assessment is performance-based.
One could infer that reading. listening, and thinking have many authentic
manifestations, but since they are not directly observable in and of themselves, they are
not performance-based. According to O'Malley and Valdez Pierce (p.5), the
following are characteristics of performance assessment:
1. Students
make a constructed response.
2. They
engage in bigber-order thinking, with open-ended tasks.
3. Tasks
are meaningful, engaging, and authentic.
4. Tasks
call for the integration of language skills.
5. Both
process and product are assessed.
6. Depth
of a student's mastery is emphasized over breadth.
Performance-based assessment needs to be
approached with caution. It is
tempting
for teachers to assume that if a student is doing something, then the process has fulfilled
its own goal and the evaluator needs only to make a mark in the grade book that says
"accomplished" next to a particular competency. In reality, performances
as assessment procedures need to be treated with the same rigor as traditional
tests. This implies that teachers should
·
state the overall goal
of the performance,
·
specify the objectives
(criteria) of the performance in detail,
·
prepare students for
performance in stepwise progressions,
·
use a reliable
evaluation form. checklist, or rating sheet,
·
treat performances as
opportunities for giving feedback and provide that feedback
systematically, and
·
if possible, utilize
self- and peer-assessments judiciously
To
sum up, performance assessment is not completely synonymous with the concept of
alternative assessment. Rather, it is best understood as one of the primary traits of the many
available alternatives to assessment.
PORTFOLIOS
One
of the most popular alternatives in assessment, especially within a framework of communicative
language teaching, is portfolio development. According to Genesee and Upshur
(1996), a portfolio is "a purposeful collection of students' work that demonstrates ...
their efforts, progress and achievements in given areas" (p. 99). Portfolios include
materials such as
·
essays and compositions
in draft and final forms;
·
reports, project
outlines:
·
poetry and creative
prose:
·
artwork, photos,
newspaper or magazine clippings:
·
audio and/or video
recordings of presentations, demonstrations, etc.;
·
journals, diaries, and
other personal reflections;
·
tests, test scores, and
written homework exercises;
·
notes on lectures, and
·
self- and
peer-assessments-comments, evaluations, and checklists.
Until recently, portfolios were thought
to be applicable only to younger children who assemble a portfolio of artwork and
written work for presentation to a teacher and/or a parent. Now
learners of all ages and in all fields of study are benefiting from the tangible, hands-on
nature of portfolio development.
Gottlieb
(1995) suggested a developmental scheme for considering the nature and purpose of
portfolios, using the acronym CRADLE to designate six possible
attributes
of a portfolio:
Collecting
Reflecting
Assessing
Documenting
Linking
Evaluating
As
Collections, portfolios are an expression of students' lives and identities.
The appropriate
freedom of students to choose what to include should be respected, but at the same time the
purposes of the portfolio need to be clearly specified. Reflective practice
through journals and self-assessment checklists is an important ingredient of a
successful portfolio Teacher and student both need to take the role of Assessment seriously
as they evaluate quality and development over time. We need to recognize that
a portfolio is an important Document in demonstrating student achievement, and
not just an insignificant adjunct to tests and grades and other more traditional
evaluation. A portfolio can serve as an important Link between student and
teacher, parent, community, and peers; it is a tangible product, created with pride, that identifies
a student's uniqueness. Finally, Evaluation of portfolios requiresa
time-consuming but fulfilling process of generating accountability.
The advantages of engaging students in
portfolio development have been
extolled
in a number of sources (Genesee & Upshur, 1996, O'Malley & Valdez
Pierce, 1996;
Brown & Hudson, 1998; Weigle, 2002). A synthesis of those characteristics gives us a number of
potential benefits. Portfolios
·
foster intrinsic
motivation, responsibility, and ownership,
·
promote student-teacher
interaction with the teacher as facilitator,
·
individualize learning
and celebrate the uniqueness of each student,
·
provide tangible
evidence of a student's work,
·
facilitate critical
thinking, self-assessment, and revision processes,
·
offer opportunities for
collaborative work with peers, and
·
permit assessment of
multiple dimensions of language learning.
At
the same time, care must be taken lest portfolios become a haphazard pile of junk" the
purpose of which is a mystery to both teacher and student. Portfolios can fail if objectives
are not clear, if guidelines are not given to students, if systematic periodic
review and feedback are not present, and so on. Sometimes the thought of asking
students to develop a portfolio is a daunting challenge, especially for new teachers and
for those who have never created a portfolio on their own Successful portfolio
development will depend on following a number of steps and guidelines.
1. State
objectives clearly Pick one or more of the CRADLE attributes named above and specify them
as objectives of developing a portfolio Show how those purposes are connected
to, integrated with, and/or a reinforcement of your already stated curricular
goals. A portfolio attains maximum authenticity and washback when it is an integral
part of a curriculum, not just an optional box of materials. Show students how
their portfolios will include materials from the course they are taking and how that
collection will enhance curricular goals.
2. Give
guidelines on wbat materials to include. Once the objectives have been determined, name
the types of work that should be included. There is some disagreement among
"experts" about how much negotiation should take place between student
and teacher over those materials. Hamp-Lyons and Condon (2000) suggested advantages
for student control of portfolio contents, but teacher guidance will keep
students on target with curricular objectives. It is helpful to give clear directions on how
to get started since many students will never have compiled a portfolio and may
be mystified about what to do. A sample portfolio from a previous student can
help to stimulate some thoughts on what to include.
3. Communicate
assessment criteria to students. This is both the most important aspect of
portfolio development and the most complex. Two sources self-assessment and
teacher assessment-must be incorporated in order for students to receive the
maximum benefit. Self-assessment should be as clear and simple as possible.
O'Malley and Valdez Pierce (1996) suggested the following half page
self-evaluation of a writing sample (with spaces for students to write) for elementary
school English language students.
Genesee and Upshur (1996) recommended
using a questionnaire format for
self-assessment,
with questions like the following for a project: Portfolio project
self-assessment questionnaire
1. What
makes this a good or interesting project?
2. What
is the most interesting part of the project?
3. What
was the most difficult part of the project?
4. What
did you learn from the project?
5. What
skills did you practice when doing this project?
6. What
resources did you use to complete this project?
7. What
is the best part of the project? Why?
8. How
would you make the project better?
The
teacher's assessment might mirror self-assessments, with similar questions designed to highlight
the formative nature of the assessment. Conferences are important checkpoints
for both student and teacher. In the case of requestes written responses from
students, help your students to process your feedback and show them how to
respond to your responses. Above all, maintain reliability in assessing portfolios so
that all students receive equal attention and are assessed by the same criteria. An option that works
for some contexts is to include peer-assessment or small group conferences to
comment on one another's portfolios. Where the classroom community is relatively
closely knit and supportive and where students are willin to expose themselves by
revealing their portfolios, valuable feedback can be achieved from peer
reviews. Such sessions should have clear objectives lest the erode into aimless
chatter. Checklists and questions may serve to preclude such eventuality
4. Designate
time within the curriculum for portfolio development. If students feel rushed
to gather materials and reflect on them, the effectiveness of the portfolio process is
diminished. Make sure that students have time set aside for portfolio work
(including in-class time) and that your own opportunities for conferencing are
not compromised.
5. Establish
periodic scbedules for review and conferencing. By doing so, you will prevent students
from throwing everything together at the end of a term.
6. Designate
an accessible place to keep portfolios. It is inconvenient for students to
carry collections of papers and artwork. If you have a self-contained classroom
or a place in a reading room or library to keep the materials, that may provide a good option. At the
university level, designating a storage place on the campus may involve impossible
logistics. In that case, encourage students to create their own accessible location
and to bring to class only the materials they need.
7. Provide
positive washbacie-giving final assessments. When a portfolio has been completed and the
end of a term has arrived, a final summation is in order. Should portfolios be
graded? be awarded specific numerical scores? Opinion is divided; every
advantage is balanced by a disadvantage. For example, numerical scores serve as convenient data
to compare performance across students, courses, and districts. For portfolios
containing written work, Wolcott (1998) recommended a holistic scoring scale
ranging from 1 to 6 based on such qualities as inclusion of out-of-class work,
error-free work, depth of content, creativity, organization, writing style, and
"engagement" of the student. Such scores are perhaps best viewed as numerical
equivalents of letter grades.
One could argue that it
is inappropriate to reduce the personalized and creative process of compiling a
portfolio to a number or letter grade and that it is more appropriate to
offer a qualitative evaluation for a work that is so open-ended. Such evaluations might
include a final appraisal of the work by the student, with questions such as those
listed above for self-assessment of a project, and a narrative evaluation of perceived
strengths and weakness by the teacher. Those final evaluations should emphasize
strengths but also point the way toward future learning challenges.
It is clear that
portfolios get a relatively low practicality rating because of the time it takes for
teachers to respond and conference with their students. Nevertheless, following
the guidelines suggested above for specifying the criteria for evaluating portfolios can
raise the reliability to a respectable level, and without question the washback
effect, the authenticity, and the face validity of portfolios remain exceedingly high.
In the above
discussion, I have tried to subject portfolios to the same specifications that
apply to more formal tests: it should be made clear what the objectives are, what tasks are
expected of the student, and how the learner's product will be evaluated. Strict
attention to these demands is warranted for successful portfolio development to take
place.
JOURNALS
Fifty
years ago, journals had no place in the second language classroom. When language
production was believed to be best taught under controlled conditions, the concept of
"free" writing was confined almost exclusively to producing essays on assigned topics. Today,
journals occupy a prominent role in a pedagogical model that stresses the
importance of self-reflection in the process of students taking control of
their own destiny.
A journal is a log (or
"account") of one's thoughts, feelings, reactions, assessments,
ideas, or progress toward goals, usually written with little attention to structure,
form, or correctness. Learners can articulate their thoughts without the threat of those thoughts being
judged later (usually by the teacher). Sometimes journals are rambling sets of
verbiage that represent a stream of consciousness with no particular point,
purpose, or audience. Fortunately, models of journal use in educational practice have sought to
tighten up this style of journal in order to give them some focus (Staton et al.,
1987). The result is the emergence of a number of overlapping categories or purposes
in journal writing, such as the following
·
language-learning logs
·
grammar journals
·
responses to readings
·
strategies-based learning
logs
·
self-assessment
reflections
·
diaries of attitudes,
feelings, and other affective factors
·
acculturation logs
Most classroom-oriented journals are
what have now come to be known as dialogue journals. They imply an interaction
between a reader (the teacher) and the student through dialogues or responses.
For the best results, those responses should be dispersed across a
course at regular intervals, perhaps weekly or biweekly. One of the principal
objectives in a student's dialogue journal is to carry on a conversetion with
the teacher. Through dialogue journals, teachers can become better acquainted with their
students, in terms of both their learning progress and the affective states, and
thus become better equipped to meet students'individual need.
The following journal entry from an
advanced student from China, and the
teacher's
response, is an illustration of the kind of dialogue that can take place.
Dialogue
journal sample
Journal
entry by Ming Ling, China:
Yesterday at about eight o'clock I was sitting
in front of my table holding
a fork and eating tasteless noodles which I usually really like to eat but I lost
my taste yesterday because I didnt feel well. I had a headache and a
fever. My head seemed to be broken. I sometimes felt cold, sometimes hot. I didn't
feel comfortable standing
up
and I didn't feel comfortable sitting down. I hated everything around me. It seemed to
me that I got a great pressure from the atmosphere and I could not breath. I was
so sleepy since I had taken
some
medicine which functioned as an antibiotic.
The room was so quiet. I was there by
myself and felt very solitary. This dinner reminded me of my mother. Whenever I
was sick in China,
my mother always took care of me and cooked rice gruel, which has to cook more
than three houry and is very delicious, I think. I would be better very soon under
the care of my mother. But
yesterday,
I had to cook by myself even though I was sick, The more I thought, the less I
wanted to eat, Half an hour passed. The noodles were cold, but I was
still sitting there and thinking about my mother. Finally I threw out the
noodles and went to bed.
Teacher's
response:
This is a powerful piece of writing
because you really communicate what you were feeling. You used vivid details,
like "eating tasteless noodles," "my head seemed to be
broken" and "rice gruel, which has to cook more than three hours and is very
delicious." These make it easy for the reader to picture exactly what you
were going
through. The other strong point about this piece is that you bring the reader full circle by beginning
and ending with the noodles."
Being
alone when you are sick is difficult. Now, I know why you were so quiet in class. If you want to do
another entry related to this one, you could have a dialogue with your "sick"
self. What would your "healthy" self say to the "sick"
self? Is there some
advice that could be exchanged about how to prevent illness or how to take care of yourself better
when you do get sick? Start the dialogue with your "sick" self speaking first.
With the widespread availability of
Internet communications, journals and other student-teacher dialogues have
taken on a new dimension. With such innovations as "collaboratories"
(where students in a class are regularly carrying on email discussions with each
other and the teacher), on-line education, and distance learning. journals-out
of several genres of possible writing-have gained additional prominence.
Journals obviously serve important
pedagogical purposes: practice in the mechanics of writing, using writing as a
"thinking process, individualization, and communication with the
teacher. At the same time, the assessment qualities of journal writing have
assumed an important role in the teaching-learning process. Because most journals
are-or should be-a dialogue between student and teacher, they afford a unique
opportunity for a teacher to offer various kinds of feedback.
On the other side of the issue, it is
argued that journals are too free a form to be assessed accurately.
With so much potential variability, it is difficult to set up criteria for
evaluation. For some English language learners, the concept of free and unfettered writing is
anathema. Certain critics have expressed ethical concerns: students may be
asked to reveal an inner self, which is virtually unheard of in their own culture. Without a
doubt, the assessing of journal entries through responding is not an exact science.It is
important to turn the advantages and potential drawbacks of journals into positive general steps
and guidelines for using journals as assessment instruments. The following steps are
not coincidentally parallel to those cited above for portfolio development:
1.
Sensitively introduce
students to the concept of journal writing. For many students, especially
those from educational systems that play down the notion of teacher-student
dialogue and collaboration, journal writing will be difficult at first University-level
students, who have passed through a dozen years of product writing, will have
particular difficulty with the concept of writing without fear of a teacher's scrutinizing
every grammatical or spelling error. With modeling, assurance, and purpose,
however, students can make a remarkable transition into the potentially
liberating process of journal writing. Students who are shown examples of journal entries and are
given specific topics and schedules for writing will become comfortable with the
process.
2.
State the objective(s)
of the journal. Integrate journal writing into the ob jectives of the
curriculum in some way, especially if journal entries become topics of class discussion.
The list of types of journals at the beginning of this section may coincide with the
following examples of some purposes of journals:
Language-learning
logs. In English language teaching, learning logs have the advantage of
sensitizing students to the importance of setting their own goals and then self-monitoring
their achievement. McNamara (1998) suggested restricting the number of skills,
strategies, or language categories that students comment on; otterwise students
can become overwhelmed with the process. A weekly schedule of a limited number of
strategies usually accomplishes the purpose of keeping sterdents on task.
Grammar
journals. Some journals are focused only on grammar acquisitios These types of journals
are especially appropriate for courses and workshops than focus on grammar.
"Error logs" can be instructive processes of consciousness raisin for students: their
successes in noticing and treating errors spur them to maintain the process of
awareness of error.
Responses
to readings. Journals may have the specified purpose of simple responses to readings
(and/or to other material such as lectures, presentations, film and videos). Entries
may serve as precursors to freewrites and help learners to som out thoughts and
opinions on paper. Teacher responses aid in the further develos ment of those ideas.
Strategies-based
learning logs. Closely allied to language-learning logs are specialized
journals that focus only on strategies that learners are seeking to become aware of and to use in
their acquisition process. In HD. Brown's (2002) Strategie for Success. A
Practical Guide to Learning Engidsb, a systematic strategies-based journal-writing
approach is taken where, in each of 12 chapters, learners become aware of a strategy,
use it in their language performance, and reflect on that proces in a journal.
Self-assessment
reflections. Journals can be a stimulus for self-assessment in : more open-ended way
than through using checklists and questionnaires. With the possibility of a few
stimulus questions, students' journals can extend beyond the scope of simple
one-word ar one-sentence responses.
Diaries
of attitudes, feelings and other affective factors. The affective states of learners are an important
element of self-understanding. Teachers can thereby become better equipped to
effectively facilitate learners'individual journeys toward their goals.
Acculturation
logs. A variation on the above affectively based journals is one that focuses exclusively
on the sometimes difficult and painful process of acculturation in a non-native
country. Because culture and language are so strongly linked awareness of the
symptoms of acculturation stages can provide keys to eventual lan guage success.
3.
Give guidelines on wbat
kinds of topics to include. Once the purpose of type of journal is clear,
students will benefit from models or suggestions on what kinds of topics to
incorporate into their journals.
4.
Carefully specify the
criteria for assessing or grading journals. Students need to understand the
freewriting involved in journals, but at the same time, they need to know assessment
criteria. Once you have clarified that journals will not be evaluated for
grammatical correctness and rhetorical conventions, state how they will be evaluated.
Usually the purpose of the journal will dictate the major assessment criterion.
Effort as exhibited in the thoroughness of students' entries will doubt be
important. Also, the extent to which entries reflect the processing of course content might be
considered. Maintain reliability by adhering conscien tiously to the criteria
that you have set up.
5.
Provide optimal
feedback in your responses. McNamara (1998, p. 39) recommended three different
kinds of feedback to journals:
·
cheerleading feedback,
in which you cclebrate successes with the students or encourage them to
persevere through difficulties,
·
instructional feedback,
in which you suggest strategies or materials, suggest ways to fine-tune
strategy use, or instruct students in their writing, and
·
reality-check feedback, in which you help the
students set more realistic
expectations
for their language abilities.
The ultimate purpose of responding to
student journal entries is well captured in McNamara's threefold classification
of feedback. Responding to journals is a very personalized matter,
but closely attending to the objectives for writing the journal and its specific
directions for an entry will focus those responses appropriately.
Peer responses to
journals may be appropriate if journal comments are relatively
"cognitive," as opposed to very personal. Personal comments could
make students feel threatened by other pairs of eyes on their inner thoughts
and feelings.
6.
Designate appropriate
time frames and schedules for review. Journals like portfolios, need
to be esteemed by students as integral parts of a course. Therefore, it is
essential to budget enough time within a curriculum for both writing journals
and for your written responses. Set schedules for submitting journal entries periodically: return
them in short order.
7.
Provide formative,
washback-giving final comments. Journals, perhaps even more than
portfolios, are the most formative of all the alternatives in assessment. They
are day-by-day (or at least weekly) chronicles of progress whose purpose is to
provide a thread of continuous assessment and reassessment, to recognize mid-stream direction
changes, and/or to refocus on goals. Should you reduce a final assessment of such a
procedure to a grade or a score? Some say yes, some say no (Peyton & Reed,
1990), but it appears to be in keeping with the formative nature of journals not to do so.
Credit might be given for the process of actually writing the journal, and possibly a
distinction might be made among high, moderate, and low effort and/or quality.
But to accomplish the goal of positive washback, narrative summary comments and
suggestions are clearly in order.
In
sum, how do journals score on principles of assessment? Practicality remains relatively low,
although the appropriation of electronic communication increases practicality by
offering teachers and students convenient, rapid (and legible!) means of responding.
Reliability can be maintained by the journal entries adhering to stated purposes and
objectives, but because of individual variations in writing and the accompanying
variety of responses, reliability may reach only a moderate level Content and face
validity are very high if the journal entries are closely interwoven with curriculum goals
(which in turn reflect real-world needs). In the category of washback, the potential
in dialogue journals is off the charts!
CONFERENCES
AND INTERVIEWS
For
a number of years, conferences have been a routine part of language classrooms especially of courses
in writing In Chapter 9. reference was made to conferencing as a standard part of the
process approach to teaching writing, in which the teacher. In a conversation about a
draft, facilitates the improvement of the written work. Such interaction has the
advantage of one-on-one interaction between teacher and student and the teacher's being
able to direct feedback toward a student's specific needs.
Conferences are not limited to drafts of
written work. Including portfolios and journals discussed above, the list of
possible functions and subject matter for conferencing is substantial:
·
commenting on drafts of
essays and reports
·
reviewing portfolios
·
responding to journals
·
advising on a student's
plan for an oral presentation
·
assessing a proposal
for a project
·
giving feedback on the
results of performance on a test
·
clarifying
understanding of a reading
·
exploring
strategies-based options for enhancement or compensation
·
focusing on aspects of
oral production
·
checking a student's
self-assessment of a performance
·
setting personal goals
for the near future
·
assessing general
progress in a course
Conferences must assume that the teacher
plays the role of a facilitator and
guide,
not of an administrator, of a formal assessment. In this intrinsically
motivating atmosphere,
students need to understand that the teacher is an ally who is encouraging
self-reflection and improvement. So that the student will be as candid as possible
in self-assessing, the teacher should not consider a conference as something to be scored or graded.
Conferences are by nature formative, not summative, and their primary purpose is to
offer positive washback
Genesee and Upshur (1996, p. 110)
offered a number of generic kinds of questions that may be useful to pose in a
conference:
·
What did you like about
this work?
·
What do you think you
did well?
·
How does it show
improvement from previous work? Can you show me the improvement?
·
Are there things about
this work you do not like? Are there things you would like to improve?
·
Did you have any
difficulties with this piece of work? If so, where, and what did you do (will you
do] to overcome them?
·
What strategies did you
use to figure out the meaning of words you could not understand?
·
What did you do when
you did not know a word that you wanted to write?
Discussions
of alternatives in assessment usually encompass one specialized kind of conference: an
interview. This term is intended to denote a context in which a teacher
interviews a student for a designated assessment purpose. (We are not talking about a
student conducting an interview of others in order to gather information on a
topic.) Interviews may have one or more of several possible goals, in which the teacher
·
assesses the student's
oral production,
·
ascertains a student's
needs before designing a course or curriculum,
·
seeks to discover a student's
learning styles and preferences,
·
asks a student to
assess his or her own performance, and
·
requests an evaluation
of a course.
One overriding principle of effective
interviewing centers on the nature of the questions that will be asked. It is easy
for teachers to assume that interviews are just informal conversations
and that they need little or no preparation. To maintain the all-important
reliability factor, interview questions should be constructed carefully to elicit as focused a
response as possible. When interviewing for oral production assessment, for
example, a highly specialized set of probes is necessary to accomplish
predetermined objectives. (Look back at Chapter 7, where oral interviews were discussed.)
Because interviews have multiple
objectives, as noted above, it is difficult to generalize principles
for conducting them, but the following guidelines may help to frame the questions
efficiently:
·
Offer an initial
atmosphere of warmth and anxiety-lowering (warm-up).
·
Begin with relatively
simple questions.
·
Continue with
level-check and probe questions, but adapt to the interviewee as needed.
·
Frame questions simply
and directly
·
Focus on only one
factor for each question. Do not combine several objectives in the same question.
·
Be prepared to repeat
or reframe questions that are not understood.
·
Wind down with friendly
and reassuring closing comments.
How do conferences and interviews score
in terms of principles of assessment?
Their
practicality, as is true for many of the alternatives to assessment, is low
because they
are time-consuming. Reliability will vary between conferences and interviews. In the case of
conferences, it may not be important to have rater reliability because the whole purpose is to
offer individualized attention, which will vary greatly from student to student. For
interviews, a relatively high level of reliability should be maintained with careful
attention to objectives and procedures. Face validity for both can be maintained
at a high level due to their individualized nature. As long as the subject matter of
the conference/interview is clearly focused on the course and course objectives,
content validity should also be upheld. Washback potential and authenticity are high
for conferences, but possibly only moderate for interviews unless the results of
the interview are clearly folded into subsequent learning.
OBSERVATIONS
All teachers, whether they are aware of
it or not, observe their students in the classroom almost constantly. Virtually
every question, every response, and almost every nonverbal behavior is,
at some level of perception, noticed. All those intuitive perceptions are
stored as little bits and pieces of information about students that can form a composite
impression of a student's ability. Without ever administering a test or a quiz, teachers
know a lot about their students. In fact, experienced teachers are so good at this almost
subliminal process of assessment that their estimates of a student's competence
are often highly correlated with actual independently administered test scores.
(See Acton, 1979, for an example.)
How do all these chunks of information
become stored in a teacher's brain
cells?
Usually not through rating sheets and checklists and carefully completed observation charts.
Still, teachers' intuitions about students' performance are not infallible, and
certainly both the reliability and face validity of their feedback to students
can be increased with the help of empirical means of observing their language
performance. The value of systematic observation of students has been extolled for decades
(Flanders, 1970; Moskowitz, 1971; Spada & Frölich, 1995), and its
utilization greatly enhances a teacher's intuitive impressions by offering
tangible corroboration
of conclusions. Occasionally, intuitive information is disconfirmed by observation data.
We will not be concerned in this section
with the kind of observation that rates a formal presentation or any other
prepared, prearranged performance in which the student is fully aware
of some evaluative measure being applied, and in which the teacher scores or
comments on the performance. We are talking about observation as a systematic,
planned procedure for real-time, almost surreptitious recording of student verbal and
nonverbal behavior. One of the objectives of such observation is to assess students
without their awareness (and possible consequent anxiety) of the observation so that the
naturalness of their linguistic performance is maximized.
The list could be even more specific to
suit the characteristics of students, the focus of a lesson or module, the
objectives of a curriculum, and other factors. The list might expand, as well,
to include other possible observed performance. In order to carry out classroom
observation, it is of course important to take the following steps:
·
Determine the specific
objectives of the observation.
·
Decide how many
students will be observed at one time.
·
Set up the logistics
for making unnoticed observations.
·
Design a system for
recording observed performances.
·
Do not overestimate the
number of different elements you can observe at one time-keep them very
limited.
·
Plan how many
observations you will make.
·
Determine specifically
how you will use the results.
Designing a system for observing is no
simple task. Recording your observations can take the form of anecdotal
records, checklists, or rating scales. Anecdotal records should be as
specific as possible in focusing on the objective of the observation, but they
are so varied in form that to suggest formats here would be counterproductive.
Their very purpose is more note-taking than record-keeping. The key is to devise a system
that maintains the principle of reliability as closcly as possible Checklists are a viable
alternative for recording observation results. Some checklists of student
classroom performance, such as the COLT observation scheme devised by Spada and Fröhlich
(1995), are elaborate grids referring to such variables as
·
whole-class, group, and
individual participation,
·
content of the topic,
·
linguistic competence (form,
function, discourse, sociolinguistic),
·
materials being used,
and
·
skill (listening,
speaking, reading, writing),
with
subcategories for each variable. The observer identifies an activity or episode as well as the starting
time for each, and checks appropriate boxes along the grid. Completing such a form
in real time may present some difficulty with so many factors to attend to at
once.
SELF-
AND PEER-ASSESSMENTS
A conventional view of language
assessment might consider the notion of seit and peer assessment as
an absurd reversal of politically correct power relationships. After all, how
could learners who are still in the process of acquisition, especially the early
processes, be capable of rendering an accurate assessment of their own performance?
Nevertheless, a closer look at the acquisition of any skal reveals the importance,
if not the necessity, of self-assessment and the bencfito peer-assessment. What
successful learner has not developed the ability to monite his or her own
performance and to use the data gathered for adjustments and corrections? Most
successful learners extend the learning process well beyond the classroom and the
presence of a teacher or tutor, autonomously mastering the art of self assessment.
Where peers are available to render assessments, the advantage of such additional
input is obvious.
Self-assessment derives its theoretical
justification from a number of weliestablished principles of second language
acquisition. The principle of autonomy stands out as one of the primary
foundation stones of successful learning. The ability to set one's
own goals both within and beyond the structure of a classroom curriculum, to pursue
them without the presence of an external prod, and to independently monitor
that pursuit are all keys to success. Developing intrinsic moti-vation that
comes from a self-propelled desire to excel is at the top of the list of successful acquisition
of any set of skills.
Peer-assessment appeals to similar
principles, the most obvious of which is cooperative learning. Many people go
through a whole regimen of education from kindergarten up through a graduate
degree and never come to appreciate the value of collaboration in learning the benefit of
a community of learners capable of teaching each other something.
Peer-assessment is simply one arm of a plethora of tasks and procedures within the
domain of learner-centered and collaborative education.
Researchers (such as Brown & Hudson,
1998) agree that the above theoretical underpinnings of self- and
peer-assessment offer certain benefits, direct involvement of students in their
own destiny, the encouragement of autonomy, and increased motivation because of
their self-involvement. Of course, some noteworthy dran backs must also be
taken into account. Subjectivity is a primary obstacle to overcome. Students
may be either too harsh on themselves or too self-flattering, or themay not
have the necessary tools to make an accurate assessment.
TYPES
OF SELF- AND PEER-ASSESSMENT
It is important to distinguish among
several different types of self- and peer-assessment and to apply them
accordingly. I have borrowed from widely accepted classifica tions of strategic
options to create five categories of self- and peer-assessment (1) direct assessment
of performance, (2) indirect assessment of performance, (3) metacognitive
assessment, (4) assessment of socioaffective factors, and (5) student
self-generated tests.
1. Assessment
of (a specifici performance. In this category, a student typically monitors him or
herself-in either oral or written production-and renders some kind of evaluation of
performance. The evaluation takes place immediately or very soon after the
performance. Thus, having made an oral presentation, the student (or a peer) fills out a
checklist that rates performance on a defined scale. Or perhaps the student views a
video-recorded lecture and completes a self-corrected comprehension quiz. A
journal may serve as a tool for such self-assessment. Peer editing is an excellent example of
direct assessment of a specific performance.
2. Indirect
assessment of (general) competence. Indirect self-or peer-assessment targets larger slices
of time with a view to rendering an evaluation of general ability, as opposed to one
specific, relatively time-constrained performance. The distinction between direct and
indirect assessments is the classic competence-performance distinction. Self- and
peer-assessments of performance are limited in time and focus to a relatively short
performance. Assessments of competence may encompass a lesson over several
days, a module, or even a whole term of course work, and the objective is to ignore
minor, nonrepeating performance flaws and thus to evaluate general ability.
In a successful experiment to introduce
self-assessment in his advanced intermediate preuniversity ESL class, Phillips
(2000) created a questionnaire (Figure10.2) through which his students
evaluated themselves on their class participation. The items were simply formatted with
just three options to check for each category which made the process
easy for students to perform. They completed the questionnaire at midterm,
which was followed up immediately with a teacher-student conference during which
students identified weaknesses and set goals for the remainder of the term.
Of course, indirect self and
peer-assessment is not confined to scored rating sheets and questionnaires.
An ideal genre for self-assessment is through journals, where students engage
in more open-ended assessment and/or make their own further comments on the results
of completed checklists.
3. Metacognitive
assessment (for setting goals). Some kinds of evaluation are more strategic in
nature, with the purpose not just of viewing past performance or competence but of
setting goals and maintaining an eye on the process of their pursuit. Personal
goal-setting has the advantage of fostering intrinsic motivation and of providing learners with
that extra-special impetus from having set and accomplished one's own goals.
Strategic planning and self-monitoring can take the form of journal entries,
choices from a list of possibilities, questionnaires, or cooperative (oral) pair or group
planning.
CLASS
PARTICIPATION
4. Socioaffective
assessment. Yet another type of self- and peer-assessment comes in the form of
methods of examining affective factors in learning. Such 25 sessment is quite
different from looking at and planning linguistic aspects of acquesition. It
requires looking at oneself through a psychological lens and may not differ greatly from self-assessment
across a number of subject matter areas or for any see of personal skills.
When learners resolve to assess and improve motivation, to gaug and lower their own
anxiety, to find mental or emotional Obstacles to learning and then plan to overcome
those barriers, an all-important socioaffective domain is is voked. A checklist form
of such items may look like many of the questionnaire items in Brown (2002), in
which test-takers must indicate preference for one statement over the one on the
opposite side:
5. Student-generated
tests. A final type of assessment that is not usually classified strictly as self-
or peer-assessment is the technique of engaging students in the process of constructing
tests themselves. The traditional view of what a test is would never allow
students to engage in test construction, but student-generated tests can be
productive, intrinsically motivating, autonomy-building processes.
GUIDELINES
FOR SELF- AND PEER-ASSESSMENT
Self-
and peer assessment are among the best possible formative types of assessment
and possibly the most rewarding, but they must be carefully designed administered
for them to reach their potential. Four guidelines will help teachers bring this
intrinsically motivating task into the classroom successfully process of constructing
tests themselves. The traditional view of what a test is would never allow
students to engage in test construction, but student-generated tests can be
productive, intrinsically motivating, autonomy-building processes.
Self- and peer assessment are among the
best possible formative types of assess
ment
and possibly the most rewarding, but they must be carefully designed ané
administered
for them to reach their potential. Four guidelines will help teachers
bring
this intrinsically motivating task into the classroom successfully
·
Tell students the
purpose of the assessment. Self-assessment is a process that many
students-especially those in traditional educational systems-will ini- tially find quite
uncomfortable. They need to be sold on the concept. It is therefore essential that you
carefully analyze the needs that will be met in offering both self and
peer-assessment opportunities, and then convey this information to students.
·
Define the task(s)
clearly. Make sure the students know exactly what they are supposed to do. If
you are offering a rating sheet or questionnaire, the task is not complex, but an
open-ended journal entry could leave students perplexed about what to write. Guidelines
and models will be of great help in clarifying the procedures.
·
Encourage impartial
evaluation of performance or ability. One of the greatest drawbacks to
self-assessment is the threat of subjectivity. By showing students the advantage
of honest, objective opinions, you can maximize the beneficial washback of
self-assessments. Peer-assessments, too, are vulnerable to unreliability as students apply
varying standards to their peers. Clear assessment criteria can go a long way toward
encouraging objectivity
·
Ensure beneficial
wasbback through follow-up tasks. It is not enough to simply toss a
self-checklist at students and then walk away. Systematic follow-up can be accomplished through
further self-analysis, journal reflection, written feedback from the teacher,
conferencing with the teacher, purposeful goal-setting by the student, or any
combination of the above.
A
TAXONOMY OF SELF- AND PEER-ASSESSMENT TASKS
An evaluation of self- and
peer-assessment according to our classic principles of assessment yields a
pattern that is quite consistent with other alternatives to assessment that have
been analyzed in this chapter. Practicality can achieve a moderate level with
such procedures as checklists and questionnaires, while reliability risks remaining at a
low level, given the variation within and across learners. Once students accept the
notion that they can legitimately assess themselves, then face validity can be raised
from what might otherwise be a low level. Adherence to course objectives will
maintain a high degree of content validity. Authenticity and washback both have very
high potential because students are centering on their own linguistic needs
and are receiving useful feedback.
Source:
Brown, H. Douglas.
2003. Language Assessment Principles and
Classroom Practices (page 251-280). San Francisco, California