Tugas Language Assessment

SUMMARY

Assessing Grammar

A. Differing notions of ‘grammar’ for assessment

In reaction to the grammar-translation approach that had become more about learning a set of abstract linguistic rules than about learning to use a language for some communicative purpose, some language teachers began to seek alternative approaches to language teaching based on what students could ‘do’ with the language. These teachers insisted that the grammar should not only be learned, but also applied to some linguistic or communicative purpose. They recommended that grammatical analysis be accompanied by application, where students are asked to answer questions, write illustrative examples, combine sentences, correct errors, write paragraphs and so forth. To know a language meant to be able to apply the rules – an approach relatively similar to what is done in many classrooms today. In this approach, knowledge of grammar was assessed by having students apply rules to language in some linguistic context.

Most of the early debates about language teaching have now been resolved; however, others continue to generate discussion. For example, most language teachers nowadays would no longer expect their students to devote too much time to describing and analyzing language systems, to translating texts or to learning a language solely for access to its literature; rather, they would want their students to learn the language for some communicative purpose. In other words, the primary goal of language learning today is to foster communicative competence, or the ability to communicate effectively and spontaneously in real-life settings. Language teachers today would not deny that grammatical competence is an integral part of communicative language ability, but most would maintain that grammar should be viewed as an indispensable resource for effective communication and not, except under special circumstances, an object of study in itself. Current teaching controversies revolve around the role, if any, that grammar instruction should play in the language classroom and the degree to which the grammatical system of a language can be acquired through instruction. These questions have, since the 1980s, produced an explosion of empirical research, which is of critical importance to language teachers.

Grammar and Linguistic

Since the 1950s, there have been many such linguistic theories – too numerous to list here that have been proposed to explain language phenomena. Many of these theories have helped shape how L2 educators currently define grammar in educational contexts. Although it is beyond the purview of this book to provide a comprehensive review of these theories, it is, nonetheless, helpful to mention a few, considering both the impact they have had on L2 education and the role they play in helping define grammar for assessment purposes. Generally speaking, most linguists have embraced one of two general perspectives to describe linguistic phenomena. Either they take a syntactocentric perspective of language, where syntax, or the way in which words are arranged in a sentence, is the central feature to be observed and analyzed; or they adopt a communication perspective of language, where the observational and analytic emphasis is on how language is used to convey meaning (VanValin and LaPolla, 1997). I will use these two perspectives to classify some of the more influential grammatical paradigms in our field.

Form-Based Perspectives Of Language

Several syntac to centric, or form-based, theories of language have provided grammatical insights to L2 teachers. I will describe three: traditional grammar, structural linguistics and transformational-generative grammar. One of the oldest theories to describe the structure of language is traditional grammar. Originally based on the study of Latin and Greek, traditional grammar drew on data from literary texts to provide rich and lengthy descriptions of linguistic form. Unlike some other syntac to centric theories, traditional grammar also revealed the linguistic meanings of these forms and provided information on their usage in a sentence (Celce-Murcia and Larsen-Freeman, 1999). Traditional grammar supplied an extensive set of prescriptive rules along with the exceptions. A typical rule in a traditional English grammar might be:

The first-person singular of the present tense verb ‘to be’ is ‘I am’. ‘Am’ is used with ‘I’ in all cases, except in first-person singular negative tag and yes/no questions, which are contracted. In this case, the verb ‘are’ is used instead of ‘am’. For example, ‘I’m in a real bind, aren’t I?’ or ‘Aren’t I trying my best?’

Probably the best-known syntax to centric theory is Chomsky’s (1965) transformational generative grammar and its later, broader instantiation, universal grammar (UG). Unlike the traditional or structural grammars that aim to describe one particular language, transformational generative grammar endeavored to provide a ‘universal’ description of language behavior revealing the internal linguistic system for which all humans are predisposed (Radford, 1988). Transformational-generative grammars aims that the underlying properties of any individual language system can be uncovered by means of a detailed, sentence-level analysis. In this regard, Chomsky proposed a set of phrase-structure rules that describe the underlying structures of all languages. These phrase structure rules join with lexical items to offer a semantic representation to the rules. Following this, a series of ‘transformation’ rules are applied to the basic structure to add, delete, move or substitute the underlying constituents in the sentence. Morphological rules are then applied, followed by phonological or orthographic rules (for further information, see Radford, 1988, or Celce-Murcia and Larsen-Freeman, 1999).

Form- And Use-Based Perspectives Of Language

The three theories of linguistic analysis described thus far have provided insights to L2 educators on several grammatical forms. These insights provide information to explain what structures are theoretically possible in a language. Other linguistic theories, however, are better equipped to examine how speakers and writers actually exploit linguistic forms during language use. For example, if we wish to explain how seemingly similar structures like I like to read and I like reading connote different meanings, we might turn to those theories that study grammatical form and use interfaces. This would address questions such as: Why does a language need two or more structures that are similar in meaning? Are similar forms used to convey different specialized meanings? To what degree are similar forms a function of written versus spoken language, or to what degree are these forms characteristic of a particular social group or a specific situation? It is important for us to discuss these questions briefly if we ultimately wish to test grammatical forms along with their meanings and uses in context.

Biber et al. (1998) identified a second kind of corpus-based study that relates grammatical forms to different types of texts. For example, how do academic texts differ from informal conversations in terms of the passive voice? Besides showing which linguistic features are possible in texts, corpus linguistics strives to identify which are probable. In other words, to what degree are linguistic features likely to occur in certain texts and in what circumstances? For example, in physical descriptions of objects the majority of the verbs are non-progressive or stative. Unlike descriptive linguistics or UG, corpus linguistics is not primarily concerned with syntax; rather, it focuses on how words co-occur with other words in a single sentence or text.

Communication-Based Perspectives Of Language

Other theories have provided grammatical insights from a communicationbased perspective. Such a perspective expresses the notion that language involves more than linguistic form. It moves beyond the view of language as patterns of morphosyntax observed within relatively decontextualized sentences or sentences found within natural-occurring corpora. Rather, a communication-based perspective views grammar as a set of linguistic norms, preferences and expectations that an individual invokes to convey a host of pragmatic meanings that are appropriate, acceptable and natural depending on the situation. The assumption here is that linguistic form has no absolute, fixed meaning in language use (as seen in sentences 1.5 and 1.7 above), but is mutable and open to interpretation by those who use it in a given circumstance. Grammar in this context is often co-terminous with language itself, and stands not only for form, but also for meaningfulness and pragmatic appropriacy, acceptability or naturalness – a topic I will return to later since I believe that a blurring of these concepts is misleading and potentially problematic for language educators.

What is pedagogical grammar?

Many language teachers who have taken courses in linguistic analysis and learned to examine language within the frameworks of formal, grammatical theories have often felt that these courses did not adequately meet their immediate needs. This is often because courses in linguistic analysis rarely address classroom concerns such as what grammar to teach, how to teach it and how to test it. Furthermore, it is unlikely that language teachers would attempt to teach phrase-structure rules, parameter setting conditions or abstract notions of time and space, and certainly, they would never test students on these principles.

In this chapter, I have attempted to answer the question ‘What do we mean by grammar?’ In this respect, I have differentiated between language and language analysis or linguistics. I have also discussed several schools of linguistics and have shown how each has broadened our understanding of what is meant by ‘grammar’. Finally, I have shown how these different notions of grammar provide complementary information that could be drawn on for purposes of teaching or assessing grammar. In the next chapter I will discuss how second language grammatical knowledge is acquired. In this respect, we will examine how grammatical ability has been conceptualized in L2 grammar teaching and learning, and how L2 grammar teaching and learning are intrinsically linked to assessment.

B. Research on L2 grammar teaching, learning and assessment

In recent years, some of these same questions have been addressed by second language acquisition (SLA) researchers in a variety of empirically based studies. These studies have principally focused on a description of how a learner’s interlanguage (Selinker, 1972), or how a learner’s L2, develops over time and on the effects that L2 instruction may have on this progression. In most of these studies, researchers have investigated the effects of learning grammatical forms by means of one or more assessment tasks. Based on the conclusions drawn from these assessments, SLA researchers have gained a much better understanding of how grammar instruction impacts both language learning in general and grammar learning in particular. However, in far too many SLA studies, the ability under investigation has been poorly defined or defined with no relation to a model of L2 grammatical ability.

Comparative Methods Studies

The comparative methods studies sought to compare the effects of different language-teaching methods on the acquisition of an L2. These studies occurred principally in the 1960s and 1970s, and stemmed from a reaction to the grammar-translation method, which had dominated language instruction during the first half of the twentieth century. More generally, these studies were in reaction to form-focused instruction (referred to as ‘focus on forms’ by Long, 1991), which used a traditional structural syllabus of grammatical forms as the organizing principle for L2 instruction. According to Ellis (1997), form-focused instruction contrasts with meaning-focused instruction in that meaning-focused instruction emphasizes the communication of messages (i.e., the act of making a suggestion and the content of such a suggestion) while form focused instruction stresses the learning of linguistic forms. These can be further contrasted with form-and-meaning focused instruction (referred to by Long (1991) as ‘focus-on-form’), where grammar instruction occurs in a meaning-based environment and where learners strive to communicate meaning while paying attention to form. (Note that Long’s version of ‘focus-on-form’ stresses a meaning orientation with an incidental focus on forms.) These comparative methods studies all shared the theoretical premise that grammar has a central place in the curriculum, and that successful learning depends on the teaching method and the degree to which that promotes grammar processing.

Empirical Studies In Support Of Non-Intervention

The non-interventionist position was examined empirically by Prabhu (1987) in a project known as the Communicational Teaching Project (CTP) in southern India. This study sought to demonstrate that the development of grammatical ability could be achieved through a task-based, rather than a form-focused, approach to language teaching, provided that the tasks required learners to engage in meaningful communication. In the CTP, Prabhu (1987) argued against the notion that the development of grammatical ability depended on a systematic presentation of grammar followed by planned practice.

Possible Implications Of Fixed Developmental Order To Language Assessment

The notion that structures appear to be acquired in a fixed developmental order and in a fixed developmental sequence might conceivably have some relevance to the assessment of grammatical ability. First of all, these findings could give language testers an empirical basis for constructing grammar tests that would account for the variability inherent in a learner’s inter language. In other words, information on the acquisition order of grammatical items could conceivably serve as a basis for selecting grammatical content for tests that aim to measure different levels of developmental progression, such as Chang (2002, 2004) did in examining the underlying structure of a test that attempted to measure knowledge of the relative clauses. These findings also suggest a substantive approach to defining test tasks according to developmental order and sequence on the basis of how grammatical features are acquired over time (Ellis, 2001b). In other words, one task could potentially tap into developmental level one, while another taps into developmental level two, and so forth.

Problems With The Use Of Development Sequences As A Basis For Assessment

Although developmental sequence research offers an intuitively appealing complement to accuracy-based assessments in terms of interpreting test scores, I believe this method is fraught with a number of serious problems, and language educators should use extreme caution in applying this method to language testing. This is because our understanding of natural acquisitional sequences is incomplete and at too early a stage of research to be the basis for concrete assessment recommendations (Lightbown, 1985; Hudson, 1993).

Interventionist Studies

Not all L2 educators are in agreement with the non-interventionist position to grammar instruction. In fact, several (e.g., Schmidt, 1983; Swain, 1991) have maintained that although some L2 learners are successful in acquiring selected linguistic features without explicit grammar instruction, the majority fail to do so. Testimony to this is the large number of non-native speakers who emigrate to countries around the world, live there all their lives and fail to learn the target language, or fail to learn it well enough to realize their personal, social and long-term career goals.

Empirical Studies In Support Of Intervention

Aside from anecdotal evidence, the non-interventionist position has come under intense attack on both theoretical and empirical grounds with several SLA researchers affirming that efforts to teach L2 grammar typically results in the development of L2 grammatical ability. Hulstijn (1989) and Alanen (1995) investigated the effectiveness of L2 grammar instruction on SLA in comparison with no formal instruction.

Research On Instructional Techniques And Their Effects On Acquisition

Much of the recent research on teaching grammar has focused on four types of instructional techniques and their effects on acquisition. Although a complete discussion of teaching interventions is outside the purview of this book (see Ellis, 1997; Doughty and Williams, 1998), these techniques include form- or rule-based techniques, input-based techniques, feedback-based techniques and practice-based techniques (Norris and Ortega, 2000).

Grammar Processing And Second Language Development

It is important for language teachers and testers to understand these processes, especially for classroom assessments. For example, I have had students fake their way through an entire lesson on the second conditional. They knew the form and could produce it well enough, but it was not until the end of the lesson that I realized they had not really understood the meaning of the hypothetical or counterfactual conditional. In other words, meaning was not mapped onto the form. A short comprehension test earlier in the lesson might have allowed me to re-teach the meaning of the conditionals before moving ahead.

Implicit grammatical knowledge refers to ‘the knowledge of a language that is typically manifest in some form of naturally occurring language behavior such as conversation’ (Ellis, 2001b, p. 252). In terms of processing time, it is unconscious and is accessed quickly. DeKeyser (1995) classifies grammatical instruction as implicit when it does not involve rule presentation or a request to focus on form in the input; rather, implicit grammatical instruction involves semantic processing of the input with any degree of awareness of grammatical form.

In this chapter, demonstrated how the teaching, learning and assessment of L2 grammatical ability are intrinsically related. Language educators depend on linguists for information on the nature of language, so that teaching, learning and assessment can reflect current notions of language. Language educators also depend on experience, other language teachers and SLA researchers for insights on teaching and learning, so that the processes underlying instruction and acquisition can be obtained and so that information on how learning can be maximized can be generated. Finally, both language educators and SLA researchers depend on language testers for expertise in the design and development of assessments so that samples of learner performance can be consistently elicited, and so that the information observed from assessments can be used to make claims about what a learner does or does not know. In the next two chapters I will discuss how grammar has been defined in models of language proficiency and will argue for a coherent model of grammatical ability – one that could be used for test development and test validation purposes.

C. The role of grammar in models of communicative language ability

In this chapter will discuss the role that grammar plays in models of communicative competence. I will then endeavor to define grammar for assessment purposes. In this discussion I will describe in some detail the relationships among grammatical form, grammatical meaning and pragmatic meaning. Finally, I will present a theoretical model of grammar that will be used in this book as a basis for a model of grammatical knowledge. This will, in turn, be the basis for grammar-test construction and validation. In the following chapter I will discuss what it means for L2 learners to have grammatical ability.

The Role Of Grammar In Models Of Communicative Competence

Every language educator who has ever attempted to measure a student’s communicative language ability has wondered: ‘What exactly does a student need to “know” in terms of grammar to be able to use it well enough for some real-world purpose?’ In other words, they have been faced with the challenge of defining grammar for communicative purposes. To complicate matters further, linguistic notions of grammar have changed over time, as we have seen, and this has significantly increased the number of components that could be called ‘grammar’. In short, definitions of grammar and grammatical knowledge have changed over time and across context, and I expect this will be no different in the future.

Rea-Dickins’ Definition Of Grammar

In discussing more specifically how grammatical knowledge might be tested within a communicative framework, Rea-Dickins (1991) defined ‘grammar’ as the single embodiment of syntax, semantics and pragmatics. She argued against Canale and Swain’s (1980) and Bachman’s (1990b) multi-componential view of communicative competence on the grounds that componential representations overlook the interdependence and interaction between and among the various components. She further stated that in Canale and Swain’s (1980) model, the notion of grammatical competence was limited since it defined grammar as ‘structure’ on the one hand and as ‘structure and semantics’ on the other, but ignored the notion of ‘structure as pragmatics’. Similarly, she added that in Bachman’s (1990b) model, grammar was defined as structure at the sentence level and as cohesion at the suprasentential level, but this model failed to account for the pragmatic dimension of communicative grammar. Instead, Rea-Dickins (1991) argued that for grammar to be truly ‘communicative’, it had to ‘allow for the processing of semantically acceptable syntactic forms, which are in turn governed by pragmatic principles’ (p. 114), and not be solely an embodiment of morphosyntax.

Larsen-Freeman’s Definition Of Grammar

Another conceptualization of grammar that merits attention is Larsen Freeman’s (1991, 1997) framework for the teaching of grammar in communicative language teaching contexts. Drawing on several linguistic theories and influenced by language teaching pedagogy, she has also characterized grammatical knowledge along three dimensions: linguistic form, semantic meaning and pragmatic use. Form is defined as both morphology, or how words are formed, and syntactic patterns, or how words are strung together. This dimension is primarily concerned with linguistic accuracy. The meaning dimension describes the inherent or literal message conveyed by a lexical item or a lexico-grammatical feature.

What Is Meant By ‘Grammar’ For Assessment Purposes?

Now with a better understanding of how grammar has been conceptualized in models of language ability, how might we define ‘grammar’ for assessment purposes? It should be obvious from the previous discussion that there is no one ‘right’ way to define grammar. In one testing situation the assessment goal might be to obtain information on students’ knowledge of linguistic forms in minimally contextualized sentences, while in another, it might be to determine how well learners can use linguistic forms to express a wide range of communicative meanings. Regardless of the assessment purpose, if we wish to make inferences about grammatical ability on the basis of a grammar test or some other form of assessment, it is important to know what we mean by ‘grammar’ when attempting to specify components of grammatical knowledge for measurement purposes. With this goal in mind, we need a definition of grammatical knowledge that is broad enough to provide a theoretical basis for the construction and validation of tests in a number of contexts. At the same time, we need our definition to be precise enough to distinguish it from other areas of language ability.

Given the central role that construct definition plays in test development and validation, my intention in this chapter has been to discuss the ‘what’ of grammar assessment. I have examined how grammar has been depicted in models of communicative language ability over the years, and have argued that for assessment purposes grammar should be clearly differentiated from pragmatics. Grammar should also be defined to include a form and meaning component on both the sentence and discourse levels. I have also argued that meaning can be characterized as literal and intended. Also the pragmatic dimension of language constitutes an extrapolation of both the literal meaning and the speaker’s intended meaning, while using contextual information beyond what is expressed in grammatical forms. I have argued that pragmatic meanings may be simultaneously superimposed upon grammatical forms and their meanings (e.g., as in a joke). In short, grammar should not be viewed solely in terms of linguistic form, but should also include the role that literal and intended meaning plays in providing resources for all types of communication. Although forms and meanings are highly related, it is important for testers to make distinctions among these components, when possible, so that assessments can be used to provide more precise information to users of test results. In the next chapter, I will use this model of grammar as a basis for defining second or foreign language grammatical ability for assessment.

D. Towards a definition of grammatical ability

What is meant by grammatical ability?

Having described how grammar has been conceptualized, we are now faced with the challenge of defining what it means to ‘know’ the grammar of a language so that it can be used to achieve some communicative goal. In other words, what does it mean to have ‘grammatical ability’?

Defining grammatical constructs

Although our basic underlying model of grammar will remain the same in all testing situations (i.e., grammatical form and meaning), what it means to ‘know’ grammar for different contexts will most likely change (see Chapelle, 1998). In other words, the type, range and scope of grammatical features required to communicate accurately and meaningfully will vary from one situation to another. For example, the type of grammatical knowledge needed to write a formal academic essay would be very different from that needed to make a train reservation. Given the many possible ways of interpreting what it means to ‘know’ grammar, it is important that we define what we mean by ‘grammatical knowledge’ for any given testing situation. A clear definition of what we believe it means to ‘know’ grammar for a particular testing context will then allow us to construct tests that measure grammatical ability.

Grammatical Knowledge

Knowledge refers to a set of informational structures that are built up through experience and stored in long-term memory. These structures include knowledge of facts that are stored in concepts, images, networks, production-like structures, propositions, schemata and representations (Pressley, 1995). Language knowledge is then a mental representation of informational structures related to language. The exact components of language knowledge, like any other construct, need to be defined. In this book, grammar refers to a system of language whereas grammatical knowledge is defined as a set of internalized informational structures related to the theoretical model of grammar proposed in Figure 3.2 (p.62). In this model, grammar is defined in terms of grammatical form and meaning, which are available to be accessed in language use. To illustrate, suppose a student learning French knows that the passive voice is constructed with a form of the verb être (to be) plus a past participle, and is able to produce this form accurately and with ease.

Grammatical Ability

Grammatical ability is, then, the combination of grammatical knowledge and strategic competence; it is specifically defined as the capacity to realize grammatical knowledge accurately and meaningfully in testing or other language-use situations. Hymes (1972) distinguished between competence and performance, stating that communicative competence includes the underlying potential of realizing language ability in instances of language use, whereas language performance refers to the use of language in actual language events. Carroll (1968) refers to language performance as ‘the actual manifestation of linguistic competence . . . in behavior’ (p. 50).

Metalinguistic Knowledge

Finally, metalanguage is the language used to describe a language. It generally consists of technical linguistic or grammatical terms (e.g., noun, verb). Metalinguistic knowledge, therefore, refers to informational structures related to linguistic terminology. We must be clear that metalinguistic knowledge is not a component of grammatical ability; rather, the knowledge of linguistic terms would more aptly be classified as a kind of specific topical knowledge that might be useful for language teachers to possess. Some teachers almost never present metalinguistic terminology to their students, while others find it useful as a means of discussing the language and learning the grammar. It is important to remember that knowing the grammatical terms of a language does not necessarily mean knowing how to communicate in the language.

What Is ‘Grammatical Ability’ For Assessment Purposes?

The approach to the assessment of grammatical ability in this book is based on several specific definitions. First, grammar encompasses grammatical form and meaning, whereas pragmatics is a separate, but related, component of language. A second is that grammatical knowledge, along with strategic competence, constitutes grammatical ability. A third is that grammatical ability involves the capacity to realize grammatical knowledge accurately and meaningfully in test-taking or other language-use contexts. The capacity to access grammatical knowledge to understand and convey meaning is related to a person’s strategic competence. It is this interaction that enables examinees to implement their grammatical ability in language use. Next, in tests and other language-use contexts, grammatical ability may interact with pragmatic ability (i.e., pragmatic knowledge and strategic competence) on the one hand, and with a host of non-linguistic factors such as the test-taker’s topical knowledge, personal attributes, affective schemata and the characteristics of the task on the other. Finally, in cases where grammatical ability is assessed by means of an interactive test task involving two or more interlocutors, the way grammatical ability is realized will be significantly impacted by both the contextual and the interpretative demands of the interaction.

Knowledge Of Phonological Or Graphological Form And Meaning

Knowledge of phonological/graphological form enables us to understand and produce features of the sound or writing system, with the exception of meaning-based orthographies such as Chinese characters, as they are used to convey meaning in testing or language-use situations. Phonological form includes the segmentals (i.e., vowels and consonants) and prosody (i.e., stress, rhythm, intonation contours, volume, tempo). These forms can be used alone or in conjunction with other grammatical forms to encode phonological meaning. For example, the ability to hear or pronounce meaning-distinguishing sounds such as the /b/ vs. /v/ could be used to differentiate the meaning between different nouns (boat/vote), and the ability to hear or pronounce the prosodic features of the language (e.g., intonation) could allow students to understand or convey the notion that a sentence is an interrogative.

Knowledge Of Lexical Form And Meaning

Knowledge of lexical form enables us to understand and produce those features of words that encode grammar rather than those that reveal meaning. This includes words that mark gender (e.g., waitress), countability (e.g., people) or part of speech (e.g., relate, relation). For example, when the word think in English is followed by the preposition about before a noun, this is considered the grammatical dimension of lexis, representing a co-occurrence restriction with prepositions. One area of lexical form that poses a challenge to learners of some languages is word formation.

Knowledge Of Morphosyntactic Form And Meaning

Knowledge of morphosyntactic form permits us to understand an produce both the morphological and syntactic forms of the language. This includes the articles, prepositions, pronouns, affixes (e.g., -est), syntactic structures, word order, simple, compound and complex sentences, mood, voice and modality. A learner who knows the morphosyntactic form of the English conditionals would know that: (1) an if-clause sets up a condition and a result clause expresses the outcome; (2) both clauses can be in the sentence-initial position in English; (3) if can be deleted under certain conditions as long as the subject and operator are inverted; and (4) certain tense restrictions are imposed on if and result clauses.

Knowledge of cohesive form and meaning

Knowledge of cohesive form enables us to use the phonological, lexical and morphosyntactic features of the language in order to interpret and express cohesion on both the sentence and the discourse levels. Cohesive form is directly related to cohesive meaning through cohesive devices (e.g., she, this, here) which create links between cohesive forms and their referential meanings within the linguistic environment or the surrounding co-text. Halliday and Hasan (1976, 1989) list a number of grammatical forms for displaying cohesive meaning. This can be achieved through the use of personal referents to convey possession or reciprocity; demonstrativereferents to display spatial, temporal or psychological links; comparative referents to encode similarity, difference and equality; and logical connectors to signal a wide range of meanings such as addition, logical conclusion and contrast.

Knowledge Of Information Management Form And Meaning

Knowledge of information management formallows us to use linguistic formsas a resource for interpreting and expressing the information structure of discourse. Some resources that help manage the presentation of information include, for example, prosody, word order, tense-aspect and parallel structures. These forms are used to create information management meaning. In other words, information can be structured to allow us to organize old and new information (i.e., topic/comment), topicalize, emphasize information and provide information symmetry through parallelism and tense concordance.

Knowledge Of Interactional Form And Meaning

Knowledge of interactional form enables us to understand and use linguistic forms as a resource for understanding and managing talk-ininteraction. These forms include discourse markers and communication management strategies. Discourse markers consist of a set of adverbs, conjunctions and lexicalized expressions used to signal certain language functions. For example, well . . . can signal disagreement, ya know or ahhuh can signal shared knowledge, and by the way can signal topic diversion. Conversation-management strategies include a wide range of linguistic forms that serve to facilitate smooth interaction or to repair interaction when communication breaks down. For example, when interaction stops because a learner does not understand something, one person might try to repair the breakdown by asking, *What means that? Here the learner knows the interactional meaning, but not the form.

Given the central role that construct definition plays in test development and validation, my intention in this chapter has been to discuss the ‘what’ of grammatical knowledge invoked by grammar assessment. After describing grammatical constructs and defining key terms in this book, have proposed a theoretical model of grammatical ability that relates grammatical knowledge to pragmatic knowledge and that specifies grammatical form and meaning on the sentence and discourse levels. I have provided operational descriptions of each part of the model along with examples that differentiate knowledge of grammatical form and meaning from knowledge of pragmatic meaning. This model aims to provide a broad theoretical basis for the definition of grammatical knowledge in creating and interpreting tests of grammatical ability in a variety of language use settings. In the next chapter, I will discuss how this model can be used to design tasks that measure one or more components of grammatical ability.

E. Designing test tasks to measure L2 grammatical ability

In fact, test scores can vary as a result of the personal attributes of testtakers such as their age (Farhady, 1983; Zeidner, 1987), gender (Kunnan, 1990; Sunderland, 1995) and language background (Zeidner, 1986, 1987). They can also fluctuate due to their strategy use (Cohen, 1994; Purpura, 1999), motivation (Gardner, 1985) and level of anxiety (Gardner, Lalonde, Moorcroft and Evans, 1987). However, some of the most important factors that affect grammar test scores, aside from grammatical ability, are the characteristics of the test itself. In fact, anyone who has ever taken a grammar test, or any test for that matter, knows that the types of questions on the test can severely impact performance. For example, some test-takers perform better on multiple-choice tasks than on oral interview tasks; others do better on essays than on cloze tasks; and still others score better if asked to write a letter than if asked to interpret a graph. Each of these tasks has a set of unique characteristics, called test-task characteristics. These characteristics can potentially interact with the characteristics of the examinee (e.g., his or her grammatical knowledge, personal attributes, topical knowledge, affective schemata) to influence test performance.

How Does Test Development Begin?

Every grammar-test development project begins with a desire to obtain and often provide) information about how well a student knows grammar in order to convey meaning in some situation where the target language is used. The information obtained from this assessment then forms the basis for decision-making. Those situations in which we use the target language to communicate in real life or in which we use it for instruction or testing are referred to as the target language use (TLU) situations (Bachman and Palmer, 1996). Within these situations, the tasks or activities requiring language to achieve a communicative goal are called the target language use tasks. A TLU task is one of many languageuse tasks that test-takers might encounter in the target language use domain. It is to this domain that language testers would like to make inferences about language ability, or more specifically, about grammatical ability.

What Are The Characteristics Of Grammatical Test Tasks?

As the goal of grammar assessment is to provide as useful a measurement as possible of our students’ grammatical ability, we need to design test tasks in which the variability of our students’ scores is attributed to the differences in their grammatical ability, and not to uncontrolled or irrelevant variability resulting from the types of tasks or the quality of the tasks that we have put on our tests. As all language teachers know, the kinds of tasks we use in tests and their quality can greatly influence how students will perform. Therefore, given the role that the effects of task characteristics play on performance, we need to strive to manage (or at least understand) the effects of task characteristics so that they will function the way we designed them to – as measures of the constructs we want to measure (Douglas, 2000). In other words, specifically designed tasks will work to produce the types of variability in test scores that can be attributed to the underlying constructs given the contexts in which they were measured (Tarone, 1998). To understand the characteristics of test tasks better, we turn to Bachman and Palmer’s (1996) framework for analyzing target language use tasks and test tasks.

The Bachman And Palmer Framework

Bachman and Palmer’s (1996) framework of task characteristics represents the most recent thinking in language assessment of the potential relationships between task characteristics and test performance. In this framework, they outline five general aspects of tasks, each of which is characterized by a set of distinctive features. These five aspects describe characteristics of (1) the setting, (2) the test rubrics, (3) the input, (4) the expected response and (5) the relationship between the input and response.

Characteristics Of The Setting

The characteristics of the setting include the physical characteristics, the participants, and the time of the task. Obviously these characteristics can have a serious, unexpected effect on performance. For example, I once gave a speaking test to a group of ESL students in my discussion skills class at UCLA. I randomly placed students into groups of four, and each was given a problem to solve. Individuals then had to participate in a discussion in which they had to try to persuade their partners of their opinion. Each group’s 20-minute discussion was videotaped. After the exam, I learned that a few students were so nervous being videotaped that they seriously questioned the quality of their performance. I also learned that, in one group, a participant became angry when the others did not agree with her and openly told them their ideas were ‘stupid’. She also berated them for being quiet. The other students were so embarrassed they hardly said a word. In such a case, one participant had anundue effect on the others’ ability to perform their best.

Characteristics of the test rubrics

The test rubrics include the instructions, the overall structure of the test, the time allotment and the method used to score the response. These characteristics can obviously influence test scores in unexpected ways (Madden, 1982; Cohen, 1984, 1993). The overall test instructions (when included) introduce test-takers to the entire test. They make explicit the purpose of the overall test and the area(s) of language ability being measured. They also introduce examinees to the different parts of the test and their relative importance. The instructions make explicit the procedures for taking the entire test. Overall test instructions are common in all high-stakes tests.

Characteristics Of The Input

According to Bachman and Palmer (1996), the characteristics of the input (sometimes called the stimulus) are critical features of performance in all test and TLU tasks. The input is the part of the task that test-takers must process in order to answer the question. It is characterized in terms of the format and language.

Characteristics Of The Expected Response

When we design a test task, we specify the rubric and input so that test takers will respond in a way that will enable us to make inferences about the aspect of grammar ability we want to measure. The ‘expected response’ thus refers to the type of grammatical performance we want to elicit. The characteristics of the expected response are also considered in terms of the format and language. Similar to the input, the expected response of grammar tasks can vary according to channel (aural or visual), form (verbal, non-verbal), language (native or target) and vehicle (live or reproduced).

Relationship Between The Input And Response

A final category of task characteristics to consider in examining how test tasks impact performance is seen in how characteristics of the input can interact with characteristics of the response. One characteristic of this relationship involves ‘the extent to which the input or the response affects subsequent input and responses’ (Bachman and Palmer, 1996, p. 55). This is known as reactivity. Reciprocal tasks, which involve both interaction and feedback between two or more examinees, are examples of tasks that have a high degree of reactivity. However, non-reciprocal tasks, such as writing in a journal, have no reactivity since no interaction or feedback is required to complete the task. Finally, in adaptive test tasks there is no feedback, but there is interaction in the sense that the responses influence subsequent language use. For example, in computer adaptive tests such as the BEST Plus (Center for Applied Linguistics, 2002), students are presented with test questions tailored to their ability level. In other words, as the student responds to input, subsequent input is tailored to their proficiency level.

Describing Grammar Test Tasks

When language teachers consider tasks for grammar tests, they call to mind a large repertoire of task types that have been commonly used in teaching and testing contexts. We now know that these holistic task types constitute collections of task characteristics for eliciting performance and that these holistic task types can vary on a number of dimensions. We also need to remember that the tasks we include on tests should strive to match the types of language-use tasks found in real-life or language instructional domains. Traditionally, there have been many attempts at categorizing the types of tasks found on tests. Some have classified tasks according to scoring procedure. For example, objective test tasks (e.g., true–false tasks) are those in which no expert judgment is required to evaluate performance with regard to the criteria for correctness. Subjective test tasks (e.g.,essays) are those that require expert judgment to interpret and evaluate performance with regard to the criteria for correctness.

Selected-Response Task Types

Selected-response tasks present input in the form of an item, and test takers are expected to select the response. Other than that, all other task characteristics can vary. For example, the form of the input can be language, non-language or both, and the length of the input can vary from anword to larger pieces of discourse. In terms of the response, selected response tasks are intended to measure recognition or recall of grammatical form and/or meaning. They are usually scored right/wrong, based on one criterion for correctness; however, in some instances, partial-credit scoring may be useful, depending on how the construct is defined. Finally, selected-response tasks can vary in terms of reactivity, scope and directness.

Given the central role of task in the development of grammar tests, this chapter has addressed the notion of task and task specification in the test development process. I discussed how task was originally conceptualized as a holistic method of eliciting performance and argued that the notion of task as a monolithic entity falls short of providing an adequate framework from which to specify tasks for the measurement of grammatical ability. I also argued that given the diversity of tasks that could emerge from real-life and instructional domains, a broad conceptualization of task is needed in grammatical assessment – one that could accommodate selected-response, limited-production and extended-production tasks. For assessment, the process of operationalizing test constructs and the specification of test tasks are extremely important. They provide a means of controlling what is being measured, what evidence needs to be observed to support the measurement claims, what specific features can be manipulated to elicit the evidence of performance, and finally how the performance should be scored. This process is equally important for language teachers, materials writers and SLA researchers since any variation in the individual task characteristics can potentially influence what is practiced in classrooms or elicited on language tests. In this chapter, I argued that in developing grammar tasks, we needed to strive to control, or at least understand, the effects of these tasks in light of the inferences we make about examinees’ grammatical ability. Finally, I described Bachman and Palmer’s (1996) framework for characterizing test tasks and showed how it could be used to characterize SL grammar tasks. This framework allows us to examine tasks that are currently in use, and more interestingly, it allows us to show how variations in task characteristics can be used to create new task types that might better serve our educational needs and goals. In the next chapter, I will discuss the process of constructing a grammar test consisting of several tasks.

Assessing Vocabulary

Chapter One

The Place Of Vocabulary In Language Assessment

At first glance, it may seem that assessing the vocabulary knowledge of second language learners is both necessary and reasonably straightforward. It is necessary in the sense that words are the basic building blocks of language, the units of meaning from which larger structures such as sentences, paragraphs and whole texts are formed. For native speakers, although the most rapid growth occurs in child- hood, vocabulary knowledge continues to develop naturally in adult life in response to new experiences, inventions, concepts, social trends and opportunities for learning. For learners, on the other hand, acquisition of vocabulary is typically a more conscious and demanding process. Even at an advanced level, learners are aware of limitations in their knowledge of second language (or L2) words.

Recent Trends In Language Testing

However, scholars in the field of language testing have a rather different perspective on vocabulary-test items of the conventional kind. Such items fit neatly into what language testers call the discrete point approach to testing. This involves designing tests to assess whether learners have knowledge of particular structural elements of the language: word meanings, word forms, sentence patterns, sound contrasts and so on. In the last thirty years of the twentieth century, language testers progressively moved away from this approach, to the extent that such tests are now quite out of step with current thinking about how to design language tests, especially for proficiency assessment.

Three Dimensions Of Vocabulary Assessment

Up to this point, I have outlined two contrasting perspectives on the role of vocabulary in language assessment. One point of view is that it is perfectly sensible to write tests that measure whether learners know the meaning and usage of a set of words, taken as independent semantic units. The other view is that vocabulary must always be assessed in the context of a language-use task, where it interacts in a natural way with other components of language knowledge. To some extent, the two views are complementary in that they relate to different purposes of assessment. Conventional vocabulary tests are most likely to be used by classroom teachers for assessing progress in vocabulary learning and diagnosing areas of weakness. Other users of these tests are researchers in second language acquisition with a special interest in how learners develop their knowledge of, and ability to use, target-language words. On the other hand, researchers in language testing and those who undertake large testing projects tend to be more concerned with the design of tests that assess learners' achievement or proficiency on a broader scale. For such purposes, vocabulary knowledge has a lower profile, except to the extent that it contributes to, or detracts from, the performance of communicative tasks. As with most dichotomies, the distinction I have made between the two perspectives on vocabulary assessment oversimplifies the matter. There is a whole range of reasons for assessing vocabulary knowledge and use, with a corresponding variety of testing procedures. In order to map out the scope of the subject, I propose three dimensions. The dimensions represent ways in which we can expand our conventional ideas about what a vocabulary test is in order to include a wider range of lexical assessment procedures. I introduce the dimen- sions here, then illustrate and discuss them at various points in the following chapters. Let us look at each one in turn.

Discrete - Embedded

The first dimension focuses on the construct which underlies the assessment instrument. In language testing, the term construct refers to the mental attribute or ability that a test is designed to measure. In the case of a traditional vocabulary test, the construct can usually be labelled as 'vocabulary knowledge of some kind. The practical signifi- cance of defining the construct is that it allows us to clarify the meaning of the test results. Normally we want to interpret the scores on a vocabulary test as a measure of some aspect of the learners' vocabulary knowledge, such as their progress in learning words from the last several units in the course book, their ability to supply derived forms of base words (like scientist and scientific, from science), or their skill at inferring the meaning of unknown words in a reading passage.

Selective - Comprehensive

The second dimension concerns the range of vocabulary to be included in the assessment. A conventional vocabulary test is based on a set of target words selected by the test-writer, and the test-takers are assessed according to how well they demonstrate their knowledge of the meaning or use of those words. This is what I call a selective vocabulary measure. The target words may either be selected as individual words and then incorporated into separate test items, or alternatively the test-writer first chooses a suitable text and then uses certain words from it as the basis for the vocabulary assessment.

Context-Independent - Context-Dependent

The role of context, which is an old issue in vocabulary testing, is the basis for the third dimension. Traditionally contextualisation has meant that a word is presented to test-takers in a sentence rather than as an isolated element. From a contemporary perspective, it is necessary to broaden the notion of context to include whole texts and, more generally, discourse. In addition, we need to recognise that contextualisation is more than just a matter of the way in which vocabulary is presented. The key question is to what extent the test- takers are being assessed on the basis of their ability to engage with the context provided in the test. In other words, do they have to make use of contextual information in order to give the appropriate Judgements about appropriateness take us beyond the text to consider the wider social context. For instance, take a proficiency test in which the test-takers are doctors and the test task is a role play simulating a consultation with a patient. If vocabulary use is one of the criteria used in rating the doctors' performance, they need to demonstrate an ability to meet the lexical requirements of the situation; for example: understanding the colloquial expressions that patients use for common symptoms and ailments, explaining medical concepts in lay terms, avoiding medical jargon, offering reassurance to someone who is upset or anxious, giving advice in a suitable tone and so on. Vocabulary use in the task is thus influenced by the doctor's status as a highly educated professional, the expected role relationship in a consultation and the affective dimension of the situation. This is a much bro view of context than we are used to thinking of in relation to vocabulary testing, but a necessary one nonetheless if we are to assess vocabulary in contemporary performance tests.

An Overview Of The Book

The three dimensions are not intended to form a comprehensive model of vocabulary assessment. Rather, they provide a basis for locating the variety of assessment procedures currently in use within a common framework and, in particular, they offer points of contact between tests which treat words as discrete units and ones that assess vocabulary more integratively in a task-based testing context. Atvarious points through the book I refer to the dimensions and exemplify them. Since a large proportion of work on vocabulary assessment to date has involved instruments which are relatively discrete, selective and context independent in nature, this approach may seem to be predominant in several of the following chapters. However, my aim is to present a balanced view of the subject, and I discuss mea- sures that are more embedded, comprehensive and context dependent wherever the opportunity arises, and especially in the last two chapters of the book.

Chapter Two

The Nature Of Vocabulary

Before we start to consider how to test vocabulary, it is necessary first to explore the nature of what we want to assess. Our everyday concept of vocabulary is dominated by the dictionary. We tend to think of it as an inventory of individual words, with their associated meanings. This view is shared by many second language learners, who see the task of vocabulary learning as a matter of memorising long lists of L2 words, and their immediate reaction when they encounter an unknown word is to reach for a bilingual dictionary. From this perspective, vocabulary knowledge involves knowing the meanings of words and therefore the purpose of a vocabulary test is to find out whether the learners can match each word with a synonym, a dictionary-type definition or an equivalent word in their own language. However, when we look more closely at vocabulary in the light of current developments in language teaching and applied linguistics, we find that we have to address a number of questions that have the effect of progressively broadening the scope of what we need to assess. The first question is: What is a word? This is an issue that is of considerable interest to linguists on a theoretical level, but for testing purposes we have more practical reasons for asking it. For example, it becomes relevant if we want to make an estimate of the size of a learner's vocabulary. Researchers who have attempted to measure how many words native speakers of English know have produced wildly varying figures, at least partly because of their different ways of defining what a word is.

What Is A Word?

A basic assumption in vocabulary testing is that we are assessing knowledge of words. But the word is not an easy concept to define, either in theoretical terms or for various applied purposes. There are some basic points that we need to spell out from the start. One is the distinction between tokens and types, which applies it the individual word form that is being assessed or the whole word family to which that word form belongs? One further complication in defining what words are is the existence of homographs. These are single word forms that have at least two meanings that are so different that they obviously belong to different word families. One commonly cited example is the noun bank, which has two major meanings: an institution which provides financial services, and the sloping ground beside a river. It also refers to a row of dials or switches, and to the tilting of an aircraft's wings as it turns. There is no underlying meaning that can usefully link all four of these definitions, so in a real sense we have several distinct word families here. In dictionaries, they are generally recognised as such by being given separate entries (rather than separate senses under a single entry). In the testing context, we cannot assume, just because learners demonstrate knowledge of one meaning, that they have acquired any of the others.

What About Larger Lexical Items?

The second major point about vocabulary is that it consists of more than just single words. For a start, there are the phrasal verbs (get across, move out, put up with) and compound nouns (fire fighter, love letters, practical joke, personal computer, applied social science, milk of magnesia), which are generally recognised as lexical units consisting of more than one word form. Then there are idioms like a piece of cake, the Good Book, to go the whole hog, let the cat out of the bag. These are phrases and sentences that cause great difficulty for second language learners because the whole unit has a meaning that cannot be worked out just from knowing what the individual words mean. Working from a similar point of view, Nattinger and DeCarrico (1992) have developed the concept of a lexical phrase, which is a group of words that looks like a grammatical structure but operates as a unit, with a particular function in spoken or written discourse. Theyidentify four categories of lexical phrases:

1. Polywords: short fixed phrases that perform a variety of functions, such as for the most part (which they call a qualifier), at any rate and so to speak (fluency devices), and hold your horses (disagreement marker).

2. Institutionalised expressions: longer utterances that are fixed in form and include proverbs, aphorisms and formulas for social interaction. Examples are: a watched pot never boils, how do you do?, long time no see, and once upon a time ... and they lived happily ever after.

3. Phrasal constraints: short- to medium-length phrases consisting of a basic frame with one or two slots that can be filled with various words or phrases. These include a (day / year / long time ago, yours (sincerely / trulyl, as far as I (know I can tell / am aware), and the (sooner) the (better).

4. Sentence builders: phrases that provide the framework for a complete sentence, with one or more slots in which a whole idea can be expressed. Examples are: I think that X; not only X, but also Y and that reminds me of X.

Pawley and Syder (1983: 206-208) offer a lengthy list of longer utterances of a similar kind. Here are some of their items:

It's on the tip of my tongue.

I'll be home all weekend.

Have you heard the news?

What does it mean to know a lexical item?

Let us now leave aside the question of what units vocabulary is composed of and take up the issue of what it means to know lexical items of various kinds. To put it another way, how do we go about describing the nature of vocabulary knowledge? One approach is to try to spell out all that the learners should know about a word if they are to fully acquire it. An influential statement along these lines was produced by Richards (1976). In his article he outlined a series of assumptions about lexical competence, growing out of developments in linguistic theory in the 1960s and 1970s.

1. The first assumption is that the vocabulary knowledge of native speakers to expand in adult life, in contrast to the relative stability of their grammatical competence. The other seven assumptions cover various aspects of what is meant by knowing a word:

2. Knowing a word means knowing the degree of probability of encountering that word in speech or print. For many words we also know the sort of words most likely to be found associated with the word.

3. Knowing a word implies knowing the limitations on the use of the word according to variations of function and situation.

4. Knowing a word means knowing the syntactic behaviour associated with the word.

5. Knowing a word entails knowledge of the underlying form of a word and the derivations that can be made from it.

6. Knowing a word entails knowledge of the network of associations between that word and other words in the language.

7. Knowing a word means knowing the semantic value of a word.

8. Knowing a word means knowing many of the different meanings associated with a word. (Richards, 1976: 83)

What Is Vocabulary Ability?

Three dimensions of vocabulary assessment represent one attempt to incorporate the two perspectives within a single framework. However, a more ambitious effort has been undertaken by Chapelle (1994), who proposed a definition of vocabulary ability based on Bachman's (1990; see also Bachman and Palmer, 1996) general construct of language ability.

The Context Of Vocabulary Use

Traditionally in vocabulary testing, the term context has referred to the sentence or utterance in which the target word occurs. For instance, in a multiple-choice vocabulary item, it is normally recommended that the stem should consist of a sentence containing the word to be tested, as in the following example:

The committee endorsed the proposal.

1. discussed

2. supported

3. c. knew about

4. prepared

Under the influence of integrative test formats, such as the cloze procedure, our notion of context has expanded somewhat beyond the sentence level. Advocates of the cloze test, especially Oller (1979), pointed out that many of the blanks could be filled successfully only by picking up on contextual clues in other sentences or paragraphs of the text. Thus, in this sense the whole text forms the context that we draw on to interpret the individual lexical items within it. However, from a communicative point of view, context is more than just a linguistic phenomenon.

Vocabulary Knowledge And Fundamental Processes

The second component in Chapelle's (1994) framework of vocabulary ability is the one that has received the most attention from applied linguists and second language teachers. Chapelle outlines four dimensions of this component:

1. Vocabulary size: This refers to the number of words that a person knows. In work with native speakers scholars have attempted to measure the total size of their vocabulary by taking a sample of words from a large unabridged dictionary. In the case of second language learners the goal is normally more modest: it is to estimat how many of the more common words they know based on a test of their knowledge of a sample of items from a word-frequency list. Discuss this further in Chapter 4 and in Chapter 5 we look at two vocabulary-size tests. As Chapelle (1994: 165) points out, though, if we follow the logic of a communicative approach to vocabulary ability, we should not just seek to measure vocabulary size in an absolute sense, but rather in relation to particular contexts of use.

2. Knowledge of word characteristics: I discussed the frameworks developed by Richards (1976) and Nation (1990) earlier in the chapter, and this is where they fit into Chapelle's definition. Just as native speakers do, second language learners know more about some words than others. Their understanding of particular lexical items may range from vague to more precise (Cronbach, 1942). As Laufer (1990) points out, learners are likely to be confused about some of the words that they have learned, because the words share certain common features, e.g. affect, effect; quite, quiet; simulate, stimulate; embrace, embarrass. And again, as with vocabulary size, the extent to which a learner knows a word varies according to the context in which it is used.

3. Lexicon organization: This concerns the way in which words and other lexical items are stored in the brain. Aitchison's book Words in the Mind (1994) provides a comprehensive and very readable account of psycholinguistic research on the mental lexicon of proficient language users. There is a research role here for vocabulary tests to explore the developing lexicon of second language learners and the ways in which their lexical storage differs from that of native speakers. Meara (1984; 1992b) has worked in this area using word-association and lexical-network tasks.

Metacognitive strategies for vocabulary use

This is the third component of Chapelle's definition of vocabulary

ability, and is what Bachman (1990) refers to as 'strategic compe-

tence'. The strategies are employed by all language users to manage

the ways that they use their vocabulary knowledge in communication.

Most of the time, we operate these strategies without being aware of

it. It is only when we have to undertake unfamiliar or cognitively

demanding communication tasks that the strategies become more

conscious. For example, I am carefully choosing my words as I write

this chapter, trying (or should that be attempting or striving,

perhaps?) both to express the ideas clearly and to get achieve the

level of (in)formality that the editors seem to be looking for. Here are

some other situations in which native speakers may need to apply

more conscious strategies: deciphering illegible handwriting in a per-

sonal letter, reading aloud a scripted speech, breaking the news of a

relative's death to a young child, or conversing with a foreigner. As

language teachers we become skilled at modifying the vocabulary that

we use so that our learners can readily understand us. By contrast you

have probably observed inexperienced native speakers failing to com-

municate with foreigners because they use slang expressions, they do

not articulate key words clearly, they are unable to rephrase an utter-

ance that the other person has obviously not understood and so on.

Of course, more is involved in all these cases than just vocabulary, but

the point is that lexical strategies play a significant role.

Chapter Three

Research On Vocabulary Acquisition And Use

The focus of this chapter is on research in second language vocabulary acquisition and use. There are three reasons for reviewing this research in a book on vocabulary assessment. The first is that the researchers are significant users of vocabulary tests as instruments in their studies. In other words, the purpose of vocabulary assessment is not only to make decisions about what individual learners have achieved in a teaching/learning context but also to advance our understanding of the processes of vocabulary acquisition. Secondly, in the absence of much recent interest in vocabulary among languagetesters, acquisition researchers have often had to deal with assessment issues themselves as they devised the instruments for their research. The third reason is that the results of their research can contribute to a better understanding of the nature of the construct of vocabulary ability, which - as I explained in the previous chapter – is important for the validation of vocabulary tests.

Systematic Vocabulary Learning

Given the number of words that learners need to know if they are to achieve any kind of functional proficiency in a second language, it is understandable that researchers on language teaching have been interested in evaluating the relative effectiveness of different ways of learning new words.

The findings of studies that address both of these issues have been reviewed by a number of authors (e.g. Higa, 1965; Nation, 1982; Cohen, 1987; Nation, 1990: Chapter 3; Ellis and Beaton, 1993b, Laufer, 1997b). In brief, some of the significant findings are as follows:

1. Words belonging to different word classes vary according to how difficult they are to learn. Rodgers (1969) found that nouns are easiest to learn, following by adjectives; on the other hand, verbs and adverbs were the most difficult. Ellis and Beaton (1993b) confirmed that nouns are easier than verbs, because learners can form mental images of them more readily.

2. Mnemonic techniques are very effective methods for gaining an initial knowledge of word meanings in a second language (Cohen, 1987; Hulstijn, 1997). One method in particular, the keyword technique, has been extensively researched (see, for example, Paivio and Desrochers, 1981; Pressley, Levin and McDaniel, 1987). It involves teaching learners to form vivid mental images which link the meanings of an L2 word and an LI word that has a similar sound. This technique works best for the receptive learning of concrete words.

3. In order to be able to retrieve L2 words from memory - rather than just recognising them when presented - learners need to say the word to themselves as they learn it (Ellis and Beaton, 1993a).

4. Words which are hard to pronounce are learned more slowly than ones that do not have significant pronunciation difficulty (Rodgers, 1969; Ellis and Beaton, 1993b).

5. Learners at a low level of language learning store vocabulary according to the sound of words, whereas at more advanced levels words are stored according to meaning (Henning, 1973).

6. Lists of words which are strongly associated with each other – like Incidental vocabulary learning.

Since the early 1980s a number of reading researchers have focused on vocabulary acquisition by native speakers of English. While there is a great deal of variation in the estimates of the number of words known by native speakers of various ages and levels of education, there is general agreement that vocabulary acquisition occurs at an impressivly fast rate from childhood throughout the years of formal education and at a slower pace on into adult life. On the face of it, a large proportion of these words are not taught by parents or teachers, or indeed learned in any formal way. The most plausible explanation for this is that native speakers acquire words 'incidentally as they encounter them in the speech and writing of other people.

Research With Native Speakers

The first step in investigating this kind of vocabulary acquisition was to obtain evidence that it actually occurs. Teams of reading researchers in the United States (Jenkins, Stein and Wysocki, 1984; Nagy, Herman and Anderson, 1985; Nagy, Anderson and Herman, 1987) undertook a series of studies with native-English-speaking school children. The basic research design involved asking the subjects to read texts appropriate to their age level that contained unfa-miliar words. The children were not told that the researchers were interested in vocabulary. After they had completed the reading task, they were given unannounced at least one test of their knowledge ofthe target words in the text. Then the researchers obtained a measure of vocabulary learning by comparing the test scores of the students who had read a particular text with those of other students who had not. The results showed that, in these terms, a small, statistically significant amount of learning had indeed occurred. In their 1985 study, Nagy, Herman and Anderson estimated that the probability of learning a word while reading was between 10 and 25 per cent (depending how strict the criterion was for knowing a word), whereas in the 1987 study they calculated a probability of just 5 per cent. One reason for the discrepancy was that in the latter study the test was administered six days after the children did the reading task rather than immediately afterwards.

Second Language Research

Now, how about incidental learning of second language vocabulary? In a study that predates the Ll research in the US, Saragi, Nation and Meister (1978) gave a group of native speakers of English the task of reading Anthony Burgess's novel A Clockwork Orange, which contains a substantial number of Russian-derived words functioning as an argot used by the young delinquents who are the main characters in the book. When the subjects were subsequently tested, it was found on average that they could recognise the meaning of 76 per cent of the 90 target words. Pitts, White and Krashen (1989) used just excerpts from the novel with two groups of American university students and also found some evidence of vocabulary learning; however, as you might expect, the reduced scope of the study resulted in fewer target disc program Raiders of the Lost Ark. She was particularly interested in factors that influenced the learning of unfamiliar words. It appeared that, in terms of frequency, learning was associated with the general frequency of words in the language rather than how often they occurred in that particular text. In addition, she found that words were more likely to be learned if they were salient, in the sense of being important for understanding a specific part of the program.

Chapter Four

Research On Vocabulary Assessment

In the previous chapter, we saw how tests play a role in research on vocabulary within the field of second language acquisition (SLA). Now we move on to consider research in the field of language testing, where the focus is not so much on understanding the processes of vocabulary learning as on measuring the level of vocabulary knowledge and ability that learners have reached. Language testing is concerned with the design of tests to assess learners for a variety of practical purposes that can be summarised under labels such as placement, diagnosis, achievement and proficiency. However, in practice this distinction between second language acquisition research and assessment is difficult to maintain consistently, because, on the one hand, language testing researchers have paid relatively little attention to vocabulary tests and, on the other hand, second language acquisition researchers working on vocabulary acquisition have often needed to develop tests as an integral part of their research design. Thus, some of the important work on how to measure vocabulary knowledge and ability has been produced by vocabulary acquisition researchers rather than language testers; the latter have tended either to take vocabulary tests for granted or, in the 1990s, to be interested in more integrative and communicative measures of lan- guage proficiency.

Multiple-Choice Vocabulary Items

Although the multiple-choice format is one of the most widely used methods of vocabulary assessment, both for native speakers and for second language learners, its limitations have also been recognised for a long time. Wesche and Paribakht summarise the criticisms of these items as follows:

1. They are difficult to construct, and require laborious field-testing, analysis and refinement.

2. The learner may know another meaning for the word, but not the one sought.

3. The learner may choose the right word by a process of elimination, and has in any case a 25 per cent chance of guessing the correct answer in a four-alternative format.

4. Items may test students' knowledge of distractors rather than theirability to identify an exact meaning of the target word.

5. The learner may miss an item either for lack of knowledge of words or lack of understanding of syntax in the distractors.

6. This format permits only a very limited sampling of the learner's total vocabulary (for example, a 25-item multiple-choice test samples one word in 400 from a 10,000-word vocabulary). Wesche and Paribakht, (1996: 17)

Chapter Five

Vocabulary Tests: Four Case Studies

In this chapter discuss four tests that assess vocabulary knowledge as case studies of test design and validation. I have referred to all four of them in earlier chapters, especially Chapter 4, and so the case studies give me the opportunity to explore issues raised earlier in greater depth, in relation to particular well-known language tests. The four tests are:

1. The Voluntary Levels Test;

2. The Eurocentres Vocabulary Size Test (EVST);

3. The Vocabulary Knowledge Scale (VKS); and

4. The Test of English as a Foreign Language (TOEFL)

These tests do not represent the full range of measures covered by the three dimensions of vocabulary assessment. Three of them are discrete, context-independent tests and all four are selective rather than comprehensive. However, I have chosen them because they are widely known and reasonably well documented in the literature. More specifically, there is research evidence available concerning their validity as assessment procedures for their intended purpose. They also represent innovations in vocabulary assessment and serve to highlight interesting issues in test design. However, there is a limited number of instruments that I could have considered for inclusion as case studies in this chapter, which reflects the fact that, despite the upsurge in second language vocabulary studies since the early 1980s, the design of tests that could function as standard instruments for research or other assessment purposes has been a neglected area.

The Eurocentres Vocabulary Size Test

Like the Vocabulary Levels Test, the Eurocentres Vocabulary Size. Test (EVST) makes an estimate of a learner's vocabulary size using a graded sample of words covering numerous frequency levels. However, there are several differences in the way that the two tests are designed and so it is worthwhile to look at the EVST in some detail as well. The EVST is a check-list test which presents learners with a series of words and simply requires them to indicate whether they know each one or not. It includes a substantial proportion of non-words to provide a basis for adjusting the test-takers' scores if they appear to be over- stating their vocabulary knowledge. Another distinctive feature of the EVST is that it is administered by computer rather than as a pen-and-paper test. Let us now look at the test from two perspectives: first as a placement instrument and then as a measure of vocabulary size.

Chapter Six

The Design Of Discrete Vocabulary Tests

In this chapter review various considerations that influence the design of discrete vocabulary tests. Discrete tests most commonly focus on vocabulary knowledge: whether the test-takers know the meaning or use of a selected set of content words in the target language. They may also assess particular strategies of vocabulary learning or vocabulary use. Such tests are to be distinguished from broader measures of vocabulary ability that are embedded in the assessment of learners' performance of language-use tasks.

The discussion of vocabulary-test design in the first part of this chapter is based on the framework for language-test development presented in Bachman and Palmer's (1996) book Language Testing in Practice. Since the full framework is too complex to cover here, I have chosen certain key steps in the test-development process as the basis for a discussion of important issues in the design of discrete vocabu- lary tests in particular. In the second part of the chapter, I offer a practical perspective on the development of vocabulary tests by means of two examples. One looks at the preparation of classroom progress tests, and the other describes the process by which I devel- oped the word-associates format as a measure of depth of vocabulary knowledge.

Receptive And Productive Vocabulary

From our experience as users of both first and second languages, we can all vouch for the fact that the number of words we can recognise and understand is rather larger than the number we use in our own speech and writing. This distinction between receptive and productive vocabulary is one that is accepted by scholars working on both first and second language vocabulary development, and it is often referred to by the alternative terms passive and active. As Melka (1997) points out, though, there are still basic problems in conceptualising and measuring the two types of vocabulary, in spite of a lengthy history of research on the subject. The difficulty at the conceptual level is to find criteria for distin- guishing words that have receptive status from those which are part of a person's productive vocabulary. It is generally assumed that words are known receptively first and only later become available for productive use. Melka (1997) suggests that it is most useful to think in terms of a receptive to productive continuum, representing increasing degrees of knowledge or familiarity with a word. Thus, when they first encounter a new word, learners have limited knowledge of it and may not even remember it until they come across it again. It is only after they gain more knowledge of its pronunciation, spelling, grammar, meaning, range of use and so on that they are able to use it themselves. The problem is to locate the threshold at which the word passes from receptive to productive status. Is there a certain minimum amount of word knowledge that is required before productive use is possible? Melka acknowledges that, if there is a continuum here, it is not a simple smooth one; furthermore, there is a fluid boundary and a great deal of interaction between receptive and productive vocabulary.

Chapter Seven

Comprehensive Measures Of Vocabulary

Comprehensive measures are particularly suitable for assessment procedures in which vocabulary is embedded as one component of the measurement of a larger construct, such as communicative competence in speaking, academic writing ability or listening comprehension. However, we cannot simply say that all comprehensive measures are embedded ones, because they can also be used on a discrete basis. For example, a number of the studies which have applied lexical statistics to learner compositions have been conducted by L2 vocabulary researchers who were not interested in an overall assessment of writing ability but just in making inferences about the learners' productive vocabulary knowledge. These researchers are clearly treating vocabulary as a separate construct and not making any more general assessment of the quality of the learners' writing.

Statistical Measures Of Writing

One way in which we can assess the written production of learners is by calculating various statistics that reflect their use of vocabulary. Some of these measurements were originally developed by literary scholars to analyse the stylistic features of major authors and to date they have been applied to second language writing only to a limited extent. Researchers in second language acquisition have certainly been interested in quantitative, or objective', measures of learner production but, in keeping with the general orientation of their work, they have mostly counted grammatical units, such as the length of sentences or of clauses. Those scholars who have worked with lexical measures have used them for research purposes rather than for assessment of learners. The relative complexity of the procedures in- volved in calculating the statistics makes it difficult to apply them in an operational writing test, although the results of such studies may provide valuable input into the design of the rating scales that are most commonly used for the assessment of learner production in a less time-consuming, qualitative manner.

The researchers have used the lexical statistics to investigate a variety of research questions:

1. Do these measures give consistent results when they are applied to two compositions written by the same learners, with only a short time interval in between? (Arnaud, 1992; Laufer and Nation, 1995).

2. How do the compositions of second language learners compare with those of native speakers of a similar age and/or educational level? (Arnaud, 1984; Linnarud, 1986; Waller, 1993).

3. What is the relationship between the lexical statistics and holistic ratings of the quality of the learners' compositions? (Nihalani, 1981; Linnarud, 1986; Engber, 1995).

4. What is the relationship between the lexical quality of learners' writing and their vocabulary knowledge, as measured by a discrete point vocabulary test? (Arnaud, 1984; 1992; Laufer and Nation, 1995).

5. Does the lexical quality of advanced learners' writing increase after one or two semesters of English study? (Laufer, 1991; 1994).

Chapter Eight

Further Developments In Vocabulary Assessment

In earlier chapters, have surveyed a diverse range of work on second language vocabulary assessment and proposed three dimensions which allow us to locate the different types of measure within a common framework. Conventional vocabulary tests - which I would describe as predominantly discrete, selective and context independent - are effective research tools for certain purposes and are routinely administered in second language teaching programmes around the world. Existing tests of this kind will continue to be used and new

ones devised. They work best in assessment situations where it makes sense to focus on vocabulary as a discrete form of language knowledge and to treat lexical items as individual units of meaning. At a time when the pendulum in language-teaching methodology is moving back to a greater emphasis on form-focused instruction, there is renewed interest in giving explicit attention to learners' mastery of the structural features of the language, including its lexical forms.

The Vocabulary Of Informal Speech

In the preceding section, I have made reference to the lexical features of spoken English in discussing Skehan's work. The vocabulary of speech is the second area of vocabulary study that has received less attention than it should have, as indicated by the fact that perhaps the most frequently cited research study is the one conducted by Schonell et al. (1956) in the 1950s on the spoken vocabulary of Australian workers. I mention this not to cast aspersions on research coming from Australia (tempting though it may be for a New Zealander to do so) but simply to highlight the limited number of more recent studies from anywhere else. There are several reasons for this (see also McCarthy and Carter, 1997: 20): As I have frequently noted, in Chapter 4 and elsewhere, a large proportion of the research on vocabulary has been undertaken by reading researchers, who obviously focus on words in written texts. There is no equivalent research tradition on the vocabulary of spoken language, especially in informal settings.

1. Almost all the established word-frequency lists have been compiled by counting words in corpora of written texts. Although spoken corpora are becoming more common now, they are usually much smaller than the corresponding written ones because samples of spoken language are difficult both to collect and to store. Making recordings of natural speech is quite a challenge: people tend to be self-conscious when they know they are being recorded, and there are legal and ethical constraints on recording speakers without their knowledge or consent. Once the speech has been recorded, it then has to be painstakingly transcribed before being entered into the computer for analysis.

2. Spoken language also creates problems of analysis. Speech is not 'grammatical', at least according to the rules for the sentences of written language, and McCarthy and Carter (1997: 28-29) point out various difficulties in the identification of vocabulary items as well. For instance, are vocalisations like mm, er and um to be considered as lexical items? Should contracted forms like don't, it's and gonna be counted as one word form or two? O'Loughlin (1995) faced such problems when he applied the lexical density statistic to speaking test data and I showed in Chapter 7 (Table 7.2) how he had to develop quite an elaborate set of rules for distinguishing lexical and grammatical items.

Source:

· Purpura E. James. 2004. Assessing Grammar (1-145). Cambridge University Press

· Read John. 2000. Assessing Vocabulary (1-236). Cambridge University Press

Tugas Language Assessment

Kamis, 14 Mei 2020

Assignment 11 for meeting 15