Tugas Language Assessment: Maret 2020

Jumat, 27 Maret 2020

Assignment 5

LANGUAGE TESTS

The previous chapters introduced a number of building blocks for designing language tests. You now have a sense of where tests belong in the larger domain of assessment. You have sorted through differences between formal and informal tests. formative and summative tests, and norm- and criterion-referenced tests. You have traced some of the historical lines of thought in the field of language assessment. You have a sense of major current trends in language assessment, especially the present focus on communicative and process-oriented testing that seeks to transform tests from anguishing ordeals into challenging and intrinsically motivating learning experiences. By now, certain foundational principles have entered your vocabulary: practicality, reliability, validity, authenticity, and washback. And you should now possess a few tools with which you can evaluate the effectiveness of a classroom test.

TEST TYPES

The first task you will face in designing a test for your students is to determine the purpose for the test. Defining your purpose will help you choose the right kind of test, and it will also help you to focus on the specific objectives of the test. We will look first at two test types that you will probably not have many opportunities to create as a classroom teacher-language aptitude tests and language proficiency tests-and three types that you will almost certainly need to create-placement tests, diagnostic tests, and achievement tests.

LANGUAGE APTITUDE TESTS

One type of test-although admittedly not a very common one-predicts a person's success prior to exposure to the second language. A language aptitude test is designed to measure capacity or general ability to learn a foreign language and ulti- mate success in that undertaking. Language aptitude tests are ostensibly designed to apply to the classroom learning of any language. Two standardized aptitude tests have been used in the United States: the Modern Language Aptitude Test (MLAT) (Carroll & Sapon, 1958) and the Pimsleur Language Aptitude Battery (PLAB) (Pimsleur, 1966). Both are English language tests and require students to perform a number of language-related tasks. The MLAT, for example, consists of five different tasks.

Tasks in the Modern Language Aptitude Test

1. Number learning: Examinees must learn a set of numbers through aural input and then discriminate different combinations of those numbers.

2. Phonetic script: Examinees must learn a set of correspondences between speech sounds and phonetic symbols.

3. Spelling clues: Examinees must read words that are spelled somewhat phonetically, and then select from a list the one word whose meaning is closest to the disguised" word.

4. Words in sentences: Examinees are given a key word in a sentence and are then asked to select a word in a second sentence that performs the same grammatical function as the key word.

5. Paired associates: Examinees must quickly leam a set of vocabulary words from another language and memorize their English meanings.

A typical example of a standardized proficiency test is the Test of English as a Foreign Language (TOEFL.) produced by the Educational Testing Service. The TOEFL is used by more than a thousand institutions of higher education in the United States as an indicator of a prospective student's ability to undertake academic work in an English-speaking milieu. The TOEFL consists of sections on listening comprehension, structure (or grammatical accuracy).reading comprehension, and written expression. The new computer-scored TOEFL announced for 2005 will also include an oral production component. With the exception of its writing section, the TOEFL (as well as many other large-scale proficiency tests) is machine-scorable for rapid turnaround and cost effectiveness (that is, for reasons of practicality).

Research is in progress(Bernstein et al., 2000) to determine, through the technology of speech recognition, if oral production performance can be adequately machine-scored. (Chapter 4 provides a comprehensive look at the TOEFL and other standardized tests). A key issue in testing proficiency is how the constructs of language ability are specified. The tasks that test-takers are required to perform must be legitimate samples of English language use in a defined context. Creating these tasks and validating them with research is a time-consuming and costly process. Language teachers would be wise not to create an overall proficiency test on their own. A far more practical method is to choose one of a number of commercially available proficiency tests.

PROFICIENCY TESTS

If your aim is to test global competence in a language, then you are, in conventional terminology, testing proficiency A proficiency test is not limited to any one course, curriculum, or single skill in the language; rather, it tests overall ability. Proficiency tests have traditionally consisted of standardized multiple-choice items on grammar, vocabulary, reading comprehension, and aural comprehension. Sometimes a sample of writing is added, and more recent tests also include oral production performance. As noted in the previous chapter, such tests often have content validity weaknesses, but several decades of construct validation research have brought us much closer to constructing successful communicative proficiency tests.

PLACEMENT TESTS

Certain proficiency tests can act in the role of placement tests, the purpose of which is to place a student into a particular level or section of a language curriculum or school. A placement test usually, but not always, includes a sampling of the material to be covered in the various courses in a curriculum; a student's performance on the test should indicate the point at which the student will find material neither too easy nor too difficult but appropriately challenging. The English as a Second Language Placement Test (ESLPT) at San Francisco State University has three parts. In Part I. students read a short article and then write a summary essay. In Part II, students write a composition in response to an article, Part 111 is multiple-choice: students read an essay and identify grammar errors in it. The maximum time allowed for the test is three hours. Justification for this three part structure rests largely on the test's content validation. Most of the ESL courses at San Francisco State involve a combination of reading and writing, with a heavy emphasis on writing. The first part of the test acts as both a test of reading com- prehension and a test of writing (a summary). The second part requires students to state opinions and to back them up, a task that forms a major component of the writing courses.

DIAGNOSTIC TESTS

A diagnostic test is designed to diagnose specified aspects of a language. A test in pronunciation, for example, might diagnose the phonological features of English that are difficult for learners and should therefore become part of a curriculum. Usually, such tests offer a checklist of features for the administrator (often the teacher) to use in pinpointing difficulties. A writing diagnostic would elicit a writing sample from students that would allow the teacher to identify those rhetorical and linguistic features on which the course needed to focus special attention. Diagnostic and placement tests, as we have already implied, may sometimes be indistinguishable from each other.

ACHIEVEMENT TESTS

An achievement test is related directly to classroom lessons, units, or even a total curriculum. Achievement tests are or should be limited to particular material addressed in a curriculum within a particular time frame and are offered after a course has focused on the objectives in question. Achievement tests can also serve the diagnostic role of indicating what a student needs to continue to work on in the future, but the primary role of an achievement test is to determine whether course objectives have been met-and appropriate knowledge and skills acquired-by the end of a period of instruction. Achievement tests are often summative because they are administered at the end of a unit or term of study. They also play an important formative role. An effective achievement test will offer wash back about the quality of a learner's performance in subsets of the unit or course.

SOME PRACTICAL STEPS TO TEST CONSTRUCTION

The descriptions of types of tests in the preceding section are intended to help you understand how to answer the first question posed in this chapter: What is the purpose of the test? It is unlikely that you would be asked to design an aptitude test or a proficiency test, but for the purposes of interpreting those tests, it is important that you understand their nature. However, your opportunities to design placement, diagnostic, and achievement tests-especially the latter-will be plentiful, In the remainder of this chapter, we will explore the four remaining questions posed at the outset, and the focus will be on quipping you with the tools you need to create such classroom-oriented tests.

ASSESSING CLEAR, UNAMBIGUOUS OBJECTIVES

In addition to knowing the purpose of the test you're creating, you need to know as specifically as possible what it is you want to test. Sometimes teachers give tests simply because it's Friday of the third week of the course, and after hasty glances at the chapter(s) covered during those three weeks, they dash off some test items so that students will have something to do during the class. This is no way to approach a test. Instead, begin by taking a careful look at everything that you think your students should "know" or be able to "do," based on the material that the students are responsible for. In other words, examine the objectives for the unit you are testing.

Selected objectives for a unit in a low-intermediate integrated-skills course Form-focused objectives (listening and speaking)

1. Students will recognize and produce tag questions, with the correct grammatical form and final intonation pattern, in simple social conversations.

2. recognize and produce what information questions with correct final intonation pattern.

3. Communication skills (speaking)Students will state completed actions and events in a social conversation.

4. Ask for confirmation in a social conversation.

5. Give opinions about an event in a social conversation

6. produce language with contextually appropriate intonation, stress, and rhythm.

7. Reading skils (simple essay or story) Students will recognize iregular past tense of selected verbs in a story or essay.

8. Writing skills (simple essay or story) Students will write a one-paragraph story about a simple event in the past.

9. Use conjunctions so and because in a statement of opinion.

DRAWING UP TEST SPECIFICATIONS

Test specifications for classroom use can be a simple and practical outline of your test. (For arge-scale standardized tests [see Chapter 4] that are intended to be widely distributed and therefore are broadly generalized, test specifications are more formal and detailed.) In the unit discussed above, your specifications mu will simply comprise (a) a broad outline of the test, (b) what skills you will test, and ()what the items will look like. Let's look at the first two in relation to the midterm unit assessment already referred to above (a) Outline of the test and (b) skils to be included. Because of the constraints of your curriculum, your unit test must take no more than 30 minutes.

DEVISING TEST TASKS

Your oral interview comes first, and so you draft questions to conform to the accepted pattern of oral interviews (see Chapter 7 for information on constructing oral interviews). You begin and end with nonscored items (warm-up and wind-down) designed to set students at ease, and then sandwich between them items intended to test the objective (level check) and a little beyond (probe). Oral interview format

A. Warm-up: questions and comments

B. Level-check questions (objectives 3, 5, and6)

1. Tell me about what you did last weekend.

2. Tell me about an interesting trip you took in the last year.

3. How did you like the TV show we saw this week?

C. Probe (objectives 5, 6)

1. What is your opinion about

? (news event)

2. How do you feel about.

-? (another news event)

D. Wind-down: comments and reassurance

You are now ready to draft other test items. To provide a sense of authenticity and interest, you have decided to conform your items to the context of a recent TV sitcom that you used in class to illustrate certain discourse and form-focused factors.

DESIGNING MULTIPLE-CHOICE

Test Items In the sample achievement test above, two of the five components (both of the listening sections) specified a multiple-choice format for items. This was a bold step to take. Multiple-choice items, which may appear to be the simplest kind of item to construct, are extremely difficult to design correctly. Hughes (2003, pp. 76-78) cautions against a number of weaknesses of multiple-choice items: The technique tests only recognition knowledge. Guessing may have a considerable eftect on test scores. The technique severely restricts what can be tested. It is very difficult to write successful items. Washback may be harmful.

SCORING, GRADING, AND GIVING FEEDBACKK

Scoring as you design a classroom test, you must consider how the test will be scored and graded. Your scoring plan reflects the relative weight that you place on each section and items in each section. The integrated-skills class that we have been using as an example focuses on listening and speaking skills with some attention to reading and writing. Three of your nine objectives target reading and writing skills. How do you assign scoring to the various components of this test? Because oral production is a driving force in your overall objectives, you decide to place more weight on the speaking (oral interview) section than on the other three sections. Five minutes is actually a long time to spend in a one-on-one situation with a student, and some significant information can be extracted from such a session. You therefore designate 0 percent to the grade to the oral interview. 1ou consider the listening and reading sections to be equally important, but each of them, especially in this multiple-choice format, is of less consequence than the oral interview. So you give each of them a 20 percent weight.

That leaves 20 percent for the writing section, which seems about right to you given the time and focus on writing in this unit of the course. Your next task is to assign scoring for each item. This may take a little numerical common sense, but it doesn't require a degree in math. To make matters simple, you decide to have a 100-point test in which the listening and reading items are each worth 2 points. The oral interview will yield four scores ranging from 5 to 1, reflecting fluency. Prosodic features, accuracy ot the target grammatical objectives, and discourse approprarcness. 1oeignt test scores appropriately, you WI double each individual score and then add them together for a possible total score of 40. (Chapters 4 and 7 will deal more extensively with scoring and assessing oral production performance.)the writing sample has two scores: one for grammar/mechanics (including the correct use of so and because) and one for overall effectiveness of the message, each ranging from 5 to 1. Again, to achieve the correct weight for writing. You will double each score and add them, so the possible total is 20 points.

GRADING

Your first thought might be that assigning grades to student performance on thistest would be easy: just give an "A" for 90-100 percent, a "B° for 80-89 percent, and so on. Not so fast! Grading is such a thorny issue that all of Chapter 11 is devoted to the topic. How you aSsign letter grades to this test is a product ofthe country, culture, and context of this English classroom, institutional expectations (most of them unwritten), explicit and implicit definitions of grades that you have set forth, the relationship you have established with this class, and student expectations that have been engendered in previous tests andquizzes in this class.For the time being, then, we will set aside issues that deal with grading this test in particular, in favor of the comprehensive treatment of gracing.

GIVING FEEDBACK

A section on scoring and grading would not be complete without some consideration of the forms in which you will offer feedback to your students, feedback that you want to become beneficial washback. In the example test that we have been refer ring to here-which is not unusual in the universe of possible formats for periodic

EXERCISES

Note: (D ndividual work; (G) Group or pair work; (C) Whole-class discussion.]1. V)Consult the MLAT website address on page 44 and obtain as muchinformation as you can about the MLAT. Aptitude tests propose to predict one's performance ina language course. Review the rationale supporting such testing, and then summarize the controversy surrounding aptitude tests. What can you say about the validity and the ethics of aptitude testing2. (G) In pairs, each assigned to one type of test (aptitude, proficiency, place-ment, diagnostic, or achievement), create a list of broad specifications for the test type you have been assigned: What are the test criteria? What kinds of items should be used? How would you sample among a number ot possible objectives?3. (G) Look again at the discussion of objectives (page 49). In a small group,discuss the following Scenario: In the case that a teacher is faced with more objectives than are possible to sample in a test, draw up a set of guidelines for choosing which objectives to include on the test and which ones to exclude You might start with considering the issue of the relative importance of allNC ODjccuves in the context of the course in question. How does one ade sum pic objcctuves?4. (V)Figure 3.1 depicts various modes of elicitation and response. Are thereother modes of elicitation that could be included in such a chart? Justify your additions with an exampie o each.5. (G) Select a language class in your immediate environment for the followingproject: In small groups, design an achievement test for a segment of the course preferably a unit for which there is no current test or for which the present test is inadequate). Follow the guidelines in this chapter for facility P and item discrimination (1D) index for selected items. If there are structure of those items in a distractor analysis to determine if they have6. (G) Find an existing, recently uscd standardized multiple-choice test forwhich there is accessible data on student performance. Calculate the item no data for an existing test, select some items on the test and analyze the (a) any bad distractors, (6) any bad stems, or () more than one potentiallycorrect answer7. W) On page 63, nine diferent options are listed for giving feedback to students on assessments. Review the practicality of each and determine the extent to which practicality (principally, more time expended) is justifiably sacrificed in order to offer better washback to learners.FOR YOUR FURTHER READINGCarroll, John B. (1990). Cogniive abilities in foreign language aptitude: Then andnow. In Thomas s. Parry &Charles W. Stansfield (Eds.), Language aptitude recon sidered. Englewood Cliffs, NJ: Prentice Hall RegentsCarroll, the original developer of the MLAT, updates arguments for and against some of the original cognitive hypotheses underlying the MLAT. In the same volume, note articles by Oxford and by Ehrman contending that styles, strategies, and personality may be further factors in the construct of language aptitude.Brown.James Dean.

Source:

Brown, H. Douglas. 2003. Language Assessment Principles and Classroom Practices. San Francisco, California

Kamis, 19 Maret 2020

Assignment 4

1. Practicallity

A test that is prohibitively expensive is impractical. A test of language proficiency that takes a student five hours to complete is impractical-it consumes more time and money) than necessary to accomplish its objective. A test that requires individual one-on-one proctoring is impractical for a group of several hundred test-takers and only a handful of examiners. A test that takes a few minutes for a student to take and several hours for an examiner to evaluate is impractical for most classroom situations. A test that can be scored only by computer is impractical if the test takes place a thousand miles away from the nearest computer. The value and quality of a test sometimes hinge on such nitty-gritty, practical considerations. Here's a little horror story about practicality gone awry. An administrator of a six-week summertime short course needed to place the 50 or so students who had enrolled in the program. A quick search yielded a copy of an old English Placement Test from the University of Michigan. It had 20 listening items based on an audio tape and 80 items on grammar, vocabulary, and reading comprehension, all multiple

choice format. A scoring grid accompanied the test. On the day of the test, the required number of test booklets had been secured, a proctor had been assigned to monitor the process, and the administrator and proctor had planned to have the scoring completed by later that afternoon so students could begin classes the next day, Sounds simple, right? Wrong. The students arrived, test booklets were distributed, and directions were given the proctor started the tape. Soon students began to look puzzled. By the time the tenth item played, everyone looked bewildered. Finally, the proctor checked a test booklet and was horrified to discover that the wrong tape was playing; it was a tape for another form of the same test! Now what? She decided to randomly select a short passage from a textbook that was in the room and give the students a dictation. The students responded reasonably well. The next 80 non-tape based items proceeded without incident, and the students handed in their score sheets and dictation papers.

2. Reliability

A reliable test is consistent and dependable. If you give the same test to the same student or matched students on two different occasions, the test should yield similar results. The issue of reliability of a test may best be addressed by considering a number of factors that may contribute to the unreliability of a test. Consider the following possibilities (adapted from Mousavi, 2002, p. 804): fluctuations in the student, in scoring in test administration, and in the test itself.

a. Student-Related Reliability

The most common learner-related issue in reliability is caused by temporary illness, fatigue, a "bad day" anxiety, and other physical or psychological factors, which may make an observed"score deviate from one's "true" score. Also included in this category are such factors as a test-taker's test-wiseness" or strategies for efficient test taking (Mousavi, 2002, p.804)

b. Rater Reliability

Human error, subjectivity, and bias may enter into the scoring process. Inter-rater reliability occurs when two or more scorers yield inconsistent scores of the same test possibly for lack of attention to scoring criteria, inexperience, inattention, or even preconceived biases. In the story above about the placement test, the initial scoring plan for the dictations was found to be unreliable that is, the two scorers were not applying the same standards.

c. Test Administration Reliability

Unreliability may also result from the conditions in which the test is administered. I once witnessed the administration of a test of aural comprehension in which a tape recorder played items for comprehension, but because of street noise outside the building, students sitting next to windows could not hear the tape accurately.

d. Test Reliability

Sometimes the nature of the test itself can cause measurement errors. If a test is too long, test-takers may become fatigued by the time they reach the later items and hastily respond incorrectly. Timed tests may discriminate against students who do not perform well on a test with a time limit.

3. Validity

By far the most complex criterion of an effective test--and arguably the most important principle-is validity, "the extent to which inferences made from assessment results are appropriate, meaningful and useful in terms of the purpose of the assessment" (Gronlund, 1998, p. 226). A valid test of reading ability actually measures reading ability-not 20/20 vision, nor previous knowledge in a subject, nor some other variable of questionable relevance. To measure writing ability, one might ask students to write as many words as they can in 15 minutes, then simply count the words for the final score. Such a test would be easy to administer (practical), and the scoring quite dependable (reliable). But it would not constitute a valid test of writing ability without some consideration of comprehensibility, rhetorical discourse elements, and the organization of ideas, among other factors.

a. Content-Related Evidence

If a test actually samples the subject matter about which conclusions are to be drawn, and if it requires the test-taker to perform the behavior that is being measured, it can claim content-related evidence of validity often popularly referred to as content validity (eg. Mousavi, 2002, Hughes, 2003). You can usually identify content related evidence observationally if you can clearly define the achievement that you are measuring. A test of tennis competency that asks someone to run a 100-yard dash obviously lacks content validity. If you are trying to assess a person's ability to speak a second language in a conversational setting, asking the learner to answer paper-and-pencil multiple choice questions requiring grammatical judgments does not achieve content validity.

b. Criterion-Related Evidence

A second form of evidence of the validity of a test may be found in what is called criterion-related evidence, also referred to as criterion-related validity, or the extent to which the "criterion" of the test has actually been reached. You will recall that in Chapter 1 it was noted that most classroom-based assessment with teacher designed tests fits the concept of criterion-referenced assessment. In such tests, specified classroom objectives are measured, and implied predetermined levels of performance are expected to be reached (80 percent is considered a minimal passing grade), In the case of teacher-made classroom assessments, criterion-related evidence is best demonstrated through a comparison of results of an assessment with results of some other measure of the same criterion.

c. Construct-Related Evidence

A third kind of evidence that can support validity, but one that does not play as large a role for classroom teachers, is construct-related validity. Commonly referred to as construct validity. A construct is any theory, hypothesis, or model that attempts to explain observed phenomena in our universe of perceptions Constructs may or may not be directly or empirically measured-their verification often requires inferential data. "Proficiency and communicative competence" are linguistic constructs: ”selfesteem" and "motivation are psychological constructs. Virtually every issue in language learning and teaching involves theoretical constructs.

d. Consequential Validity

As well as the above three widely accepted forms of evidence that may be introduced to support the validity of an assessment, two other categories may be of some interest and utility in your own quest for validating classroom tests. Messick (1989), Gronlund (1998), McNamara (2000), and Brindley (2001), among others, underscore the potential importance of the consequences of using an assessment. Consequential validity encompasses all the consequences of a test, including such considerations as its accuracy in measuring intended criteria, its impact on the preparation of test-takers, its effect on the learner, and the intended and unintended) social consequences of a test's interpretation and use.

e. Face Validity

An important facet of consequential validity is the extent to which students view the assessment as fair relevant, and useful for improving learning" (Gronlund, 1998,p. 210), or what is popularly known as face validity, "Face validity refers to the degree to which a test looks right, and appears to measure the knowledge or abilities it claims to measure, based on the subjective judgment of the examinees who take it, the administrative personnel who decide on its use, and other psychometrically unsophisticated observers" (Mousavi, 2002, p. 244) Sometimes students don't know what is being tested when they tackle a test.

4. Authenticity

A fourth major principle of language testing is authenticity, a concept that is a little slippery to define, especially within the art and science of evaluating and designing tests. Bachman and Palmer (1996, p. 23) define authenticity as "the degree of correspondence of the characteristics of a given language test task to the features of a target language task," and then suggest an agenda for identifying those target language tasks and for transforming them into valid test items. Essentially, when you make a claim for authenticity in a test task, you are saying that this task is likely to be enacted in the real world. Many test item types fail to simulate real-world tasks. They may be contrived or artificial in their attempt to target a grammatical form or a lexical item. The sequencing of items that bear no relationship to one another lacks authenticity. One does not have to look very long to find reading comprehension passages in proficiency tests that do not reflect a real world passage. In a test, authenticity may be present in the following ways:

• The language in the test is as natural as possible.

• Items are contextualized rather than isolated

• Topics are meaningful (relevant interesting for the learner

• Some thematic organization to items is provided, such as through a story line or episode.

• Tasks represent, or closely approximate, real-world tasks.

5. Washback

A facet of consequential validity, discussed above, is the effect of testing on teaching and learning (Hughes, 2003, p. 1), otherwise known among language testing specialists as washback. In large-scale assessment, washback generally refers to the effects the tests have on instruction in terms of how students prepare for the test "Cram" courses and teaching to the test" are examples of such washback. Another form of washback that occurs more in classroom assessment is the information that "washes back to students in the form of useful diagnoses of strengths and weaknesses Washback also includes the effects of an assessment on teaching and learning prior to the assessment itself, that is, on preparation for the assessment. Informal performance assessment is by nature more likely to have built-in washback effects because the teacher is usually providing interactive feedback. Formal tests can also have positive washback, but they provide no washback if the students receive a simple letter grade or a single overall numerical score.

A little bit of washback may also help students through a specification of the numerical scores on the various subsections of the test. A subsection on verb tenses for example, that yields a relatively low score may serve the diagnostic purpose of showing the student an area of challenge. Another viewpoint on washback is achieved by a quick consideration of differences between formative and summative tests, formative tests, by definition, provide washback in the form of information to the learner on progress toward goals. But teachers might be tempted to feel that summative tests, which provide assessment at the end of a course or program, do not need to offer much in the way of washback. Such an attitude is unfortunate because the end of every language course or program is always the beginning of further pursuits, more learning, more goals, and more challenges to face.

Even a final examination in a course should carry with it some means for giving washback to students. In my courses I never give a final examination as the last scheduled classroom session. I always administer a final exam during the penultimate session, then complete the evaluation of the exams in order to return them to students during the last class. At this time, the students receive scores, grades, and comments on their work, and I spend some of the class session addressing material on which the student were not completely clear. My summative assessment is thereby enhanced by some beneficial washback that is usually not expected of final examinations. Finally, washback also implies that students have ready access to you to discuss the feedback and evaluation you have given. While you almost certainly have known teachers with whom you wouldn't care argue about a grade, an interactive, cooperative, collaborative classroom nevertheless can promote an atmosphere of dialogue between students and teachers regarding evaluative judgments. For learning to continue, students need to have a chance to feed back on your feedback, to seek clarification of any issues that are fuzzy, and to set new and appropriate goals for themselves for the days and weeks ahead.

Source :

Brown, H. Douglas. 2003. Language Assessment Principles and Classroom Practices. San Francisco, California

Assignment 3

1. Validity

Based on 35 high school level National Examination items in 2017 above. The national exam questions really measure the validity of the test because it is about the target and shows what needs to be measured. The national exam questions are very valid because they are made by a special TEAM who plays a role in their field.

2. Reliability

In terms of reliability the national exam questions do not yet include the reability category because facilities in the city and in the village are different but are given questions with the same standard. Certainly because the facilities in the city are more complete and adequate, the students get higher grades. Therefore, due to differences in these facilities, the national exam is not reliable.

3. Practically

In the national exam questions observed above there are some practicalities of national exams including the question of money, paper quality and printing. The test which takes several minutes for a student to take and several hours for the examiner to evaluate is not practical for most class situations. A test that can only be assessed by a computer is not practical if the test is done thousands of miles away from the nearest computer. The grades and quality of exams sometimes depend on complex practical considerations. Here's a little horror story about the practicality that went awry.

https://www.zenius.net/c/6227/soal-un-sma-2017-bahasa-inggris