DESIGNING CLASSROOM LANGUAGE TESTS

A. TEST TYPES
The first task you will face in designing a test for your students is to determine the purpose for the test. Defining your purpose will help you choose the right kind of test, and it will also help you to focus on the specific objectives of the test.We will look first at two test types that you will probably not have many opportunities to create·as a classroom teacher-language aptitude tests and language profiCiency tests-and three types that you will almost certainly need to create-placement tests, diagnostic tests, and achievement tests.

1. Language Aptitude Tests
A language aptitude test is designed to measure capacity or general ability to learn a foreign language and ultimate success in that undertaking. Language aptitude tests are ostensibly designed to apply to the classroom learning of any language. Instead, attempts to measure language aptitude more often provide learners with information about their preferred styles and their potential strengths and weaknesses, with follow-up strategies for capitalizing on the strengths and overcoming the weaknesses.
2. Proficiency Tests
A proficiency test is not limited to anyone course, curriculwn, or single skill in the language; rather, it tests overall ability. ProfiCiency tests have traditionally consisted of standardized multiple-choice items on grammar, vocabulary, reading comprehension, and aural comprehension.
Proficiency tests are almost always summative and norm-referenced. They pra. vide results in the form of a single score (or at best two or three subscores, one for each section of a test), which is a sufficient result for the gate-keeping role they v" play of accepting or denying someone passage into the next stage of a journey.

3. Placement Tests
Certam profiCiency tests can act in the role of placement tests, the purpose of which is to place a student into a particular level or section of a language curriculum or school. A placement test usually, but not always, includes a sampling of the material to be covered in the various courses in a curriculum; a student's performance on the test should indicate the point at which the student will find material neither too easy nor too difficult but appropriately challenging.
Placement tests come in many varieties: assessing comprehension and produc~ tion, responding through written and oral performance, open-ended and limited responses, selection (e.g., multiple-choice) and gap.filling formats, depending on the nature of a program and its needs.

4. Diagnostic Tests
A diagnostic test is designed to diagnose specified aspects of a language. A test in pronunciation, for example, might diagnose the phonological features of English that are difficult for learners and should therefore become part of a curriculum. Usually, such tests offer a checklist of features for the administrator (often the teacher) to use in pinpointing difficulties. A writing diagnostic would elicit'a writing sample from students that would allow the teacher to identify those rhetorical and linguistic features on which the course needed to focus special attention.
A typical diagnostic test of oral production was created by Clifford Prator (1972) to accompany a manual of English pronunciation. Test-takers are directed to read a ISO-word passage while they are tape-recorded. The test administrator then refers to an. inventory of phonological items for analyzing a learner's production.

5. Achievement Tests
An achievement test is related directly to classroom lessons, units, or even a total curriculum. Achievement tests are (or should be) limited to particular material addressed in a curriculum within a particular time frame and are offered after a course haS focused on the objectives in question. Achievement tests can also serve the diagnostic role of indicating what a student needs to continue to work on in the future, but the primary role of an achievement test is to determine whether course objectives have been met-and appropriate knowledge and skills acquired-by the end of period of instruction.
Achievement tests are often summative because they are administered at the end of a unit dr term of study. They also play an important formative role. An effective achievenlent test will offer washback about the quality of a learner's performance in subsets of the unit or course.

B. SOME PRACI1CAL STEPS TO TEST CONSTRUCTION
In the remainder of this chapter, we will explore the four remaining questions posed at the outset, and the focus will be on equipping you with the tools you need to create such classroom-oriented tests.
1) Assessing Clear, Unambiguous Objectives
In addition to knowing the purpose of the test you're creating, you need to know as specifically as possible what it is you want to test. Sometimes teachers give tests simply because it's Friday of the third week of the course, and after hasty glances at the chapter(s) covered during those three weeks, they dash off some test items so the students will have something to do during the class. This is no way to approach a test. Instead, begin by taking a careful look at everything that you think your students should "know" or be able to "do," based on the material that the students are responsible for. In other words, examine the objectives for the unit you are testing.

2) Drawing Up Test Specifications
Test specifications for classroom use can be a simple and practical outline of your test. (For large-scale standardized tests, that are intended to be widely distributed and therefore are broadly generalized, test specifications are much more formal and detailed.) In the unit discussed above, your specifications will simply comprise (a) a broad outline of the test, (b) what skills you will test, and (c) what the items will look like. Let's look at the frrst two in relation to the midterm unit assessment already referred to above.
3) Devising Test Tasks
Your oral interview comes frrst, and so you draft questions to conform to the accepted pattern of oral interviews (see Chapter 7 for information on constructing oral interviews). You begin and end with nonscored items (warm-up and wind down) designed to set students at ease, and then sandwich between them items intended to test the objective (level check) and a little beyond (Probe).
You are now ready to draft other test items. To provide a sense ofauthenticity and interest, you have decided to ·conform your items to the context of a recent TV sitcom that you used in claSs to . illustrate certain discourse and. form-focused factors. The sitcom depicted a loud,noisy party with lots ofsmall talk.As you devise your test items, consider such factors as how students will perceive them (face validity), the extent to which authentic language and contexts are present, potential difficulty caused by cultural schemata, the length of the listening stimuli, how well a story line comes across, how things like the cloze testing format will work, and other practicalities.

4) Designing Multiple-Choice Test Items
Multiple-choice items, which may appear to be the Simplest kind of item to construct, are extremely difficult to design correctly. Hughes (2003, pp. 76-78) cautions against a number of weaknesses of multiple-choice items:
• The technique tests only recognition knowledge.
• Guessing may have a considerable effect on test scores.
• The technique severely restricts what can be tested.

The two principles that stand out in support of multiple-choice formats are, of course, practicality and reliability. With their predetermined correct responses and time-saving scoring procedures, multiple-choice items offer overworked teachers the tempting possibility of an easy and consistent process of scoring and grading.
Since there will be occasions when multiple-choice items are appropriate, consider the following four guidelines for designing multiple-choice items for both classroom-based and large-scale situations (adapted from Gronlund, 1998, pp.60-75, and]. D. Brown, 1996, pp. 54-57).

1. Item facility (or IF) is the extent to which an item is easy or difficult for the proposed group of test-takers. You may wonder why that is important if in your estimation the item achieves validity. The answer is that an item that is too easy (say 99 percent of respondents get it right) or too difficult (99 percent get it wrong) really does nothing to separate high-ability and low-ability test-takers. It is not really performing much "work" for you on a test.
2. Item discrimination (10) is the extent to which an item differentiates between high- and low-ability test-takers. An item on which high-ability students (who did well in the test) and low-ability students (who didn't) score equally well would have poor ID because it did not discriminate between the two groups. Conversely, an item that garners correct responses from most of the high-ability group and incorrect responses from most of the low-ability group has good discrimination power.
3. Distractor efficiency is one more important measure of a multiple-choice item's value in a test, and one that is related to item discrimination. The efficiency of distractors is the extent to which (a) the distractors "lure" a sufficient number of test takers, especially lower-ability ones, and (b) those responses are somewhat evenly distributed across all distractors. Those of you who have a fear of mathematical formulas will be happy to read that there is no formula for calculating distractor efficiency and that an inspection of a distribution of responses will usually yield the information you need.

C. SCORING, GRADING, AND GIVING FEEDBACK

I. Scoring
Because oral production. is a driving force in your overall objectives, you decide to place more weight on the speaking (oral interview) section than on the other three "sections" Five minutes is actually a long time to spend in a one-on-one situation with a student, and some significant information can be extracted from such a session. You therefore designate 40 percent of the grade to the oral interview. You consider the listening and reading sections to be equally important, but' each of them, especially in this multiple-choice format, is of less consequence than the oral interview. So you give each of them a 20 percent weight.  This may take a little numerical common sense, but it doesn't require a degree in math. To make matters simple, you decide to have a 100-point test in which
the listening and reading items are each worth 2 points.
the oral interview will yield four scores ranging from 5 to I" reflecting fluency, prosodic features, accuracy of the target grammatical objectives, and discourse appropriateness. To weight these scores appropriately, you will double each individual score and then add them together for a possible total score of 40.
the writing sample has two scores: one for grammar/mechanics (including the correct use of so and because) and one for overall effectiveness of the message, each ranging from 5 to 1. Again, to achieve the correct weight for writing, you will double each score and add them, so the possible total is 20 points.

II. Grading
Your first thought might be that assigning grades to student performance on this test would be easy: just give an "A" for 90-100 percent, a "B" for 80-89 percent, and ,so on. Not so fast! Grading is such a thorny issue that all of Chapter 11 is devoted to the topic. How you assign letter grades to this test is a product of
• the country, culture, and context of this English classroom,
• institutional expectations (most of them unwritten),
• explicit and implicit definitions of grades that you have set forth,
• the relationship you have established with this class, and
• student expectations that have been engendered in previous tests and quizzes in this class.

III. Giving Feedback
A section on scoring and grading would not be complete without some consideration of the forms in which you will offer feedback to your students, feedback that you want to become beneficial washback in the example test that we have been referring to here-which is not unusual in the universe of possible formats for periodic classroom tests-consider the multitude of options. You might choose to return the test to,:the student with one of, or a combination of, any of the possibilities below:
1. a letter grade
2. a total score
3. four sub scores (speaking, listening, reading, writing)
4. for the listening and reading sections
5. for the oral interview
6. on the essay
7. on all or selected parts of the test, peer checking of results

IV. EXERCISES
1. (I/C) Consult the MLAT website address on page 44 and obtain as much information as you can about the MLAT. Aptitude tests propose to predict one's performance in a language course. Review the rationale supporting such testing, and then summarize the controversy surrounding aptitude tests. What can you say about the validity and the ethics of aptitude testing?
2. (G) In pairs, each assigned to one type of test (aptitude, proficiency, placement, diagnostic, or achievement), create a list of broad specifications for the test type you have been assigned: What are the test criteria? What kinds of items should be used? How would you sample among a number of possible objectives?
3. (G) Look again at the discussion of objectives. In a small group, discuss the following scenario: In the case that a teacher is faced with more objectives than are possible to sample in a test, draw up a set of guidelines for choosing which objectives to include on the test and which ones to exclude. You might start with considering the issue. of the relative importance of all the objectives in the context of the course in question. How does one adequately sample objectives?
4. (l/C) Figure 3.1 depicts various modes of elicitation and response. Are there other modes of elicitation that could be included in such a chart? Justify your additions with an example of each.
5. (G) Select a language class in your immediate environment for the following project: In small groups, design an achievement test for a segment of the course (preferably a unit for which there is no current test or for wpjch the present test is inadequate). Follow the guidelines in this chapter for developing an assessment procedure. When it is completed, present your assessment project to the rest of the class.
6., (G) Find an existing, recently used standardized multiple-choice test for which there is accessible data on student performance. Calculate the item facility (IF) and item discrimination (ID) index for selected items. Ifthere are no data for an existing test, select some items on the test and analyze the structure of those items in a distractor analysis to determine if they have (a) any bad distractors, (b) any bad stems, or (c) more than one potentially correct answer.
7. (I/C) On page 63, nine different options are listed for giving feedback to students on assessments. Review the practicality of each and determine the extent to which practicality (principally, more time expended) is justifiably sacrificed in order to offer better washback to learners.


Reference
Brown, Douglas. 2004. Language assessment principles and classroom Practice. New York Longman.

Comments

Popular posts from this blog

Summary Assessment

Defenition : Test,measurement,evaluation,assessment,infomal and formal assessment,formative and summative assessment