Standardized English Test analysis: selected-response questions
Standardized English tests are administered to evaluate theproficiency of English language teachers and their suitability forteaching the subject. According to Gardner (2000), a standardizedtest is designed in a way that its main objectives are to assess theteachers’ proficiency in English language. A number of elementsthat are considered to tell whether the test itself performs thefunction that it is supposed to. As such, in order to have auniversally standardized test, there has to be rigorous reliabilityand validity testing. While analysing the reliability and validity ona proficiency test for English Language Teachers, there are a numberof aspects that are looked into. These aspects evaluate the variousstructure of the test, namely, the grammatical structure, vocabularyand reading comprehension. In this analysis, there shall be aliterature review of the topic, analysis of the test, discussion ofthe finding and conclusion. Thus, the discourse analyses astandardized test as the most predominant and comprehensive techniquefor assessing the language proficiency of learners.
There are a number of approaches that have been used to test thevalidity of a proficiency test. Chapelle et al. (2011) discuss anapproach to construct validation of the tests. The initial approachevaluates the test’s correspondence with the theory that is taughtin class. This means that the test is supposed to conform adequatelyto the theory that has been approved by the education board. Young(2013) says that the theory itself is excluded from the questions.Secondly, the test has to ensure that it draws the learner’sattention into the questions that they are supposed to be answering.As such, the learner has to be in a position to correlate thequestions in the test. All other parts that constitute a section mustbe correlated adequately, and any inconsistencies in the testedsubject must be avoided. This helps the evaluator to collect evidencefrom these sub-sections in a distinct manner. According to Young(2013), to assure that a certain test has construct validity, thetotal test score must have an association with the small parts thatare evaluated. According to Young (2013), some problems may arisewhile doing this, especially while correlating the sub-sections tothe total test. This problem is solved by applying theMultitrait-Multimethod Matrix (MTMM) approach.
The MTMM approach is a tool that was designed for the purposed ofevaluating construct validity of any test (Llosa, 2007). Campbell andFriske developed it in the 1950s. The tool was designed mainly toevaluate language tests. This tool was a practical attempt thatresearchers could use to as a substitute to the nomological networkidea (meaning that it relates to certain principles taken as true).The nomological network idea was initially developed an evaluationtool, however, its main shortcoming was that it lacked a clearmethodology (Young, 2013). The MTMM tool was designed to contain aconvergent and discriminate validity methodology. The convergentvalidity is the degree to which the concepts that are intended to betheoretically related are associated in reality. On the other hand,discriminant validity is the measure to which the concepts that arenot supposed to be theoretically related are not associated inreality (Gardner, 2000). Validity evaluators agree that bothconvergent and discriminant validity can be evaluated using MTMM. Atthe same time, Llosa (2007) asserts that before it can be claimedthat the measures have construct validity, one has to demonstrateboth convergence and discrimination.
Young (2013) speaks more about convergent validity and discriminatevalidity. He says that if a trait is going to be tested by twomethods, correlation is expected to be high, given that the trait isthe same in each method. This means that if a number of learners areto be tested for grammar, say, in the form of multiple choices, it isexpected the correlation will be high. This is because the elementthat is being tested is grammar, so any differences that may beobserved are attributed to method’s effectiveness. Gardner (2000)also says that either validity is supposed to be logically related tothe total scores obtained by the learners. Gardner says thedifference between the two validity tests can be demonstrated byconsidering two sections of a proficiency are considered. These arevocabulary and grammar. Intrinsically, both are supposed to evaluatedifferent constructs in the learner’s proficiency. Therefore,Gardner (2000) says that the length to which a low correlation ifproduced from these to is a connection to the discriminate validityof the said test.
Factors to be assessed
There are a number of considerations for assessing languageproficiency. According to Young (2013), a test has to recognizediversity. An effective assessment is one, which recognizes thetested individuals’ different skills and learning capabilities. Inorder to accurately assess a learner’s language, Lowenberg (2002)asserts that the student’s variations in English skills, growth anddevelopment have to be constantly kept in check. As such, aproficiency tests has to be investigated for elements in backgroundinformation to ensure that all-important factors that have that havean impact on the learner are not biased. The background informationincludes proficiency and student achievement in first language,trauma due to some events such as war and accidents and family &cultural values.
Young (2013) speaks about developmental appropriate assessment. Inthis regard, Young says that the same calls for the use of an arrayof assessment strategies, mainly because English Language learnersneed a number of ways to highlight their understanding. If thelanguage proficiency is of a lower level, it is important for theteacher to use techniques that go beyond just the use of stationary.
According to Goldenberg (2008), a test has to demonstrate a cleardifference between receptive and productive language skills. It iscommon for some learners to show discrepancies between their oral andliteracy skills. This is dependent on their backgrounds, especiallyeducational cultural. At the same time, Hughes (2003) identifies thatsome students may demonstrate better understanding when they arereading and writing, and vice versa. As such, a test has to beanalysed for any unjustified biases. This is also helpful foreducators and decision makers when designing tests for various groupsof students.
Another important factor to consider while analysing an Englishproficiency test touches on the differences between social andacademic language. According to Young (2013), when English learnersare being assessed, the tester has to take into consideration thetype of language that the student is using to pass their message. Itis common for some learners to use high frequency vocabulary, whileothers prefer to use long and simple sentences to demonstratelanguage proficiency. This is especially useful when analysinggrammar and language comprehension sections of the proficiency tests.At the same time, Geva (2006) says that academic and specializedvocabulary may be necessary requirements for testing languageproficiency. This is what facilitates the move by some testers totest the students’ actual language levels in masking academiclanguage proficiency.
The methodology applied utilizes reliability and validity tests. Thismethodology tests the attached sample proficiency test. As for thereliability test, internal consistency of the test and replicationare measured. According to Olrich, Harder &Callahan (2012),internal consistency is the type of reliability that estimates thecoefficient of the test’s scores. If no uniform pattern is found inthe student’s responses, it means that the tester has notconsidered the student’s learning level and that the learners arejust trying to choose the answers randomly without consideringconcept and content. On the other hand, replication is used tomeasure the test’s convenience, procedures, and inference.
As for validity, face validity is an effective method for evaluatinga test. According to Olrich, Harder & Callahan (2012) thisinvolves peer-review of the question’s test to point out any areasthat need modification. The main reason validity is used is becausethe internal structure of the test may not concur with how it appearson the paper, meaning that there is need for a third party to gothrough it. Once the test has been validated by this method, anyrecommendation for future tests can be made and a catalogue of thesuggestions to be kept by the relevant authorities.
The grammatical structure section tests the student’s ability toform a sentence by filling in the blank spaces with an appropriateanswer from four given choices. This is a form of constituency test,which are used to identify the various constituent structures of asentence (Hawkey, 2006). Out of the seven recognized constituencytests that are used in testing the learner’s grammaticalstructuring ability, this test can be classified under the answerellipsis group (or the question test). Many testers use this asrough-and-ready tools for evaluating the learner’s grammaticalability. Given that there are a number of contradictory results thatmay come about, it is important for the tester to give a word ofcaution. This word of caution plays the role of eliminating bias andtreating all the learners on fair and equal grounds. It would makelittle or no sense at all to discredit one learner’s answer withouthaving adequately advised on the general directions to take whilstattempting the questions. As such, it is advised that the designersof this particular test arrange the test on a scale of reliability.This is because regardless of the fact that less reliable tests canbe considered useful, they are not actually self-sufficient on theirown. The greatest shortcoming about such a test is that it maymislead the evaluator into thinking that the learner has lesserproficiency than required.
Some of the questions in this particular test cover topicalization.In English testing, topicalization involves moving the position ofthe components of the sentence, bringing the front part to the back,and vice versa (Yuan & Dugarova, 2012). This simple movementoperation tests the learner’s ability to reconstruct a sentence.Given that topicalization of arguments in English is quite rare, itis not advisable for a proficiency test to put too much focus on it.At the same time, there has to be a consideration of diversity inculture. Different learners’ first languages have different levelsof topicalization. In fact, English is one of the languages with theleast incidences of topicalization in real world language. Having oneor two questions assessing topicalization in a 15-question set can beconsidered enough.
The grammatical section also tests about passives and actives. Ofsignificance is the learner’s ability to construct ill-formedpassives and well-formed passives. It is normal for the double objectsentences to produce two types of passives in theory. This isdependent on whether either one of the two objects can be put intothe position previously occupied by the subject. However, this iswhere the issue of culture and background comes in. English as firstlanguage speakers and English as alternative language speakers willperceive the sentences quite differently. While the latter are mostlikely not to detect any significant difference, the former arelikely to indicate that indeed one of them is ungrammatical. Forinstance, in question 97, many students will most likely think thattheir answers are correct, and all others are wrong. As mentionedearlier, this is where the issue of culture and background factorsin. The biggest concern as regards to this observation is how theevaluators are supposed to account for the discrepancy in the answersfrom the learners with different cultural backgrounds. The bestsuggestion for dealing with this is using Lexical Mapping Theory(Dalrymple, 2001). Lexical Mapping Theory relates arguments togrammatical functions. These are subj, obj, obj2 and obl. There are anumber of other rules that can be applied to solve this issue.
This test evaluates language that is more general. It does not focuson testing academic language, rather, the general spoken language.The learners have been prompted to finish the sentences they havebeen given by filling in missing words or phrases from a group offour choices. The answers given are real-English words, however, ofdifferent forms. The syntax is not complex, and the vocabulary ismade up of everyday words that are used by English language speakers.The discourse therefore is of less demanding process. However, someaspects of the answers demonstrate a content area that hasacademically more demanding language, which includes academicvocabulary and various syntax structures. The grammatical evaluationsection of the test has been designed with the general rule thatthere is need for the developmental of language proficiencyassessment, which takes into consideration the various levels ofcomplexity in language syntax. At the same time, the second sectionof the test provides the learners with a choice of four words thatthey can use to replace the words in bold, which has the closesmeaning to it. This technique is especially useful for testing thelearners’ ability to grasp concept, and not merely cramming whatthey have been taught in class.
The test also has a reading comprehension section. In this section,the learners have been asked to read a paragraph carefully,thereafter answer the questions based on what has been stated orimplied in the text. This particular mode of testing is useful forevaluating the learner’s ability to understand language as well asinterpret what has been written in the language. The essay-formquestions do not test the learner’s grammar ability, ratherunderstanding and comprehension. However, some of the questions havebeen designed to test whether the learner can stick to the basicrules of the language even as they respond to the prompts. Thecorrectness of the answer given in this question has littleconnection to the learner’s background and culture neither does ithave a connection to their personal evaluation of the instructions.
This test has used selected-response questions to evaluate thelearner’s abilities in a number of areas. This type if questioningis widely used in language assessment for two major reasons. First,Richards & Renandya (2002) argue that because they limit thenumber of responses that the learner can provide, for the purpose ofquick scoring and objectivity. These type of questions are in scoreddichotomously. This means that the answer provided by the test-takercan either be wrong or right, not both. As such, there are a numberof concepts that have to be applied when designing these tests. Thebiggest focus is laid on the frequency of the type of question as itappears in the tests over tie, as a matter of avoiding having thelearners to cram the answers and avoid paying attention to graspingthe fundamental concepts of the language. In regards to the lastsection of the test in this particular case, there are somerecommendations that have to be made to aid in the creation ofpassages.
Firstly, the test designer has to take into consideration thecharacteristics of the input for testing the learners. This isbecause there are a number of factors that can influence thecomprehension difficulty and cognitive load of the texts that areused to test the learner. As such, the tester has to be fully awareof the grammatical complexity and structure of the discourse as theyset to test the learner’s proficiency using passages. The rate ofrhythm of speech, accent and pitch are a number of the factors thatare most likely to affect the learner’s ability to read andunderstand the passage. At the same time, there has to be anevaluation, to a certain degree, the construct to be measured. Thismeans that a passage has to contain appropriate features for aparticular context.
According to Bailey (2007), the tester has to ensure that skills andknowledge are totally unrelated to the target construct. At the sametime, they have to avoid having a situation where skills andknowledge in multiple-choice questions influence the learner’sperformance. As such, the developers of English proficiency testshave to ensure that they pay attention to what they wish to test andbe able to compare the results to the objective of the tests. Itwould be of no importance to have a test that evaluates the learner’sgrammar competency by using a test that is out of their level. Thismeans that the options that the learner has been given to choose fromhave to be written in a lower proficiency level, which the learner iscurrently in. on the same issue, Leaver & Shekhtman (2002) saysthat the testers have to consider the test-taker’s reading andcomprehension abilities, as they are the basic requirements foranswering such tests. In the test that has been analysed in thisdiscourse, it is advisable that the tester adequately providesinstructions, which are not complex. It has to be kept in mind thatthe main challenge is in the questions, not the prompts. This is away of minimizing the probability of having the learner give wrongquestions just because they could not understand what was required ofthem.
While standardized English Proficiency tests are administered toevaluate the proficiency of English language, it is has to beconsidered that the effectiveness of the test relies on a number ofthings. That is why it is important to test the validity andreliability of the tests, to ensure that they deliver on theirobjectives. A number of researchers and scholars have come up withways of evaluating the tests, and at the same time, developed toolsfor doing the same. While using the tools and other guidelines,various factors are assessed to note any shortcomings and come upwith recommendations for improving the structure and content of thetests. The evaluators have to keep in mind that fairness andobjectivity are the core elements of administering language tests, assuch, have to revise their design, and to improve the tests overtime.
Bailey, K. M. (2007). PracticalEnglish language teaching: speaking.Higher Education Press.
Chapelle, C. A., Enright, M. K., &Jamieson, J. M. (Eds.). (2011). Buildinga Validity Argument for the Test of Teaching English as a ForeignLanguage. Routledge.
Dalrymple, M.(2001). Lexical‐FunctionalGrammar. John Wiley &Sons, Ltd.
Gardner, H. (2000). Thedisciplined mind: Beyond facts and standardized tests, the K-12education that every child deserves.New York: Penguin Books.
Geva, E. (2006). Second-languageoral proficiency and second-language literacy. Developingliteracy in second-language learners: Report of the National LiteracyPanel on Language-Minority Children and Youth,123-140.
Goldenberg, C. (2008). TeachingEnglish language learners. AmericanEducator.
Hawkey, R. (2006). Impact theoryand practice. Studiesin language testing, 24.
Hughes, A. (2003). Testingfor language teachers.Cambridge university press.
Leaver, B. L., &Shekhtman, B.(Eds.). (2002). Developingprofessional-level language proficiency.Cambridge University Press.
Llosa, L. (2007). Validating astandards-based classroom assessment of English proficiency: Amultitrait-multimethod approach. LanguageTesting,24(4),489-515.
Lowenberg, P. H. (2002). AssessingEnglish proficiency in the expanding circle.WorldEnglish, 21(3),431-435.
Orlich, D., Harder, R., Callahan,R., Trevisan, M., & Brown, A. (2012). Teachingstrategies: A guide to effective instruction.Cengage Learning.
Richards, J. C., &Renandya, W.A. (Eds.). (2002). Methodologyin language teaching: An anthology of current practice.Cambridge university press.
Young, J.W. Guidelines for best test development practices toensure validity and fairness for international English Language.Educational Testing Services.
Yuan, B., &Dugarova, E. (2012).Topicalization at the syntax-discourse interface in English speakersfor Chinese grammars. Studiesin Second Language Acquisition, 34(04),533-560.