Pınar Gündüz & Berna Akpınar Arslan

The 9th Forum on Assessment Issues (FOAI) was hosted by Social Sciences University of Ankara on 3-4 November, 2017. The event brought together over 30 institutions to discuss “Designing Speaking Rubrics”. We would like to thank our hosts for helping us to engage in fruitful discussions and exchange of ideas to better our practices in our institutions. 

On the 1.5-day event, the FOAI program began with the keynote speaker, Dr Keith Hoodless on speaking rubrics. In his talk, he emphasized that speaking as a skill was very demanding. To illustrate his point, he stated that native speakers speak at a rate of 2.5 words per second, which could increase up to 5 words per second when the speaker experiences anxiety. He also said that an educated speaker might have a vocabulary of 30,000 words. These examples show how challenging it is for a learner of English to reach a native-like competence. 

Dr Hoodless stated that the main difficulty with the assessment of speaking is that it is subjective in that every assessor can have a different understanding of the rubrics if they are not designed well. He emphasized that the reason speaking is the most challenging skill to test is due to its multifaceted nature, since it draws on fields such as anthropology, communication, linguistics, and so on. Other challenges associated with it are the interference of listening comprehension, the design of the task that elicits the student output, and the scoring procedures. 

He argued that the challenges that come with the assessment of speaking can only be overcome with a clear set of criteria. The criteria should be designed so that they are unbiased towards variables such as personal characteristics, knowledge of the world or affective schemata of the students. Context is crucial to writing effective rubrics as the content of the rubric and the design of the test will depend on it. 

Clear rubrics are crucial for effective scoring of oral performance due to various reasons including but not limited to:

•providing students with clear expectations

•giving concrete details about how to obtain a particular score

•defining what quality entails

•being quick, objective, and efficient, 

•making justification of scores less tedious for teachers

•providing students with detailed and timely feedback

•becoming more objective

•helping refine teaching skills/learning activities

Based on the institutional needs, criteria can be holistic or analytic, and each one has its advantages as well as drawbacks. Analytical criteria break down what the educators want their students to have reached into discrete competencies, and they have certain advantages over holistic criteria in that when analytical criteria are used, each dimension of performance is evaluated separately, allowing students to be assessed more fairly. Analytical criteria serve a diagnostic purpose because both the teacher and the student can see which areas need to be further worked on. For the same reason, they make the connections with instruction easier, having a positive washback effect. However, they take longer time to score compared to holistic sets of criteria, and it is harder to achieve inter-rater reliability. Holistic criteria, on the other hand, are useful in that all dimensions of student performance can be evaluated simultaneously, therefore, they are usually helpful for summative assessment, especially. Scoring is faster with holistic criteria, and because of a narrower range of bands, there is usually higher inter-rater reliability scores. 

Dr. Hoodless also went over why some teachers or institutions may still want to go for readymade rubrics instead of developing their own, but that ideally teachers or institutions should design their own set of criteria because creating their own rubric allows the teachers to:

•make adjustments to cater for the needs of a specific group of learners

•benchmark, monitor and progress student progress

•be clear about the expectations and aims of the course and from the students

•eliminate certain kinds of bias/subjectivity

Dr. Hoodless concluded his talk by stating that he often found that rubrics are usually poorly handled. Another problem he commonly encounters is that many criteria are list like and focus on trivial details.  

The second session of the day was a workshop given by Dr. Reza Vahdani from Social Sciences University of Ankara. Dr. Reza’s workshop focused on key terminology associated with the testing of speaking and rubric design. Participants were asked to define key concepts, and exchanged examples of how these concepts may affect the evaluation of spoken performance. 

Dr. Vahdani then focused on the concept of rubrics, and why they should be used while evaluating performances. He summarized the main reasons as: 

•improve the consistency and efficiency

•providing feedback and supporting teaching and learning

•generating a record of student achievement

•communicating with others outside the classroom

Following the discussion on the use of rubrics, the second activity for the participants focused on the rubric design process. Dr. Vahdani emphasized that we should focus on the operation meaning of speaking in our own contexts for this, or in other words, what we expect our students to be able to do. The design process is challenging, and for this reason, he recommended that more people are involved in the process. 

The participants were asked to identify the main stages of design and the steps that need to be followed for each stage of rubric design and implementation. The main stages were summarized by Dr. Vahdani as: planning, designing the rubric, planning the assessment procedures and using the rubric, evaluating the reliability or fairness of the rubric, evaluating the quality of your rubric, and planning feedback and revise for pedagogically useful ratings. 

Dr. Vahdani then focused on evaluating Validity. Some of the key concerns for this evaluation include appropriacy of the rubric in terms of the purpose of assessment, specificity, clarity of the descriptors, the link between what is assessed and the purpose, and whether the rubric assess the knowledge and skills that make up the purpose of assessment. 

For evaluating the reliability of the assessment, there are various tools that help track the performance of the raters, such as Cronbach’s Alpha, the Spearman-Brown prophecy, or Standard Error of Measurement.

Following validity, evaluation of usability was discussed. Dr. Vahdani touched on a few concerns regarding usability, which focused mainly on the number of the criteria, the number of levels, the language in the descriptors, and the layout. 

After Dr. Vahdani’s workshop, the participants were involved in the focus group discussions, which centred around the questions below:

OUP A: Design of Rubrics

·Does CEFR play an important role in your speaking rubric design? If yes, how?

·Should rubrics for external exams be used as a model?

·Who should be involved in rubric design?

·Which type of rubric outweighs the other in terms of advantages? Holistic or Analytic?

·There is a myth that analytic rubrics are much stricter than holistic ones. Do you think it is correct? Or is it vice versa?

·What discrete elements/bands should be included in speaking rubrics?

·How detailed should the descriptors be?


GROUP B: Levels and Assessment Types

·What are the advantages of having different rubrics for each level?

·What are the advantages of having one type of rubric catering for all levels?

·Should the rubrics used for achievement (level exams) and proficiency exams be different?

·Should the rubrics be based on the task types? E.g. pair task/individual task, presentation, debate, etc.

·How do task types affect the content of rubrics? What descriptors would be required for specific task types?

·Should rubrics be independent of task types? If yes, what should our rubrics be based on?


GROUP C: Implementation of Rubrics

·What are some ways to monitor the effectiveness of speaking rubrics and ensure that speaking objectives are assessed in a reliable way?

·How can we ensure the success of norming sessions done for instructors’ training?

·Should students be familiarized with the rubrics used by assessors before the assessment? If yes, how?

·How should scoring be done to ensure reliability and validity of assessment?

·Should each band have the same weighting? Should points awarded for each band be equal?

Each focus group presented what they discussed and shared it with the whole group on the second day.  The highlights of the discussions could be summarised as follows:

•CEFR is taken as a basis for developing syllabi and rubrics in almost all participant institutions.

•External exam rubrics could be used as models and resources to benefit from due to their reliability, validity and practicality; however, they should not be replicated. 

•Despite the fact that various stakeholders including teachers, students, even course book writers, could be involved in rubric design, the most common parties involved are  experienced teachers, assessors and coordinators or managers. Students seem to not be involved in rubric design but directly or indirectly gathered student feedback is used to evaluate the effectiveness of rubrics.

•The decision about whether analytic or holistic rubric should be used depends on different variables including the nature of the test, task types, objectives, and so on. 

•Analytical rubrics may be preferred for all exams when providing feedback to students is required whereas holistic rubrics may make assessors’ lives easier.

•Any rubric could be strict depending on how it is used. 

•Institutions have different bands but the most common ones include the following keywords: task fulfilment, coherence, fluency, accuracy, range, grammar, vocabulary, interaction.

•Descriptors should be both detailed and concise, and they could be  performance driven or theoretical driven reflecting “delivery, organization, content, use of language, pronunciation and discourse, textual knowledge, functional knowledge, pragmatics, fluency, lexis, interaction, consistency, developing ideas, justifying opinions, task achievement”.

•Weighting needs to be determined considering institutional/program/level expectations.

•Having one rubric is preferable due to practicality. Yet different rubrics are considered better for formative evaluation, more meaningful feedback for both students and teachers, important for validity. 

•Training both students and teachers is essential for a more reliable speaking assessment.

•Standardisation needs to be facilitated and carried out as effectively as possible to ensure higher standards in speaking assessment. 

•Assessors need to be carefully chosen and paired considering the experience level and familiarity with the assessment. 

•Assessors and interlocutors could swap roles and/or assess together to increase reliability.

•Scoring needs to be monitored by assigning more than one assessor and random checking of grades.

•The exam administration process needs careful planning taking into account the need for re-evaluating performance especially if the institution has a petition policy, so recording of any sort needs to be planned well.

Considering the nature of the speaking skill and the difficulty of assessing it due to a variety of concerns, assessment process should be organised as meticulously as possible and the rubrics, the backbone of the whole process, need to be designed and implemented carefully. FOAI-9 provided a common ground for all the participants to share what they know, do or ignore in terms of speaking rubrics and everyone left with a warm smile reflecting the satisfaction derived from this sharing experience.