Welcome to the professional learning module introducing assessment. We hope you find it helpful and informative, and send us any feedback you might have. For the full list of the modules and explanation of how they work, please see here.
This course will outline some basic assessment theory and will be the first module of a number looking at assessment. During the course we will ask questions to guide and frame your thinking. At the end of the course we will ask you to complete a short written task, so you many want to take notes as you progress through the course. You can do this using our virtual notepad here, which also gives you the option of receiving feedback on your notes.
1. What are some of the purposes of assessment?
There are many reasons we assess. Throughout the many threads in the #cogscisci email group we have seen the range of these reasons become apparent. Some teachers assess in order to identify students for specific interventions and some teachers assess in order to (try to) predict the spread of attainment when their students finish GCSEs. These are all valuable reasons but unless teachers are clear on their reasoning, the assessment design they produce might not be suited for it. Before we start, here are some questions to get you thinking:
- Think back to last week. Where in your practice did you assess? Try to be as specific as you can.
- For each of the examples you remember, what was your purpose behind the assessment activity?
Read this blog by Evidence Based Education. Here are some questions to frame your thinking about the blog:
- How far do you agree with the statement that “no assessment is 100% accurate”?
- Does it make sense to say the answer to the assessment “What is your favourite colour?” has some inaccuracy?
- Focus on one set of assessments done at your school (you might want to focus on January mocks for example): what purpose do these assessments serve to people in various roles in the school, MAT lead? Headteacher? Assistant Head for Data? Head of Department? Fellow colleagues? Yourself?
- What does the word “construct” mean in the psychological literature?
- Think ahead to a point in time where you will formally assess. What is the construct you’d like to assess? Do you think this construct is useful to you? Is it useful to your fellow colleagues?
Read the paper by Archer. Here are some questions to frame your thinking about the paper:
- Do you agree with Archer’s three basic purposes for assessment? Why? Why not?
- Archer (along with Paul Newton at the top of page 2) argues that classifying assessment along the lines of summative/formative “leaves something to be desired”. How far do you agree with Archer? How useful (or not) has this distinction been so far in your career?
- Archer then lists Newton’s and Schildkamp and Kuiper’s interrogation of assessment purposes. Is this framework more useful to you?
- In the section “Assessment to Support Learning”, what dangers does the triangle metaphor imply?
2. What is meant by “validity”?
Validity is often referred to as the central concept of assessment. The history of validity thought stretches back to the birth of large scale testing 120 years ago. Because validity is such a large and often misunderstood topic, each reading is paired up with an accessible introduction. Read section 2 of this blog to get a history of validity thought as well as this blog by Evidence Based Education to clarify some of the ideas.
Whilst reading, here are some questions to frame your thinking:
- Why doesn’t it make sense to talk about validity as a property of the test?
- What are some construct-irrelevant factors when your students sit a science mock?
- What does construct under-representation mean?
- Think back to the last formal assessment that you administered. Was there any construct under-representation?
- Think back to the last formal assessment that you administered. Was there anything that introduced some construct-irrelevant variance?
- Summarise, from the two blogs, a definition for validity that you could give to a trainee.
Sometimes a concept is made clearer when you’re told what not to do! This article introduces a paper by Crooks, Kane and Cohen that explores some of the threats behind validity claims. After you have read the article, read the 8 page section titled “Threats Associated with Each Link” from the bottom of page page 269 to the top of page 279. The questions that follow will help guide your reading of the paper.
- Why is a chain used as a metaphor to hold the approach of Crooks, Kane and Cohen together?
- Read the paragraph that starts off with “The importance of the eight links…” on page 269. Could you come up with another threat for any of the links before you read the rest of the paper?
- Do you agree with Crooks, Kane and Cohen that “for a classroom-based assessment intended solely for diagnostic and formative purposes, the aggregation, generalization and extrapolation links may be somewhat less important than other links” (page 269)?
- Think back to the last formal assessment that you administered. Was there a weak link in the chain? What stage was the weakest?
3. What is meant by “reliability”?
Reliability is probably best thought about as “freedom from uncertainty”. If we have highly reliable assessments, we are likely to get the same results if we apply that assessment to the same students under similar conditions. Whilst validity can be quite discursive, reliability can get quite technical fairly fast.
One of the biggest mistakes newcomers can make is that they confuse the term “reliability” with “dependability”. Reliability is a technical term with the restricted meaning mentioned above, however, “dependability” is a judgement made on the quality as well as the certainty we get from an assessment.
This blog by Evidence Based Education is a good introduction to reliability. Here are some questions to guide your thinking:
- Reading the “sources of error” section from the blog, can you think back to your own practice and identified real situations where some of the sources of error were present?
- What is the difference between inter-rater and intra-rater reliability?
- What assessment scenarios have you been part of in school that involved an attention to inter-rater reliability? Intra-rater reliability?
- How often have you implemented the suggestions in the section “improving rater reliability”?
The Institute of Education ran a study on how reliably teachers assess in classrooms. Read pages 4, 5, 6 and 7 starting with “In-depth review” two thirds of the way down page 4. Here are some questions to guide your thinking:):
- What could the main conclusions from the first six bullet points be??
- How do these conclusions differ from your own personal experience?
- From the next section reporting the validity of teacher judgement, what could be the main conclusions from the report?
- What were the main findings from the conditions that affected the reliability and validity of teacher summative judgement?
- How can teacher judgement be made more reliable using the findings from this report?
- Did any of the findings surprise you? Which ones and why?
4. Bringing it together
We don’t think this is the be-all-and-end-all of assessment. Assessment is much more than what we have presented. It’s the systematic enterprise of finding what our students know and so there is much we haven’t covered. We haven’t covered motivation, ethics, comparability, the estimation of reliability, modern test theories … the list goes on.
In future modules, we will look at how to apply some of this thinking to the classroom. Adam Boxer has brought together these principles in his blog “What to do after a mock?”. Here are some questions to guide your thinking:
- How has Adam used the concept of validity to guide his practice?
- To what extent have your question-level analyses run into the problems Adam describes?
- What are your top 3 takeaways from this blog?
Our last activity is a youtube video of Dylan Wiliam’s webinar “There is no such thing as a valid test”. We hope that by re-covering some of the ground from sessions 1 to 7, you have a clearer picture of what good assessment is.
5. End of module task
As with all modules, there is a task at the end. This task should serve not only to consolidate the work you have done throughout the module, but is also a way for other teachers to see how you have applied and used the things you have learnt.
Please choose a task from the below and send it to firstname.lastname@example.org:
- Write a short article about what you have learned throughout this module about assessment. You may want to focus on a few key ideas.
- Prepare a keyword glossary of key assessment terms for your colleagues based on this module.
- Write a reflective article about your previous practice, where you may have changed your mind in light of what you have learned, and what you intend to do in the future.
We would love to publish anything you produce, but will obviously not do so without your permission. If you are happy for us to publish it on CogSciSci please let us know in the email.
If you have any feedback for us, you can submit it anonymously here.