Weighing the pig AND making it fatter


Lesson observations don’t work.  So, how do we judge the quality of teaching?

Blue sky thinking and other clichés

In a previous life, when I travelled up and down the country delivering training courses to teachers, I’d often start the day with a spot of ‘blue sky thinking’. After I’d apologised for using the term ‘blue sky thinking’, I’d asked colleagues to imagine a world in which there was no Secretary of State for Education, no Ofsted, no SLT, and no performance management. I liked this activity because, frankly, it didn’t matter what else I said or did, I knew my colleagues would skip home smiling from ear to ear imagining a post-Gove world of possibility.

The purpose of the exercise, though, was to try to ascertain what, beneath this clear blue sky, colleagues would do differently.  And, surprisingly, colleagues didn’t say they’d go to the pub whilst their students sat comatose in front of a video.  Instead, they said they’d teach lessons which were redolent of The Dead Poets’ Society.

O captain, my captain.

Amid these excited discussions, I’d pose the seemingly facetious question: What’s stopping you?

My question was only ‘seemingly’ facetious because I genuinely wanted to understand why colleagues didn’t teach in the way they wanted to every day. What was holding them back? What was zapping their passion and drive?

Fear, that’s what.

Fear that Ofsted wouldn’t like it. Fear that school leaders wouldn’t like it. Fear that it wouldn’t tick all the boxes on their school’s lesson planning pro-forma.

In short, colleagues didn’t teach the way they wanted to for fear their lessons wouldn’t conform to a prescribed structure and would therefore be judged inadequate.


Let me make clear I’m not averse to accountability. Far from it, in fact. I believe teachers should be held very firmly to account for their performance because teaching matters. If you under-perform in teaching you are guilty of misusing public funds. Moreover, if you under-perform in teaching you can inflict lasting damage on young people’s lives.

Though it may not always feel like it, teaching is a privilege and an honour. There are far easier ways of making money and striking a work life balance. If you teach, it should be because you care deeply about improving people’s life chances. So, yes, teaching matters and teachers should be held to account in order to ensure they do it well.

When it comes to managing people, my philosophy is simple: I don’t believe anyone wakes up in the morning intent on doing the worst job they possibly can; no one opens their eyes, stretches and yawns, then looks themselves in the mirror and vows to fail as many young people as possible that day.

It is no one’s vocation to fail.

But, despite the best of intentions, some people some times don’t perform as well as we’d like.

When people under-perform, they need to be given time and support – and this includes training – in order to improve.  Many will.  But those who don’t need to leave the profession either willingly or otherwise. Retaining people who cannot perform the duties for which they are paid serves no one well, least of all our learners.


So accountability – when managed fairly and accurately, honestly and transparently – is a good thing. It ensures the best people do the best jobs; it ensures the teaching profession – and our next generation – is kept safe.

But our current system of accountability is broken. It doesn’t accurately and reliably measure the quality of teaching nor the effectiveness of our teachers. What it does – as described above – is create a climate of fear; it straitjackets teachers.

And this fear is killing creative teaching.

I hear it first-hand whenever I engage colleagues in a spot of ‘blue sky thinking’. When I ask them to imagine a brave new world in which our current systems of accountability are erased, they grow visibly taller, happier and more excited about teaching. And they speak passionately about how they’ll challenge and engage their learners, how they’ll ignite sparks, foster curiosity, and develop – oh no, here’s another cliché – ‘the whole child’.

The fear that surrounds our current system of accountability is preventing teachers from taking risks, from trying out new approaches in the classroom, and from providing learners with a varied diet of activities. Fear is preventing teachers from – oh no, here’s the hat-trick of cringe-worthy clichés – ‘thinking outside the box’.  Ultimately, fear is preventing teachers from being the best they can be.

And this is not a criticism of senior leaders or even Ofsted. I believe many of us work with the systems we have and do the best job we can. We try to be fair. And the system does have its advantages: it allows inspectors to verify the judgments made by school and college leaders; it enables a collection of data which may be used as part of a school’s self-evaluation. It can – when done well (as I attempt to describe here) contribute towards performance management. But the fact remains that, if our purpose is to improve the quality of teaching, the current system just isn’t fit for purpose.

The current system of accountability is also forcing us to do the exact opposite of what we preach is right for our students. We say that we promote a culture of risk-taking in our classrooms. We say that we promote a growth mindset (see here and here) in which making mistakes is not only accepted but is positively encouraged because doing so leads to learning. We say that formative assessment – whereby we focus on what students need to do in order to improve – is more helpful than summative assessment – whereby we tell students what they’ve already achieved.

And yet we have a system of accountability that closes down risk-taking, discourages growth mindset thinking, and summatively assesses performance without explicitly pointing the way towards improvements.

And when it comes to accountability, teachers’ greatest fear of all is failing the big test: the high stakes, formal, graded lesson observation. And it’s this I wish to take the sword to now…


Lesson observations are dead…

As a result of the traditional lesson observations model, teachers tend to do one of two things:

  1. They over-plan, over-teach and proffer a showcase lesson which bears no relation to their everyday practice.  Sometimes, this ‘showcase’ lesson is better than their ‘normal’ lessons because an incredible amount of time and effort has been invested in its conception.  Sometimes, however, this ‘showcase’ lesson is not as good as their ‘normal’ lessons because, having dedicated so much time to the planning and preparation of it, teachers are less willing to deviate from their intricate plans and so do not take account of students’ ‘live’ learning (what is happening right in front of them).  Incidentally, writing in the commentary to Ofsted’s annual report in December 2013, Sir Michael Wilshaw once again attacked some schools for insisting that their staff produce detailed lesson plans.  He said that “overly detailed lesson plans… stifle initiative and flair” and “mistake the process of education with the purpose”. He’s often said that Ofsted does not want to see showcase lessons, performances put on for inspectors’ benefit.
  2. They become stressed by the experience of being watched and so under-perform.  They are nervous and stilted; pressured and pained. They tune out of the classroom dynamics – that sixth sense which tells them when students need the pace to slow or quicken and when the work is inappropriately pitched.  They try to spin too many plates all at once and, far from providing an entertaining circus act, it all starts to resemble a Greek wedding. The stress squeezes all the fun out of their teaching and all the fun out of students’ learning so that there is no rapport between student and teacher and learning begins to feel like a chore.

In short, formal, graded observations do not allow observers to observe the teacher as they would normally teach.  But even if the teacher is brave enough to teach a ‘normal’ lesson and does not succumb to the natural stress of observation, the very presence of an observer in the room – particularly an inspector with a clipboard – inevitably alters the classroom dynamics. It’s what’s called the Hawthorne Effect.

I see it every day when I walk into lessons, labs and workshops. I can sense the focus of the lesson shift as the circus rolls into town and the clown wheels across the room on a unicycle, juggling fire. That’s not a criticism of colleagues; it’s a criticism of a system which breeds fear.

Colleagues assume I want to see them entertaining their students whereas I actually want to see students working, whatever that may look like. If students are working in silence reading or writing, so be it. If students are in focused discussions whilst the teacher is marking, so be it. I really don’t care so long as everyone is safe and focused. I will quickly ascertain from speaking to people whether or not students are challenged, engaged and making progress over time. I will quickly configure how the current lesson fits into the wider context and whether or not it is providing opportunities for students to fill gaps in their learning and move closer towards their targets.

Sometimes that might look entertaining; other times it will not. No matter.


And that’s not all. Formal, graded observations are also ineffective because

(a) it is not technically possible to grade a single lesson, and

(b) it is not possible to observe real learning.

Let me explain…

Ofsted makes clear that the grade descriptors it uses to judge the quality of teaching describe the quality of teaching that occurs in a school or college as a whole and not that delivered by an individual teacher. It is explicit that its grade descriptors are not designed to judge individual lessons and should only be used alongside a range of other evidence including data which shows students’ progress over time. Applying the grade descriptors in order to judge a lesson as ‘good’ or ‘outstanding’, therefore, is to misinterpret and misapply the criteria. I’m not suggesting it doesn’t happen – including by inspectors – but that it technically shouldn’t. I know some inspectors give teachers grades because teachers are conditioned to expect it and want to be put out of their misery! I don’t blame them. I’m guilty of doing it, too. Colleagues nearly always ask me what grade I’d give their lesson. I usually prevaricate and explain it’s not technically possible. I explain that what I saw was only a snapshot and needs to be used alongside a range of other data. But, in the end, having offered formative feedback, I give in. I grade it and I feel dirty as a result. But that’s the system.

Professor Robert Coe of the University of Durham argues that we can’t observe real learning anyway. Learning is not always visible and so we mistake poor proxies for learning instead. We see students engaged in discussions or listening attentively to the teacher and we assume that this means they’re learning. But how do we really know? Learning is a complicated process (as Dylan Wiliam says, “learning isn’t rocket science, it’s much more complex than that”) which takes place over time and is the result of a series of, well, complex cognitive processes, as I attempt to explain here. And what is learning anyway?  Surely it is – or at least in part – the ability to retain and recall information at a later date?  How can we possibly observe this in twenty minutes to an hour? By observing a lesson, we can see the information as it goes in but we’d need to see it as it comes out, too, in order to be sure it has been learnt. (I’m sure there’s a metaphor there somewhere.)


Notwithstanding the misapplication of grade descriptors and the observer’s tendency to mistake poor proxies for learning, observations are unreliable because there is always going to be an element of human error. No matter how explicit Sir Michael Wilshaw and his official inspection documentation is about the fact that Ofsted does not have a preferred style of teaching, inspectors and indeed all other observers will project their preferences on what they see because it is human nature to do so – and it is often done subconsciously. Put crudely, if you’re a didactic teacher, you’re likely to favour didactic teaching in others. If the teacher isn’t teaching in the way you would, you are less likely to think favourably of that teaching. This is particularly true when observing your own subject specialism.

Coe argues that observers make a strong emotional response to particular behaviours and styles which are hard to over-rule. This is partly why observers rarely concur with each other’s judgments. Every observer is looking for something different and is making a different emotional response to what they see. Coe quotes statistics which show that, when one observer judges a lesson to be outstanding, there is – at least – a 51% chance that a second observer will disagree and – at most – a 78% chance that a second observer will disagree. If the lesson is judged to be inadequate then there’s a 90% chance a second observer will disagree.

What’s more, there’s little evidence to suggest that the outcomes of observations correlate with the other sources of data which may be used to judge the quality of teaching. For example, and again to quote Coe, there’s a 96% probability that a lesson judged outstanding will not have matching value added data. In other words, the lesson may be outstanding but the learning most certainly is not because learners are not making better than expected progress relative to their starting points. For a lesson judged inadequate, the probability that the value added data will contradict it is greater than 99%.

Dylan Wiliam says that, in order for a lesson observation judgment to achieve a reliability of 0.9, a teacher would have to be observed by at least 5 independent observers and teaching at least 6 different classes.

I hope I’ve made a convincing case against using only formal, graded observations as a means of  judging the quality of individual teachers and indeed the quality of teaching across an organisation… but…


Long live lesson observations

I’m not suggesting we should stop observing lessons altogether. In fact, I think walking into lessons to see what’s happening is important. By observing the classroom environment, for example, we can make judgments about the rapport the teacher has established with students, we can make judgments about how well the teacher manages behaviour and uses resources, and we can make judgments about the ways in which students are grouped. Lesson observations also allow us to see the ways in which transitions are handled and learning is organised.

Yes, observations have their place. Informal, ungraded learning walks allow us to observe some important aspects of teaching but observations alone do not enable us to accurately judge the extent to which learning is taking place. For that we need to triangulate what we see and hear in classrooms, labs and workshops with other sources of information, not least our – much maligned but absolutely vital – professional judgment.

In other words, we should measure the quality of teaching in a holistic not isolated way.


Winners are losers

I think most of us now accept that grading students’ work can be harmful in that it leads to what Carol Dweck calls the ‘comparison effect’ and doesn’t necessarily show the progress they’ve made. Grading work can lead high attainers to grow complacent and low attainers to grow despondent. Moreover, grading work encourages students to focus on the mark they’ve just got not on what they need to do in order to make further progress. You can read more about this here.

We now accept that formative assessment is what works best for our students because it provides them with clear direction and focus; it concentrates learners’ minds on what they need to do next in order to improve.

Surely, what’s good for the goose is also good for the gander… surely, we teachers must practice what we preach, we too must start assessing the quality of our teaching in a formative not summative way. And that means abandoning graded observations and only engaging in developmental observations aimed at helping colleagues to improve and refine their teaching.  After all, in order to win you have to embrace failure.  Or, as Michael Jordan famously said, ‘I’ve failed over and over and over again in my life. And that is why I succeed.’.

If we are to practice what we preach to our learners, then we too must embrace the growth mindset. We too must be willing to take risks, to make mistakes, and to learn from failure. We too must be willing to work with others, to seek feedback and challenge, and strive to get better and better.



In conclusion, if we are to improve the quality of teaching in our schools and colleges then our evaluative system must be:

–          formative: in order to improve the quality of teaching, and

–          holistic: in order to accurately and fairly judge the quality of teaching

Here a quick summary…

Formative: observations of teaching, like the best forms of student assessment, should be formative rather than summative.  They should identify what a teacher needs to do in order to improve rather than simply report their level of current practice against arbitrary criteria.  In order to be formative, observations should be conducted without fear or favour.  They should be led by the observed teacher rather than the observer and be focused on a particular aspect of their teaching at any one time. Peer observations and learning walks are a great way to do this. Lesson study is also a fantastic way for teachers to work together to improve their teaching in a focused, formative way.

Holistic: observations of teaching as those described above should form only a part of the overall judgment of a practitioner’s effectiveness and, indeed, of a school’s or college’s effectiveness.  Observations – a combination of learning walks, peer observations and lesson studies, not formal graded observations – should be triangulated with other sources of data such as work scrutiny (an evaluation of students’ work, assessment records, and planning information), and attendance, retention, success, and achievement data.  In fact, as many source of information as exist should be used to help form a fair, accurate picture of a teacher’s effectiveness.  That way, as an organisation we’ll know we have reliable data on which to act; as a practitioner you’ll know you are being judged fairly and will not be penalised for taking a risk in a lesson observation which didn’t pay off, or be branded inadequate simply for having a bad day.


Next steps

We’re piloting this new formative, holistic approach to monitoring teaching with one of our college faculties in the New Year and have already begun moving in this direction in our secondary academy school. I will share our experiences with you throughout the spring and summer terms of 2014 and will share the resources we develop along the way.

I’d like to hear from anyone who is trying something similar – particularly if you’re further along the journey than we are. But I’d also like to hear the counter-argument. I’d like to hear from anyone who firmly believes that graded observations should stay and why.


  1. Really enjoyed reading this. My MEd thesis is on a similar theme, I am researching what types of feedback boost teachers’ professional growth & sense of self-efficacy. I have long advocated that Wiliam’s “comments only” approach to learning could equally apply to teachers’ learning. I’m also interested in the acceptance – or rejection – of feedback, depending on the observer/observed relationship. I am keen to hear how things progress with your project. Good luck!

