UK Registered Charity number 1155018

Positive Reinforcement Training

The psychology of learning, clicker training and bitless riding.

Bringing horsemanship and science together

Is positive reinforcement training the best training approach for bitless riders?  The answer is yes, but to discover why this is the case, delving a little deeper is necessary.

Over the last 100 years, psychologists, and more recently neuroscientists, have advanced our understanding of how all animals learn.  Horses and humans are no exception to this relatively new science.  Horse trainers have discovered aspects of this through trial and error over the centuries since horses became an integral part of human culture, but sometimes things happened for reasons that weren’t clearly understood, or the reasons given for why horses behaved in certain ways didn’t quite stand up to scientific study.  Meanwhile, a scientific approach to studying learning started to yield a language and a set of rules that can be used and understood in the same way by everybody working on learning.  These discoveries apply to every single species from the tiny sea slug right up to the complexities of the human brain.  This language and set of rules allowed scientists to conduct research on topics as diverse as treating human phobias, training rats to detect and report the presence of land mines (Poling et al, 2013), and teaching dogs to detect and report the distinctive odours of human illnesses.

At the same time that this progress in research on animal learning was taking place, the horse’s role in human culture began to change.  In many cultures, the horse was no longer an essential part of work and subsistence and started to become a leisure interest instead.  Horse training began to change from the often very refined, but ultimately coercion based approach to an almost philosophical reflection on human-animal relationships. The 21st century approach to horse training now aims for a high welfare approach, conducting training in the best and most humane way possible.  This means that we understand that horses shouldn’t just be kept healthy in order to fulfil human purposes and economics, but should thrive and have a healthy and fulfilled life in our care.  The scientific discoveries of the 20th century psychologists can help us achieve this.

Since this is the approach we want to take with our ridden horses, we need to work out how to use the information we’ve gained about how (all) animals learn togive our horses a voice in their training.  This isn’t just a soft hearted compromise: we know now that animals treated and trained in this way are healthier, safer to be around, live longer and most crucially for us, actually want to cooperate with their human trainers and carers (Bassett & Buchanan-Smith, 2007).   What this tells us is that given a choice of the different ways currently available to train a horse to be ridden, we should be opting for the one that is most attractive and least stressful to the horse, not the one that is the quickest and most straightforward for us as trainers.  If we can train ourselves to use this approach, the two can be combined.  Currently, the traditional approach to horse training is the quickest and most straightforward for us to use purely because it is the one best known to most horse trainers, not because it is the one that causes least stress and most enjoyment for the horse as the learner.

What does the science tell us about how horses learn?

Suppose we choose to look in more detail at the positive reinforcement training approach to see how we can use it with our horse: where should we begin?

Rather than purchasing a clicker, getting a few slices of carrots and walking into the field with our horse, we need to step back and find out a bit about why this approach is going to help us.  To do that, we need to know exactly what and how our horse is going to learn, when we approach with the clicker and carrot.  The science of learning tells us that horses (and humans) learn about their world through two simultaneously occurring processes: associative and non-associative learning.  Clicker training harnesses elements of associative learning, but while we’re doing it, non-associative learning is happening too, and we need to understand how that can affect the quality of the end result.

Let’s start to unpack the scientific language!

Non-associative learning means that the animal changes their behaviour in response to a change in their environment.  Associative learning means that the animal learns that two things or two events are linked, and the link either changes how the animal feels, or changes how the animal behaves (or both!).

Non-associative learning can be divided into habituation and sensitization (Blumstein, 2017).  These are simply ways in which our and our horse’s nervous systems and brain learn what’s worth paying attention to and what’s better ignored.  There’s no point wasting energy being repeatedly startled by a car passing your field if your field is next to the main road… but equally, when you’ve been chased by a vehicle that came into your field, it’s worth being alert to signs that it’s about to happen again. The first example is a process called habituation, and through habituation, the horse learns that something new that’s just happened is not worthy of energy or attention.  A horse in a stable hears a car engine start outside – the first time, they may look to see what it is, but after it happens several times a day over a period of time, the horse will stop responding.  In the same way, if a familiar human drapes a rope over the horse’s neck, the horse may want to sniff and investigate the rope, but if it happens quite a few times over a few hours, the horse will stop paying attention and tune out the sensation of the rope touching their coat and skin.  This is the process by which horses learn to wear halters, headcollars and bridles – and the process by which humans learn to wear clothes, shoes and even dental braces. It’s a passive process – the horse (and human) doesn’t have to do anything, their nervous system simply takes a reading of the situation and then decides whether to ignore or respond to what’s present. It’s not that we’re unaware, it’s just that the brain turns the volume down.  You’re probably sitting somewhere with faint traffic noise (or, if you’re lucky, birdsong instead), but weren’t aware of it until you read this sentence!

In contract, sensitization is the process where the horse (and human) learn that something is worth attending to or has meaning.  For example, a horse may have habituated to the feel of vibrating clippers running over its skin thanks to a careful process undertaken by their human trainer.  However, one day the clippers accidentally pinch some skin and the horse flinches.  Their nervous system is now on alert, gathering information about whether clippers are in fact worthy of a fear response and immediate flight.  This means the horse is now collecting information about everything that’s happening when the clippers are present.  Maybe next time, the human trainer is careful that the clippers don’t pinch but a broom is knocked over outside the stable while the clippers are being used, making a sudden loud noise.  The horse startlesfearfully.  Now they are even more tuned to seek out information that clippers are to be feared… and even small, seemingly inconsequential events the next time, and the time after will build up until the horse can no longer be clipped. With sensitization, the animal’s response gets a little worse each time the situation is repeated as they add information about threat.  In this example, the horse can no longer be habituated to clippers because the clippers being present automatically prime the horse’s brain and nervous system to respond fearfully.

Humans go through the same process.  Imagine feeling a slight tickle on the back of your hand.  Mostly, it’s just part of your clothing brushing the skin and you ignore it.  Then, one day, you look down to see a wasp has settled on the back of your hand and seconds later, it stings you.  After that, every tiny brush on your skin leads you to check fearfully for another wasp. You have now become sensitized to that slight tickle on the skin, and you will react to it even if it’s only something as innocuous as a thread from your sleeve. With repeated tickles, you stop responding (desensitization) but your nervous system is still primed to check quickly for wasps where before you weren’t bothered.

In recent years, the term “desensitization” has been used in horse training to refer to a wide range of practices designed to stop horses responding fearfully to all kinds of things, from the sensation of moving wearing a saddle through to the horse’s reaction to the sudden appearance of a plastic bag in the hedgerow. However if we’re looking at taking the approach most pleasant for the horse and with the highest welfare outcome, it’s clear we should be using the process of habituation, where the horse learns – without a fear response ever being induced – that many of the aspects of domesticated life including ridden work are unthreatening and inconsequential. Desensitization should be reserved for when things go wrong despite all our best intentions, because the link to a fear response can never be removed – all we can do is temporarily reduce the level of responding.

At the same time, we need to be aware that we want the horse to be sensitive and responsive to our cues when we ride, but if we’re aiming for high welfare training, we don’t want that the horse sensitized to our cues because they are linkedto a potential threat.

Introducing clicker training

This leads on to the topic of associative learning: where an animal learns that two things or two events are linked.  Both kinds of associative learning are critical if we want to be able to use a clicker to train our horse.  The first is called classical conditioning, and the most familiar example for most of us is the scientist Pavlovand the dogs he trained to salivate in response to the sound of a bell.  In humans, the sound of crinkling foil is often a good signal to us that someone nearby is unwrapping a bar of chocolate, and the feelings we get when we hear that sound are similar to the feelings we have when we actually eat chocolate.    For horses, we know that the sound of the feed room door being opened leads to the horses acting in a way that shows us they’re anticipating food. This isdespite the fact that the opening of the door has nothing to do with the food itself at all, it’s just a sound that has become linked with food appearing. The same process can link a sight, a sound or a smell with something unpleasant: for example, the smell of singeing hoof horn can trigger a response in a horse who has had a bad experience with shoeing, even if they can’t see or hear the farrier at work nearby.

All of this comes together when we move on to the final type of learning: operant conditioning.  This is what has led to the development of the technique of clicker training.  The horse (and human) both learn that something they do – called “an operation on the environment” – leads to a change in their world.

There are two types of change in behaviour.  One is that a behaviour – let’s say lifting a hoof – is strengthened or happens faster or more often.  This change is called “reinforcement”.  The second change is when a behaviour – let’s choose standing still – is weakened, happens less quickly or less often – we say that if a behaviour has become weaker, happens less quickly or less often that it’s been “punished”.    In training a horse, we have lots of situations where we want more of a behaviour or for a behaviour to happen faster. An example would be change of gait: we want to the horse to respond promptly to our cue to trot, rather than maintaining a walk or only launching into trot after a few more strides of walk.  At the same time, we want the horse to remain still when we cue them to stand, and not to walk off, fidget or walk backwards. 

There are two ways to reinforce a behaviour.  The first way is to introduce something that the horse wants and likes, timing it to be available immediately after the behaviour we want to change.  So we add “an appetitive” (something pleasant and desirable).  Addition is the same as in maths and is shown as a + sign – hence the term “positive reinforcement”.

We can also reinforce behaviour by taking something away or subtracting it.  This is called “negative reinforcement”. For this to work, we need to introduce something the horse dislikes enough to want it to stop (or, if they have learned when it may happen, for them to do something  to avoid or escape it).  What we add in this case is called “an aversive”, and it can be as small as the tickle we felt with the wasp on the back of our hand, or as significant as the pain of a metal bar pressing on the horse’s gums over the bony bars of their mouth.  The horse will repeat the behaviour they learned caused the thing they disliked to stop, or they will do something to make sure it doesn’t happen again.

Using the add and subtract principles, we can also make a behaviour less likely to happen.  This process is termed “punishment”, but what it means is that we or the horse become less likely to keep doing a behaviour that is immediately followed by an unpleasant consequence.  So if stepping forward out of a square halt before being cued is always immediately followed by pressure from a bit on the bars of the mouth or on the tongue, the horse will be less likely to move out of a halt.  This is “positive punishment” – something (an aversive) is added that makes a behaviour less likely to happen again.  We can also reduce behaviour by subtraction: if allowing a headcollar to be put on immediately results in a horse being removed from grass, the behaviour of standing still to have a headcollar put on will become less likely in the future. This is termed “negative punishment”. One key feature of punishment is that, for a short while after it happens, it suppresses all behaviour to a greater or lesser extent – the horse is less likely to try doing anything. So a rider who positively punishes moving forward out of a halt with a pressure on the horse’s mouth will find the horse is less prompt at responding to their next cue (for example, a rein back cue).

With clicker training, we join together operant and classical conditioning.  As an example, it’s difficult to deliver a food reward (an appetitive) the very instant the horse performs a flying change in response to your cue, so you need to find a way to reinforce the flying change at the instant it happens. The sound of a clicker has no meaning to most horses, but using the principles of classical conditioning described above, we can associate the sound with the arrival of an appetitive, something pleasant that the horse likes and enjoys.  This is usually a food reward but can in some horses and at some times be scratching or gentle stroking (Ellis & Greening, 2016).  Since horses are rather less motivated to seek out scratches and social contact during active movement based training, food is what’s most often used as a positive reinforcer.  The horse learns that when they hear the sound of a clicker, it will be quickly followed by food, so the sound takes on meaning and the horse has the same emotional response that they do to the appearance of food.

This means that we can use the sound of the click as a marker that tells the horse “what happened in the instant before you heard that sound is a behaviour that’s worth repeating in the same situation, and it will be followed by food”. At the same time, by the process of association, the sound of the clicker actually slightly reinforces the behaviour.

We can use this simple approach to train any behaviour, and with bitless riding it’s particularly useful since the horse doesn’t have to try to eat around an obstruction in their mouth.

The relevance of clicker training to riders choosing bitless bridles

Just as with traditional horse training, using a clicker and positive reinforcement to train is something that starts on the ground: we teach the horse from the beginning of training that certain behaviours are desirable and will be rewarded. We can gradually build complexity and add recognizable cues (traditionally called “aids”) so that the horse will understand and respond to them when we’re in the saddle as well as on the ground. One of the first things most trainers do is to teach the horse how to behave around food, since it is such a powerful motivator (Hockenhull& Creighton, 2010).

This approach moves away from training that uses the addition and/or removal of an aversive to motivate and reinforce behaviour.  One key difference is again due to classical conditioning.  With positive reinforcement, the horse wants to start a training session and will work to get us to give them the cues that they have learned will lead to good things.  The cues we trained, the places we did the training, the equipment we used – and our own presence – all become associated with the reinforcers we used (Sankey et al, 2010).  So we become an appetitive for our horse: when they see us, the emotions they experience are similar to the positive emotions they experience when they find and eat tasty food.

The flip side is also true: if we use the application and removal of aversives (no matter whether they’re as small as the wasp tickle or as painful as sharp spurs), the horse has the same emotional response to the cues, the place the training happens, the equipment used and our presence as they do to the stimulus they dislike. We become an aversive (Innes & McBride, 2008).

For most of us, the horse comes to us already at least partly trained, and this training has usually involved the skillful application and removal of aversives. If we’re lucky it has been the skillful application and rapid removal of very small aversives, and we won’t have fear and trauma interfering with our riding and training.  However, suppose, along with using a bitless bridle, we want to change over to using positive reinforcement with our horse because we’ve read the literature and we see that it’s a high welfare approach and that horses really enjoy learning this way.  This is where it’s really helpful to know about all kinds of learning – habituation, sensitization,classical and operant conditioning.  If we simply add a click followed by a food reward when the horse responds correctly to a cue they learned in their past life through negative reinforcement, it can have an unpredictable effect on behaviour.  Let’s go back to the tickling wasp example…  Suppose you feel a tickle, you react fearfully expecting a wasp sting, but you’re offered a piece of chocolate! Next time you feel the tickle, do you respond with unmitigated joy, or with mild suspicion and slight concern?  Adding an appetitive to something that’s previously been associated with something unwanted or unpleasant creates a mental conflict: “will this signal bring something bad or something good?”.  Conflict like this leads to humans and horses responding cautiously, rather than joyfully, to trained cues.  So you can give a trot cue and find that the horse hesitates and then moves into a slightly bumpy trot rather than instantly and confidently responding to your cue.  Or that the horse allows their hoof to be lifted but quickly wants to pull it back. In these situations, we’re diluting the potential power of positive reinforcement.

However we can use the power of classical conditioning, together with our knowledge of sensitization, to create a better emotional response leading to a quicker, cleaner and more joyful physical response to our cues: this is a process called “systematic desensitization and counter conditioning”.  We can use

classical conditioning to create a new and positive emotional response to the cues that the horse learned might lead to unpleasant things by spending time pairing them with good things.

There’s another excellent reason to spend time training (or retraining) our horses using positive reinforcement.  We’ve already seen that it leads to the horse forming positive associations with us, the place we train and the equipment we use.  There’s another reason it can contribute to safety and welfare:when we train using even the mildest negative (or subtraction) reinforcement, we cannot ever afford for the horse to refuse to respond to our cue, because if they do, we gradually desensitize them to the presence of the cue.  The only way forward is to resensitize – to make the cue sufficiently different and sufficiently unpleasant that the horse will respond.

If, however, a horse doesn’t respond to a cue that has previously led to them getting something good, we immediately have information about the horse’s mental state.  Why would you choose not to gain something you like?  If a horse trained using positive reinforcement doesn’t respond to a cue to come to the mounting block, we know it’s not because they are trying to avoid or escape an aversive we used in training, so by process of elimination we can often spot health or pain issues before they cause damage to the horse or a dangerous riding situation for us. In contrast, if a horse who has been trained to come to the mounting block using pressure on the rope and tapping on the hindquarters fails to approach the mounting block, we don’t know if it’s because our pressure and tapping wasn’t aversive enough, because the horse had found a way to prevent us applying the pressure, or because something else was wrong.

Bitless riders are typically very interested in finding ways of training that don’t depend on having a strong aversive held in reserve for emergencies. Many riders who use bits believe they have a failsafe where a strong aversive can be applied using a metal bit to suppress an unwanted behaviour. The extent to which this is a well founded belief is debatable, given that horses will run through even the strongest aversive if they believe their life is in danger.As bitless riders, however, we want to choose the least aversive tack possible in all situations. Welfare concerns mean we should never choose a more aversive piece of tack than we need.  Because of this, we need to be able to have confidence that the horse actually wants us to give them the cues we’ve trained, not that the horse is looking to find ways to escape or avoid the cues we use. We also want the cue we give in an emergency situation to have strong positive (appetitive) associations, rather than being either a direct or a conditioned aversive.

It’s also worth comparing the motivation of a horse trained to respond to cues using negative (subtraction) reinforcement and positive (addition) reinforcement.  In the first case, the horse’s primary motivation is always to escape or avoid the reinforcer, so we may find that unless we are very careful with our application of the reinforcer, the horse will try a range of behaviours to stop us giving the cue.  In traditional training, these behaviours are labelled “evasions” – the horse will duck their head down or raise their head up to avoid us applying pressure on a bit (or a noseband), or will speed up or run sideways to avoid us using our leg to touch them with our heel or a spur.  In competition, a rider whose horse lashes their tail in response to a leg aid will lose marks, as will a rider whose horse gapes their mouth to avoid a bit cue or put their tongue over the bit.

A horse whose cues have been trained using positive reinforcementactually wants the rider to give the cues, and they will offer increased responsiveness and offer more and larger behaviours without prompting, because they know the chain of behaviours will ultimately lead to a reward.  A horse trained using negative reinforcementhas no motivation to amplify behaviours – if the behaviour resulted in the aversive being removed, there’s no reason to do anything more than what worked last time.  Only a stronger aversive will lead to a bigger behaviour.

Many riders enjoy working towards various forms of liberty training with their horse, including tackless riding.   In conventional training, a lot of work has to be done in advance to train the horse using equipment including ropes, sticks or whips and bits/other tack. The horse must learn that cues cannot be escaped, and that moving away from the trainer will result in aversives, while staying close to the trainer will lead to the aversives being removed.  The horse has to be responding instantly and consistently to all these cues before any equipment can be removed.  Liberty training using positive reinforcement doesn’t need any equipment at all – the horse wants to stay close to the trainer, and wants the trainer to give the cues the horse has learned will lead to rewards, so the trainer can start liberty work from the very outset of training with a willing and responsive partner. 

In the same way, many riders complain that their horses are barn or buddy sour, inclined to nap or rush home.  Remember that the equipment used, the cues trained and the rider themselves have all become associated with the application and manipulation of aversives. There is no positive association in the horse’s mind to help overcome the worry and fear of separation from their equine companions.  Training from the outset using positive reinforcement means that the horse is more likely to want to stay with the trainer or rider, so gradually building the horse’s away from home and companions is much more likely to progress quickly.

Riders are also familiar with horses who spook, start to buck or run home “out of the blue”.  We know that there is a cumulative effect of stressors on a horse (often referred to as “trigger stacking” – a huge topic, for another article!) and that something that seems small and innocuous to us can be the final straw in terms of the horse being able to maintain calm behaviour.  What we forget is that our training has used the application and removal of aversivesto train behaviour – so what seems to us just a simple nudge with our heel or lift of the rein is actually a threat from the horse’s point of view (Hockenhull& Creighton, 2013).  In a crisis situation, our equipment and our cues are not actually safety gear but the final straw that means the horse can no longer cope with a stressful event.

A 21st century horse training revolution

Putting everything together, since our 21st century horses are partners and companions, we want their experience of working with us to be both enjoyable and rewarding.  The next steps, for most of us, are simply finding out how to apply the huge body of knowledge about how positive reinforcement works, in a way that allows the horse to learn the skills to interact with us safely and effectively.  The horse is an excellent and focused learner, but we’re still at the very beginning of developing our skills in using this science.  The full potential has yet to be tapped, but as more and more people choose to start out using this approach, or to convert their existing skills to working this way, the advances will quickly overtake old fashioned coercion-based methods.

Bassett, L. & Buchanan-Smith, H.M. (2007). Effects of predictability on the welfare of captive animals. Applied Animal Behaviour Science, 102 (3–4), 223-245.

Blumstein, Daniel T. (2016) Habituation and sensitization: new thoughts about old ideas. Animal Behaviour 120, 255-262

Ellis, S. & Greening, L. (2016).  Positively reinforcing an operant task using tactile stimulation and food – a comparison in horses using clicker training. Journal of Veterinary Behavior, 15 (September–October), 78.

Hockenhull, J. & Creighton, E. (2010).  Unwanted oral investigative behaviour in horses: A note on the relationship between mugging behaviour, hand-feeding titbits and clicker training.  Applied Animal Behaviour Science, 127 (3–4), 104-107

Hockenhull, J. & Creighton, E. (2013). Training horses: Positive reinforcement, positive punishment, and ridden behavior problems. Journal of Veterinary Behavior, 8 (4), 245-252

Innes, L. & McBride, S. (2008). Negative versus positive reinforcement: An evaluation of training strategies for rehabilitated horses. Applied Animal Behaviour Science, 112(3-4) 357-368

Poling, P.,Weetjens, B., Cox, C.,Beyene, N.W., Bach, H. & Sully, A. (2013). Using trained pouched rats to detect land mines: another victory for operant conditioning. Journal of Applied Behavior Analysis, 44(2), 351-355.

Sankey, C., Richard-Yris, M-A.,Leroya, H., Henry, S. &Hausberger, M. (2010).Positive interactions lead to lasting positive memories in horses, Equus caballus. Animal Behaviour, 79 (4), 869-875

Credit   
Dorothy Heffernan, Ph.D., C.Psychol.  is the author of a series of blogs about equine behaviour, ethology and psychology written to help horse owners and trainers gain a greater insight into why their horses might behave the way they do. She is a member of the World Bitless Association and the Pet Professional Guild and uses a force-free approach to training. She enjoys working with horses and their people to find evidence based ways to develop training and observation skills, with the aim of creating a more harmonious and enjoyable relationship.  She’s especially interested in making the science of “learning theory” more accessible so that it can be applied in day to day life with horses in an enjoyable and ethical way. She has ridden bitless for over 15 years, and loves helping people transition to this approach. Her four equine companions provide inspiration, information and insight. Dorothy is based in Scotland, just outside Glasgow, and runs a local positive reinforcement and equine ethology group who meet regularly to learn, study, train… and share coffee and cake! If you’d like to come to some of the regular meet-ups, seminars or training sessions, please do get in touch via dorothy.heffernan@gmail.com. If you’d like coaching, feedback on equine behavioural issues or to organise a local seminar on equine behaviour or optimal management of horses at traditional livery yards, it would be great to hear from you. To learn more, visit Horses Under Our Skin https://horsesunderourskin.wordpress.com/about/  
Scroll to Top