Learning can be defined as a relatively permanent change in behaviour that occurs as a result of experience. It is an ongoing process that starts at birth and continues throughout our lifespan to help us adapt and cope in an ever changing world. From an evolutionary point of view learning is critical for for our survival in enabling us to distinguish between edible and inedible foods and to tell apart friends from enemies. The range of possible foods or threats are too great to be prewired into the brain, so instead we have the ability to learn from trial and error, and from observance of others, and to remember this learned information for future use.
Note that in this post I will be using the word organism a lot when describing learning processes that apply to humans as well as animals and insects. I’m not trying to be a cold and calculating science type, but for the purpose of this type of guide it is somewhat necessary… and besides, you are an organism!
The simplest form of learning that occurs is called habituation, this is when we gradually become used to a stimulus over a short period of time, we learn to ignore it. For example, the ticking of a clock might grab your attention at first, but it will slowly filter out of your awareness, same goes for birds chirping outside, cars on the road or even strange smells. This type of basic sensory learning is important because it helps screen out sensory information that doesn’t require our immediate attention.
Learning theory in psychology is the foundation of the behaviorist perspective, and the bulk of this post will cover the behavioral concepts of classical and operant conditioning, known together as associative learning. About 2,500 years ago the philosopher Aristotle proposed a set of laws of association to account for memory and learning. The two most important laws were:
- the law of contiguity, which stated that two events will connect together in the mind if they are experienced close together in time, such as thunder and lightning, or marriage and divorce (heh).
- the law of similarity, which suggests that objects that resemble each other, such as two people who look alike, are likely to become associated.
Table of Contents
- Classical Conditioning
- Operant Coniditoning
- Social Cognitive Theory
- Social Learning
Classical conditioning was the first type of learning to be studied systematically. In the late nineteenth century, the Russian psychologist Ivan Pavlov (1849-1936) was studying the digestive systems of dogs when he came upon a curious discovery. Like humans and other animals, dogs salivate when presented with food, which is a natural reflexive response. But Pavlov noticed that if he rang a bell every time the dog was presented with food, it would eventually salivate at the sound of the bell, without the presence of food being necessary. As Pavlov had discovered, what happened is the dog had learnt to associate the sound of the bell with food, and the reflexive response of salivating had actually transferred to the sound of the bell. This phenomenon is called classical conditioning and it is not only restricted to dogs and saliva, but is open to any situation where learning takes place as a result of two stimuli being associated with a certain response, such as being afraid of any body of water after having a near death drowning experience in an ocean.
An innate reflex such as salivation to the sight of food is an unconditioned reflex. Conditioning is a form of learning, therefore an unconditioned reflex is a reflex that occurs naturally, without any prior learning; it is instinctual, such as withdrawing a hand that touches something extremely hot. The stimulus that produces the response in an unconditioned reflex is called an unconditioned stimulus (UCS). In the case of Pavlov’s dog, the UCS was the food. An unconditioned stimulus activates a reflexive response without any learning having taken place, which is why the reflex is considered to be unlearned, or unconditioned. An unconditioned response (UCR) is a response that does not have to be learned. In Pavlov’s experiment the UCR was salivation.
Shortly before presenting the UCS (food), Pavlov presented a neutral stimulus (the bell) that normally doesn’t elicit the reflex in question (salivating). After the neutral stimulus (bell) had been paired with the UCS (food) several times, the sound of the bell alone came to evoke a conditioned response (salivation. A conditioned response (CR) is a response that had been learned. By pairing the UCS (food) with the sound of a bell, the bell became a conditioned stimulus (CS) – a stimulus that through learning, has come to evoke a conditioned response. This initial stage of learning, in which the conditioned response becomes associated with the conditioned stimulus, is known as aquisition.
This discovery of classical conditioning was a breakthrough in psychology as it explains how learning takes place in a limitless amount of situations outside the laboratory. If you’ve ever had a dog you’ll know that the act of picking up a leash prompts the dog to salivate and jump around, or maybe even sit patiently infront of you as it awaits the leash to be attached to it’s collar. This is because the dog has associated the leash with being taken outside for a walk, and classical conditioning has occured to reach this stage. Dogs aren’t born knowing that leashes equals going for walks, they learn this information.
Cat’s also react to the sound of tins being opened and will end up circling your feet until you feed them. Humans too are affected by classical conditioning, as most behaviours and phobias are learnt through conditioning. We aren’t born afraid of spiders or snakes, an encounter with these creatures has to occur (whether in real life or on a television) to condition us to be forever afraid of them. Likewise to be afraid of needles, one would need to have associated an injection with distress at some point in their childhood. Through the knowledge of classical conditioning, psychologists can just as easily make a person unlearn these conditioned responses and eventually break the association they have made with certain phobias.
Stimulus Generalisation and Discrimination
Once an organism has learned to associate a conditioned stimulus (CS) with an unconditioned stimulus (UCS) it may respond to stimuli that resemble the CS with a similar response. This phenomenon, called stimulus generalisation, is related to Aristotle’s principle of similarity. This explains how a traumatic experience with a common house spider can generalise itself to all spiders, including hunstmans, black widows, trap doors and tarantulas. It may even transfer over to insects similar looking to spiders, such as scorpions or crabs.
Stimulus discrimination on the other hand, is the opposite of stimulus generalisation. Discrimination occurs when an organism differentiates between two similar stimuli when these stimuli are not consistently associated with the same UCS. For example if you think back to your high school days, the sound of the lunch bell would evoke feelings of relief for class being over, and it might also follow with feelings of hunger, while the alarm clock bell that rings every morning wouldn’t evoke the same conditioned response, this is the result of us being able to discriminate between similar, but different stimuli.
Extinction in classical conditioning does not refer to the wiping out of a species, but instead to the process by which a conditioned response (CR) is weakened by presentation of the conditioned stimulus (CS) without the unconditioned stimulus (UCS). Sound confusing? Basically extinction is when we unlearn to associate a conditioned stimulus with a conditioned response. In the case of Pavlov’s dog this would be the dog no longer salivating at the sound of a bell, this will inevitably occur if the bell is rung without any food being presented – after a while, the dog knows what’s up.
Factors Affecting Classical Conditioning
Several factors influence the extent to which classical conditioning will occur with an organism, these factors are the interstimulus interval, the individual’s learning history and the individual’s preparedness to learn.
Interstimulus interval is the time between presentation of the CS and the UCS. If too much time passes between the presentation of these two stimuli, then the organism is unlikely to associate them and conditioning is less likely to occur.
An individual’s learning history also plays a part in classical conditioning; an extinguished response is usually easier to be reconditioned a second time around because the stimulus was already associated with the response in the past. On the flip side, sometimes prior learning can hinder conditioning. For example if I had already conditioned you to salivate at the sound of a bell, and then I wanted to condition you to evoke the same response to the flashing of a light, if during our light flashing sessions I rang the bell at any intervals you would ignore the lights and salivate at the sound of the bell. This is known as blocking, which is the failure of a stimulus to elicit a CR when it is combined with another stimulus that already elicits the response.
A similar phenomenon occurs in latent inhibition, in which initial exposure to a neutral stimulus without a UCS slows the process of later learning the CS-UCS association and developing a CR. For example if a bell is rung repeatedly without any presentation of food prior to the commencement of the food/bell association sessions, the dog might take longer to make the association between the sound of the bell and food.
Prepared learning refers to the biologically wired readiness to learn some associations more easily than others. This can be best explained from an evolutionary point of view, as associations that encourage survival are more easily learned, for example animals that have survived eating something poisonous are much more likely to associate that particular stimulus with avoidance. Given this knowledge it’s no coincidence that humans are more likely to associate fears or phobias of spiders and snakes than they are flowers and ants. This is because we are biologically prepared to avoid these creatures due to their potentially life threatening nature.
In 1898, a man named Edward Thorndike put a hungry meowing cat into a box with a mechanical latch and then deviously placed some food just outside the box – out of the cat’s reach. The cat meowed, paced back and forth and rubbed against the walls of the box, just as cat’s do. In doing so, it happened to trip the latch. Immediately, the door to the box opened and the cat gained access to the food. Thorndike stroked his moustache with pride and repeated the experiment, and with continued repetitions, the cat became more adept at tripping the latch. Eventually, it was able to escape from it’s boxy confines as soon as food appeared outside of the box.
Thorndike proposed a law (these people enjoyed proposing laws) of learning to account for this phenomenon, which he called the law of effect: an animal’s tendency to reproduce a behaviour depending on that behaviour’s effect on the environment and the consequent effect on the animal (humans are animals too you know). More simply put, the law of effect states that behaviour is controlled by it’s consequences… does this remind you of your school days? It should.
Thorndike’s cat began a second form of conditioning, known as instrumental or operant conditioning. Thorndike used the term instrumental conditioning because the behaviour is instrumental to achieving a more satisfying state of affairs. B.F. Skinner, who spent years experimenting with the ways in which behaviour is controlled by the environment, called it operant conditioning, which means learning to operate on the environment to produce a consequence.
In classical conditioning, an environmental stimulus produces a response, while in operant conditioning, a behavior (or operant) produces an environmental response. Operants are behaviours that are spontaneously produced rather than elicited by the environment. Thorndike’s cat spontaneously emitted the behaviour of brushing up against the latch, which resulted in an effect that conditioned future behaviour. Just like his cat, had Thorndike failed, he probably wouldn’t have persisted in these experiments, which others no doubt looked upon as silly ‘so you’re putting cats in boxes now?’ But because he had succeeded and had gained respect from his colleagues, he had continued his experiments and made a name for himself and placed it permanently in the psychology books.
In this post we will explore two types of environmental consequence that produce operant conditioning: reinforcement, which increases the probability that a response will occur, and punishment, which decreases it’s likelihood of occurring.
Reinforcement means exactly as the name implies: something in the environment that fortifies, or reinforces a behaviour. A reinforcer is an environmental consequence that occurs after an organism has produced a response and makes the response more likely to occur. There are two types of reinforcement, they are positive reinforcement and negative reinforcement. Are you still reading? Good boy!
Positive reinforcement is the process where presentation of a stimulus (usually a reward) after a behaviour makes the behaviour more likely to occur again. The psychological term for reward in this case is a positive reinforcer. A couple of examples of positive reinforcement would be students showing more effort in class when their behaviour is praised by teachers, and adults going to work everyday even though they don’t want to, because they get a paycheck at the end of every week.
Negative reinforcement is when the removal of an aversive stimulus makes a behaviour more likely to occur. Don’t confuse this with punishment because of the word negative, the meaning of negative in this context refers to the taking away of something, or subtraction of. An example of negative reinforcement would be using an umbrella in the rain, because the umbrella removes the unwanted stimulus of rain making you wet, the behaviour of using an umbrella in the rain is strengthened. Another example would be using an air conditioner more often in the summer to get rid of the heat, wearing shorts in the summer to avoid being too hot, or putting on sunscreen to avoid getting sunburnt in the hot summer sun. Yeah I know I just pulled a whole lot of weather examples out of my hat, but they all reinforce the point I am making…
Negative reinforcers are unpleasant stimuli that strengthen a behaviour by their removal, in the above examples case they are: rain, heat and sunburn. Another example would be: hitting the snooze button on an alarm clock is negatively reinforced by the dissapearance of the alarm sound, sometimes I even manage to hit the snooze button in my sleep that is how conditioned I am to hate my alarm clock.
Negative reinforcement occurs in both escape learning and avoidance learning. In escape learning, a behaviour is reinforced by the elimination of an aversive state of affairs that already exists, that is the organism escapes an unpleasant situation. Like a cat escaping the rain, or a person escaping a really bad party. Avoidance learning occurs as an organism learns to prevent an expected unpleasant from happening. In this case avoidance of an unpleasant situation reinforces the behaviour of avoidance. Both escaping and avoiding an unpleasant situation negatively reinforces the behaviour of escaping and avoiding. It’s a cycle.
While reinforcement increases the likelihood of a behaviour occurring, punishment does the opposite by decreasing the likelihood of a behaviour occuring. The criminal justice system is grounded on punishment, because they operate on the idea that people are less likely to commit crime if punishment is a consequence, and those that do commit crime are less likely to commit it a second time after serving time in prison.
Like reinforcement, punishment can either be positive or negative, and no, this does not have anything to do with the person’s feelings when receiving the punishment. Rather, positive simply means something is presented, while negative means something is taken away. So an example of positive punishment would be spanking a child for doing something naughty, while negative punishment would be taking the same childs toys away. An example of punishment that is both negative and positive at the same time is being put in time out (or prison), as the child is being given a punishment, and is also being removed from where they originally were. Another example would be spanking a child, while at the same time taking away all their toys – an impressive feat, bound to confuse the hell out of the one being punished.
A problem with using punishment with children and especially animals is that they sometimes don’t know which behaviour is being punished. I once had the displeasure of attending a dog obedience class with my brother’s wife, the room was full of small and yippy dogs that wouldn’t stop yelping and running in circles, and yuppie owners who were either on the phone or desperately trying to contain their animal, much to their own embarrassment. It was a zoo. I did learn a lot about training dogs though, and one thing I learned was dogs don’t know that pooing on your carpet is a bad behaviour. How should they know? It’s not like they can understand our language and strange customs and taboos. Sticking a dogs nose in their own mess is only abusing your dog, as the dog cannot make the association between what it did and what you are doing to it. The dog is confused, and wants to know why it is being so savagely punished for doing who knows what.
Another problem with punishment is that the learner may come to fear the person dishing out the punishment (via classical conditioning) rather than the action (cia operant conditioning). A child who is harshly punished by his father, for example, may become afraid of his father instead of changing his behaviour.
Operant conditioning occurs in everyday social interaction without even being aware of it, people constantly use reinforcement and punishment to mould others behaviour. Consider if your friend got a job for example, you would positively reinforce him by giving him words of encouragement, as you would see this as a positive behaviour. If your friend did something you considered to be bad, you would punish (not physically I hope!) by telling them that they didn’t do the right thing. This is how social interaction works, if no one got told off for doing something bad, or commended for doing something good, then people would grow up without any direction.
It’s often a two way street as well, take for example parents who punish their child for doing something wrong. If the punishment results in the child not doing the behaviour anymore, then the child is negatively reinforcing their parent’s punishing behaviour. The parents will condition the child to associate doing the bad behaviour with being punished, so they will stop doing it, but at the same time the parents are being conditioned to punish the child every time it does something bad, as they have been negatively reinforced to associate punishment with the removal of an unpleasant situation (the child’s naughty behaviour)!
Schedules of Reinforcement
Yes, reinforcement has a schedule, a very busy one. There are two main types of schedules of reinforcement, they are:
- continuous reinforcement schedule – when a subject is reinforced after every behaviour they produce.
- partial reinforcement schedule – when a subject is reinforced only some of the time.
Continuous reinforcement is most effective during the acquisition phase of operant conditioning, while partial (or intermittent) reinforcement is more effective in maintaining learned behaviours. For example, initially it is necessary to give a dog a treat every time it performs a trick, but after a while it is necessary to give it a treat only some of the time, so that it doesn’t know when to expect a future treat, therefore strengthening the behaviour in anticipation of a treat.
Another example would be praising a student’s work, initially it is helpful to praise all of their good work, but after a while the student would recognise the pattern and would place less value on the praise, after the conditioning has been set, the student’s good behaviour will be better maintained if he only gets words of encouragement some of the time, so the praise holds more value and the student will strive for more praise. The same applies to grades, if a student got A’s for every assignment they handed in, they wouldn’t place any value in getting A’s and their quality of work would drop.
Partial reinforcement schedules are further divided into ratio schedules and interval schedules. In ratio schedules, rewards (or punishments) are tied to the number of responses produced; only a fraction of ‘correct’ behaviours receive reinforcement. While in interval schedules, rewards (or punishments) are delivered only after some interval of time, no matter how many responses the organism produces.
Fixed-ratio (FR) schedules are when an organism (heh) receives reinforcement for a fixed proportion of the responses it emits. For example, a telemarketer receives reinforcement (bonus) for every sale they make (FR-1 schedule), while a girl scout receives reinforcement (a badge) for every 10 boxes of cookies they sell (FR-10 schedule). In regards to punishment, a child could be punished on a FR-1 schedule (punished after every example of bad behaviour) or on say a FR-3 schedule (punished after every 3rd example of bad behaviour).
Variable-ratio (VR) schedules are when an organism receives a reward for some percentage of responses, but the number of responses required before reinforcement is unpredictable (that is, variable). For example, when fishing you will not be rewarded with a fish every time you cast your line into the water, but on some occasions you will be. Gambling also works on a variable-ratio, especially slot machines, which are expected to pay out at a variable ratio, you don’t know when it might be, but the next coin you put in might be a winner. In regards to punishment, a variable-ratio schedule would be used with an unpredictable parent with mood swings, the child is uncertain when his or her behaviours will be punished.
Fixed-interval (FI) schedules are when an organism receives reinforcement for its responses only after a fixed amount of time. For example, you can work as many or as little hours in a week, but you will only get paid on a certain day. Fixed-interval schedules also operate in accordance with lunch hours at school, as you only get a break at a certain time every day. An example of a fixed-interval schedule in punishment would be getting spanked every day at 9 o clock.
Variable-interval (VI) schedules differ from fixed-interval schedules in that the organism cannot predict how long the time interval will be when they receive reinforcement. A VI-10 schedule for example could mean reinforcement occuring at intervals of roughly 10 minutes apart, but not exactly 10 minutes. An example of a variable-interval schedule would be a safety inspection, it is known that the safety inspector will come during the afternoon, but at which time is uncertain, so staff are on high alert the whole day in anticipation – this is why a variable-interval schedule is more effective than a fixed-interval schedule. Another example would be a cable guy or plumber or anyone who comes to fix something in your house, they usually say they’ll come during a variable interval of hours, for example any time between 12-3. An example of this schedule in punishment might be an abusive alcoholic husband coming home from the pub to yell at his wife, the wife is unsure of when exactly he will be home to yell at her, but she still expects his return at a variable interval of hours – maybe after 2 hours of drinking, or 3 or 4…
In some situations, a connection might exist between a behaviour and a consequence, psychologists call this a response contingency, as the consequence is dependent, or contingent, on the behaviour. In other situations however, the contingencies might be different, so the organism needs to be able to discriminate circumstances under which different contingencies apply. For example, it might be ok to smoke inside the house while you’re parents are away, but when they are home that behaviour will definitely result in punishment. A stimulus that signals the presence of particular contingencies of reinforcement is called a discriminative stimulus. In other words, the organism learns to produce certain actions only in the presence of the discriminative stimulus. For the child smoking at home, the discriminative stimulus would be the presence of parents.
Stimulus discrimination is one of the keys to the complexity and flexibility of human and animal behaviour. Behaviour therapists, who apply behaviourist principles to maladaptive behaviours, use the concept of stimulus discrimination to help people recognise and alter some very subtle triggers for maladaptive responses, particularly in relationships. For example, one couple was on the verge of divorce because the husband complained that his wife was too passive, while the wife complained that her husband was too controlling. A careful behavioural analysis of their interactions suggested some complex contingencies controlling their behaviour.
At times, the woman would detect a particular tone in her husband’s voiuce that she has associated with his getting angrey; upon hearing that tone, she would shut down and become more passive and quiet. Her husband found this passivity enraging and would then begin to demand answers and decisions from her, which only intensified her passivity and his controlling behaviour. The woman was not always passive, and the man was not always controlling, but the discriminative stimulus of the husband’s tone caused the wife to withdraw because she had associated it with anger, through operant conditioning. Easing the tension in the marriage therefore required isolating the discriminative stimuli that controlled each of their unwanted responses.
Social Cognitive Theory
By the 1960s, many researchers and theorists had begun to wonder whether a psychological science could be built strictly on observable behaviours without reference to thoughts. Most agreed that learning is the basis of much of human behaviour, but were not convinced that classical and operant conditioning could explain everything people do. From behaviourist learning principles emerged social cognitive theory, which incorporated concepts of conditioning but added two new features: a focus on cognition (thinking) and a focus on social learning.
Cognitive maps are mental representations that we create to reflect our environment, it is a map that we refer to, to help us navigate ourselves in any environment we have happened to be in, or have expectations of. For example, our ability to conjure up an image in our mind’s eye of a house we used to live in, and be able to navigate this house without physically being there is an example of a cognitive map.
The concept was discovered by the behaviourist Edward Tolman (1948) who had found that rats who were left to navigate mazes without any reinforcement (no food pellets leading to the end) would run around aimlessly and not reach the end of the maze very quickly compared to rats who had been trained to. However, after a few sessions of directionless running around, Tolman put food at the end of the maze and the rats were able to find the exit at surprisingly quick speeds. This, Tolman decided, was because the rats had drawn their own cognitive map of the maze and could therefore navigate it efficiently if it were necessary to.
Once the rats were reinforced, their learning became observable. Tolman called learning that has occured but is not currently manifest in behaviour, latent learning. To social cognitive theorists, latent learning is evidence that knolwedge or beliefs about the environment are crucial to the way humans and animals behave.
Social cognitive theory proposes that an individual’s expectations, or expectancies, about the consequences of a behaviour are what render the behaviour more or less likely to occur. If a person expects a behaviour to produce a reinforcing consequence, she is likely to perform it as long as she has the competence or skill to do so. Likewise, a person is less likely to do something, if he expects a negative outcome to result. For example, a worker is less likely to ask for a raise if he assumes his boss will not be happy with him for asking. Or a boy might not start a conversation with an attractive girl at a bar, because he believes she would turn him down and embarrass him.
Julian Rotter (1954), one of the earliest social cognitive theorists, distinguished between specific expectancies such as ‘if I ask this lecturer for an extension, he will refuse’ to more generalised expectancies such as ‘you can’t ask people for anything in life – they’ll always turn you down’. These are called generalised expectancies, because they influence a broad spectrum of behaviour. Rotter used the term locus of control to refer to the genreralised expectancies people hold about whether or not their own behaviour can bring about the outcomes they seek. Individuals with an internal locus of control believe they are the masters of their own fate. While people with an external locus of control believe their lives are determined by forces outside themselves.
Learned helplessness consists of the expectancy that one cannot escape aversive events and the motivational and learning deficits that result from this belief; it is a form of expectancy that is central to human depression. Researchers have found that the explanatory style of an individual (how they make sense of bad situations) plays a crucial role in whether or not they become and remain depressed. Individuals with a pessimistic explanatory style blame themselves for the bad things that happen to them, for example they might say that the reason they failed an exam was because they were dumb, and not because the test was hard, or they didn’t study enough.
Several studies suggest that pessimistic people are actually more accurate than optimists in recognising when they lack control over outcomes. According to this view, people who maintain positive illusions about themselves and their ability to control their environment are less accurate but tend to be happier and report fewer psychological symptoms such as depression and anxiety. However, optimistic people benefit from the self fulfilling prophecy, which is where if they believe they are confident for example, they will actually appear to be more confident in front of others. While pessimistic people, suffer from the self fulfilling prophecy, as if they believe they are deficient in certain areas, then they will most likely reinforce that image and others will see it as well.
Social cognitive theory proposes that individuals learn many things from the people around them, without reinforcement, through social learning mechanisms other than classical and operant conditioning. A major form of social learning is observational learning – learning by observing the behaviour of others. The impact of observational learning is huge – from learning how to feel and act when someone tells an inappropriate joke, to what kinds of foods, clothes and drugs are fashionable. Modelling (not, not that modelling) is when a person learns to reproduce a behaviour that is observed by someone they see to be a model (role model). Examples of this behaviour would be little girls playing with baby dolls to imitate their mothers nursing a new baby. Whether an individual actually performed modelled behaviour also depends on the behaviour’s likely outcome. This outcome expectancy is itself often learned through an observational learning mechanism known as vicarious conditioning. In vicarious conditioning, a person learns the consequences of an action by observing its consequences for someone else. For example, a boy would know not to touch a tray of food in the oven if he had just seen his brother touch it and receive a burn.
Well that about wraps it up, I hope you learnt something today!
Other guides in the Psychology 101 series: