The third and final learning mechanism that we'll discuss is instrumental conditioning, also known as operant conditioning. This is learning the relationship between actions and rewards and punishment. It's for learning what works and what doesn't. It's learning which of your actions lead to positive results and which don't. So, this is very different from classical conditioning. Classical conditioning is passive. You just sit there and you observe how stimuli interact, you respond to them, and as a result of the co-occurrence of say the bell and the food, you come to learn things. Instrumental conditioning or operant conditioning is based on your own actions. You act to the world, and then the way the world treats your actions shapes the nature of your future actions. Skinner, although he did not devise the idea of instrumental conditioning, built on it. This was his major theoretical and experimental research program was focusing on the extent to which operant conditioning could shape the behavior of humans and of other animals. So, the theoretical foundations for operant conditioning were established by the psychologist, Edward Thorndike. Thorndike noticed in his studies of animals, that animals don't seem to learn through sudden insight, but rather through a series of random activities that they get better and better at. His example was putting a cat in a puzzle box. In this video here explains what Thorndike did. But how is a new skill learned? That was a question which began to fascinate Thorndike. To answer it, he built some ingenious puzzle boxes from which cats could only escape by operating latches. And in you go. The cat appears to be very clever in engineering its escape. Solving the problem with the deftly placed paw and a push of its nose. But Thorndike didn't believe that an animal, even a clever cat understands the consequences of its behavior. When he placed a cat in the puzzle box for the first time, Thorndike was unable to see any evidence of flashes of insight. The successful actions appeared first by chance. He proved that the apparent cleverness arose by trial and error, and use graphs to measure the rate of learning. A well-practiced cat quickly recalls the actions that help it escape to its reward of food. If an action brings a reward, Thorndike believed that that action becomes stamped into the mind. In his thesis, he explained further his ideas about learning, that behavior changes because of its consequences. He called this his Law of Effect. Which explained how even wild creatures develop new habits. The cat eventually comes to escape from the puzzle box, but not through figuring it out, Thorndike argued, but rather a cat does all sorts of different activities. In the end, just one of them, in this case pulling the lever, is reinforced gifts the cat the payoff that it wants. Thorndike summarized what was going on here, as the law of effect. The tendency to perform an action is increased if rewarded, weakened if not. The law of effect leads animals gradually to come to the correct behavior in certain situations. Now, as I said, Skinner famously extended the principles of operant conditioning, and he developed it. So, let me work through with some examples. Suppose you have to train a pig. Well, how would you train a pig? What you would do is, you would reinforce it. There's two types of reinforcement, there's positive reinforcement, giving it something it wants, and negative reinforcement, releasing it from something aversive. So, for instance, if the pig, I don't know, had a heavy object on and is very painful, you could reward it by taking away that pain, by removing the objects. Negative reinforcement and positive reinforcement are two sorts of reinforcement, they're the sorts of things you do to an animal to increase the likelihood of its behavior in the future, and punishment is a way to decrease the animals likelihood of doing that behavior in the future. So, how can you use this to train the pig? Well, if the pig is doing something you want it to do you could reinforce it, you could reward it, and if it's doing something you don't want it to do, you could punish it. But that's really limited because suppose you wanted to make the pig do something it's never done before, like dance, well you can't just wait for it to dance and then reinforce it. It will never do that. So, what you do is what Skinner described as shaping. For instance, when the pig moves in a certain way that approximates dancing, you reward it, and now it'll start to do that. Then when it moves in that way and another way and it's looking even more like dancing, you reward it again. In other words, as the pig gradually approximate the behavior you hope it to do, you reward it on its way. Skinner viewed it as a shaping and he often described as analogous to natural selection. So, it's not like all of a sudden animals evolve an eye, and then that leads to increased survival and reproduction, that'll be magic. The way eyes evolve is, animals evolve something that very roughly approximates, something approximating something like an eye, which leads to increased survival and reproduction. Then you have step by step by step by step until they evolve complex structures. Skinner argued, the same thing happens with complex behaviors. So, he proved it. You could use the techniques of shaping to train animals to do all sorts of things. You could train pigeons to play ping-pong, for example, or take children. Suppose you want your child, your young toddler to go to your fridge and bring you back a beer, well, you're not going to wait for the kid to do it spontaneously and then say, "Good job." But rather, the child walks to the kitchen, you say, "That's great. You did great walking into the kitchen." Then ultimately gets good at that, and goes to the fridge, "That's terrific." Gradually, by rewarding each and every step, you can get the child to come to act upon the behavior, how you wanted to. Certainly people who train animals use the techniques of shaping all the time. They reward approximations of the behavior until ultimately, behavior is what they're looking for. Now, an example so far, the reinforcements and punishments were instinctively reinforcing or punishing. Things like food or shock. But in reality, you don't need to do that. You can reinforce and punish in all sorts of ways, even for animals, that don't correspond to things that are built in reinforce or punishments. A dog will be rewarded by a pat on the head or saying, "Good dog." Of course, humans will work for money, strips of paper, that themselves have no reinforcing properties. You could train animals to work for things like poker chips. So, how could this happen? Well, the way this works is, you combine operant conditioning and classical conditioning. That is, you use classical conditioning to take something which is a neutral thing, like a poker chip. You associate it with a positive unconditioned stimulus. Pretty soon, the poker chip, through classical conditioning, will tend to have reward and qualities. So, for instance, if every time you give your dog a delicious treat, you pat him on the head. Pretty soon, patting it on the head will become rewarding to the dog. Use classical conditioning to make a pat on the head rewarding. Now, when you do your operant conditioning, you could use something like your pat on the head as a reward. Another way in which we can elaborate on this scenario training a pig, is to think about schedules of reinforcement. So the simplest case is, every time the pig does something right, you reward it. But real life doesn't work that way, and real learning doesn't work that way. There's all sorts of partial reinforcement in the world, in which case we get reinforced some of the time, and not all of the time. You could imagine this in terms of a fixed versus variable and ratio versus interval schedules of reinforcement. What this means is, a fixed ratio reinforcement is a reward after every such and so responses. So imagine a piecework, which is every 100 objects you put together, you get a reward. In fact, you can train pigeons to do different activities where you don't reward them every time, you just reward them every nth time, where n could be, for a pigeon, like even hundreds. There's variable ratio reinforcement which is on average once every n times. This is how a slot machine works. So, a slot machine doesn't give out money every 100 times because then people would get rich off of it. But roughly, gives out some amount of money, say every 100 times. Sometimes it's right away, sometimes after 500, but it's roughly. There's a fixed interval reinforcement. So, I checked my email, and every once in awhile I get a positive email, and that can be fixed in some crazy world, where every hour I get a positive email, but more likely it's variable. Every once in a while, I get a positive email. So, who cares? Why would you want to do scheduled reinforcement that were partial, that weren't right away? Well, the answer is, the partial reinforcement effect. The idea is that if you stop reinforcing something, the behavior goes away. If you reward your dog with a pat on the head or a treat every time it does something, and then you stop doing so, it'll stop giving the activity you're looking for. But, suppose you train it by reinforcing it every time, if so, then when you stop reinforcing it, it's behavior will go away pretty quickly. But, if you reinforce it partially, its behavior will stick around for a long time. So, imagine a kid who has tantrums. Suppose you reward the kid every time he has a tantrum. So you get much, "Okay, I'll give you a treat, I'll give you a present, whatever." Then you stop doing it. Well, the kid will come to realize my tantrums aren't working anymore, and gradually will come to stop. But suppose instead, as parents often do, we reward the kid not every time it has a tantrum, but roughly every 10 times. "Oh fine, I can't take it anymore, here's something." Well now if you start rewarding the kid, the tantrums will last a very long time. It's as if, and this isn't the way behavior would like to think about it, but it's as if, when you reinforce somebody every time, they think, "Okay, I could tell this gets an immediate response, " and then when you stop reinforcing, they say, "It's over. This isn't working anymore." But if you reinforced them occasionally, one out of every 100 times, or once at a roughly couple of hours, then when you stop reinforcing them, they say, "Well, I'm not getting reinforced anymore, but maybe this is more to partial stuff. Maybe my reinforcement is just around the corner." In fact, the kid doing tantrums might say to you, "I'm going keep doing tantrums because sure you're not rewarding me now, but eventually you'll reward me." Maybe it's not so interesting that animals like chimpanzees and dogs understand this and that kids understand this, but interestingly, even rats and pigeons seem to understand this. If you want to make a behavior last, don't reinforce it every time, reinforce it intermittently. So, we've reviewed three general learning mechanisms. We talked about habituation, classical conditioning, and operant conditioning. These are argued to be a theory of how animals come to behave the way they do and humans, as well as techniques for training animals and training humans. So, Skinner famously argued, that a good society would use the techniques of behaviorism to make people better. So, for instance, instead of thinking about abstract, moral principles regarding justice and retribution and so on, if we want to stop crime and encourage generosity and kindness, our societies should figure out ways to reinforce good behaviors, and punish bad ones. Behaviorism became an extremely popular and promising idea to shape the world to make it a better place.