We solved the scaling problem with Amazon EC2 auto-scaling. But now we've got a bit of a traffic problem, don't we? Let's take a look at the situation. When customers come into the coffee shop, right now they have three options for which cashier to talk to, to place an order, and oddly enough, most of them are lining up in one line, causing an uneven distribution of customers per line. Even though we have other cashiers waiting to take orders, standing around doing nothing. Customers come in and aren't sure exactly where to route their order, would help a lot if we added a host to the situation. A host stands at the door and when customers come in to the coffee shop, they tell them which line to proceed to for placing their order. The host keeps an eye on the cashiers taking orders and counts the number of people in line, each cashier is serving. Then it will direct new customers to the cashier that has the shortest line as the least bogged down. Thus allowing the lines to be even across cashiers and helping customers be served in the most efficient manner possible. The same idea applies to your AWS environment. When you have multiple EC2 instances all running the same programs serve the same purpose, and a request comes in, how does that request know which EC2 instance to go to? How can you ensure there's an even distribution of workload across EC2 instances. Not just one is backed up while the others are idle, sitting by. You need a way to route requests to instances to process that request. What you need to solve this is called load balancing. A load balancer is an application, takes in requests and routes them to the instances to be processed. Now, there are many off the shelf load balancers that work great on AWS. If you have a favorite flavor that already does exactly what you want, then feel free to keep using it. In which case, it will be up to your operations team to install, manage, update, scale, handle failover, and availability. It's doable, odds are what you really need is just to properly distribute traffic in a high-performance, cost efficient, highly available, automatically scalable system that you can just set and forget. Introducing elastic load balancing. Elastic load balancing, or ELB, is one of the first major managed services we're going to talk about in this course. It's engineered to address the undifferentiated heavy lifting of load balancing. To illustrate this point, I need to zoom out a bit here. To begin with, elastic load balancing is a regional construct and we'll explain more what that means in later videos. But the key value for you is that because it runs at the region level rather than on individual EC2 instances, the service is automatically highly available with no additional effort on your part. ELB is automatically scalable. As your traffic grows, ELB is designed to handle the additional throughput with no change the hourly cost. When you're EC2 fleet auto scales out as each instance comes online, the auto-scaling service just lets the elastic load balancing service know that it's ready to handle the traffic and off it goes. Once the fleet scales in, ELB first stops all new traffic and waits for the existing requests to complete to drain out. Once they do that, then the auto-scaling engine can terminate the instances without disruption to existing customers. ELB is not only used for external traffic. Let's look at the ordering Tier and how it communicates with the production tier. Right now, each front end instance is aware of each backend instance. If a new backend instance comes online in this current architecture, it would have to tell every front end instance that it can now accept traffic. This is complicated enough with just half a dozen instances. Now imagine you have potentially hundreds of instances on both tiers, each tier shifting constantly based on demand. Just keeping them network efficiently is impossible. Well, we solved the backend traffic chaos with an ELB as well. Because ELB is regional, it's a single URL that each front end instance uses. Then the ELB directs traffic to the backend that has the least outstanding requests. Now, if the back-end scales, once the new instance is ready, it just tells the ELB that it can take traffic and it gets to work. The front end doesn't know and doesn't care how many backend instances are running. This is true decoupled architecture. There's even more that the ELB can do that we'll learn about later, but this is good enough for now. The key is choosing the right tool for the right job, which is one of the reasons AWS offers so many different services. For example, back-end communication. There are many ways to handle it, and ELB is just one method. Next, we'll talk about some other services that might work even better for some architectures.