Going beyond these simple regressions. So just revisiting that term. If someone says I've done a simple regression, what they mean is they've run a regression model with a single input. Going beyond simple regression, we get to multiple regression. Now the fact is that the world can be a pretty complicated place. So let's go back and think about our. Car example. So in that one we were interested in looking at the association between the weight of a car, and its fuel economy. But the weight of the car isn't the only driver so to speak of the fuel economy of a car. There are other features of the car that are important as well. One of those other features would be the size of the engine. The bigger the engine, then the less fuel efficient the vehicle is, the more gallons it needs to go into the tank. When we start thinking about additional predictor variables beyond a single one, then what we're talking about is a multiple regression. So the idea is that the world's a pretty complicated place. We might need a somewhat complicated model to adequately capture how the world or the business process is working, and multiple regression gives us a chance to do that. And so in the fuel economy data set, we might be interested in adding in the horse power to the model as a additional predictor variable. Talking about the diamonds data set, we could use the weight of the diamond as a predictor of price. But people who are into diamonds know there are four c's associated with diamonds. We have talked about the weight, which is carats, but there is also color, cut and clarity. And those other features, if we incorporated, might enhance the quality of our regression model. And so in the diamonds data set as a second step, I might be interested in introducing color to the model. When I add in additional predictive variables, then what I'm doing Is running a multiple regression. So formulaically, to show you what a multiple regression would look like if I had two variables, let's call them X1 and X2 in general. Then, with two variables our equation for the regression, is expected value of Y, so I'm discussing the formula at the bottom of the slide. The expected value of Y, that means the average of Y, the mean of Y, is now a function of two variables, X1 and X2. So the straight line there between the Y and the X1, we articulate that as given. The expected value y given X1 and X2 is equal to. We start off with our straight line formulation b not plus b1x1. But now with an additional variable, we just throw in plus b2x2. So that's our formulation of the multiple regression. In our spreadsheet or data set we would have one column that contained y. We'd second column that contained x1. A third column that contained x2. Our data would be in that format, and then we would run our multiple regression. And what the method of these squares would produce for us are the estimates b null b1 and b2. So that will give us our multiple regression equation. So let's now have a look at that multiple regression model in a little bit more detail. So I've added weight and horsepower as predictors of fuel economy. When I do that, I end up with a new model as compared to the simple regression. I've got the expected value of fuel economy as measured by Gallons per thousand miles in the city. So the average gain regression, gives us a model for the average of y, so I've got the expected value of fuel economy now, given the weight and horsepower of the vehicle. And using the method of these squares, we get estimates for the coefficients as 11.68 for the intercept plus .0089 times weight plus .0884 times horsepower. You can use this equation to do prediction. You come to the me with a vehicle that weights 2000 pounds and has 300 horsepower, I can take those values, plug them into the regression equation and predict the fuel economy of the vehicle. Note though that once we increase the dimension of the problem. We've gone from simple regression to multiple regression. If we want to visualize what's going on which can be very useful. We're going to need three dimensions to do that. To look at the raw data in all it's glory so to speak. So we have three variables to deal with. We have weight, we have horse power, and we have a fuel economy. That means we need three dimensions to have a look at the data. Hence the picture that you can see on the bottom hand right slide is a three dimensional picture. Each point represents a car. And it has coordinates for weight, horsepower, and fuel economy. And what the multiple regression model is doing is no longer fitting just a line through here, doesn't make sense if we're in higher dimensions. It puts the best fitting plane through the data. So the analog of the line in the simple regression, is a plane in the multiple regression. It still fit through the method of least squares. This is the plane that best fits the data in the sense that it minimizes the sum of the squares of the vertical distance from the point to the plane now. So there's our least squares plane. Now, I said that we can use this plane for doing forecasting, but we still have our one number summaries around. Those one number summaries are the regression where R2 and RMSE, if we calculate R2 for this multiple regression, it comes out to be 84%. In the simple regression model it was 76%, so our R2 has increased. We've explained more variation. By adding in this additional variable, and we've also reduced the value of root- mean-square error. Root-mean-square error is now only 3.45, so if we wanted to create an approximate 95% prediction interval, for the fuel economy of a vehicle. As long as it's weight and horsepower, we're in the range described within this data set. So we're interpolating or extrapolating outside the range. So as long as we are interpolating we can use our 95% prediction interval, rule of thumb, again, and say up to the plane plus or minus twice the root-mean-squared area. So this regression model will give us 95% prediction intervals of a width of about plus or minus seven, twice 3.45. So that's the precision with which we can predict based on the current model. So, through the prediction interval we get a sense of the uncertainty of our forecast. So root-mean-squared error is really a critical summary of these regression models. So just summarizing this slide, the multiple regression allows us to estimate this least squares plane. Once we've got this this multiple regression equation, we can use it for prediction. So long as we have a root-mean-squared error estimate, which we do and we're working within the range of the data we can put the two things together. The forecast and the root-mean-squared error to come up with a 95% prediction. So, there's a brief look at multiple regression. It's the technique that one would use to make more realistic quantitative models of business processes.