The three examples that we've seen so far have all exhibited approximate linear relationships, but the fact is that sometimes when you collect data and you plot it, a line might not be an appropriate summary of the underlying relationship, and I'm now looking at some data that comes from a product where we have sold the product at various prices. That's on the horizontal axis. And on the vertical axis, we've got the sales, the volume, the quantity sold of that product. And what we can see here is that as price goes up so quantity sold goes down and vice versa. If the price is cheap we sell a lot. But when you look at those data points and you see does a line fit these data points very well, there's a sense that there's some curvature. We describe curvature in the data. And I have fit a linear model through the data and that's what you can see in the bottom right hand picture. I can fit a line through the data. All because I can fit a line through the data doesn't actually mean that it makes a lot of sense. One of the most important activities that you can do when you're running regression is to look at the data, and I say always, always plot the data. And by plotting the data here, I can see that a regression line, a linear model here, isn't a particularly good description of the underlying relationship. It misses a lot of the points, in particular it provides a very lousy forecast for the times when this product was selling at a low price. Notice how those points at the beginning of the graph are way above the line. And if we go all the way to the other end of the plot, when the price was high, then all the points are above the line there as well. So there's a systematic lack of fit to that points to the line. And it's just telling me that this straight line model doesn't look like it's appropriate in the situation. So this is going to happen a lot in practice. When you look at data, it's not necessarily going to be linear. So then that begs the question of well, what do we do in this situation, when we observe curvature in the data. Well, the good news is it's not like everything is lost, there's something that we can do. And the thing that we do is to consider transformations of the data. Now when I say transformation, I mean a mathematical function being applied to the data. It could be applied to x, it could be applied to y, it could be applied to both. Now, there are an infinite number of mathematical transformations out there. Which one should I do? That's where an underlying, a basic, knowledge of the key math functions that we discussed in another module really come into play, and those basic functions are the linear. They are the power. They are the exponential, and they are the log function. So those are the ones that we most frequently use when we're thinking about transforming data, and given the relationship isn't a straight line then of those functions the one that we will find used most in practice is the log function. Doesn't mean that it's always going to work for you, but it certainly can provide for some flexible models. And what I've done is taken the data in this particular example, and by the way, the product is a pet food, and what we're looking at is the price that the pet food is sold for and the quantities cases sold, in this case. And what we've done is take the price and the quantities sold, and applied the log transform to them. And in this case, I'm using the natural logarithm. And so we've taken the log transform and when we look at the data on the log scale, you can see that the relationship appears much more linear than it did on the original scale of the data. So one of our basic approaches to seeing curvature in data is to consider transforming the data and if you ask me, well what transformation should I do, my answer generically is going to be do a transformation that achieves linearity on the transform scale. How do I know I achieve linearity? Well, the answer is going to be always, always plot your data. Have a look at it on the transform scale. And when I look at this data on the transform scale, and that's the plot at the bottom left-hand side here, you see it's approximately linear. And putting a line through the data on the transform scale seems to make much more sense. So that's how we will typically proceed. Now, the only downside of this so far is that if I go in and give a presentation to people, and I show them a plot of the data on the log-log scale with my line on the log-log scale. Lot of people don't like that. They don't understand logarithms. And so you often do everyone a favor by taking that model on the transform scale and back transforming it to the original scale of the data. When you back transform there's log log model on the transform scale. Back to the original scale of the data, you will get the graph that you see on the right hand side. And so the two graphs are presenting the same data. They are presenting the same model. But they're doing it on different scales. We fit the straight line model on the log log scale and for presentation purposes we would typically back transform to the original scale of the data and then my best fitting line becomes the best fitting curve. So what that says to you is that so long as you're willing to transform your data you with a regression methodology, going to be able to capture all sorts of interesting relationships between variables with your quantitative model. So the log log model is a pretty good fit to this demand data, the demand data being how does the quantity so depend on the price of the product. Now, I want to show you formula what the model looks like that we've just fit. It's a regression model, so it's a model for the mean. I'm looking at the first equation now, so we write that as expected value. But now, we're not working with the sales, we're working with the log of sales. So this regression model is the expected value of the log of sales, given the price, where we are doing a log transform on the price. And so we have the expected value of the log of sales is equal to some constant b0 + b1 times the log of price. So I would term this a log-log model we got the log of y and the log of x Convex. And in this particular instance we have an intercept of 11.015 when we use the method of least squares to fit this and we have a slope of -2.442. And because it's a log-log model that slope has an interpretation and the interpretation is as what we call an elasticity. Talked about that in one of the other modules. An elasticity tells you how percent changed in x is associated with percent change in y. And the -2.442 is telling me that based on this analysis, as price goes up by 1%, I anticipate the sales to fall by 2.442%. So percent change, an extra percent change, in y. That's the interpretation of the slope. In a log-log model. And one of the caveats of that interpretation is you should only use it for small percent changes. The equation that I presented now shows you how you might go about creating a model for a subsequent optimization. One of the things I've been talking about how model, is how models are frequently inputs to an optimization process. Here the optimization would be, I wonder what the best price would be for this product in order to maximize my profit. And you can approach such a question through calculus, but we need some inputs. We need a function to optimize the regression model is giving us such a function. It's reverse engineering the underlying process. Or at least an approximation of process, one that we think adequately captures the association between the outcome quantity sold and the input price of the product. So you can see how we can create the setting to do a subsequent optimization by fitting one of these regression models.