There are two ways to summarize data. One can either use numerical summaries or one can summarize the data using pictures. For communicating information, it's usually best to use a graphical summary, and that's because people prefer to look at pictures rather than at numbers. Now there are many ways to visualize data. Which one to choose will depend on the nature of the data and the goal of the visualization. And we will look at some examples now. First, let's look at a case where the data are qualitative. That is, the data are not numbers but categories such as colors. In that case, we would use a pie chart or a dot plot. On the left, there's a pie chart that shows the geographic origin of students at a university on the west coast. The idea is that each slice of the pie is sliced according to the corresponding percentage. The right side shows the dot plot of the same data. The idea is that each horizontal line corresponds to one category. The position of the dot corresponds to the percentage. For example, if we look at Oregon, this dot corresponds roughly to 20 percent. Each of the two displays has some advantages and disadvantages. For example, if we want to compare two categories, it's easier to do that in the dot plot. If we look at the proportion of international students and the proportion of students from other US states, then it's very difficult to see which one is bigger, whereas it's very easy to see that in the dot plot because it's much easier to judge the position on a horizontal line. On the other hand, the pie chart makes it easier to see what percentage of the total a category represents. For example, if we look at Oregon, then we see that this is roughly a little bit less than a quarter of the total pie. And this would be hard to see in the dot plot because we first would have to look down on the horizontal axis to see what's going on. Now, when the data themselves are quantitative, that is, they are numbers, then the convention is to put the data on the number line. The reason for this is that the ordering in the distance between the numbers conveys important information. The standard graph for displaying such data is the bar graph. It's essentially a dot plot put on its side. In this example, it shows the number of assignments that were completed by 22 students in a class. The histogram is very similar to a bar graph, but it allows to use blocks which have different width. For example, this histogram shows the ages of a number of people. The key point here is that the areas of the blocks are proportional to frequency. This means that the total area corresponds to 100 percent. For example, if were interested in figuring out what percentage of people fall in the age group from 60-80, then we are interested in the area of this block. Now, just looking at that picture, we see that this area is probably around one-seventh of the total area. And so, we can conclude that roughly 20 percent of the people fall into the range from 60-80. So the percentage falling into a block can be figured out without using a vertical scale, just using the fact that the total area has to be 100 percent. Now, sometimes, it is useful to have a vertical scale. It's also called the density scale. And the reason for that is simply that the area of the block corresponds to a percentage, and the area is computed by width times height. And since the width is given in years, the unit for the height needs to be percent per year in order for the units to work out. There are two kinds of information one can get out of a histogram. The first one is what's called density, which essentially means crowding. The height of the bar tells how many subjects there are for one unit on the horizontal scale. For example, if we look at the highest point in the histogram, which is around here, this corresponds to people of age 19. And looking at the vertical axis, we see that there are about four percent of all people in that age. In contrast, if we look at the block between 60 and 80, we see that the height of the block is maybe 0.007. That means about 0.7 percent of the subjects fall under each one year group in that age range. The second piece of information one can get from the histogram is to compute percentages. Remember the rule which says the area corresponds to a percentage. And the area of a block is given by height times width. So, for example, if we want to figure out what percentage of people fall in the age range between 60 and 80, then the computation that we would make is, we would look at the width of the box, which is 20 years, multiplied by the height, which is 0.7 percent per year. And we would find that they are 14 percent of subjects falling in that range. As we've seen before, there is another way to figure this one out. We can eyeball that this area comprises roughly one-seventh of the total area. So, roughly one-seventh of the subjects fall in that range.