On Friday, we’ll talk about Tufte’s ideas for new graphical methods based on the “data-ink” principle.
Here’s an illustration of one of these new methods. Some years ago, I collected the lengths (in seconds) of Beatles songs on different albums.
Here is a basic boxplot display.
Here is Tufte’s improved display (both graphs were created using ggplot2).
There is a neat way of learning the graphics package ggplot2. Visit the ggplot2 web interface http://yeroon.net/ggplot2/
Here you can …
- load data from either your computer or a google spreadsheet
- map aesthetics to variables
- add different layers (geoms and stats)
- display the ggplot2 graph and the corresponding R code
- save the completed graph either as a pdf file or a svg file
Here is a a link to a graph that I created using this interface:
I wouldn’t recommend using this to create all ggplot2 graphs, but it seems like a useful learning tool.
To illustrate the beautiful graphics that one can create using ggplot2, here is a famous comparison of two recently retired baseball players, Barry Bonds and Ken Griffey Jr. I’ve plotted each player’s OPS statistic (a good measure of hitting) against season. The basic patterns in the hitting trajectories are displayed using lowess smoothers and I have displayed a vertical line at the year 1998.
What do we see?
- In baseball, a player’s hitting performance generally increases until midcareer (about 30 years old) and then decreases until retirement. Griffey shows this general pattern — he seemed to decline steadily after 1996.
- Bonds showed an unusual trajectory. He started to decline after 1994, then started to increase from 1998 to 2002, and then exhibited a decline until his retirement in 2010.
- 1998 is actually a special year in baseball since two home run sluggers, Mark McGwire and Sammy Sosa, both had great home run hitting seasons.
This graph raises the obvious question — why did Bonds exhibit such an unusual hitting trajectory? (The answer is well known to most people who have some interest in baseball.)
Evidently, there are some problems uploading images to our blogs. Several students sent me emails with problems and I verified that there is some problem (there is a weird “missing folder” error message).
Here is a workaround:
1. Upload your image to google.doc and make the image public so anyone can see it.
2. Copy the address of your google.doc image.
3. In your blog, put a link to that particular image, like this:
a graph from USToday
This is not an ideal solution, but it works and I’ll be able to view your images.
This year our department is doing an assessment of our introductory statistics course. We gave a multiple choice test on statistical concepts to our MATH 1150 students at the beginning of the semester (the PRETEST) and administered the same test at the end of the semester (the POSTTEST). We haven’t received that much of the posttest results yet, but here are the proportions correct on each question on the pretest and posttest.
How can we effectively graph these data?
We’ll shortly look at dot plots from a graphical perception viewpoint. There is an interesting graph in the book that shows the popular classical composers. To see if a related dataset available, I found an interesting site
that gives a reference to a dataset that ranks classical composers on two criteria: their popularity as indicated by the number of cds sold on amazon.com and their prominence among classical music scholars.
Here’s one graph of the amazon sales: