In one of my previous posts I created a 3D visualization of California pollution data. I promised to explain how I made it, so here goes.

You can download the data from the bottom of this page at California’s Office of Environmental Health Hazard Assessment website. From there you’d save the first sheet as a csv file to make playing with the data easier in R. If you don’t know R, google the R statistical programming language.

For the newbs, first set your working directory to wherever that is on your computer and then read that data in:

setwd("~/Documents/Chicago/Perro")

data <- read.csv("CalEnv.csv")

I wanted the data color-coded by region so, after a bit of googling, I determined that I had to look that up for myself. I found this website that identifies 3-digit zip code regions. That required some manual work, but with only 60 unique 3-digit zip codes regions it wasn’t that hard. I’m from Chicago, so I guesstimated to the best of my ability.

TahoeNV <- c("890", "894")

LA <- c("900", "902", "903", "904", "905", "906", "907", "908", "910",

"911", "912", "913", "914", "915", "916", "917", "918")

SD <- c("919", "920", "921")

SouthEast <- c("922", "923", "924", "925")

Anaheim <- c("926", "927", "928")

MidCal <- c("930", "931", "932", "933", "934", "935", "936", "937", "939")

Bay <- c("940", "941", "943", "944", "945", "946",

"947", "948", "949", "950", "951")

NoCal <- c("952", "953", "954", "955", "956", "957",

"958", "959", "960", "961", "976")

```
```areas <- c("TahoeNV", "LA", "SD", "SouthEast", "Anaheim", "MidCal", "Bay", "NoCal")

zip3digit <- substring(as.character(unique(data[, 1])), 1, 3)

col <- rep(0, length(zip3digit))

`for(i in 1:length(areas)){`

for(j in 1:length(get(areas[i]))){

col[which(zip3digit == get(areas[i])[j])] = i

}

}

We only wanted to see the interplay between race, poverty, the overall pollution score so we selected only those columns:

n.data <- data[, c(37, 39, 25)]

bad.ind <- unique(which(is.na(n.data), arr.ind = T)[, 1])

n.data <- n.data[-bad.ind, ]

col = col[-bad.ind]

Next is the hero of the day, the *rgl* package! Check it out!

library(rgl)

The *plot3d* function is pretty straight-forward, but 1700 tiny dots wasn’t terribly informative so I wanted to turn the dots into spheres whose radii varied depending on the size of the neighborhood. It’s a kind of manifold learning hack that I made up and liked. Below is the function, where *rad* determines the size of the neighborhood to search for neighbors and *scl* determines the size of the spheres:

new.plot3d <- function(data, rad, scl, apply.rad = T,

col, main, cex = 1){

plot3d(data[, 1], data[, 2], data[, 3],

xlab = "Poverty (% below 2x federal poverty level)",

ylab = "Race (% non-White)",

zlab = "Pollution burden (aggregated score)",

col = col, main = main, cex = cex)

```
```

` if(apply.rad == T){`

num.neighbors <- apply(as.matrix(dist(data, upper = T, diag = T)) < rad,

1, sum)

num.neighbors <- num.neighbors / max(num.neighbors) * scl

spheres3d(data[, 1], data[, 2], data[, 3],

radius = num.neighbors,

col = col)

}

}

Here we use the newly defined function. It took a little playing around with the rad and scl to get something that was informative, which was the reason for the new function in the first place.

new.plot3d(data = n.data, rad = 10, scl = 7,

apply.rad = T, col = col,

main = "California Pollution, Poverty, and Race (by zip code)",

cex = 10)

Finally, we add in a legend:

text3d(x = 100, y = 0, z = c(0, 10, 20, 30, 40, 50, 60, 70),

areas, col = "black")

spheres3d(x = 90, y = 0, z = c(0, 10, 20, 30, 40, 50, 60, 70),

col = 1:8, size=5)

Turning this R graphic into the web-friendly WebGL format was just one function! You do lose some resolution (click on the *index.html* file in the the new folder to see how it looks in a browser), but given the ease of use it’s a good trade-off.

writeWebGL()

For a more rigorous rgl tutorial, I’d check out Ross Ihaka’s page here.