Data Analysis and R Resources

Lately I have been working at home to pick up new research skills. I am working on statistical methods in data analysis, such as regression and multilevel models. The main tool I am learning is the R programming language. I am using Microsoft's distribution of R because it utilizes all four cores in my computer's processor. And I am running this inside of R Studio as my development environment. The other tools for data visualization and visual analysis I am learning are Tableau and CartoDB.

I am working my way through DataCamp's online R course and using several books as guides. I link to the books below.

Along the way, I am practicing these new skills with real data on wildfires in my community. You can find my dashboard for exploring the fire data here. Tonight I am going to my community's town hall meeting to meet the fire chief and discuss if this kind of dashboard would be useful somehow in our community and to see how he thinks I could add to or improve this dashboard. So, there is much to come as I make progress!

For example, I am now adding precipitation data to the fire data and will explore any correlations that exist. If I find anything interesting I'll add it to the dashboard.

R Graphics Cookbook
$31.12
By Winston Chang
R Cookbook (O'Reilly Cookbooks)
$31.50
By Paul Teetor

Interactive Data - HWY 285 Corridor Fires 1980-2014

UPDATE 4/5/2016: I have updated the dashboard with all the relevant fire data and added new interactive charts to the dashboard. click the link to go to the dashboard and see the improvements!

 

In my previous post I played around with R and CartoDB to map fire data around the highway 285 corridor. Then I discovered the online interactive data visualization suite Tableau. Using the same U.S. Forest Service data as before, I used Tableau to build a dashboard of interactive visualizations. Each element can be clicked and it will filter all the other elements. There is even a timeline slider at the top which will also adjust all the visualizations.

Keep in mind that I built this as a proof-of-concept, just to see if I could do it, and to see if others are interested in it. If this dashboard were to be used by the county or any other professional entity, I would first need to verify that I have all the relevant data built in. Presently, this only uses U.S. Forest Service data.

The dashboard is too large for this blog post, so I put it on it's own page.

Click here to visit the dashboard and explore the fire data!

Here is a screenshot of it just to tease you:

Hwy 285 corridor fire data 1980-2014

UPDATE 4/4/2016: I have updated all the fire charts with the relevant data from USGS. The charts are now in an interactive dashboard on my 285 Fire Data page. You can access it via the research portfolio menu, or you can just click here to go straight to the dashboard.

 

The first map is an interactive map of human-caused and natural fires in Colorado's front range between 1980 and 2014. The data are a combination of U.S. Forest Service and Bureau of Land Management reports. Hover your mouse over each point to see information specific to that fire. The two colors distinguish between human-caused and natural fires. It appears that Summit County has a higher ratio of human-caused fires while Jefferson County appears to have a higher ratio of natural fires, and Park County appears to have a more even balance. In general, the northwest part of the Highway 285 corridor shows a higher ratio of human-caused fires while the southeast part tends toward natural fires.

Fire data were downloaded from the USGS here. County boundaries are from the Census Counties 2010 data at Colorado Information Market Place here.

I used the statistical programming language R to create this stacked bar plot to show how many of each kind of fire occurred each year between 1980-2014. This bar plot only uses the U.S. Forest Service data.

Fires by cause per year in the highway 285 corridor between 1980-2014. Data from the U.S. Forest Service.

Fires by cause per year in the highway 285 corridor between 1980-2014. Data from the U.S. Forest Service.

 

The second map uses only the U.S. Forest Service data, and all circles are the same color. But this time, there are five circle sizes available to represent all fires. All fires were clustered into five groups depending on each fire's total acres burned. The smallest fires were clustered into a group represented by the smallest of the five circles. The largest fires were clustered into a group represented by the biggest circle, etc. So, although each circle is a distinct fire, the specific fire's circle size represents which cluster it is in based on how many acres that fire burned. Just based on the visuals, I don't see any interesting patterns. Do you?

Because the map above uses Jenks optimization to group fires by size into only five clusters, it cannot show how infrequent the really large fires are, because each circle size must include fires of vastly different sizes in order to cluster all fires into only five clusters. The next image uses the USFS fire size classes to show the number of fires per year by class size.

Fires by class size per year between 1980-2014 in the highway 285 corridor. U.S. Forest Service data. 2002 is the only year when a fire of each class size occurred. Class G fires have only occurred four times.

Fires by class size per year between 1980-2014 in the highway 285 corridor. U.S. Forest Service data. 2002 is the only year when a fire of each class size occurred. Class G fires have only occurred four times.

Here are the sizes for each class:

  • Class G - 5,000 acres or more
  • Class F - 1,000-4,999 acres
  • Class E - 300-999 acres
  • Class D - 100-299 acres
  • Class C - 10-99 acres
  • Class B - .26-9 acres
  • Class A - 0-.25

The third map uses an SQL query to isolate only the fires that burned more than 4,000 acres. For this map I chose the cutoff of 4,000 acres because it represents a natural break in the data set. Out of 2,830 Forest Service records, there are six of this size. Five of those six fires were human-caused. Hover your mouse over each one to get the details. (In the whole data set, there were two between 1,000-3,999 acres, and all the rest, 2,821, were under under 1,000 acres burned) All of the over-4000 acre fires were south of highway 285 (remember that I am not plotting fires north of I-70, west of hwy 9, east of I-25, or south of hwy 24).

The fourth map is an animated time-series map that populates fires to the map in chronological order by month. The animation takes 30 seconds. There is a play and pause button at the bottom, so you can start and pause as you like. This map also uses just the U.S. Forest Service data. Remember, I am only plotting fires between Highway 9 and I-25 and I-70 and Hwy 24.

Hwy 285 Corridor Tornado Data 1950-2015

We may have fires, but we do not have tornadoes! Having lived in Louisiana, Texas, Arkansas, and Florida, and lived through one killer tornado (March 1997 Arkadelphia, AR), this is refreshing to see.

Here is an interactive map I built with the online tool, CartoDB, showing all tornado paths between 1950 and 2015. You can scroll across the entire United States on this map.

Data comes from NOAA's NWS SPC.