One of the first steps in data analysis, after cleaning and tidying your data, is visualizing your data. This is my first strip plot. The strip plot shows each incident that includes a fatality. How many fatalities in the incident determines how high that incident is plotted. There are four countries in the Lake Chad Basin. Incidents are plotted according to which state they occurred in, and then states are grouped by their country.
In R, this meant faceting my graph by country. I also had to assign a "sum of fatalities" aggregate value to each state and reorder them in decreasing order within each facet (country), otherwise ggplot2 would order the states by number of incidents per state when the order I really want is sum of fatalities. I used the jitter geom width and height adjustments to slightly separate points that would otherwise be directly on top of one another. So, this graph is not about the precision of the points. It's more about generally comparing levels of fatal violence between states in the region.
Unfortunately, it is difficult to add text annotations (like my title) to a faceted graph in ggplot2 in R. Text annotations want to repeat in each facet, but I only want my title in one place. There is a way around this. You can pass a new dataframe to ggplot2 in order to plot a text label at a specific coordinate in a single facet. But I could not make it work for my graph. So I simply exported my graph as a jpg from R Studio, then opened the jpg in Photoshop to add my title.
From the visualization it is obvious that more people are dying in Nigeria than elsewhere (according to this ACLED data). Also, Cameroon and Niger each have a single state that is a major outlier in terms of number of fatalities compared to their other states. These are the kinds of simple observations that elicit more specific questions that will guide further analysis of the data. Another observation is that incidents are clustered around the 200 and 400 levels in Nigeria. When I turn off the jitter adjustments, these incidents fall exactly on the 200 and 400 marks. This suggests to me that the reports from the field are rounded or that the reports are general to begin with, which is something to be noted about the data set when trying to interpret any further findings.
The histogram below shows the number of incidents that occur by the number of fatalities per incident.
From here down I am putting fatalities back on the y axis. If we look at fatalities by country by year we see a huge change after 2013. And we see how many more fatalities occur in Nigeria than in the other basin countries.
Next we will look at how each country's average number of fatalities moves per year.
Even though Nigeria experiences many more fatalities than the other three countries, when it comes to the average number of fatalities, Niger and Cameroon outpaced Nigeria in 2015, though the differences are relatively small. Here is the same graph with the trend lines plotted. When looking at these trends of the averages, Cameroon comes out on top in 2015.
Now I'll put fatalities back on the x axis so we can look at just those fatal incidents involving Boko Haram between 2011 and 2015.
After 2012, the proportion of Boko Haram incidents that caused more than 6 fatalities began to increase. These densities show that the proportion of low-fatality incidents (1-5) was high in 2011 and 2012. Starting in 2013, the proportion of low-fatality incidents decreased as the proportion of higher-fatality (6+) incidents began increasing.
Now let's see these Boko Haram densities by country instead of by year.
If we look at the densities for fatalities caused by government forces, we see a pattern similar to the Boko Haram densities. By 2015, fewer incidents killed less than 6 people and more incidents killed more than 6 people.