Blog #7 – Beans!!
Uhmm, so I wanna make a map about beans. I have the data I need, and I think I can do it. The question I want to answer is: where in Alberta grows the most beans? Specifically, in 2006, because that’s the easiest recent census to deal with (believe me, I don’t want to go near the 2011 and 2016 census after dealing with them during my thesis). I’ve never worked with Alberta census data because my thesis focused on Ontario, so I’m excited to see how it differs.
My approach to answering this question is simple: how many hectares of beans are grown in the different census divisions of Alberta? This data is readily available and can even be brought down to a census subdivision level, but we’re going to keep it simple today. Subdivisions are the smallest census area which fit within the divisions, which then further fit within agricultural regions, and finally into the province.
To keep this blog short, today is just going to focus on what I do with the census data and how I can bring it into a GIS platform.
You can download the agricultural census yourself (if you really want to?) from the Odesi website. Expand the agriculture tab all the way and you will find the agricultural census data going back all the way to 1871. There’s all kinds of information, and the categories tend to change over the years, not to mention that the census borders change over the years, so it can be kind of a mess if you go back before the 1980s. You can find how many farms grow different crops, how many chickens were raised, how many tractors they have, all kinds of stuff.
What I do, is I go into the excel sheet for the year I’m interested and copy and paste all the columns for the province I am interested in into a new sheet. Then I make another new sheet and take the specific columns of data I want with the identifying information, like this:
You can see in the last column on the right that I took the white beans data in hectares for all the rows, which includes the provincial total, the total for each agricultural region, the totals for the divisions (see Division No. 1 in bold), and the subdivisions (un-bolded underneath the division). If you’re confused, welcome to the club. It took me about a year to get used to this dataset.
It looks like there are 7 agricultural regions in Alberta, you can tell which agricultural region each division is in by looking at the second column (e.g. Division 1 is in agricultural region 1, because they both have the number 10). You can tell which division each subdivision is in by looking at column 3 (e.g. Cypress County is in Division 1 because they both have the number 1).
I could go on explaining how the census works forever, but it’s going to get boring really fast, it might even be too late already. I am going to extract the data for each division, ignoring everything else, and plug this into GIS. Before I do this, I’m going to pull out a few extra columns of data just in case. Let’s grab hectare data for “other” beans, peas, chickpeas, lentils, soybeans, and corn for good measure.
When I copied over the Alberta data, I was so excited to see that there are only 19 divisions, and barely any data to sift through. This made it a lot easier to hand-select the divisions, instead of writing a whole MATLAB code to do it (Ontario has 48 divisions, and hundreds and hundreds of subdivisions).
What you might have noticed is a bunch of x’s in the first figure. This means that the data is suppressed for one reason or another. Maybe there are only a couple of farms growing beans in that division and they don’t want to be singled out, who knows? All I know is that it gives me a headache. When we look at the division data alone, we see that there are lots of x’s:
What do we do with these x’s? Nothing today, I have a method of guessing these values that I used in my thesis, but who has time for that? I’ll make a special category for them in the map.
I’ve also included a new column at the beginning, which is a 4-digit number that combines the first, third, and fourth number we saw in the first table. This number is important, it represents the numeric identifier for each division that we will use to bring the data into GIS.
You might have also noticed that Division 16 is missing. I don’t know why, but I figured it’s because it’s way in the northeast of Alberta (red star), where there probably isn’t any agriculture (see figure of Alberta outlined in blue). Alberta is a bit squished here because I haven’t changed the projection yet… woop.
In this image of Alberta, each polygon is assigned a 4-digit value called CDUID (census division unit identity I think?). This matches the 4-digit number I made in excel, I made sure that these values matched before I followed through, and they do!
Next time we will use this 4-digit number to bring the excel data table we made into GIS, and we can make a map out of it!
Comments