Spatial Joins in R

Saturday, Feb 23, 2019| Tags: spatial joins, sp, rgdal, r

Goal

The goal of this post is join spatial point and area data. I demonstrate how to identify what community each Divvy station resides in. In other words, I find the community in which the latitude and longitude of the Divvy station is and append a column to indicate the community that each Divvy station resides in.

See my other post about overlaying point data onto area data with ggplot.

The goal of this post is to demonstrate how to overlay geographic points onto geographic polygons. Specifically, I’ll demonstrate how map the location of Chicago Divvy bike stations (points) onto the Chicago community map (polygons from a shapefile).

Data

The Chicago community boundaries came in a shapefile.

The Divvy station contained the latitidue and longitude of all the Divvy bike stations in the Chicago area, but did not contain the community they were in.

To do this, I use the sp and rgdal packages. The

library(sp)
library(rgdal)
library(readr)

First, read in the shapefile data. Note that I extracted the .dbf, .prj, and .shx files to the data directory.

comm <- readOGR("./data/.")

readOGR creates a SpatialPolygonsDataFrame, which is a fairly complicated object, so I’ll refrain from giving you a glimpse.

Now, load in the Divvy station data:

divvy <- read_csv("./data/Divvy_Bicycle_Stations.csv")

Now it is necessary to define the columns in divvy that represents the coordinates and then ensure that both datasets share the same coordinate projection.

coordinates(divvy) <- c("Longitude", "Latitude")
proj4string(divvy) <- proj4string(comm)

So, now I’m ready to “overlay” Divvy station locations on top of the community boundaries (from the shapefile) and identify the community in which each station resides. I’ll actually create two new columns in the divvy data – one for the numeric community id and one for the community name.

To do this, we use the over function from sp package.

divvy$community <- over(divvy, comm)$community
divvy$comm_id <- over(divvy, comm)$area_num_1
head(divvy)
##             coordinates  ID                   Station.Name
## 1 (-87.57645, 41.76664)  11         Jeffery Blvd & 71st St
## 2 (-87.62872, 41.89745) 106          State St & Pearson St
## 3 (-87.68655, 42.05704) 661          Evanston Civic Center
## 4  (-87.6967, 41.90036) 622     California Ave & Cortez St
## 5 (-87.58005, 41.74736) 665 South Chicago Ave & Elliot Ave
## 6 (-87.62034, 41.87647)   2            Buckingham Fountain
##                          Address Total.Docks Docks.in.Service     Status
## 1         Jeffery Blvd & 71st St          11               11 In Service
## 2          State St & Pearson St          27               27 In Service
## 3          Evanston Civic Center          15               15 In Service
## 4     California Ave & Cortez St          15               15 In Service
## 5 South Chicago Ave & Elliot Ave           7                7 In Service
## 6            Buckingham Fountain          39               39 In Service
##                           Location       community comm_id
## 1 (41.76663823695, -87.5764501141)     SOUTH SHORE      43
## 2          (41.897448, -87.628722) NEAR NORTH SIDE       8
## 3          (42.057044, -87.686554)            <NA>    <NA>
## 4          (41.900363, -87.696704)       WEST TOWN      24
## 5          (41.747363, -87.580046)     AVALON PARK      45
## 6            (41.87647, -87.62034)            LOOP      32

And…voila! Now the divvy data can be aggregated (e.g., station count, total docks in service) up to the community level.

QUESTIONS?

I'd be happy to sit down with you for a free consultation.

Contact me