Data visualisation is key to understanding your dataset. It is also a powerful tool to convey the message you have extracted from your data. In this post, I will show you how to easily plot your data on a map using Python.

Motivations and results

I have recently worked on a market research for a startup idea. For this research we did an online survey to better understand specific habits of consumers. To be able to analyse global and local trends, we asked the consumers to provide (among many other things) their postal code.

I decided to write a code to visualise the geographic distribution of the answers we got. In other words I wanted to plot my data on a map. Good news: As it is often the case in python, it turned out to be much easier than I was expecting. The code I ended up writing is available on my github account. It can be easily modified to display different type of data over different geographic areas. Before we dive into the code, you can have a look at the resulting map below (I have used a dummy dataset to produce this map). Not too bad, right?


Let’s dive in

So let see how this is done. For this code we need two python packages:

1
2
import pandas as pd
import folium

Pandas is an high-performance, easy-to-use data structures and data analysis tools for the Python programming language. We will use it to load and store our data.
Folium is the package we will use to create the map and overplot our data. As we will see, it is really easy to use.

Creating the map

The first step is to create the initial world map. This is done in a single line with folium:

1
m = folium.Map(location=[46.5,2], zoom_start=5.49)

As you can see we need to specify a location, that is the latitude and longitude where we want to center the map (in our case it will be centred on France) and a zoom.

As always, you need data

Then we want to plot a number of counts over different departments (the French equivalent of UK counties). For this we need two files:

  • A file containing the number of counts per departments.
  • A file containing the boundaries of the French departments.

The first one should not be an issue because it is our data. Luckily for us, the second one is also no problem. There is a standard format for such files called geojson (it is a standard json file with some more conventions) and it is relatively easy to find one for the administrative boundaries. I have found mine here. I am sure you will find the one for your favorite country by a clever use of your traditional search engine.

The file containing the number of counts per departments is a 2 columns CSV file. The first column identifies the department (French departments are ordered by numbers). The second column gives the number of counts for each departments. Here are a few lines of the file:

1
2
3
4
5
6
7
department,counts
1,16
2,44
3,97
4,29
5,14
6,11

I use pandas to read the CSV file:

1
df_counts = pd.read_csv('data/counts_per_department.csv', dtype={'department':object})

As you can see I cast the department column as a string (object). That might seem odd for now, but this is because we will use this column to match the departments with the ones from the geojson file, and the numbers are given as strings in this file.

Plotting your data on the map

We now get to the method that will do the magic, a.k.a. the choropleth. Let’s look at the code first:

1
2
3
4
5
6
7
8
9
10
11
m.choropleth(
geo_data     = 'data/departements.geojson',
name         = 'French departments',
data         = df_counts,
columns      = ['department','counts'],
key_on       = 'feature.properties.code',
fill_color   = 'YlGn',
fill_opacity = 0.7,
line_opacity = 0.2,
legend_name  = 'Number of answers per department'
)

The method has several arguments:

  • geo_data specifies the geojson file we want to use (containing the boundaries of the departments).
  • name is the name of the layer we are adding.
  • data is the pandas data frame containing our data.
  • columns are the columns of our data frame.
  • key_on is probably the most important argument. It tells to which key of the geojson file we want to link our data. In the current case, the key ‘feature.properties.code’ contains the number of the department which is the same number as the one in the department column of the data frame (and if you followed, you know it is also a string). Note that the key_on argument will always start by ‘features.’
  • The following arguments specify different plotting settings and the legend of the plot.

Finally we save the plot as an html page to keep it interactive:

1
m.save('departments.html')

And that is it! We are done! Nothing more! You can literally write the full thing in 6 lines of code! Open the resulting file with your favorite browser and enjoy the result.

Full code

Here is the full code, also available with everything you need to make it run on github:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import pandas as pd
import folium

# Loading the data with the number of counts per departments
# We load the department column as string because it is a str
# in the geojson file.
df_counts = pd.read_csv('data/counts_per_department.csv', dtype={'department':object})

# Creating the initial map
m = folium.Map(location=[46.5,2], zoom_start=5.49)

# Creating the choropleth to add the coloured departments
# on top of the map
m.choropleth(
geo_data = 'data/departements.geojson',
name = 'French departments',
data = df_counts,
columns = ['department','counts'],
key_on = 'feature.properties.code',
fill_color = 'YlGn',
fill_opacity = 0.7,
line_opacity = 0.2,
legend_name = 'Number of answers per department'
)

# Save to html and open it in your favorite browser
m.save('departments.html')

Did you enjoy this post? Do you have a question? Feel free to leave a comment or contact me!


Leave a Reply

Your email address will not be published. Required fields are marked *