This notebook provides some basic helper functions to work with the EUBUCCO database.
Author: Felix Wagner (wagner@mcc-berlin.net), Nikola Milojevic-Dupont
Date: 08.05.2022 (updated: 19.10.2022)
import urllib.request
import pandas as pd
import geopandas as gpd
from shapely import wkt
import matplotlib.pyplot as plt
Download data¶
Visit https://eubucco.com/data/ and manually download building data for the city of Vichy, France or use our API:
url = 'https://api.eubucco.com/v0.1/files/46c25dfc-aa93-49a5-bab8-df69d2fae81d/download'
path = 'Vichy.gpkg.zip'
urllib.request.urlretrieve(url, path)
('Vichy.gpkg.zip', <http.client.HTTPMessage at 0x7fc41a8854b0>)
Load data¶
Use GeoPandas, it can even read zipped files without unzipping them!
df = gpd.read_file(path) # if you have unzipped the file removed .zip
df.head()
id | height | age | type | id_source | type_source | geometry | |
---|---|---|---|---|---|---|---|
0 | v0.1-FRA.1.2.3.11_1-0 | 1.8 | NaN | None | BATIMENT0000002201545772 | Indifférencié | POLYGON ((3813315.435 2578264.986, 3813320.501... |
1 | v0.1-FRA.1.2.3.11_1-1 | 2.5 | NaN | None | BATIMENT0000002201545773 | Indifférencié | POLYGON ((3813314.600 2578266.873, 3813312.160... |
2 | v0.1-FRA.1.2.3.11_1-2 | 2.2 | NaN | None | BATIMENT0000002201545774 | Indifférencié | POLYGON ((3813365.262 2578291.039, 3813360.612... |
3 | v0.1-FRA.1.2.3.11_1-3 | 2.2 | NaN | None | BATIMENT0000002201545775 | Indifférencié | POLYGON ((3813377.901 2578285.128, 3813362.431... |
4 | v0.1-FRA.1.2.3.11_1-4 | 3.1 | NaN | None | BATIMENT0000002201545776 | Indifférencié | POLYGON ((3813308.103 2578261.758, 3813306.462... |
Load data in small chunks¶
If limited computational resources are available, the data can be read in chunks (of any size).
df = gpd.read_file(path, rows=1000)
df.tail()
id | height | age | type | id_source | type_source | geometry | |
---|---|---|---|---|---|---|---|
995 | v0.1-FRA.1.2.3.11_1-995 | 2.5 | NaN | None | BATIMENT0000002201689056 | Indifférencié | POLYGON ((3813076.434 2578803.380, 3813077.836... |
996 | v0.1-FRA.1.2.3.11_1-996 | 9.0 | 1900.0 | non-residential | BATIMENT0000002201689058 | Commercial et services | POLYGON ((3813088.148 2578891.694, 3813097.014... |
997 | v0.1-FRA.1.2.3.11_1-997 | 20.0 | NaN | non-residential | BATIMENT0000002201688763 | Commercial et services | POLYGON ((3812548.377 2578945.916, 3812560.541... |
998 | v0.1-FRA.1.2.3.11_1-998 | 21.7 | 1900.0 | residential | BATIMENT0000002201688769 | Résidentiel | POLYGON ((3812621.572 2578861.264, 3812614.153... |
999 | v0.1-FRA.1.2.3.11_1-999 | 13.8 | NaN | None | BATIMENT0000002201688779 | Indifférencié | POLYGON ((3812648.147 2578905.983, 3812645.797... |
Plot data¶
df.plot() # plot all building footprints from the file
<AxesSubplot: >
df.plot(column='height') # plot building footprints and color-code building heights
<AxesSubplot: >
# plot individual building footprint
df.iloc[[0]].plot()
<AxesSubplot:>
# plot distribution of building heights
df['height'].hist(bins=20)
<AxesSubplot: >
Matching regional boundaries¶
From GADM to individual buildings and vice versa: Adding country, region and city level information to the building level data.¶
If you want to find buildings only from a specific country, region or city, we recommend to use the table admin-codes-matches-v0.1.csv to filter for the country or region name and then choose the relevant dataset to download based on the id.
In case you want to add the country, region and city names to the building level data, feel free to use the following function to match the admin-codes-matches-v0.1.csv table with the ids of your data.
def match_gadm_info(df_temp,df_overview):
""" function to match country, region and city info from overview table with building level data
df_temp (dataframe):= building level dataframe
df_overview:= overview table
"""
# remove numbering at end of id str
df_temp['id_temp'] = df_temp['id'].str.rsplit('-',1).apply(lambda x: x[0])
# merge with overview file
df_out = df.merge(df_overview, left_on='id_temp',right_on='id')
# keep only relevant columns
df_out = df_out[['id_x','id_source','country','region','city','height','age','type','type_source','geometry']]
# rename back to 'id' and return
return df_out.rename(columns={'idx_x':'id'})
# define path to overview file
path_overview_file = 'admin-codes-matches-v0.1.zip'
# read in overview file
df_overview = pd.read_csv(path_overview_file)
# check overview file
df_overview.head()
id | country | region | city | source | |
---|---|---|---|---|---|
0 | v0.1-AUT.4.16.24_1 | austria | Oberösterreich | Pilsbach | austria-osm |
1 | v0.1-AUT.3.15.42_1 | austria | Niederösterreich | Wimpassing im Schwarzatale | austria-osm |
2 | v0.1-AUT.3.15.43_1 | austria | Niederösterreich | Würflach | austria-osm |
3 | v0.1-AUT.3.15.44_1 | austria | Niederösterreich | Zöbern | austria-osm |
4 | v0.1-AUT.3.17.1_1 | austria | Niederösterreich | Sankt Pölten | austria-osm |
# match gadm info to bldg lvl data and assign to df
df = match_gadm_info(df,df_overview)
# check df: we have some additional columns added now
df.head()
id_x | id_source | country | region | city | height | age | type | type_source | geometry | |
---|---|---|---|---|---|---|---|---|---|---|
0 | v0.1-FRA.1.2.3.11_1-0 | BATIMENT0000002201545772 | france | Auvergne-Rhône-Alpes | Vichy | 1.8 | NaN | None | Indifférencié | POLYGON ((3813315.435 2578264.986, 3813320.501... |
1 | v0.1-FRA.1.2.3.11_1-1 | BATIMENT0000002201545773 | france | Auvergne-Rhône-Alpes | Vichy | 2.5 | NaN | None | Indifférencié | POLYGON ((3813314.600 2578266.873, 3813312.160... |
2 | v0.1-FRA.1.2.3.11_1-2 | BATIMENT0000002201545774 | france | Auvergne-Rhône-Alpes | Vichy | 2.2 | NaN | None | Indifférencié | POLYGON ((3813365.262 2578291.039, 3813360.612... |
3 | v0.1-FRA.1.2.3.11_1-3 | BATIMENT0000002201545775 | france | Auvergne-Rhône-Alpes | Vichy | 2.2 | NaN | None | Indifférencié | POLYGON ((3813377.901 2578285.128, 3813362.431... |
4 | v0.1-FRA.1.2.3.11_1-4 | BATIMENT0000002201545776 | france | Auvergne-Rhône-Alpes | Vichy | 3.1 | NaN | None | Indifférencié | POLYGON ((3813308.103 2578261.758, 3813306.462... |