Complete notebook available at: https://github.com/ai4up/ufo-prediction/blob/main/demo/demo.ipynb
Motivation¶
Building attributes such as building height, type, and construction year are not available for all buildings in EUBUCCO. However, for many prospective use cases of the dataset, such as energy modeling, the building attributes are of high importance. This notebook shows how the available building footprints can be used to estimate missing building attributes with supervised machine learning. For more details on the conceptualization and feature engineering see:
Data¶
Demo sample of ~20k buildings for Spain, ~50k for France, and 170k for the Netherlands. All 117 urban form features, lat lon, as well as some auxiliary attributes like city name, neighborhood, building type, etc. are included.
The demo samples are stored using Git Large File Storage (LFS). To download them explicitly use:
!git clone git@github.com:ai4up/ufo-prediction.git
!git lfs pull
DATA_DIR = '.'
path_data_NLD = os.path.join(DATA_DIR, 'df-NLD-exp.pkl')
path_data_FRA = os.path.join(DATA_DIR, 'df-FRA-exp.pkl')
path_data_ESP = os.path.join(DATA_DIR, 'df-ESP-exp.pkl')
df = pd.read_pickle(path_data_NLD)
Prediction¶
xgb_model_params = {'tree_method': 'hist'}
xgb_hyperparams = {
'max_depth': 5,
'learning_rate': 0.1,
'n_estimators': 500,
'colsample_bytree': 0.5,
'subsample': 1.0,
}
Regression¶
predictor = AgePredictor(
model=XGBRegressor(**xgb_model_params),
df=df,
test_training_split=pp.split_80_20,
# cross_validation_split=pp.cross_validation,
early_stopping=True,
hyperparameters=xgb_hyperparams,
preprocessing_stages=[pp.remove_outliers]
)
predictor.evaluate()
MAE: 10.73 y RMSE: 16.85 y R2: 0.5483 R2: nan MAPE: nan
Classification¶
tabula_nl_bins = [1900, 1965, 1975, 1992, 2006, 2015, 2022]
equally_sized_bins = (1900, 2020, 10)
classifier = AgeClassifier(
model=XGBClassifier(**xgb_model_params),
df=df,
test_training_split=pp.split_80_20,
# cross_validation_split=pp.cross_validation,
preprocessing_stages=[pp.remove_outliers],
hyperparameters=xgb_hyperparams,
mitigate_class_imbalance=True,
# bin_config=equally_sized_bins,
bins=tabula_nl_bins,
)
classifier.evaluate()
Classification report: precision recall f1-score support 1900-1964 0.751537 0.842825 0.794567 8850 1965-1974 0.875129 0.834151 0.854149 7133 1975-1991 0.904658 0.799159 0.848642 8798 1992-2005 0.852081 0.774682 0.811540 6209 2006-2014 0.595462 0.695315 0.641526 3095 2015-2021 0.496798 0.711664 0.585129 763 accuracy 0.801911 0.801911 0.801911 0 macro avg 0.745944 0.776299 0.755926 34848 weighted avg 0.813968 0.801911 0.805261 34848 Cohen’s kappa: 0.7501 Matthews correlation coefficient (MCC): 0.7513
Country and generalization comparison¶
The AgePredictorComparison
faciliates comparisons between differently configured training runs, for example to compare the prediction performance between countries, cross-validation strategies, oversampling strategies or any other preprocessing steps.
comparison_config = {
'Spain': {'df': path_data_ESP},
'France': {'df': path_data_FRA},
'Netherlands': {'df': path_data_NLD},
}
grid_comparison_config = {
'random-cv': {'cross_validation_split': pp.cross_validation},
'neighborhood-cv': {'cross_validation_split': pp.neighborhood_cross_validation},
'city-cv': {'cross_validation_split': pp.city_cross_validation},
}
comparison = AgePredictorComparison(
exp_name='demo',
model=XGBRegressor(**xgb_model_params),
df=None,
frac=0.5,
cross_validation_split=None,
preprocessing_stages=[pp.remove_outliers],
hyperparameters=xgb_hyperparams,
compare_feature_importance=False,
compare_classification_error=False,
include_baseline=False,
save_results=False,
garbage_collect_after_training=True,
comparison_config=comparison_config,
grid_comparison_config=grid_comparison_config,
)
results = comparison.evaluate()
results
name | R2 | R2_std | MAE | MAE_std | RMSE | RMSE_std | within_5_years | within_10_years | within_20_years | R2_seed_0 | |
---|---|---|---|---|---|---|---|---|---|---|---|
8 | Netherlands_city-cv | 0.135401 | 0.0 | 18.030643 | 0.0 | 23.598668 | 0.0 | 0.221385 | 0.392977 | 0.638903 | 0.135401 |
7 | France_city-cv | 0.187767 | 0.0 | 18.645831 | 0.0 | 23.772030 | 0.0 | 0.176875 | 0.345911 | 0.615315 | 0.187767 |
6 | Spain_city-cv | 0.197072 | 0.0 | 23.840955 | 0.0 | 29.563272 | 0.0 | 0.126411 | 0.247178 | 0.494357 | 0.197072 |
3 | Spain_neighborhood-cv | 0.198503 | 0.0 | 23.779078 | 0.0 | 29.536916 | 0.0 | 0.129797 | 0.247178 | 0.506772 | 0.198503 |
5 | Netherlands_neighborhood-cv | 0.304538 | 0.0 | 15.884060 | 0.0 | 21.164937 | 0.0 | 0.241489 | 0.444702 | 0.699700 | 0.304538 |
4 | France_neighborhood-cv | 0.330228 | 0.0 | 16.306574 | 0.0 | 21.586864 | 0.0 | 0.211348 | 0.408337 | 0.705209 | 0.330228 |
0 | Spain_random-cv | 0.363164 | 0.0 | 20.108252 | 0.0 | 26.328608 | 0.0 | 0.180587 | 0.355530 | 0.592551 | 0.363164 |
1 | France_random-cv | 0.511105 | 0.0 | 12.372172 | 0.0 | 18.443089 | 0.0 | 0.369564 | 0.593466 | 0.806340 | 0.511105 |
2 | Netherlands_random-cv | 0.575725 | 0.0 | 10.203823 | 0.0 | 16.531180 | 0.0 | 0.525335 | 0.695626 | 0.827052 | 0.575725 |