InfDB Buildings Processor

Motivation

The use of openly available infrastructure data provides a reliable foundation for developing tools in urban analysis and planning. By integrating this data, it becomes possible to support a more detailed understanding of the built environment.

To manage and utilize complex 3D city models, InfDB incorporates 3DCityDBv5. It supports the import of LoD2 datasets in the standardized CityGML format, enabling structured storage and retrieval of 3D geospatial data. The models can be enriched with statistical data from census sources, and spatial context is further improved through integration with the Basemap Project. Together, these components provide a flexible and scalable foundation for geospatial exploration, simulation, and infrastructure planning. They are all built into InfDB directly.

To make the above-mentioned data usable in Pylovo, it must first be combined into a single table: pylovo_input.buildings. This is the purpose of the processor in InfDB.

Data Sources

The processor uses three InfDB data sources:

  • 3DCityDBv5: Provides detailed 3D building models in Level of Detail 2 (LOD2), including roofs, walls, and building functions. Useful for visualization, simulation, and spatial analysis tasks.

  • Census: Statistical data in 100m x 100m grids which contains demographic and housing statistics from Zensus 2022, such as population density, household types, and age structure.

  • Basemap: Basemap includes streets, parcels, land cover, and administrative features. Supports background mapping and street-level geometry extraction.

3DCityDBv5 resides in the citydb schema. Census and Basemap are located in the opendata schema.

Column overview

An overview of the column sources and roles for pylovo_input.buildings is shown below:

Column Name

Source

Role

id

citydb.feature.id

Building ID

objectid

citydb.feature.objectid

Alternate unique ID (from citydb)

geom

citydb.geometry_data.geometry

2D geometry in EPSG:3035

building_use_id

citydb.property.val_*

Internal ID for building use

building_use

citydb.property.val_*

One of Residential, Industrial, Commercial, Public

floor_area

citydb.property.val_*

Ground floor area

height

citydb.property.val_*

Building height

floor_number

citydb.property.val_* or estimated from height

Number of floors

construction_year

opendata.cns22_100m_baujahr_jz

Ranges: '-1919', '1919-1948', '1949-1978', '1979-1990', '1991-2000', '2001-2010', '2011-2019' or '2020-'

building_type

Derived from opendata.cns22_100m_wohnung_gbtyp_groesse, plus height and floor_area

AB, MFH, TH, or SFH

occupants_per_building

opendata.cns22_100m_bevoelkerungszahl

Estimated number of residents

households_per_building

opendata.cns22_100m_durchschn_haushaltsgroesse

Estimated number of households

postcode

opendata.plz_plz-5stellig.plz

Postcode based on building centroid

address_street_id

Derived from pylovo_input.ways and citydb.address

Street ID corresponding to the building’s address

Processing Steps

This section outlines the full data filling and processing flow for building data, combining building geometries and census data.

1. Create the Buildings Table

We begin by creating the pylovo_input.buildings table, which will hold all relevant attributes for each structure. Using CityDB, we populate id, objectid, building_use_id, and building_use (mapped from building_use_id). Only buildings with function codes starting with 31001_ are loaded. Garages (31001_2463) and water tanks (31001_2513) are excluded.

2. Import and Filter Geometry

Geometry is linked to ground surfaces via the objectid and converted into EPSG:3035. The floor_area can also be extracted from the ground surface features. Any building smaller than 12 m² is filtered out, as such structures are assumed to be non-habitable. After height is imported from citydb, buildings shorter than 3.5 m are filtered out.

3. Fill and Estimate Floor Numbers

The floor_number is taken from the storeysAboveGround attribute when available. If it’s missing, we estimate it by dividing the total building height by the median height per floor. These medians are calculated from all buildings in InfDB where floor numbers were known and grouped by type.

4. Estimate Occupancy and Households

Only residential buildings proceed to this step. We estimate the number of occupants by distributing census population data proportionally based on each building’s volume (floor_area * height). Households are then estimated using the average household size from census. Every residential building is enforced to have at least one occupant and one household. If census data is missing, the nearest census grid with data is used to fill in the values.

5. Assign Construction Year

Construction years are assigned randomly but weighted by the corresponding census grid data. Again, for buildings where the construction year is missing, we find the nearest grid with available data and assign based on those distributions.

6. Count Touching Neighbors

To support building type classification, we identify neighboring buildings. Structures within 0.01 meters are considered “touching”. Each building is assigned a neighbor count. Zero neighbors are allowed. This is used for assigning building types.

7. Classify Building Type

Buildings are initially classified based on structural characteristics and neighborhood context. The following initial classification rules are applied:

  • Apartment Blocks (AB) are defined as buildings with 4 or more floors, or those with 3+ floors and 3+ touching neighbors, or those with a floor area above 1500 m².

  • Single-Family Homes (SFH) are smaller buildings with under 350 m², up to 3 floors, and no touching neighbors. Extremely small homes (under 200 m²) with ≤2 floors and fewer than 2 touching neighbors also qualify.

  • Townhouses (TH) fall in the range of 80-150 m², have 2-3 floors, 1-2 touching neighbors, and similar size (±20%) to neighboring buildings.

  • Multi-Family Homes (MFH) have 2-3 floors or are buildings over 150 m² with 1-3 touching neighbors.

Types are recursively propagated to touching neighbors when patterns match. Buildings that touch an AB become AB. Buildings that touch an SFH and have an area of less than 100 m² and ≤2 floors become SFH. Buildings touching at least 2 THs with 25% floor area discrepancy become TH. Buildings touching 1 TH with 20% floor area discrepancy become TH. Buildings with 2-3 floors touching MFH likely also MFH. If classification fails but the building is residential, it defaults to AB. This classification is then refined using household counts: one household implies SFH or TH, two to four households point to MFH, and five or more households suggest AB.

8. Rebalance to Match Census

Once initial classification is complete, building type distributions are adjusted per census grid to match target proportions. Rebalancing follows a set of conversion rules:

  • To increase AB, convert the largest MFH or TH.

  • To raise MFH numbers, convert the largest TH or smallest AB.

  • To increase TH, convert smaller MFHs.

  • To boost SFH, convert smaller TH or MFHs.

Household counts are kept consistent during these adjustments.

9. Final Attribute Assignments

Postcodes are assigned using PLZ geometry overlays. Finally, address_street_id is linked using matching address data from buildings and ways. This field can be null if no address match is found.