A quick way to load the complete Geocoded National Address File of Australia (GNAF) and Australian Administrative Boundaries into Postgres, simplified and ready to use as reference data for geocoding, analysis, visualisation and aggregation.
Have a look at these intro slides (PDF), as well as the data.gov.au page.
Running the Python script takes 30-120 minutes on a Postgres server configured to take advantage of the RAM available.
You can process the GDA94 or GDA2020 version of the data - just ensure that you download the same version for both GNAF and the Administrative Boundaries. If you don't know what GDA94 or GDA2020 is, download the GDA94 versions (FYI - they're different coordinate systems)
To get a good load time you'll need to configure your Postgres server for performance. There's a good guide here, noting it's a few years old and some of the memory parameters can be beefed up if you have the RAM.
CREATE EXTENSION postgis
-h
argument (see command line examples below)The behaviour of gnaf-loader can be controlled by specifying various command line options to the script. Supported arguments are:
--gnaf-tables-path
specifies the path to the extracted GNAF PSV files. This directory must be accessible by the Postgres server, and the corresponding local path for the server to this directory may need to be set via the local-server-dir
argument--local-server-dir
specifies the local path on the Postgres server corresponding to gnaf-tables-path
. If the server is running locally this argument can be omitted.--admin-bdys-path
specifies the path to the extracted Shapefile admin boundary files. Unlike gnaf-tables-path
, this path does not necessarily have to be accessible to the remote Postgres server.--pghost
the host name for the Postgres server. This defaults to the PGHOST
environment variable if set, otherwise defaults to localhost
.--pgport
the port number for the Postgres server. This defaults to the PGPORT
environment variable if set, otherwise 5432
.--pgdb
the database name for Postgres server. This defaults to the PGDATABASE
environment variable if set, otherwise geoscape
.--pguser
the username for accessing the Postgres server. This defaults to the PGUSER
environment variable if set, otherwise postgres
.--pgpassword
password for accessing the Postgres server. This defaults to the PGPASSWORD
environment variable if set, otherwise password
.--srid
Sets the coordinate system of the input data. Valid values are 4283
(the default: GDA94 lat/long) and 7844
(GDA2020 lat/long).--geoscape-version
Geoscape version number in YYYYMM format. Defaults to current year and last release month. e.g. 202408
.--previous-geoscape-version
Previous Geoscape release version number as YYYYMM; used for QA comparison. e.g. 202405
.--raw-gnaf-schema
schema name to store raw GNAF tables in. Defaults to raw_gnaf_
.--raw-admin-schema
schema name to store raw admin boundary tables in. Defaults to raw_admin_bdys_
.--gnaf-schema
destination schema name to store final GNAF tables in. Defaults to gnaf_
.--admin-schema
destination schema name to store final admin boundary tables in. Defaults to admin_bdys_
.--previous-gnaf-schema
Schema with previous version of GNAF tables in. Defaults to gnaf_
.--previous-admin-schema
Schema with previous version of admin boundary tables in. Defaults to admin_bdys_
.--states
space separated list of states to load, eg --states VIC TAS
. Defaults to loading all states.--prevacuum
forces the database to be vacuumed after dropping tables. Defaults to off, and specifying this option will slow the import process.--raw-fk
creates both primary & foreign keys for the raw GNAF tables. Defaults to off, and will slow the import process if specified. Use this option
if you intend to utilise the raw GNAF tables as anything more then a temporary import step. Note that the final processed tables will always have appropriate
primary and foreign keys set.--raw-unlogged
creates unlogged raw GNAF tables, speeding up the import. Defaults to off. Only specify this option if you don't care about the raw data tables after the import - they will be lost if the server crashes!--max-processes
specifies the maximum number of parallel processes to use for the data load. Set this to the number of cores on the Postgres server minus 2, but limit to 12 if 16+ cores - there is minimal benefit beyond 12. Defaults to 4.--no-boundary-tag
DO NOT tag all addresses with some of the key admin boundary IDs for creating aggregates and choropleth maps.python load-gnaf.py --gnaf-tables-path="C:tempgeoscape_202408G-NAF" --admin-bdys-path="C:tempgeoscape_202408Administrative Boundaries"
Loads the GNAF tables to a Postgres server running locally. GNAF archives have been extracted to the folder C:tempgeoscape_202408G-NAF
, and admin boundaries have been extracted to the C:tempgeoscape_202408Administrative Boundaries
folder.python load-gnaf.py --gnaf-tables-path="\svrsharedgnaf" --local-server-dir="f:sharedgnaf" --admin-bdys-path="c:tempunzippedAdminBounds_ESRI"
Loads the GNAF tables which have been extracted to the shared folder \svrsharedgnaf
. This shared folder corresponds to the local f:sharedgnaf
folder on the Postgres server. Admin boundaries have been extracted to the c:tempunzippedAdminBounds_ESRI
folder.python load-gnaf.py --states VIC TAS NT ...
Loads only the data for Victoria, Tasmania and Northern TerritoryYou can load the Admin Boundaries without GNAF. To do this: comment out steps 1, 3 and 4 in def main.
Note: you can't load GNAF without the Admin Bdys due to dependencies required to split Melbourne and to fix non-boundary locality_pids on addresses.
When using the resulting data from this process - you will need to adhere to the attribution requirements on the data.gov.au pages for GNAF and the Admin Bdys, as part of the open data licensing requirements.
GNAF and the Admin Boundaries are ready to use in Postgres in an image on Docker Hub.
docker pull minus34/gnafloader:latest
docker run --publish=5433:5432 minus34/gnafloader:latest
5433
. Default login is - user: postgres
, password: password
Note: the compressed Docker image is 8Gb, uncompressed is 25Gb
WARNING: The default postgres superuser password is insecure and should be changed using:
ALTER USER postgres PASSWORD '
Download Postgres dump files and restore them in your database.
Should take 15-60 minutes.
Geoparquet versions of the spatial tables, as well as parquet versions of the non-spatial tables, are in a public S3 bucket for use directly in an application or service. They can also be downloaded using the AWS CLI.
Geometries have WGS84 lat/long coordinates (SRID/EPSG:4326). A sample query for analysing the data using Apache Sedona, the spatial extension to Apache Spark is in the spark
folder.
The files are here: s3://minus34.com/opendata/geoscape-202408/geoparquet/
aws s3 ls s3://minus34.com/opendata/geoscape-202408/geoparquet/
aws s3 sync s3://minus34.com/opendata/geoscape-202408/geoparquet/
Incorporates or developed using G-NAF © Geoscape Australia licensed by the Commonwealth of Australia under the Open Geo-coded National Address File (G-NAF) End User Licence Agreement.
Incorporates or developed using Administrative Boundaries © Geoscape Australia licensed by the Commonwealth of Australia under Creative Commons Attribution 4.0 International licence (CC BY 4.0).
GNAF and the Admin Bdys have been customised to remove some of the known, minor limitations with the data. The most notable are:
loc9901d119afda_1
& loc9901d119afda_2
). The split occurs at the Yarra River (based on the postcodes in the Melbourne addresses)