Mimirsbrunn is an geocoding service build upon Elasticsearch.
It is an independent service, but Navitia uses it as it's global geocoding service.
Mimirsbrunn is composed of several parts, some managing the data import in Elasticsearch, and a web service wrapping Elasticsearch responses to return formated responses (we use geocodejson as the responses format)
To build, you must first install rust:
curl https://sh.rustup.rs -sSf | shand then build Mimirsbrunn:
cargo build --releaseTo use the Mimirsbrunn components you will need an elasticsearch database.
The elasticsearch version need to be >= 2.0
To test simply launch:
cargo testIntegration tests are spawning one ElasticSearch docker, so you'll need a recent docker version. Only one docker is spawn, so ES base has to be cleaned before each test.
To write a new test:
- write your test in a separate file in tests/
- add a call to your test in tests/tests.rs::test_all()
- pass a new ElasticSearchWrapper to your test method to get the right connection string for ES base
- the creation of this ElasticSearchWrapper automatically cleans ES base (you can also refresh ES base, clean up during tests, etc.)
Data are imported in multiple indexes with this structure:
munin -> munin_addr -> munin_addr_dataset1 -> munin_addr_dataset1_20160101T123200
|-> munin_addr_dataset2 -> munin_addr_dataset2_20160101T123200
|-> munin_admin -> munin_admin_dataset1 -> munin_admin_dataset1_20160101T123200
|-> munin_street -> munin_street_dataset1 -> munin_street_dataset1_20160101T123200
Munin is the root index, it's an alias used by the frontend (bragi), it pointing to an index for each dataset/document type. So if we have address data for France and Belgium we will have two indexes: "addr_fr" and "addr_be". These are also aliases, they point to a dated index, this way we can import data in another index without impacting anyone, then switch the alias to point to the new data.
This will give us the ability to only a part of the world without any downtime.
During an update the indexes will be (for the previous example say we update addr_dataset1):
During the data update:
munin -> munin_addr -> munin_addr_dataset1 -> munin_addr_dataset1_20160101T123200
|-> munin_addr_dataset2 -> munin_addr_dataset2_20160101T123200
|-> munin_admin -> munin_admin_dataset1 -> munin_admin_dataset1_20160101T123200
|-> munin_street -> munin_street_dataset1 -> munin_street_dataset1_20160101T123200
|-> munin_stop -> munin_stop_dataset1 -> munin_stop_dataset1_20160101T123200
munin_addr_dataset1_20160201T123200
and when the loading is finished
munin -> munin_addr -> munin_addr_dataset1
|-> munin_addr_dataset1_20160201T123200
|-> munin_addr_dataset2 -> munin_addr_dataset2_20160101T123200
|-> munin_admin -> munin_admin_dataset1 -> munin_admin_dataset1_20160101T123200
|-> munin_street -> munin_street_dataset1 -> munin_street_dataset1_20160101T123200
|-> munin_stop -> munin_stop_dataset1 -> munin_stop_dataset1_20160101T123200
There is one major drawback: dataset aren't hermetic since we import multiple OSM files, the area near the border will be in multiple dataset, for now we accept these duplicate. We will be able to filter with shape at import time and/or remove them in bragi.
All Mimirsbrunn's components implement the --help (or -h) argument to explain it's use
There are several components in Mimirsbrunn:
This component imports openstreetmap data into Mimir.
You can get openstreetmap data from http://download.geofabrik.de/
eg:
curl -O http://download.geofabrik.de/europe/france-latest.osm.pbfTo import all those data into Mimir, you only have to do:
./target/release/osm2mimir --input=france-latest.osm.pbf --level=8 --level=9 --import-way --import-admin --import-poi --dataset=france --connection-string=http://localhost:9200level: administrative levels in openstreetmap
This component imports bano's data into Mimir. It is recommanded to run bano integration after osm integration so that addresses are attached to admins.
You can get bano's data from http://bano.openstreetmap.fr/data/
eg:
curl -O http://bano.openstreetmap.fr/data/full.csv.gz
gunzip full.csv.gzTo import all those data into Mimir, you only have to do:
./target/release/bano2mimir -i full.csv --dataset=france --connection-string=http://localhost:9200/The --connection-string argument refers to the ElasticSearch url
This component imports stops into Mimir. It is recommended to run stops integration after osm integration so that stops are attached to admins.
To import all those data into Mimir, you only have to do:
./target/release/stops2mimir -i stops.txt --dataset=idf --connection-string=http://localhost:9200/The --connection-string argument refers to the ElasticSearch url
The stops input file needs to match the either the GTFS specification (https://developers.google.com/transit/gtfs/reference/) or NTFS specification (https://github.com/CanalTP/navitia/blob/dev/documentation/ntfs/ntfs_0.6.md)
Bragi is the webservice built around ElasticSearch. It has been done to hide the ElasticSearch complexity and to return consistent formated response.
Its responses format follow the geocodejson-spec. It's a format used by other geocoding API (https://github.com/addok/addok or https://github.com/komoot/photon).
To run Bragi:
./target/release/bragi --connection-string=http://localhost:9200/muninyou then can call the API (the default Bragi's listening port is 4000):
curl "http://localhost:4000/autocomplete?q=rue+hector+malot"