Skip to content

NYCPlanning/labs-factfinder-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FactFinder API

An express.js api that provides search and selection data for NYC Population FactFinder.

Requirements

You will need the following things properly installed on your computer.

Setting Up a Local Development Environment

  1. Clone this repo https://github.com/NYCPlanning/labs-factfinder-api.git
  2. Ensure you're running the Node version listed in the .nvmrc and have Yarn installed
  3. Install Dependencies yarn
  4. Create a file called .env in the root directory of this repository. This file must contain an environment variable called DATABASE_URL with a postgresql connection string for your database. Copy of the contents ofsample.env into you're newly created .env. Be sure to name this file correctly, as it may need to contain sensitive information and therefore is included in the .gitignore file for this repo. If you add this file and see it show up in your Git diffs, that means you named it incorrectly and should not commit the file you created.
  5. This API relies on a Postgresql database for its data. At this point in the set up, you must decide if you want to use a local instance of the database or if you want to connect to the remote database used by our development environment. If you're only making changes to the API code, you will likely be fine using the development environment database. If you're making changes to the scripts in /migrations or performing database updates, you should start with a local database so that you can work on your changes without affecting any remote environments shared across the team. If you're unsure which approach you should take, ask someone on the team for help.

Setting up a local database instance with docker

You can set up a local instance of the database that this API relies on using Docker. This is helpful for testing changes to the API locally and for making updates to the "migration" scripts themselves. To set up a local instance of the database:

  • Makes sure you have docker installed. If you don't, consider installing it via homebrew.
  • Once you have docker installed, try running docker compose up from the root directory of this repository. The file compose.yaml should spin up a container running the default Postgresql image. If this step is successful, you should see logs indicating that the database is running and ready to accept connections in your terminal. You will need to keep this container running while running the project.
  • Make sure you have locally installed jq and Postgres. Tip: brew install jq and brew install postgresql for Mac users using Homebrew.
  • Double check that you created your .env file and copied over the contents of sample.env. The contents of sample.env are pre-configured to connect to the database created by compose.yaml
  • Run yarn migrate. Running the migrate command kicks off ./bin/migrate node executable, which in turn runs the .sh shell scripts under ./migrations. Running this command will kick off a series of bash scripts that download and datasets and load them into your database. You should see logs in your terminal such as Running ETL command: ./migrations/cli.sh etl --datasource=acs --year=2019 --geography=2010_to_2020 --download --load.

You should see 8 check marks on completion. By the end of the migration, the target Postgres database should be set up with these schemas/tables:

  • support_geoids (table)
  • selection (table)
  • acs (schema)
    • 2010 (table)
    • 2019 (table)
    • metadata (table)
  • decennial (schema)
    • 2010 (table)
    • 2020 (table)
    • metadata (table)

Note - these scripts appear to occasionally time out and fail "silently". If you get all Done! logs but are missing tables in your database, try re-running the script for the missing tables individually. To update just the ACS or decennial data, run yarn migrate acs and yarn migrate decennial respectively.

Connecting your local environment to the Development environment database.

This option is only available to internal DCP developers, as it requires access to sensitive information.

  • Log into our 1PW vault and look for a document called "Factfinder API Development Env"
  • Download that file and replace the contents of your .env file with its contents.
  1. Once you're set up to connect to either your own local database or the development environment, start the api server by running yarn run dev

Architecture

Database/Table Definitions

There are three core set of tables that support the Factfinder API:

  1. Survey tables - The set of tables that hold ACS and Census data by geography and variable.
  2. Selection table - Holds user-defined selection of geographies. See the section Selection geotype Factfinder IDs
  3. Support Geoids table - Holds a mapping of geoid to human-readable label for the geography.

API Endpoints.

/search returns search results for addresses, neighborhoods, etc. Uses geosearch, and carto tables to return autocomplete suggestions.

/selection sets and gets collections of census geometries. Creates or looks up an array of geoids in the support_geoids table.

/survey returns decennial/acs data for a given array of geoids.

Routes

  • /search - gets results that match a string passed in as query parameter q. Returns various search result types:

    • geosearch - autocomplete API results
    • nta - neighborhoods that match the input query
    • tract - tracts that match the input query
    • block - census blocks that match the input query
    • borough -
    • cdta -
    • cd - community districts
  • /selection

    • POST /selection

      • Request payload should be an object with the following properties:
      • Checks if the incoming combination of geoids already exists in the database. If so, returns the hash ID of the selection. If not, it adds it to the database and returns the hash ID.
    • GET /selection/:geotype/:geoid Retrieves geography data for the given geoid and geotype combination

      • Request:
      • Response: an object with the following properties:
        • status - 'success' if found, 'not found' if not found
        • id - the of the selection
        • type - geoid type of the selection, one of 'blocks', 'tracts', 'ntas', 'pumas'
        • features - a geojson FeatureCollection containing a Feature for each selected geometry
    • /survey

      • GET /survey/:survey/:geotype/:geoid?compareTo={:compareTo} Retrieves survey data for the given geoid, geotype, and compareTo combination.

        • Request:
          • :survey - (Required) Either acs or decennial.
          • :geotype - (Required) See geotype section for possible values
          • :geoid - (Required) See geoids
          • ?compareTo - (Essential) The geoid of the geography to compare against. Although this is a queryParam, this is actually crucial for this endpoint. Not an ideal architecture, probably. If not passed, it will default to NYC. But better to pass in an intended geoid. The endpoint will retrieve survey data for the comparison geography specified by this parameter.
        • Response: An array of objects each representing the data for a different survey variable, for the user's selected geometry. Each object contains the variable data for current year, previous year and change over time. For each year, it further slices data by selection (geoid), the comparison, and the difference between selection and comparison. "Slicing" is done prefixes. Example:
          [
            {
              "id": "QUlBTk5I",
              "sum": 0,
              "m": 0,
              "cv": null,
              "percent": 0,
              "variable": "aiannh",
              "variablename": "AIANNH",
              "base": "Pop_2",
              "category": "mutually_exclusive_race_hispanic_origin",
              "survey": "acs",
              "previous_sum": 0,
              "previous_m": 0,
              ...
              "difference_sum": -15017,
              "percent_is_reliable": false
            },
            {
              "id": "QXNuMVJj",
              "sum": 472,
              "m": 157.429349233235,
              "cv": 20.2757906899742,
              "variable": "asn1rc",
              ....
            },
            ...
          ]
        

*Geoids

A geoid in the context of endpoint parameters (see above section) can either be a Selection table id (i.e. of geotype selection), or an id that maps to the geoid column in the Survey tables (e.g. of geotype ntas, boroughs, cdtas, etc).

Geotypes

The Frontend and this API has its own programmatic abstraction for the geotype variable, different from that in the Postgres database. Primarily, geotype in this application can also take on the value of "selection", alongside other geotypes like "ntas", "boroughs", etc. The "selection" value helps indicate to the API whether to look in the Selection Postgres Database table for aggregate geographies.

See the table below for possible programmatic Geotype values, and a mapping of programmatic geotype values to Survey table values.

The programmatic column values are the acceptable arguments into the endpoints above.

Geotype (Programmatic) Geotype (Survey table) Source (which survey tables)
selection* N/A ACS, Census
ntas NTA2020 ACS, Census
tracts CT2020 ACS, Census
cdtas CDTA2020 ACS
ccds CCD2023 Census
??? Census
blocks CB2020 Census
boroughs Boro2020 ACS, Census
city City2020 ACS, Census

Each row within the survey tables have a unique geoid, geotype pair of values.

Selections, Selection geotype and Selection hash Ids

Geoids of geotype selection are ids for a custom, user-defined geographical area held by the Selection table in the Factfinder stack. Each Selection geoid points to one row in the Selection table. Each row holds an array of geoids of other geotypes (like ntas, tracts, cdtas, etc).

Where the data for this app comes from and how to update it

The project has a set of "migration" scripts to set up a Postgres database with required tables and load data. The scripts transfer data from EDM's CSV datasets into the target Postgres database -- whether it be a local database for development or the database for the development, staging, or production environments

Data Sources

The following describes where the raw data loaded into this app is stored. These sources are only accessible by internal DCP developers.

The scripts in the /migrations folder of this repo read data from a set of CSVs built by EDM. Those data are stored in the edm-publishing Space in Digital Ocean under a folder called db-factfinder. Inside that folder, the data are further organized by ACS or Decennial, as well as publication year. Additional metadata is sourced from the db-factfinder repo

Data Updates

Most of the source data is organized into folders based on whether the data is ACS or Decennial and the year of publication, so the migration scripts and the API code rely on a set of constants located in labs-factfinder-api/special-calculations/data/constants.js. To update your local database with new data releases:

  1. Update the years in those constants

  2. Re-run the migrations against the local database, yarn migrate

  3. Commit those changes to develop, staging or production branches, then running the migrate Github Action for the approrpiate environment (develop, staging, or production). Note that you should only have to make changes to those constants when you are doing the work of updating the data itself. If you're just trying to populate a local instance of the database so that you can work on application code, you should be able to leave those constants as-is.

The following process only applies to internal DCP developers To update the develop, staging or production databases with new data releases:

  1. Update the years in those constants
  2. Commit those changes and open a PR for review against the branch associated with the target database (i.e. develop, staging or main)
  3. Run the migrate Github Action with the target database environment under "The database to update"

Backend services

  • Carto - Carto instance with MapPLUTO and other Zoning-related datasets
  • NYC GeoSearch API - Autocomplete search results

Testing and checks

  • ESLint - We use ESLint with Airbnb's rules for JavaScript projects
    • Add an ESLint plugin to your text editor to highlight broken rules while you code
    • You can also run eslint at the command line with the --fix flag to automatically fix some errors.

About

API for for labs-factfinder

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 18