Download a website locally without any configuration right from you terminal
Note: The script is based entirely on node-webiste-scraper, an awesome website scraper library :)
- Nodejs version >= 8
npm install -g node-site-downloadernode-site-downloader download DOMAIN START_POINT OUTPUT_FOLDER [VERBOSE] [OUTPUT_FOLDER_SUFFIX] [INCLUDE_IMAGES]# Download all of the english jest documentation
node-site-downloader download -s https://jestjs.io/docs/en/getting-started -d https://jestjs.io/docs/en/ -o jest-docs -v --include-imagesFor more information please run
node-site-downloader --help
node-site-downloader download --helpNow you can run the downloader straight from a docker container. This way there is no need to download nodejs and install node-site-downloader.
Instead please pull the image from dockerhub
docker pull gnird/node-site-downloaderAnd then run the container with all of the relevant options passed to the script (Please check the options section), except for --output-folder.
--output-folder isn't passed to the container because the script saves the site inside of the container.
Instead configure a volume from a folder on your
computer to /data in the container.
docker run -v /some/path:/data ...docker run -v /tmp/mysite:/data gnird/node-site-downloader download -d https://jestjs.io/docs/en/ -s https://jestjs.io/docs/en/getting-started -v NOTICE: The first -v configures the volume for the container and the second -v (at the end of the command) is passed to the script in order to make it verbose.
- domain (-d) - The script will download all of the urls under the specified url.
- start point (-s) - The page from which the script should start scraping
- include-images (--include-images) - Should the script download relevant images as well?
- output folder (--output-folder) - The folder in which the script should save the downloaded assets, Note: The folder should not exist!
- verbose (-v) - If flag is present the script will print every url that was downloaded.
- output folder suffix (--output-folder-suffix) - The suffix that will be added to
OUTPUT_FOLDER, defaults to:.site