Search app build around Vespa search-engine that indexes some popular programming documentation and blogs.
KodeSearch is build with the help of docker-compose
, there are multiple components in this project. Those are as follows:
- Scrapper: Uses
scrapy
to crawl and scrap content from web pages. - PostgresQL as DB: Used to store and keep track of scrapped files.
- Vespa Engine: Main component that indexes and store the content for user queries.
- Feeder: This components feeds data to Vespa, the data which
scrapper
downloads. - Web - RoR app: Simple web app that provides interface to user for querying Vespa.
- After
scrapper
scrapes web pages, it stores them on the file system. scrapper
stores the file info in the DB, using it as a queue this is used by feeder.feeder
reads file info that are ready to be fed tovespa
.feeder
reads the actual file content.feeder
feeds the data tovespa
- User queries
vespa
viaweb
vespa
returns results to the user query back toweb
web
request scrapped domain list frompostgresql-db
As I previously mentioned this project utalizes docker-compose
so the installation quite easy.
- Clone the project.
cp .env.example .env
- Set the variables inside
.env
docker-compose build
- We will star the containers one-by-one
#1
docker-compose up scrapper -d
#2
docker-compose up vespa -d
# Vespa will take sometime to start.
# run the script that will deploy vespa package to the container
./vespa/deploy-and-start.sh
#3 once vespa is ready to accept data, run the feeder
docker-compose up feeder -d
#4
docker-compose up web -d