Releases: gustavz/DataChad
Releases · gustavz/DataChad
DataChad V2
This is the cut-off point for DataChad V2
How does it work?
- Upload any
file(s)or enter anypathorurl - The data source is detected and loaded into text documents
- The text documents are embedded using openai embeddings
- The embeddings are stored as a vector dataset to activeloop's database hub
- A langchain is created consisting of a LLM model (
gpt-3.5-turboby default) and the vector store as retriever - When asking questions to the app, the chain embeds the input prompt and does a similarity search of in the vector store and uses the best results as context for the LLM to generate an appropriate response
- Finally the chat history is cached locally to enable a ChatGPT like Q&A conversation
Good to know
- The app only runs on
py>=3.10! - As default context this git repository is taken so you can directly start asking question about its functionality without chosing an own data source.
- To run locally or deploy somewhere, execute
cp .env.template .envand set credentials in the newly created.envfile. Other options are manually setting of system environment variables, or storing them into.streamlit/secrets.tomlwhen hosted via streamlit. - If you have credentials set like explained above, you can just hit
submitin the authentication without reentering your credentials in the app. - Your data won't load? Feel free to open an Issue or PR and contribute!
- Yes, Chad in
DataChadrefers to the well-known meme - DataChad V2 does not support local mode, but many feature will soon come. Stay tuned!
DataChad V1
This is the cut-off point for DataChad V1
How does it work?
- Upload any
file(s)or enter anypathorurl - The data source is detected and loaded into text documents
- The text documents are embedded using openai embeddings
- The embeddings are stored as a vector dataset to activeloop's database hub
- A langchain is created consisting of a LLM model (
gpt-3.5-turboby default) and the vector store as retriever - When asking questions to the app, the chain embeds the input prompt and does a similarity search of in the vector store and uses the best results as context for the LLM to generate an appropriate response
- Finally the chat history is cached locally to enable a ChatGPT like Q&A conversation
Good to know
- The app only runs on
py>=3.10! - As default context this git repository is taken so you can directly start asking question about its functionality without chosing an own data source.
- To run locally or deploy somewhere, execute
cp .env.template .envand set credentials in the newly created.envfile. Other options are manually setting of system environment variables, or storing them into.streamlit/secrets.tomlwhen hosted via streamlit. - If you have credentials set like explained above, you can just hit
submitin the authentication without reentering your credentials in the app. - To enable
Local Mode(disabled for the demo) setENABLE_LOCAL_MODEtoTrueindatachad/constants.py. You need to have the model binaries downloaded and stored inside./models/ - Currently supported
Local ModeOSS model is GPT4all. To add more models updatedatachad/models.py - If you are running
Local Modeall your data stays locally on your machine. No API calls are made. Same with the embeddings database which stores its data to./data/ - Your data won't load? Feel free to open an Issue or PR and contribute!
- Yes, Chad in
DataChadrefers to the well-known meme