ymcui · ymcui · Aug 9, 2023 · Aug 9, 2023 · Aug 9, 2023 · Aug 9, 2023
diff --git a/README.md b/README.md
@@ -20,7 +20,7 @@
 - 🚀 针对Llama-2模型扩充了**新版中文词表**，开源了中文LLaMA-2和Alpaca-2大模型
 - 🚀 开源了预训练脚本、指令精调脚本，用户可根据需要进一步训练模型
 - 🚀 使用个人电脑的CPU/GPU快速在本地进行大模型量化和部署体验
-- 🚀 支持[🤗transformers](https://github.com/huggingface/transformers), [llama.cpp](https://github.com/ggerganov/llama.cpp), [text-generation-webui](https://github.com/oobabooga/text-generation-webui), [LangChain](https://github.com/hwchase17/langchain), [vLLM](https://github.com/vllm-project/vllm)等LLaMA生态
+- 🚀 支持[🤗transformers](https://github.com/huggingface/transformers), [llama.cpp](https://github.com/ggerganov/llama.cpp), [text-generation-webui](https://github.com/oobabooga/text-generation-webui), [LangChain](https://github.com/hwchase17/langchain), [privateGPT](https://github.com/imartinez/privateGPT), [vLLM](https://github.com/vllm-project/vllm)等LLaMA生态
 - 目前已开源的模型：Chinese-LLaMA-2-7B, Chinese-Alpaca-2-7B (更大的模型可先参考[一期项目](https://github.com/ymcui/Chinese-LLaMA-Alpaca))
 
 ![](./pics/screencast.gif)
@@ -133,11 +133,10 @@
 | [**仿OpenAI API调用**](https://platform.openai.com/docs/api-reference) | 仿OpenAI API接口的服务器Demo |  ✅   |  ✅   |  ✅   |  ❌   |  ✅   |  ✅  | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/api_calls_zh) |
 | [**text-generation-webui**](https://github.com/oobabooga/text-generation-webui) | 前端Web UI界面的部署方式 |  ✅   |  ✅   |  ✅   |  ✅   |  ❌   | ❌  | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/text-generation-webui_zh) |
 | [**LangChain**](https://github.com/hwchase17/langchain) | 适合二次开发的大模型应用开源框架 |  ✅<sup>†</sup>  |  ✅   |  ✅<sup>†</sup>   |  ❌   |  ❌   | ❌  | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/langchain_zh) |
+| [**privateGPT**](https://github.com/imartinez/privateGPT) | 基于LangChain的多文档本地问答框架 | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/privategpt_zh) |
 
 <sup>†</sup>: LangChain框架支持，但教程中未实现；详细说明请参考LangChain官方文档。
 
-⚠️ 一代模型相关推理与部署支持将陆续迁移到本项目，届时将同步更新相关教程。
-
 
 ## 系统效果
 

diff --git a/README_EN.md b/README_EN.md
@@ -20,7 +20,7 @@ This project is based on the Llama-2, released by Meta, and it is the second gen
 - 🚀 New extended Chinese vocabulary beyond Llama-2, open-sourcing the Chinese LLaMA-2 and Alpaca-2 LLMs.
 - 🚀 Open-sourced the pre-training and instruction finetuning (SFT) scripts for further tuning on user's data
 - 🚀 Quickly deploy and experience the quantized LLMs on CPU/GPU of personal PC
-- 🚀 Support for LLaMA ecosystems like [🤗transformers](https://github.com/huggingface/transformers), [llama.cpp](https://github.com/ggerganov/llama.cpp), [text-generation-webui](https://github.com/oobabooga/text-generation-webui), [LangChain](https://github.com/hwchase17/langchain), [vLLM](https://github.com/vllm-project/vllm) etc.
+- 🚀 Support for LLaMA ecosystems like [🤗transformers](https://github.com/huggingface/transformers), [llama.cpp](https://github.com/ggerganov/llama.cpp), [text-generation-webui](https://github.com/oobabooga/text-generation-webui), [LangChain](https://github.com/hwchase17/langchain), [privateGPT](https://github.com/imartinez/privateGPT), [vLLM](https://github.com/vllm-project/vllm) etc.
 - The currently open-source models are Chinese-LLaMA-2-7B and Chinese-Alpaca-2-7B (check our [first-gen project](https://github.com/ymcui/Chinese-LLaMA-Alpaca) for more models).
 
 ![](./pics/screencast.gif)
@@ -127,11 +127,10 @@ The models in this project mainly support the following quantization, inference,
 | [**OpenAI API Calls**](https://platform.openai.com/docs/api-reference) | A server that implements OpenAI API |  ✅   |  ✅   |  ✅   |  ❌   |  ✅   |  ✅  | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/api_calls_en) |
 | [**text-generation-webui**](https://github.com/oobabooga/text-generation-webui) | A tool for deploying model as a web UI |  ✅   |  ✅   |  ✅   |  ✅   |  ❌   | ❌  | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/text-generation-webui_en) |
 | [**LangChain**](https://github.com/hwchase17/langchain) | LLM application development framework, suitable for secondary development |  ✅<sup>†</sup>  |  ✅   |  ✅<sup>†</sup>   |  ❌   |  ❌   | ❌  | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/langchain_en) |
+| [**privateGPT**](https://github.com/imartinez/privateGPT) | LangChain-based multi-document QA framework | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/privategpt_en) |
 
 <sup>†</sup>: Supported by LangChain, but not implemented in the tutorial. Please refer to the official LangChain Documentation for details.
 
-⚠️ Inference and deployment support related to the first-generation model will be gradually migrated to this project, and relevant tutorials will be updated later.
-
 ## System Performance
 
 ### Generation Performance Evaluation

diff --git a/scripts/privategpt/README.md b/scripts/privategpt/README.md
@@ -0,0 +1,19 @@
+## privateGPT相关示例脚本
+
+具体使用方法参考：https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/privategpt_zh
+
+Detailed usage: https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/privategpt_en
+
+The following codes are adapted from https://github.com/imartinez/privateGPT/blob/main/privateGPT.py
+
+### privateGPT.py
+
+嵌套Alpaca-2指令模板的主程序入口示例代码。由于第三方库更新频繁，请勿直接使用。建议对照教程自行修改。
+
+Example with Alpaca-2 template. Please do not use this script directly, as third-party library may change over time. Please follow our wiki to adapt to new code.
+
+### privateGPT_refine.py
+
+使用`refine`策略的主程序入口示例代码。
+
+Example that uses `refine` strategy.
diff --git a/scripts/privategpt/privateGPT.py b/scripts/privategpt/privateGPT.py
@@ -0,0 +1,99 @@
+#!/usr/bin/env python3
+from dotenv import load_dotenv
+from langchain.chains import RetrievalQA
+from langchain.embeddings import HuggingFaceEmbeddings
+from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
+from langchain.vectorstores import Chroma
+from langchain.llms import GPT4All, LlamaCpp
+import os
+import argparse
+import time
+
+load_dotenv()
+
+embeddings_model_name = os.environ.get("EMBEDDINGS_MODEL_NAME")
+persist_directory = os.environ.get('PERSIST_DIRECTORY')
+
+model_type = os.environ.get('MODEL_TYPE')
+model_path = os.environ.get('MODEL_PATH')
+model_n_ctx = os.environ.get('MODEL_N_CTX')
+model_n_batch = int(os.environ.get('MODEL_N_BATCH', 8))
+target_source_chunks = int(os.environ.get('TARGET_SOURCE_CHUNKS', 4))
+
+from constants import CHROMA_SETTINGS
+
+def main():
+    # Parse the command line arguments
+    args = parse_arguments()
+    embeddings = HuggingFaceEmbeddings(model_name=embeddings_model_name)
+    db = Chroma(persist_directory=persist_directory, embedding_function=embeddings, client_settings=CHROMA_SETTINGS)
+    retriever = db.as_retriever(search_kwargs={"k": target_source_chunks})
+    # activate/deactivate the streaming StdOut callback for LLMs
+    callbacks = [] if args.mute_stream else [StreamingStdOutCallbackHandler()]
+    # Prepare the LLM
+    match model_type:
+        case "LlamaCpp":
+            llm = LlamaCpp(model_path=model_path, max_tokens=model_n_ctx, n_ctx=model_n_ctx,
+                           n_gpu_layers=1, n_batch=model_n_batch, callbacks=callbacks, n_threads=8, verbose=False)
+        case "GPT4All":
+            llm = GPT4All(model=model_path, max_tokens=model_n_ctx, backend='gptj', n_batch=model_n_batch, callbacks=callbacks, verbose=False)
+        case _default:
+            # raise exception if model_type is not supported
+            raise Exception(f"Model type {model_type} is not supported. Please choose one of the following: LlamaCpp, GPT4All")
+
+    # The followings are specifically designed for Chinese-Alpaca-2
+    # For detailed usage: https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/privategpt_en
+    alpaca2_prompt_template = (
+        "[INST] <<SYS>>\n"
+        "You are a helpful assistant. 你是一个乐于助人的助手。\n"
+        "<</SYS>>\n\n"
+        "{context}\n\n{question} [/INST]"
+    )
+    from langchain import PromptTemplate
+    input_with_prompt = PromptTemplate(template=alpaca2_prompt_template, input_variables=["context", "question"])
+
+    qa = RetrievalQA.from_chain_type(
+        llm=llm, chain_type="stuff", retriever=retriever,
+        return_source_documents= not args.hide_source,
+        chain_type_kwargs={"prompt": input_with_prompt})
+
+    # Interactive questions and answers
+    while True:
+        query = input("\nEnter a query: ")
+        if query == "exit":
+            break
+        if query.strip() == "":
+            continue
+
+        # Get the answer from the chain
+        start = time.time()
+        res = qa(query)
+        answer, docs = res['result'], [] if args.hide_source else res['source_documents']
+        end = time.time()
+
+        # Print the result
+        print("\n\n> Question:")
+        print(query)
+        print(f"\n> Answer (took {round(end - start, 2)} s.):")
+        print(answer)
+
+        # Print the relevant sources used for the answer
+        for document in docs:
+            print("\n> " + document.metadata["source"] + ":")
+            print(document.page_content)
+
+def parse_arguments():
+    parser = argparse.ArgumentParser(description='privateGPT: Ask questions to your documents without an internet connection, '
+                                                 'using the power of LLMs.')
+    parser.add_argument("--hide-source", "-S", action='store_true',
+                        help='Use this flag to disable printing of source documents used for answers.')
+
+    parser.add_argument("--mute-stream", "-M",
+                        action='store_true',
+                        help='Use this flag to disable the streaming StdOut callback for LLMs.')
+
+    return parser.parse_args()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/privategpt/privateGPT_refine.py b/scripts/privategpt/privateGPT_refine.py
@@ -0,0 +1,119 @@
+#!/usr/bin/env python3
+from dotenv import load_dotenv
+from langchain.chains import RetrievalQA
+from langchain.embeddings import HuggingFaceEmbeddings
+from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
+from langchain.vectorstores import Chroma
+from langchain.llms import GPT4All, LlamaCpp
+import os
+import argparse
+import time
+
+load_dotenv()
+
+embeddings_model_name = os.environ.get("EMBEDDINGS_MODEL_NAME")
+persist_directory = os.environ.get('PERSIST_DIRECTORY')
+
+model_type = os.environ.get('MODEL_TYPE')
+model_path = os.environ.get('MODEL_PATH')
+model_n_ctx = os.environ.get('MODEL_N_CTX')
+model_n_batch = int(os.environ.get('MODEL_N_BATCH', 8))
+target_source_chunks = int(os.environ.get('TARGET_SOURCE_CHUNKS', 4))
+
+from constants import CHROMA_SETTINGS
+
+def main():
+    # Parse the command line arguments
+    args = parse_arguments()
+    embeddings = HuggingFaceEmbeddings(model_name=embeddings_model_name)
+    db = Chroma(persist_directory=persist_directory, embedding_function=embeddings, client_settings=CHROMA_SETTINGS)
+    retriever = db.as_retriever(search_kwargs={"k": target_source_chunks})
+    # activate/deactivate the streaming StdOut callback for LLMs
+    callbacks = [] if args.mute_stream else [StreamingStdOutCallbackHandler()]
+    # Prepare the LLM
+    match model_type:
+        case "LlamaCpp":
+            llm = LlamaCpp(model_path=model_path, max_tokens=model_n_ctx, n_ctx=model_n_ctx,
+                           n_gpu_layers=1, n_batch=model_n_batch, callbacks=callbacks, n_threads=8, verbose=False)
+        case "GPT4All":
+            llm = GPT4All(model=model_path, max_tokens=model_n_ctx, backend='gptj', n_batch=model_n_batch, callbacks=callbacks, verbose=False)
+        case _default:
+            # raise exception if model_type is not supported
+            raise Exception(f"Model type {model_type} is not supported. Please choose one of the following: LlamaCpp, GPT4All")
+
+    # The followings are specifically designed for Chinese-Alpaca-2
+    # For detailed usage: https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/privategpt_en
+    alpaca2_refine_prompt_template = (
+        "[INST] <<SYS>>\n"
+        "You are a helpful assistant. 你是一个乐于助人的助手。\n"
+        "<</SYS>>\n\n"
+        "这是原始问题：{question}\n"
+        "已有的回答: {existing_answer}\n"
+        "现在还有一些文字，（如果有需要）你可以根据它们完善现有的回答。"
+        "\n\n{context_str}\n\n"
+        "请根据新的文段，进一步完善你的回答。 [/INST]"
+    )
+
+    alpaca2_initial_prompt_template = (
+        "[INST] <<SYS>>\n"
+        "You are a helpful assistant. 你是一个乐于助人的助手。\n"
+        "<</SYS>>\n\n"
+        "以下为背景知识：\n{context_str}\n"
+        "请根据以上背景知识，回答这个问题：{question} [/INST]"
+    )
+
+    from langchain import PromptTemplate
+    refine_prompt = PromptTemplate(
+        input_variables=["question", "existing_answer", "context_str"],
+        template=alpaca2_refine_prompt_template,
+    )
+    initial_qa_prompt = PromptTemplate(
+        input_variables=["context_str", "question"],
+        template=alpaca2_initial_prompt_template,
+    )
+    chain_type_kwargs = {"question_prompt": initial_qa_prompt, "refine_prompt": refine_prompt}
+    qa = RetrievalQA.from_chain_type(
+        llm=llm, chain_type="refine",
+        retriever=retriever, return_source_documents= not args.hide_source,
+        chain_type_kwargs=chain_type_kwargs)
+
+    # Interactive questions and answers
+    while True:
+        query = input("\nEnter a query: ")
+        if query == "exit":
+            break
+        if query.strip() == "":
+            continue
+
+        # Get the answer from the chain
+        start = time.time()
+        res = qa(query)
+        answer, docs = res['result'], [] if args.hide_source else res['source_documents']
+        end = time.time()
+
+        # Print the result
+        print("\n\n> Question:")
+        print(query)
+        print(f"\n> Answer (took {round(end - start, 2)} s.):")
+        print(answer)
+
+        # Print the relevant sources used for the answer
+        for document in docs:
+            print("\n> " + document.metadata["source"] + ":")
+            print(document.page_content)
+
+def parse_arguments():
+    parser = argparse.ArgumentParser(description='privateGPT: Ask questions to your documents without an internet connection, '
+                                                 'using the power of LLMs.')
+    parser.add_argument("--hide-source", "-S", action='store_true',
+                        help='Use this flag to disable printing of source documents used for answers.')
+
+    parser.add_argument("--mute-stream", "-M",
+                        action='store_true',
+                        help='Use this flag to disable the streaming StdOut callback for LLMs.')
+
+    return parser.parse_args()
+
+
+if __name__ == "__main__":
+    main()