DaveCoDev · DavidKoleczek · Jan 25, 2025 · Jan 11, 2025 · Jan 11, 2025 · Jan 12, 2025
diff --git a/.github/_typos.toml b/.github/_typos.toml
@@ -1,2 +1,5 @@
 [default.extend-identifiers]
-arange = "arange" # np.arange
+arange = "arange" # np.arange
+
+[files]
+extend-exclude = ["*.ipynb"]
diff --git a/.github/workflows/python.yml b/.github/workflows/python.yml
@@ -34,7 +34,7 @@ jobs:
         env:
           OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
           OPENAI_ORG_ID: ${{ secrets.OPENAI_ORG_ID }}
-          SKIP_TESTS_NAAI: "tests/llm tests/local_llm tests/data"
+          SKIP_TESTS_NAAI: "tests/llm tests/data"
         run: poetry run nox -s test-${{ matrix.python-version }}
   quality:
     runs-on: ubuntu-22.04

diff --git a/.vscode/settings.json b/.vscode/settings.json
@@ -17,6 +17,6 @@
     },
     "python.testing.unittestEnabled": false,
     "python.testing.pytestEnabled": true,
-    "python.testing.pytestArgs": [],
+    "python.testing.pytestArgs": ["-s"],
     "markdown.extension.orderedList.marker": "one",
 }
diff --git a/README.md b/README.md
@@ -24,11 +24,9 @@ Requires: Python 3.11, or 3.12
 Install the entire package from [PyPI](https://pypi.org/project/not-again-ai/) with: 
 
 ```bash
-$ pip install not_again_ai[llm,local_llm,statistics,viz]
+$ pip install not_again_ai[data,llm,statistics,viz]
 ```
 
-Note that local LLM requires separate installations and will not work out of the box due to how hardware dependent it is. Be sure to check the [notebooks](notebooks/local_llm/) for more details.
-
 The package is split into subpackages, so you can install only the parts you need.
 
 ### Base
@@ -49,16 +47,7 @@ The package is split into subpackages, so you can install only the parts you nee
    1. Using AOAI requires using Entra ID authentication. See https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/managed-identity for how to set this up for your AOAI deployment.
       * Requires the correct role assigned to your user account and being signed into the Azure CLI.
    1. (Optional) Set the `AZURE_OPENAI_ENDPOINT` environment variable.
-1. Setup GitHub Models
-   1. Get a Personal Access Token from https://github.com/settings/tokens and set the `GITHUB_TOKEN` environment variable. The token does not need any permissions.
-   1. Check the [Github Marketplace](https://github.com/marketplace/models) to see which models are available.
-
-
-### Local LLM
- 1. `pip install not_again_ai[llm,local_llm]`
- 1. Some HuggingFace transformers tokenizers are gated behind access requests. If you wish to use these, you will need to request access from HuggingFace on the model card.
-    * Then set the `HF_TOKEN` environment variable to your HuggingFace API token which can be found here: https://huggingface.co/settings/tokens
- 1. If you wish to use Ollama:
+1. If you wish to use Ollama:
      1. Follow the instructions at https://github.com/ollama/ollama to install Ollama for your system. 
      1. (Optional) [Add Ollama as a startup service (recommended)](https://github.com/ollama/ollama/blob/main/docs/linux.md#adding-ollama-as-a-startup-service-recommended)
      1. (Optional) To make the Ollama service accessible on your local network from a Linux server, add the following to the `/etc/systemd/system/ollama.service` file which will make Ollama available at `http://<local_address>:11434`:
@@ -68,7 +57,6 @@ The package is split into subpackages, so you can install only the parts you nee
          Environment="OLLAMA_HOST=0.0.0.0"
          ```
      1. It is recommended to always have the latest version of Ollama. To update Ollama check the [docs](https://github.com/ollama/ollama/blob/main/docs/). The command for Linux is: `curl -fsSL https://ollama.com/install.sh | sh`
- 1. HuggingFace transformers and other requirements are hardware dependent so for providers other than Ollama, this only installs some generic dependencies. Check the [notebooks](notebooks/local_llm/) for more details on what is available and how to install it.
 
 
 ### Statistics

diff --git a/notebooks/llm/01_openai_chat_completion.ipynb b/notebooks/llm/01_openai_chat_completion.ipynb
@@ -6,7 +6,7 @@
    "source": [
     "# Using OpenAI Chat Completions\n",
     "\n",
-    "This notebook covers how to use the Chat Completions API and other features such as creating prompts and function calling."
+    "This notebook covers how to use the Chat Completions API and other features such as creating prompts and function calling.\n"
    ]
   },
   {
@@ -22,11 +22,11 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 1,
    "metadata": {},
    "outputs": [],
    "source": [
-    "from not_again_ai.llm.openai_api.openai_client import openai_client\n",
+    "from not_again_ai.llm.chat_completion.providers.openai_api import openai_client\n",
     "\n",
     "client = openai_client()"
    ]
@@ -37,14 +37,14 @@
    "source": [
     "## Basic Chat Completion\n",
     "\n",
-    "The `chat_completion` function is an easy way to get responses from OpenAI models. \n",
-    "It requires the prompt to the model to be formatted in the chat completion format, \n",
-    "see the [API reference](https://platform.openai.com/docs/api-reference/chat/create) for more details."
+    "The `chat_completion` function is an easy way to get responses from OpenAI models.\n",
+    "It requires the prompt to the model to be formatted in the chat completion format,\n",
+    "see the [API reference](https://platform.openai.com/docs/api-reference/chat/create) for more details.\n"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 2,
    "metadata": {},
    "outputs": [
     {
@@ -53,18 +53,27 @@
        "'Hello! How can I assist you today?'"
       ]
      },
-     "execution_count": 7,
+     "execution_count": 2,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "from not_again_ai.llm.openai_api.chat_completion import chat_completion\n",
+    "from not_again_ai.llm.chat_completion import chat_completion\n",
+    "from not_again_ai.llm.chat_completion.types import ChatCompletionRequest, SystemMessage, UserMessage\n",
     "\n",
-    "messages = [{\"role\": \"system\", \"content\": \"You are a helpful assistant.\"}, {\"role\": \"user\", \"content\": \"Hello!\"}]\n",
-    "response = chat_completion(messages=messages, model=\"gpt-4o-mini-2024-07-18\", max_tokens=100, client=client)\n",
+    "messages = [\n",
+    "    SystemMessage(content=\"You are a helpful assistant.\"),\n",
+    "    UserMessage(content=\"Hello!\"),\n",
+    "]\n",
+    "request = ChatCompletionRequest(\n",
+    "    messages=messages,\n",
+    "    model=\"gpt-4o-mini-2024-07-18\",\n",
+    "    max_completion_tokens=100,\n",
+    ")\n",
+    "response = chat_completion(request, \"openai\", client)\n",
     "\n",
-    "response[\"message\"]"
+    "response.choices[0].message.content"
    ]
   },
   {
@@ -75,50 +84,46 @@
     "\n",
     "Injecting variables into prompts is a common task and we provide the `chat_prompt` which uses [Liquid templating](https://jg-rp.github.io/liquid/).\n",
     "\n",
-    "In the `messages_unformatted` argument, the \"content\" field can be a [Python Liquid](https://jg-rp.github.io/liquid/introduction/getting-started) template string to allow for more dynamic prompts which not only supports variable injection, but also conditional logic, loops, and comments.\n"
+    "In the `messages` argument, the \"content\" field can be a [Python Liquid](https://jg-rp.github.io/liquid/introduction/getting-started) template string to allow for more dynamic prompts which not only supports variable injection, but also conditional logic, loops, and comments.\n"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 3,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "[{'role': 'system',\n",
-       "  'content': '- You are a helpful assistant trying to extract places that occur in a given text.\\n- You must identify all the places in the text and return them in a list like this: [\"place1\", \"place2\", \"place3\"].'},\n",
-       " {'role': 'user',\n",
-       "  'content': 'Here is the text I want you to extract places from:\\nI went to Paris and Berlin.'}]"
+       "[SystemMessage(content='- You are a helpful assistant trying to extract places that occur in a given text.\\n- You must identify all the places in the text and return them in a list like this: [\"place1\", \"place2\", \"place3\"].', role=<Role.SYSTEM: 'system'>, name=None),\n",
+       " UserMessage(content='Here is the text I want you to extract places from:\\nI went to Paris and Berlin.', role=<Role.USER: 'user'>, name=None)]"
       ]
      },
-     "execution_count": 8,
+     "execution_count": 3,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "from not_again_ai.llm.openai_api.prompts import chat_prompt\n",
+    "from not_again_ai.llm.prompting.compile_messages import compile_messages\n",
     "\n",
     "place_extraction_prompt = [\n",
-    "    {\n",
-    "        \"role\": \"system\",\n",
-    "        \"content\": \"\"\"- You are a helpful assistant trying to extract places that occur in a given text.\n",
-    "- You must identify all the places in the text and return them in a list like this: [\"place1\", \"place2\", \"place3\"].\"\"\",\n",
-    "    },\n",
-    "    {\n",
-    "        \"role\": \"user\",\n",
-    "        \"content\": \"\"\"Here is the text I want you to extract places from:\n",
+    "    SystemMessage(\n",
+    "        content=\"\"\"- You are a helpful assistant trying to extract places that occur in a given text.\n",
+    "- You must identify all the places in the text and return them in a list like this: [\"place1\", \"place2\", \"place3\"].\"\"\"\n",
+    "    ),\n",
+    "    UserMessage(\n",
+    "        content=\"\"\"Here is the text I want you to extract places from:\n",
     "{%- # The user's input text goes below %}\n",
     "{{text}}\"\"\",\n",
-    "    },\n",
+    "    ),\n",
     "]\n",
     "\n",
     "variables = {\n",
     "    \"text\": \"I went to Paris and Berlin.\",\n",
     "}\n",
     "\n",
-    "messages = chat_prompt(messages_unformatted=place_extraction_prompt, variables=variables)\n",
+    "messages = compile_messages(messages=place_extraction_prompt, variables=variables)\n",
     "messages"
    ]
   },
@@ -132,12 +137,12 @@
     "\n",
     "We explicitly require a tokenizer since loading it has some overhead, so we want to avoid doing so many times for certain use cases.\n",
     "\n",
-    "NOTE: This function not support counting tokens used by function calling."
+    "NOTE: This function not support counting tokens used by function calling.\n"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 4,
    "metadata": {},
    "outputs": [
     {
@@ -149,10 +154,10 @@
     }
    ],
    "source": [
-    "from not_again_ai.llm.openai_api.tokens import load_tokenizer, num_tokens_from_messages\n",
+    "from not_again_ai.llm.prompting.providers.openai_tiktoken import TokenizerOpenAI\n",
     "\n",
-    "tokenizer = load_tokenizer(model=\"gpt-4o-2024-05-13\")\n",
-    "num_tokens = num_tokens_from_messages(messages=messages, tokenizer=tokenizer, model=\"gpt-4o-mini-2024-07-18\")\n",
+    "tokenizer = TokenizerOpenAI(model=\"gpt-4o-mini-2024-07-18\")\n",
+    "num_tokens = tokenizer.num_tokens_in_messages(messages=messages)\n",
     "print(num_tokens)"
    ]
   },
@@ -161,34 +166,24 @@
    "metadata": {},
    "source": [
     "## Chat Completion with Function Calling and other Parameters\n",
+    "\n",
     "The `chat_completion` function can also be used to call functions in the prompt and a myriad of other commonly used parameters like temperature, max_tokens, and logprobs. See the docstring for more details.\n",
     "\n",
-    "See the [gpt-4-v.ipynb](gpt-4-v.ipynb) for full details on how to use the vision features of `chat_completion` and `chat_prompt`."
+    "See the [gpt-4-v.ipynb](gpt-4-v.ipynb) for full details on how to use the vision features of `chat_completion` and `chat_prompt`.\n"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 5,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "{'choices': [{'finish_reason': 'stop',\n",
-       "   'message': None,\n",
-       "   'tool_names': ['get_current_weather'],\n",
-       "   'tool_args_list': [{'location': 'Boston, MA', 'format': 'fahrenheit'}]},\n",
-       "  {'finish_reason': 'stop',\n",
-       "   'message': None,\n",
-       "   'tool_names': ['get_current_weather'],\n",
-       "   'tool_args_list': [{'location': 'Boston, MA', 'format': 'fahrenheit'}]}],\n",
-       " 'completion_tokens': 40,\n",
-       " 'prompt_tokens': 101,\n",
-       " 'system_fingerprint': 'fp_611b667b19',\n",
-       " 'response_duration': 0.786}"
+       "ChatCompletionResponse(choices=[ChatCompletionChoice(message=AssistantMessage(content='', role=<Role.ASSISTANT: 'assistant'>, name=None, refusal=None, tool_calls=[ToolCall(id='call_2moHfKov3UxMINi9umf6zod1', function=Function(name='get_current_weather', arguments={'location': 'Boston, MA', 'format': 'fahrenheit'}), type='function')]), finish_reason='tool_calls', json_message=None, logprobs=None, extras={}), ChatCompletionChoice(message=AssistantMessage(content='', role=<Role.ASSISTANT: 'assistant'>, name=None, refusal=None, tool_calls=[ToolCall(id='call_LZA4NljGxgSZ6dGc4hdO9hBA', function=Function(name='get_current_weather', arguments={'location': 'Boston, MA', 'format': 'fahrenheit'}), type='function')]), finish_reason='tool_calls', json_message=None, logprobs=None, extras={})], errors='', completion_tokens=46, prompt_tokens=99, completion_detailed_tokens=None, prompt_detailed_tokens=None, response_duration=4.4759, system_fingerprint='fp_bd83329f63', extras={'prompt_filter_results': None})"
       ]
      },
-     "execution_count": 10,
+     "execution_count": 5,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -221,37 +216,31 @@
     "]\n",
     "# Ask the model to call the function\n",
     "messages = [\n",
-    "    {\n",
-    "        \"role\": \"user\",\n",
-    "        \"content\": \"What's the current weather like in {{ city_state }} today? Call the get_current_weather function.\",\n",
-    "    }\n",
+    "    UserMessage(\n",
+    "        content=\"What's the current weather like in {{ city_state }} today? Call the get_current_weather function.\",\n",
+    "    )\n",
     "]\n",
     "\n",
-    "messages = chat_prompt(messages_unformatted=messages, variables={\"city_state\": \"Boston, MA\"})\n",
+    "messages = compile_messages(messages=messages, variables={\"city_state\": \"Boston, MA\"})\n",
     "\n",
     "client = openai_client()\n",
     "\n",
-    "response = chat_completion(\n",
+    "request = ChatCompletionRequest(\n",
     "    messages=messages,\n",
     "    model=\"gpt-4o-mini-2024-07-18\",\n",
     "    client=client,\n",
     "    tools=tools,\n",
     "    tool_choice=\"required\",  # Force the model to use the tool\n",
-    "    max_tokens=300,\n",
+    "    max_completion_tokens=300,\n",
     "    temperature=0,\n",
-    "    logprobs=(True, 2),  # logprobs=(True, 2) returns the log probabilities of the top 2 tokens\n",
+    "    log_probs=True,\n",
+    "    top_log_probs=2,  # returns the log probabilities of the top 2 tokens\n",
     "    seed=42,  # Set the seed for reproducibility. The API will also return a `system_fingerprint` field to monitor changes in the backend.\n",
     "    n=2,  # Generate 2 completions at once\n",
     ")\n",
+    "response = chat_completion(request, \"openai\", client)\n",
     "response"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {