Skip to content

Conversation

sladinji
Copy link
Contributor

@sladinji sladinji commented Sep 5, 2025

Introduce a centralized Jinja2 prompt system with multi-language capabilities, including English and French templates. Enhance configuration options for prompts directory and language selection, and update documentation to reflect these changes. Refactor existing prompt handling to utilize the new template-based approach.

@sladinji sladinji self-assigned this Sep 5, 2025
@leoguillaume leoguillaume linked an issue Sep 5, 2025 that may be closed by this pull request
4 tasks
@sladinji sladinji force-pushed the Add-Prompt-Management-System-with-Jinja2-Templates-#408 branch from a438fda to 542f54c Compare September 8, 2025 15:47
else:
logger.debug("Prompt overrides directory does not exist: %s", self.overrides_dir)
search_paths.append(self.internal_dir)
self.env = Environment(loader=FileSystemLoader(search_paths), autoescape=False, trim_blocks=True, lstrip_blocks=True)

Check warning

Code scanning / CodeQL

Jinja2 templating with autoescape=False Medium

Using jinja2 templates with autoescape=False can potentially allow XSS attacks.

Copilot Autofix

AI 6 days ago

To fix this vulnerability, autoescaping should be enabled when constructing the Jinja2 Environment. The recommended and most robust solution is to use select_autoescape, which will automatically enable escaping for templates that are commonly rendered as HTML or XML (files ending in .html, .htm, .xml, etc.)—and possibly leave others, like pure text templates, unescaped. However, if you know that all of your templates are either HTML or XML, you can use autoescape=True. To maximize safety and future flexibility, replacing autoescape=False with autoescape=select_autoescape(['html', 'xml', 'j2']) ensures anything with .j2 (or the relevant extensions you use) is escaped. Since your templates use the .j2 extension, including 'j2' in the extension list is prudent. This does not change any application logic and retains existing behavior, with the reliability benefit of escaping.

The only changes needed are:

  • Import the select_autoescape method from jinja2 (to allow its use).
  • Replace the autoescape=False argument in the Environment instantiation with autoescape=select_autoescape(['j2', 'html', 'xml']).

These changes need to be made in api/utils/prompt_loader.py—specifically, to the import statement and the environment construction.


Suggested changeset 1
api/utils/prompt_loader.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/api/utils/prompt_loader.py b/api/utils/prompt_loader.py
--- a/api/utils/prompt_loader.py
+++ b/api/utils/prompt_loader.py
@@ -3,7 +3,7 @@
 from functools import lru_cache
 from typing import Any, Iterator, List, Optional
 
-from jinja2 import Environment, FileSystemLoader, TemplateNotFound
+from jinja2 import Environment, FileSystemLoader, TemplateNotFound, select_autoescape
 
 from api.utils.configuration import get_configuration
 
@@ -53,7 +53,12 @@
         else:
             logger.debug("Prompt overrides directory does not exist: %s", self.overrides_dir)
         search_paths.append(self.internal_dir)
-        self.env = Environment(loader=FileSystemLoader(search_paths), autoescape=False, trim_blocks=True, lstrip_blocks=True)
+        self.env = Environment(
+            loader=FileSystemLoader(search_paths),
+            autoescape=select_autoescape(['j2', 'html', 'xml']),
+            trim_blocks=True,
+            lstrip_blocks=True,
+        )
 
         # cache of resolved template name lists per module key (ordered)
         self._module_template_cache: dict[Optional[str], List[str]] = {}
EOF
@@ -3,7 +3,7 @@
from functools import lru_cache
from typing import Any, Iterator, List, Optional

from jinja2 import Environment, FileSystemLoader, TemplateNotFound
from jinja2 import Environment, FileSystemLoader, TemplateNotFound, select_autoescape

from api.utils.configuration import get_configuration

@@ -53,7 +53,12 @@
else:
logger.debug("Prompt overrides directory does not exist: %s", self.overrides_dir)
search_paths.append(self.internal_dir)
self.env = Environment(loader=FileSystemLoader(search_paths), autoescape=False, trim_blocks=True, lstrip_blocks=True)
self.env = Environment(
loader=FileSystemLoader(search_paths),
autoescape=select_autoescape(['j2', 'html', 'xml']),
trim_blocks=True,
lstrip_blocks=True,
)

# cache of resolved template name lists per module key (ordered)
self._module_template_cache: dict[Optional[str], List[str]] = {}
Copilot is powered by AI and may make mistakes. Always verify output.
if not base or not os.path.isdir(base):
continue
path = os.path.join(base, cand)
if os.path.isfile(path) and (base, cand) not in found:

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI 6 days ago

To mitigate path traversal, the user-controlled language value (as well as related inputs derived from it) must be validated before constructing candidate filenames. The best fix, given filenames are supposed to follow the format <lang>.j2 or <module>.<lang>.j2, is to allow only valid language codes (e.g. two-letter alphabetical lowercase strings or an explicitly allow-listed set) for the filename position.

The recommended approach:

  • Validate language strictly in the PromptRenderer initializer. Accept only language codes matching a regex such as ^[a-z]{2}$ or a set of known codes.
  • If the value does not match, either default to the fallback DEFAULT_LANGUAGE or raise an exception.
  • Additionally, in _candidate_templates, ensure that only valid candidate filenames get constructed (derived from a safe language string).
  • This validation can be encapsulated in a helper function, and performed immediately on initialization.

Files to edit: api/utils/prompt_loader.py (in the constructor and associated candidate template logic).
Dependencies: For regex, Python's builtin re is sufficient.


Suggested changeset 1
api/utils/prompt_loader.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/api/utils/prompt_loader.py b/api/utils/prompt_loader.py
--- a/api/utils/prompt_loader.py
+++ b/api/utils/prompt_loader.py
@@ -1,5 +1,6 @@
 import logging
 import os
+import re
 from functools import lru_cache
 from typing import Any, Iterator, List, Optional
 
@@ -42,7 +43,11 @@
         # read dynamic configuration if available
         self.overrides_dir = overrides_dir or getattr(settings, "prompts_dir", "/prompts")
         # prompts_lang is stored as a language code (eg 'en', 'fr') without the .j2 extension
-        self.language = language or getattr(settings, "prompts_lang", DEFAULT_LANGUAGE)
+        selected_lang = language or getattr(settings, "prompts_lang", DEFAULT_LANGUAGE)
+        if not re.fullmatch(r"^[a-z]{2}$", str(selected_lang)):
+            logger.warning(f"Invalid language code '{selected_lang}' provided, falling back to default: {DEFAULT_LANGUAGE}")
+            selected_lang = DEFAULT_LANGUAGE
+        self.language = selected_lang
 
         self.internal_dir = DEFAULT_PROMPTS_RELATIVE_DIR
 
EOF
@@ -1,5 +1,6 @@
import logging
import os
import re
from functools import lru_cache
from typing import Any, Iterator, List, Optional

@@ -42,7 +43,11 @@
# read dynamic configuration if available
self.overrides_dir = overrides_dir or getattr(settings, "prompts_dir", "/prompts")
# prompts_lang is stored as a language code (eg 'en', 'fr') without the .j2 extension
self.language = language or getattr(settings, "prompts_lang", DEFAULT_LANGUAGE)
selected_lang = language or getattr(settings, "prompts_lang", DEFAULT_LANGUAGE)
if not re.fullmatch(r"^[a-z]{2}$", str(selected_lang)):
logger.warning(f"Invalid language code '{selected_lang}' provided, falling back to default: {DEFAULT_LANGUAGE}")
selected_lang = DEFAULT_LANGUAGE
self.language = selected_lang

self.internal_dir = DEFAULT_PROMPTS_RELATIVE_DIR

Copilot is powered by AI and may make mistakes. Always verify output.
# if no exact candidate matched, try to use any internal templates matching the language
logger.debug(
"No exact candidate template found; searching internal dir for any *.%s.j2 files",
self.language,

Check failure

Code scanning / CodeQL

Log Injection High

This log entry depends on a
user-provided value
.

Copilot Autofix

AI 6 days ago

To mitigate log injection, user-provided values that are included in log messages should be sanitized. For plain-text logs, the most important step is to strip or replace newline and carriage return characters from the language value before logging. The simplest fix is to wrap the self.language reference in a helper that removes problematic characters (e.g., replacing \n and \r with an empty string) in all logging calls where user-provided language codes could end up in log output. Only the logging on line 106 needs to be changed, as subsequent uses do not directly log the value.

The fix should focus only on the code shown in api/utils/prompt_loader.py. A good approach is to use a local variable (e.g., safe_language) constructed by sanitizing self.language just before logging, and using that variable in place of self.language in the log statement.

No new imports are required, as only native string replacement (str.replace) is needed.


Suggested changeset 1
api/utils/prompt_loader.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/api/utils/prompt_loader.py b/api/utils/prompt_loader.py
--- a/api/utils/prompt_loader.py
+++ b/api/utils/prompt_loader.py
@@ -101,9 +101,10 @@
 
         if not found:
             # if no exact candidate matched, try to use any internal templates matching the language
+            safe_language = str(self.language).replace("\n", "").replace("\r", "")
             logger.debug(
                 "No exact candidate template found; searching internal dir for any *.%s.j2 files",
-                self.language,
+                safe_language,
             )
             try:
                 internal_files = [f for f in os.listdir(self.internal_dir) if f.endswith(f".{self.language}.j2")]
EOF
@@ -101,9 +101,10 @@

if not found:
# if no exact candidate matched, try to use any internal templates matching the language
safe_language = str(self.language).replace("\n", "").replace("\r", "")
logger.debug(
"No exact candidate template found; searching internal dir for any *.%s.j2 files",
self.language,
safe_language,
)
try:
internal_files = [f for f in os.listdir(self.internal_dir) if f.endswith(f".{self.language}.j2")]
Copilot is powered by AI and may make mistakes. Always verify output.
found = [(self.internal_dir, f) for f in internal_files]
else:
# final fallback to default language file name (will be looked up in env paths)
logger.error("No prompt template found; tried: %s", ", ".join(candidates))

Check failure

Code scanning / CodeQL

Log Injection High

This log entry depends on a
user-provided value
.

Copilot Autofix

AI 6 days ago

The best way to fix this problem is to sanitize any user-controlled input that is written to log files. In this specific case, before logging the candidate filenames joined together, each candidate should be sanitized to remove potentially malicious characters such as carriage returns and line feeds (\r, \n). This can be done by mapping a sanitize function over the list of candidate filenames before joining them for logging.

The only file that requires editing is api/utils/prompt_loader.py, specifically in the _resolve_template_for_module method where the log uses ", ".join(candidates). To implement this:

  • Define a helper function (locally or as a static/class method) to sanitize user-supplied or user-influenced strings before logging.
  • Use this function to clean all candidate strings before joining and logging them.

No external dependencies need to be added; built-in Python string operations are sufficient.


Suggested changeset 1
api/utils/prompt_loader.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/api/utils/prompt_loader.py b/api/utils/prompt_loader.py
--- a/api/utils/prompt_loader.py
+++ b/api/utils/prompt_loader.py
@@ -85,6 +85,12 @@
             yield from self._module_template_pairs_cache[module]
             return
 
+        def _sanitize_for_log(s: str) -> str:
+            # Remove CR and LF characters and indicate user origin
+            sanitized = s.replace('\r', '').replace('\n', '')
+            return sanitized
+
+
         search_paths = [self.overrides_dir, self.internal_dir]
         candidates = self._candidate_templates(module)
         found: list[tuple[str, str]] = []
@@ -114,7 +120,7 @@
                 found = [(self.internal_dir, f) for f in internal_files]
             else:
                 # final fallback to default language file name (will be looked up in env paths)
-                logger.error("No prompt template found; tried: %s", ", ".join(candidates))
+                logger.error("No prompt template found; tried: %s", ", ".join(_sanitize_for_log(c) for c in candidates))
                 found = [(self.internal_dir, f"{DEFAULT_LANGUAGE}.j2")]
 
         # cache relative template names for introspection but keep pairs for loading
EOF
@@ -85,6 +85,12 @@
yield from self._module_template_pairs_cache[module]
return

def _sanitize_for_log(s: str) -> str:
# Remove CR and LF characters and indicate user origin
sanitized = s.replace('\r', '').replace('\n', '')
return sanitized


search_paths = [self.overrides_dir, self.internal_dir]
candidates = self._candidate_templates(module)
found: list[tuple[str, str]] = []
@@ -114,7 +120,7 @@
found = [(self.internal_dir, f) for f in internal_files]
else:
# final fallback to default language file name (will be looked up in env paths)
logger.error("No prompt template found; tried: %s", ", ".join(candidates))
logger.error("No prompt template found; tried: %s", ", ".join(_sanitize_for_log(c) for c in candidates))
found = [(self.internal_dir, f"{DEFAULT_LANGUAGE}.j2")]

# cache relative template names for introspection but keep pairs for loading
Copilot is powered by AI and may make mistakes. Always verify output.
# Load template from a specific base directory to allow loading internal
# templates even when the same filename exists in an overrides dir.
try:
env = Environment(loader=FileSystemLoader([base]), autoescape=False, trim_blocks=True, lstrip_blocks=True)

Check warning

Code scanning / CodeQL

Jinja2 templating with autoescape=False Medium

Using jinja2 templates with autoescape=False can potentially allow XSS attacks.

Copilot Autofix

AI 6 days ago

To fix this issue, we should instantiate the Jinja2 Environment with autoescape enabled in a way that is correct for what is being templated. If the templates might generate HTML or XML, use select_autoescape(['html', 'xml'])—this is Jinja2's best practice. This approach will enable autoescaping for HTML/XML templates but not for others (like plain .txt files). If all templates are, for example, .j2 files but exclusively render non-HTML, the more precise control is to explicitly enable or disable autoescape based on downstream needs. The safest general fix is to replace autoescape=False with autoescape=select_autoescape(['html', 'xml']).

Therefore, in api/utils/prompt_loader.py, replace the instantiation:

env = Environment(loader=FileSystemLoader([base]), autoescape=False, trim_blocks=True, lstrip_blocks=True)

with

env = Environment(
    loader=FileSystemLoader([base]),
    autoescape=select_autoescape(['html', 'xml']),
    trim_blocks=True,
    lstrip_blocks=True,
)

and add an import for select_autoescape from jinja2 at the top of the file.

Suggested changeset 1
api/utils/prompt_loader.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/api/utils/prompt_loader.py b/api/utils/prompt_loader.py
--- a/api/utils/prompt_loader.py
+++ b/api/utils/prompt_loader.py
@@ -3,7 +3,7 @@
 from functools import lru_cache
 from typing import Any, Iterator, List, Optional
 
-from jinja2 import Environment, FileSystemLoader, TemplateNotFound
+from jinja2 import Environment, FileSystemLoader, TemplateNotFound, select_autoescape
 
 from api.utils.configuration import get_configuration
 
@@ -128,7 +128,12 @@
         # Load template from a specific base directory to allow loading internal
         # templates even when the same filename exists in an overrides dir.
         try:
-            env = Environment(loader=FileSystemLoader([base]), autoescape=False, trim_blocks=True, lstrip_blocks=True)
+            env = Environment(
+                loader=FileSystemLoader([base]),
+                autoescape=select_autoescape(['html', 'xml']),
+                trim_blocks=True,
+                lstrip_blocks=True
+            )
             return env.get_template(template_name)
         except TemplateNotFound as e:
             raise FileNotFoundError(f"Prompt template file '{template_name}' not found in base '{base}'.") from e
EOF
@@ -3,7 +3,7 @@
from functools import lru_cache
from typing import Any, Iterator, List, Optional

from jinja2 import Environment, FileSystemLoader, TemplateNotFound
from jinja2 import Environment, FileSystemLoader, TemplateNotFound, select_autoescape

from api.utils.configuration import get_configuration

@@ -128,7 +128,12 @@
# Load template from a specific base directory to allow loading internal
# templates even when the same filename exists in an overrides dir.
try:
env = Environment(loader=FileSystemLoader([base]), autoescape=False, trim_blocks=True, lstrip_blocks=True)
env = Environment(
loader=FileSystemLoader([base]),
autoescape=select_autoescape(['html', 'xml']),
trim_blocks=True,
lstrip_blocks=True
)
return env.get_template(template_name)
except TemplateNotFound as e:
raise FileNotFoundError(f"Prompt template file '{template_name}' not found in base '{base}'.") from e
Copilot is powered by AI and may make mistakes. Always verify output.


# revision identifiers, used by Alembic.
revision: str = "095feb42bc54"

Check notice

Code scanning / CodeQL

Unused global variable Note

The global variable 'revision' is not used.

Copilot Autofix

AI 5 days ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.


# revision identifiers, used by Alembic.
revision: str = "095feb42bc54"
down_revision: Union[str, None] = "479aeeae940b"

Check notice

Code scanning / CodeQL

Unused global variable Note

The global variable 'down_revision' is not used.

Copilot Autofix

AI 5 days ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

# revision identifiers, used by Alembic.
revision: str = "095feb42bc54"
down_revision: Union[str, None] = "479aeeae940b"
branch_labels: Union[str, Sequence[str], None] = None

Check notice

Code scanning / CodeQL

Unused global variable Note

The global variable 'branch_labels' is not used.

Copilot Autofix

AI 5 days ago

To fix the issue, delete the assignment to the unused global variable branch_labels in api/alembic/versions/2025_09_10_1143-095feb42bc54_user_name_optional.py. Make sure not to remove any right-hand side code that has side effects, but since the assignment is simply None, safe to fully remove the line. No other changes are necessary because Alembic does not require this variable to be present unless the migration is part of a branch.


Suggested changeset 1
api/alembic/versions/2025_09_10_1143-095feb42bc54_user_name_optional.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/api/alembic/versions/2025_09_10_1143-095feb42bc54_user_name_optional.py b/api/alembic/versions/2025_09_10_1143-095feb42bc54_user_name_optional.py
--- a/api/alembic/versions/2025_09_10_1143-095feb42bc54_user_name_optional.py
+++ b/api/alembic/versions/2025_09_10_1143-095feb42bc54_user_name_optional.py
@@ -15,7 +15,6 @@
 # revision identifiers, used by Alembic.
 revision: str = "095feb42bc54"
 down_revision: Union[str, None] = "479aeeae940b"
-branch_labels: Union[str, Sequence[str], None] = None
 depends_on: Union[str, Sequence[str], None] = None
 
 
EOF
@@ -15,7 +15,6 @@
# revision identifiers, used by Alembic.
revision: str = "095feb42bc54"
down_revision: Union[str, None] = "479aeeae940b"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None


Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
revision: str = "095feb42bc54"
down_revision: Union[str, None] = "479aeeae940b"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None

Check notice

Code scanning / CodeQL

Unused global variable Note

The global variable 'depends_on' is not used.

Copilot Autofix

AI 5 days ago

To fix the problem, we should remove the definition of the unused global variable depends_on on line 19 of api/alembic/versions/2025_09_10_1143-095feb42bc54_user_name_optional.py. As its value is simply None and there are no side effects associated with its assignment, it's safe to delete this line without impacting any functionality or documentation. No imports or other changes are necessary.

Suggested changeset 1
api/alembic/versions/2025_09_10_1143-095feb42bc54_user_name_optional.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/api/alembic/versions/2025_09_10_1143-095feb42bc54_user_name_optional.py b/api/alembic/versions/2025_09_10_1143-095feb42bc54_user_name_optional.py
--- a/api/alembic/versions/2025_09_10_1143-095feb42bc54_user_name_optional.py
+++ b/api/alembic/versions/2025_09_10_1143-095feb42bc54_user_name_optional.py
@@ -16,9 +16,9 @@
 revision: str = "095feb42bc54"
 down_revision: Union[str, None] = "479aeeae940b"
 branch_labels: Union[str, Sequence[str], None] = None
-depends_on: Union[str, Sequence[str], None] = None
 
 
+
 def upgrade() -> None:
     """Upgrade schema."""
     op.alter_column("user", "name", existing_type=sa.VARCHAR(), nullable=True)
EOF
@@ -16,9 +16,9 @@
revision: str = "095feb42bc54"
down_revision: Union[str, None] = "479aeeae940b"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None



def upgrade() -> None:
"""Upgrade schema."""
op.alter_column("user", "name", existing_type=sa.VARCHAR(), nullable=True)
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Prompt Management System with Jinja2 Templates
2 participants