Add multi-language support for prompt templates #414

sladinji · 2025-09-05T14:38:22Z

Introduce a centralized Jinja2 prompt system with multi-language capabilities, including English and French templates. Enhance configuration options for prompts directory and language selection, and update documentation to reflect these changes. Refactor existing prompt handling to utilize the new template-based approach.

app/utils/prompt_loader.py

…ers; update OCR default prompt handling

…nglish prompt templates for deepsearch

…glish and French

api/utils/prompt_loader.py

+        else:
+            logger.debug("Prompt overrides directory does not exist: %s", self.overrides_dir)
+        search_paths.append(self.internal_dir)
+        self.env = Environment(loader=FileSystemLoader(search_paths), autoescape=False, trim_blocks=True, lstrip_blocks=True)


To fix this vulnerability, autoescaping should be enabled when constructing the Jinja2 Environment. The recommended and most robust solution is to use select_autoescape, which will automatically enable escaping for templates that are commonly rendered as HTML or XML (files ending in .html, .htm, .xml, etc.)—and possibly leave others, like pure text templates, unescaped. However, if you know that all of your templates are either HTML or XML, you can use autoescape=True. To maximize safety and future flexibility, replacing autoescape=False with autoescape=select_autoescape(['html', 'xml', 'j2']) ensures anything with .j2 (or the relevant extensions you use) is escaped. Since your templates use the .j2 extension, including 'j2' in the extension list is prudent. This does not change any application logic and retains existing behavior, with the reliability benefit of escaping.

The only changes needed are:

Import the select_autoescape method from jinja2 (to allow its use).

Replace the autoescape=False argument in the Environment instantiation with autoescape=select_autoescape(['j2', 'html', 'xml']).

These changes need to be made in api/utils/prompt_loader.py—specifically, to the import statement and the environment construction.

api/utils/prompt_loader.py

+                if not base or not os.path.isdir(base):
+                    continue
+                path = os.path.join(base, cand)
+                if os.path.isfile(path) and (base, cand) not in found:


To mitigate path traversal, the user-controlled language value (as well as related inputs derived from it) must be validated before constructing candidate filenames. The best fix, given filenames are supposed to follow the format <lang>.j2 or <module>.<lang>.j2, is to allow only valid language codes (e.g. two-letter alphabetical lowercase strings or an explicitly allow-listed set) for the filename position.

The recommended approach:

Validate language strictly in the PromptRenderer initializer. Accept only language codes matching a regex such as ^[a-z]{2}$ or a set of known codes.

If the value does not match, either default to the fallback DEFAULT_LANGUAGE or raise an exception.

Additionally, in _candidate_templates, ensure that only valid candidate filenames get constructed (derived from a safe language string).

This validation can be encapsulated in a helper function, and performed immediately on initialization.

Files to edit: api/utils/prompt_loader.py (in the constructor and associated candidate template logic).
Dependencies: For regex, Python's builtin re is sufficient.

api/utils/prompt_loader.py

+            # if no exact candidate matched, try to use any internal templates matching the language
+            logger.debug(
+                "No exact candidate template found; searching internal dir for any *.%s.j2 files",
+                self.language,


To mitigate log injection, user-provided values that are included in log messages should be sanitized. For plain-text logs, the most important step is to strip or replace newline and carriage return characters from the language value before logging. The simplest fix is to wrap the self.language reference in a helper that removes problematic characters (e.g., replacing \n and \r with an empty string) in all logging calls where user-provided language codes could end up in log output. Only the logging on line 106 needs to be changed, as subsequent uses do not directly log the value.

The fix should focus only on the code shown in api/utils/prompt_loader.py. A good approach is to use a local variable (e.g., safe_language) constructed by sanitizing self.language just before logging, and using that variable in place of self.language in the log statement.

No new imports are required, as only native string replacement (str.replace) is needed.

api/utils/prompt_loader.py

+                found = [(self.internal_dir, f) for f in internal_files]
+            else:
+                # final fallback to default language file name (will be looked up in env paths)
+                logger.error("No prompt template found; tried: %s", ", ".join(candidates))


The best way to fix this problem is to sanitize any user-controlled input that is written to log files. In this specific case, before logging the candidate filenames joined together, each candidate should be sanitized to remove potentially malicious characters such as carriage returns and line feeds (\r, \n). This can be done by mapping a sanitize function over the list of candidate filenames before joining them for logging.

The only file that requires editing is api/utils/prompt_loader.py, specifically in the _resolve_template_for_module method where the log uses ", ".join(candidates). To implement this:

Define a helper function (locally or as a static/class method) to sanitize user-supplied or user-influenced strings before logging.

Use this function to clean all candidate strings before joining and logging them.

No external dependencies need to be added; built-in Python string operations are sufficient.

api/utils/prompt_loader.py

+        # Load template from a specific base directory to allow loading internal
+        # templates even when the same filename exists in an overrides dir.
+        try:
+            env = Environment(loader=FileSystemLoader([base]), autoescape=False, trim_blocks=True, lstrip_blocks=True)


To fix this issue, we should instantiate the Jinja2 Environment with autoescape enabled in a way that is correct for what is being templated. If the templates might generate HTML or XML, use select_autoescape(['html', 'xml'])—this is Jinja2's best practice. This approach will enable autoescaping for HTML/XML templates but not for others (like plain .txt files). If all templates are, for example, .j2 files but exclusively render non-HTML, the more precise control is to explicitly enable or disable autoescape based on downstream needs. The safest general fix is to replace autoescape=False with autoescape=select_autoescape(['html', 'xml']).

Therefore, in api/utils/prompt_loader.py, replace the instantiation:

env = Environment(loader=FileSystemLoader([base]), autoescape=False, trim_blocks=True, lstrip_blocks=True)

with

env = Environment( loader=FileSystemLoader([base]), autoescape=select_autoescape(['html', 'xml']), trim_blocks=True, lstrip_blocks=True, )

and add an import for select_autoescape from jinja2 at the top of the file.

…in User model

…st parameters accordingly

api/alembic/versions/2025_09_10_1143-095feb42bc54_user_name_optional.py

+
+
+# revision identifiers, used by Alembic.
+revision: str = "095feb42bc54"


api/alembic/versions/2025_09_10_1143-095feb42bc54_user_name_optional.py

+
+# revision identifiers, used by Alembic.
+revision: str = "095feb42bc54"
+down_revision: Union[str, None] = "479aeeae940b"


api/alembic/versions/2025_09_10_1143-095feb42bc54_user_name_optional.py

+# revision identifiers, used by Alembic.
+revision: str = "095feb42bc54"
+down_revision: Union[str, None] = "479aeeae940b"
+branch_labels: Union[str, Sequence[str], None] = None


To fix the issue, delete the assignment to the unused global variable branch_labels in api/alembic/versions/2025_09_10_1143-095feb42bc54_user_name_optional.py. Make sure not to remove any right-hand side code that has side effects, but since the assignment is simply None, safe to fully remove the line. No other changes are necessary because Alembic does not require this variable to be present unless the migration is part of a branch.

api/alembic/versions/2025_09_10_1143-095feb42bc54_user_name_optional.py

+revision: str = "095feb42bc54"
+down_revision: Union[str, None] = "479aeeae940b"
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None


To fix the problem, we should remove the definition of the unused global variable depends_on on line 19 of api/alembic/versions/2025_09_10_1143-095feb42bc54_user_name_optional.py. As its value is simply None and there are no side effects associated with its assignment, it's safe to delete this line without impacting any functionality or documentation. No imports or other changes are necessary.

sladinji requested a review from leoguillaume September 5, 2025 14:38

sladinji self-assigned this Sep 5, 2025

github-advanced-security bot found potential problems Sep 5, 2025

View reviewed changes

app/utils/prompt_loader.py Fixed Show fixed Hide fixed

app/utils/prompt_loader.py Fixed Show fixed Hide fixed

app/utils/prompt_loader.py Fixed Show fixed Hide fixed

app/utils/prompt_loader.py Fixed Show fixed Hide fixed

app/utils/prompt_loader.py Fixed Show fixed Hide fixed

leoguillaume linked an issue Sep 5, 2025 that may be closed by this pull request

Add Prompt Management System with Jinja2 Templates #408

Open

4 tasks

Julien Almarcha and others added 10 commits September 8, 2025 10:45

feat: add multi-language prompt templates

839cd6b

feat: add prompts configuration options for directory and language

1dca291

update py

36dcb13

feat: add template-based prompts feature to README

e138fbc

feat: integrate prompt rendering for multi-agent and web search manag…

e4c24df

…ers; update OCR default prompt handling

feat: enhance prompt rendering with language support and add French/E…

c626ccc

…nglish prompt templates for deepsearch

feat: refactor summarization prompts to use template rendering for En…

66ab27c

…glish and French

Update configuration documentation

2446c2f

fix test config

36b076b

fix import

542f54c

sladinji force-pushed the Add-Prompt-Management-System-with-Jinja2-Templates-#408 branch from a438fda to 542f54c Compare September 8, 2025 15:47

github-advanced-security bot found potential problems Sep 8, 2025

View reviewed changes

actions-user and others added 12 commits September 8, 2025 15:50

Update coverage badge

dd1feea

fix : replace name by email in playground

2f0d092

remove summarize templates

6e19516

remove prints

6191103

feat: make user name optional in database schema

256735e

feat: add password field to UserUpdateRequest and make name optional …

4589543

…in User model

fix: replace user name with email in DocumentManager queries

884e461

feat: update create_user method to use email instead of name and adju…

9258ef4

…st parameters accordingly

feat: make user name optional in User model

8d2864b

fix: update user creation tests to use email instead of name

9c5f9b1

fix: update user creation tests to use email instead of name

6a56ff5

fix: update create_user tests to use email instead of name

b359953

github-advanced-security bot found potential problems Sep 10, 2025

View reviewed changes

Update coverage badge

9dea87e

@@ -3,7 +3,7 @@
             from functools import lru_cache
             from typing import Any, Iterator, List, Optional
-            from jinja2 import Environment, FileSystemLoader, TemplateNotFound
+            from jinja2 import Environment, FileSystemLoader, TemplateNotFound, select_autoescape
             from api.utils.configuration import get_configuration
@@ -53,7 +53,12 @@
                     else:
                         logger.debug("Prompt overrides directory does not exist: %s", self.overrides_dir)
                     search_paths.append(self.internal_dir)
-                    self.env = Environment(loader=FileSystemLoader(search_paths), autoescape=False, trim_blocks=True, lstrip_blocks=True)
+                    self.env = Environment(
+                        loader=FileSystemLoader(search_paths),
+                        autoescape=select_autoescape(['j2', 'html', 'xml']),
+                        trim_blocks=True,
+                        lstrip_blocks=True,
+                    )
                     # cache of resolved template name lists per module key (ordered)
                     self._module_template_cache: dict[Optional[str], List[str]] = {}

@@ -1,5 +1,6 @@
             import logging
             import os
+            import re
             from functools import lru_cache
             from typing import Any, Iterator, List, Optional
@@ -42,7 +43,11 @@
                     # read dynamic configuration if available
                     self.overrides_dir = overrides_dir or getattr(settings, "prompts_dir", "/prompts")
                     # prompts_lang is stored as a language code (eg 'en', 'fr') without the .j2 extension
-                    self.language = language or getattr(settings, "prompts_lang", DEFAULT_LANGUAGE)
+                    selected_lang = language or getattr(settings, "prompts_lang", DEFAULT_LANGUAGE)
+                    if not re.fullmatch(r"^[a-z]{2}$", str(selected_lang)):
+                        logger.warning(f"Invalid language code '{selected_lang}' provided, falling back to default: {DEFAULT_LANGUAGE}")
+                        selected_lang = DEFAULT_LANGUAGE
+                    self.language = selected_lang
                     self.internal_dir = DEFAULT_PROMPTS_RELATIVE_DIR

@@ -85,6 +85,12 @@
                         yield from self._module_template_pairs_cache[module]
                         return
+                    def _sanitize_for_log(s: str) -> str:
+                        # Remove CR and LF characters and indicate user origin
+                        sanitized = s.replace('\r', '').replace('\n', '')
+                        return sanitized
                     search_paths = [self.overrides_dir, self.internal_dir]
                     candidates = self._candidate_templates(module)
                     found: list[tuple[str, str]] = []
@@ -114,7 +120,7 @@
                             found = [(self.internal_dir, f) for f in internal_files]
                         else:
                             # final fallback to default language file name (will be looked up in env paths)
-                            logger.error("No prompt template found; tried: %s", ", ".join(candidates))
+                            logger.error("No prompt template found; tried: %s", ", ".join(_sanitize_for_log(c) for c in candidates))
                             found = [(self.internal_dir, f"{DEFAULT_LANGUAGE}.j2")]
                     # cache relative template names for introspection but keep pairs for loading

@@ -3,7 +3,7 @@
             from functools import lru_cache
             from typing import Any, Iterator, List, Optional
-            from jinja2 import Environment, FileSystemLoader, TemplateNotFound
+            from jinja2 import Environment, FileSystemLoader, TemplateNotFound, select_autoescape
             from api.utils.configuration import get_configuration
@@ -128,7 +128,12 @@
                     # Load template from a specific base directory to allow loading internal
                     # templates even when the same filename exists in an overrides dir.
                     try:
-                        env = Environment(loader=FileSystemLoader([base]), autoescape=False, trim_blocks=True, lstrip_blocks=True)
+                        env = Environment(
+                            loader=FileSystemLoader([base]),
+                            autoescape=select_autoescape(['html', 'xml']),
+                            trim_blocks=True,
+                            lstrip_blocks=True
+                        )
                         return env.get_template(template_name)
                     except TemplateNotFound as e:
                         raise FileNotFoundError(f"Prompt template file '{template_name}' not found in base '{base}'.") from e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add multi-language support for prompt templates #414

Add multi-language support for prompt templates #414

Uh oh!

sladinji commented Sep 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Check warning

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check warning

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Uh oh!

@@ -101,9 +101,10 @@
                     if not found:
                         # if no exact candidate matched, try to use any internal templates matching the language
+                        safe_language = str(self.language).replace("\n", "").replace("\r", "")
                         logger.debug(
                             "No exact candidate template found; searching internal dir for any *.%s.j2 files",
-                            self.language,
+                            safe_language,
                         )
                         try:
                             internal_files = [f for f in os.listdir(self.internal_dir) if f.endswith(f".{self.language}.j2")]



		# revision identifiers, used by Alembic.
		revision: str = "095feb42bc54"

Add multi-language support for prompt templates #414

Are you sure you want to change the base?

Add multi-language support for prompt templates #414

Uh oh!

Conversation

sladinji commented Sep 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Check warning

Copilot Autofix

Check failure

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Copilot Autofix

Check warning

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Uh oh!