[llm_bench] Add reranking pipeline #2728

sbalandi · 2025-09-12T11:44:45Z

usage example:
python ../tools/llm_bench/benchmark.py -m ./models/ms-marco-MiniLM-L6-v2-text-class/ -n 1 --rerank --texts "side profile centered painted portrait, Gandhi rolling a blunt, Gloomhaven, matte painting concept art, art nouveau, 8K HD Resolution, beautifully background" --reranking_max_length 512 --reranking_top_n 3 --rerank

example output:

./tools/llm_bench/prompts/texts_for_rerank.jsonl
Multiple distributions found for package optimum. Picked distribution: optimum
The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
/home/labuser/work/notebook/sampler_env/lib/python3.10/site-packages/torch/onnx/_internal/registration.py:162: OnnxExporterWarning: Symbolic function 'aten::scaled_dot_product_attention' already registered for opset 14. Replacing the existing function with new function. This is unexpected. Please report it on https://github.com/pytorch/pytorch/issues.
  warnings.warn(
[ INFO ] ==SUCCESS FOUND==: use_case: rag, model_type: bert
[ INFO ] OV Config={'CACHE_DIR': ''}
[ WARNING ] It is recommended to set the environment variable OMP_WAIT_POLICY to PASSIVE, so that OpenVINO inference can use all CPU resources without waiting.
[ INFO ] The num_beams is 1, update Torch thread num from 18 to 9, avoid to use the CPU cores for OpenVINO inference.
[ INFO ] Model path=/home/labuser/work/notebook/openvino.genai/models/ms-marco-MiniLM-L6-v2-text-class, openvino runtime version: 2025.4.0-19834-71b56616e5d, genai version: 2025.4.0.0-2521-7e8d586a5aa-reranker
[ INFO ] Selected OpenVINO GenAI for benchmarking
[ INFO ] Pipeline initialization time: 0.81s
[ INFO ] Read texts from /home/labuser/work/notebook/openvino.genai/tools/llm_bench/prompts/texts_for_rerank.jsonl
[ INFO ] [warm-up][P0] Input query: What are the main features of Intel Core Ultra processors?
 Input texts: ['The commercial PC market is propelled by premium computing solutions that drive user productivity and help service organizations protect and maintain devices. Corporations must empower mobile and hybrid workers while extracting value from artificial intelligence (AI) to improve business outcomes. Moreover, both public and private sectors must address sustainability initiatives pertaining to the full life cycle of computing fleets. An inflection point in computing architecture is needed to stay ahead of evolving requirements. Introducing IntelÂŽ Coreâ
  ĸ Ultra Processors IntelÂŽ Coreâ
                                  ĸ Ultra processors shape the future of commercial computing in four major ways: Power Efficiency The new product line features a holistic approach to powerefficiency that benefits mobile work. Substantial changes to the microarchitecture, manufacturing process, packaging technology, and power management software result in up to 40% lower processor power consumption for modern tasks such as video conferencing with a virtual camera.  Artificial Intelligence Intel Core Ultra processors incorporate an AI-optimized architecture that supports new user experiences and the next wave of commercial applications. The CPU, GPU, and the new neural processing unit (NPU) are all capable of executing AI tasks as directed by application developers. For example, elevated mobile collaboration is possible with support for AI assisted background blur, noise suppression, eye tracking, and picture framing. Intel Core Ultra processors are capable of up to 2.5x the AI inference performance per watt as compared to Intelâs previous mobile processor offering.', 'Intel Core Ultra processors incorporate an AI-optimized architecture that supports new user experiences and the next wave of commercial applications.']
[ INFO ] [warm-up][P0] Document 1, score: 7.2574: Intel Core Ultra processors incorporate an AI-optimized architecture that supports new user experiences and the next wave of commercial applications.
[ INFO ] [warm-up][P0] Document 0, score: 5.1098: The commercial PC market is propelled by premium computing solutions that drive user productivity and help service organizations protect and maintain devices. Corporations must empower mobile and hybrid workers while extracting value from artificial intelligence (AI) to improve business outcomes. Moreover, both public and private sectors must address sustainability initiatives pertaining to the full life cycle of computing fleets. An inflection point in computing architecture is needed to stay ahead of evolving requirements. Introducing IntelÂŽ Coreâ
                                    ĸ Ultra Processors IntelÂŽ Coreâ
                                                                    ĸ Ultra processors shape the future of commercial computing in four major ways: Power Efficiency The new product line features a holistic approach to powerefficiency that benefits mobile work. Substantial changes to the microarchitecture, manufacturing process, packaging technology, and power management software result in up to 40% lower processor power consumption for modern tasks such as video conferencing with a virtual camera.  Artificial Intelligence Intel Core Ultra processors incorporate an AI-optimized architecture that supports new user experiences and the next wave of commercial applications. The CPU, GPU, and the new neural processing unit (NPU) are all capable of executing AI tasks as directed by application developers. For example, elevated mobile collaboration is possible with support for AI assisted background blur, noise suppression, eye tracking, and picture framing. Intel Core Ultra processors are capable of up to 2.5x the AI inference performance per watt as compared to Intelâs previous mobile processor offering.
[ INFO ] [warm-up][P0] Input token size: 598, Infer count: 1, Tokenization Time: 3.86ms, Total Time: 31.2880s, Latency: 31.2880 ms/prompt
[ INFO ] [warm-up][P0] First iteration latency: 31.29 ms, other iteration latency: NA, len of input tokens: 598, texts number: 2
[ INFO ] [warm-up][P0] First infer latency: 31.29 ms, other infers latency: NA, inference count: 1
[ INFO ] [warm-up][P0] start: 2025-09-15T15:23:52.131644, end: 2025-09-15T15:23:52.167208
[ INFO ] [1][P0] Document 1, score: 7.2574
[ INFO ] [1][P0] Document 0, score: 5.1098
[ INFO ] [1][P0] Input token size: 598, Infer count: 1, Tokenization Time: 1.48ms, Total Time: 15.3794s, Latency: 15.3794 ms/prompt
[ INFO ] [1][P0] First iteration latency: 15.38 ms, other iteration latency: NA, len of input tokens: 598, texts number: 2
[ INFO ] [1][P0] First infer latency: 15.38 ms, other infers latency: NA, inference count: 1
[ INFO ] [1][P0] start: 2025-09-15T15:23:52.167254, end: 2025-09-15T15:23:52.184472
[ INFO ] <<< Warm-up iteration is excluded. >>>
[ INFO ] [Total] Iterations: 1
[ INFO ] [Average] P[0] Input token size: 598, 1st iteration latency: 15.38 ms, 2nd iteration latency: NA, 2nd iteration throughput: NA

Copilot

Pull Request Overview

This PR adds reranking pipeline functionality to the llm_bench tool, enabling benchmarking of text reranking models. The implementation supports both PyTorch and OpenVINO frameworks with Optimum Intel and GenAI backends.

Introduces new TextRerankerOptimum and TextRerankerGenAI pipeline classes for text reranking
Adds configuration options for reranking parameters (max_length, top_n, input texts)
Integrates reranking functionality into the existing benchmark infrastructure

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
tools/llm_bench/task/text_reranker.py	Core implementation of text reranking pipelines and benchmark runner
tools/llm_bench/task/pipeline_utils.py	Common pipeline base class with timing decorators
tools/llm_bench/llm_bench_utils/pt_utils.py	PyTorch model creation utilities for reranking
tools/llm_bench/llm_bench_utils/ov_utils.py	OpenVINO model creation utilities for reranking
tools/llm_bench/llm_bench_utils/model_utils.py	Model configuration and argument processing updates
tools/llm_bench/llm_bench_utils/metrics_print.py	Metrics printing enhancements for reranking
tools/llm_bench/llm_bench_utils/hook_forward.py	Renamed hook class from EmbedForwardHook to RAGForwardHook
tools/llm_bench/llm_bench_utils/hook_common.py	Updated hook creation for RAG/reranking use cases
tools/llm_bench/llm_bench_utils/config_class.py	Configuration mappings for reranking model classes
tools/llm_bench/benchmark.py	Main benchmark script integration with new reranking arguments
tests/python_tests/samples/test_tools_llm_benchmark.py	Test cases for reranking functionality

Comments suppressed due to low confidence (1)

tools/llm_bench/task/text_reranker.py:1

Missing import statement for json module. The json.loads() function is used on line 388 but json is not imported.

# -*- coding: utf-8 -*-

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

tools/llm_bench/task/text_reranker.py

tools/llm_bench/task/pipeline_utils.py

tools/llm_bench/benchmark.py

as-suvorov · 2025-09-16T08:32:06Z

Data from example: Time: 31.2880s, Latency: 31.2880 ms/prompt numbers s vs ms looks odd. Could you please clarify?

Copilot

Pull Request Overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

tools/llm_bench/llm_bench_utils/pt_utils.py

tools/llm_bench/task/text_reranker.py

tools/llm_bench/llm_bench_utils/ov_utils.py

tools/llm_bench/llm_bench_utils/pt_utils.py

tools/llm_bench/task/text_reranker.py

tools/llm_bench/task/pipeline_utils.py

sbalandi · 2025-09-16T12:08:10Z

Data from example: Time: 31.2880s, Latency: 31.2880 ms/prompt numbers s vs ms looks odd. Could you please clarify?

To be honest I don't have an answer, it's just this way it is historically, it's possible to align it, but this part of the code is used in many parts, so I would do it in a separate pr.

as-suvorov · 2025-09-16T12:42:29Z

Data from example: Time: 31.2880s, Latency: 31.2880 ms/prompt numbers s vs ms looks odd. Could you please clarify?

To be honest I don't have an answer, it's just this way it is historically, it's possible to align it, but this part of the code is used in many parts, so I would do it in a separate pr.

How many prompts were processed during generation time of 31.2880s ?

sbalandi · 2025-09-16T15:42:40Z

Data from example: Time: 31.2880s, Latency: 31.2880 ms/prompt numbers s vs ms looks odd. Could you please clarify?

To be honest I don't have an answer, it's just this way it is historically, it's possible to align it, but this part of the code is used in many parts, so I would do it in a separate pr.

How many prompts were processed during generation time of 31.2880s ?

One query and two texts. Total is in ms here, I thought you ask about mixing ms and s, thank you ! I will fix

tools/llm_bench/task/text_reranker.py

tools/llm_bench/task/pipeline_utils.py

tools/llm_bench/task/text_reranker.py

tools/llm_bench/task/pipeline_utils.py

github-actions bot added category: llm_bench Label for tool/llm_bench folder category: GGUF GGUF file reader category: RAG samples RAG samples labels Sep 12, 2025

sbalandi force-pushed the reranker branch from 2b5a763 to 7e8d586 Compare September 12, 2025 11:46

github-actions bot removed the category: RAG samples RAG samples label Sep 12, 2025

Wovchena requested a review from Copilot September 12, 2025 11:59

Copilot AI reviewed Sep 12, 2025

View reviewed changes

sbalandi force-pushed the reranker branch 5 times, most recently from 2efebe7 to f08e2e7 Compare September 15, 2025 11:00

sbalandi requested review from Wovchena, rkazants and as-suvorov September 15, 2025 11:37

sbalandi force-pushed the reranker branch 3 times, most recently from e54086a to f9a9ddd Compare September 15, 2025 18:47

as-suvorov requested a review from Copilot September 16, 2025 08:32

Copilot AI reviewed Sep 16, 2025

View reviewed changes

as-suvorov reviewed Sep 16, 2025

View reviewed changes

Wovchena reviewed Sep 16, 2025

View reviewed changes

tools/llm_bench/task/pipeline_utils.py Outdated Show resolved Hide resolved

sbalandi force-pushed the reranker branch from bdebd8b to e0bdf27 Compare September 16, 2025 12:22

sbalandi force-pushed the reranker branch from 82d4075 to 643e880 Compare September 18, 2025 10:17

as-suvorov reviewed Sep 19, 2025

View reviewed changes

tools/llm_bench/task/text_reranker.py Outdated Show resolved Hide resolved

tools/llm_bench/task/text_reranker.py Outdated Show resolved Hide resolved

Wovchena reviewed Sep 22, 2025

View reviewed changes

tools/llm_bench/task/pipeline_utils.py Show resolved Hide resolved

tools/llm_bench/task/text_reranker.py Show resolved Hide resolved

tools/llm_bench/task/text_reranker.py Outdated Show resolved Hide resolved

sbalandi added 3 commits September 24, 2025 18:42

[llm_bench] Add reranking pipeline

a0c16ff

allign to comments

14d15ea

allign time measurements

67d1c9d

sbalandi force-pushed the reranker branch from 643e880 to 3b9eb6e Compare September 24, 2025 17:43

Wovchena reviewed Sep 24, 2025

View reviewed changes

tools/llm_bench/task/pipeline_utils.py Show resolved Hide resolved

as-suvorov approved these changes Sep 25, 2025

View reviewed changes

sbalandi force-pushed the reranker branch 2 times, most recently from 77d26d4 to 175ed77 Compare September 25, 2025 09:15

update

28b5cdb

sbalandi force-pushed the reranker branch from 175ed77 to 28b5cdb Compare September 25, 2025 09:16

Wovchena approved these changes Sep 25, 2025

View reviewed changes

sbalandi enabled auto-merge September 25, 2025 10:44

sbalandi added this pull request to the merge queue Sep 25, 2025

Merged via the queue into openvinotoolkit:master with commit 1564b2e Sep 25, 2025
91 checks passed

[llm_bench] Add reranking pipeline #2728

[llm_bench] Add reranking pipeline #2728

Uh oh!

Conversation

sbalandi commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

as-suvorov commented Sep 16, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sbalandi commented Sep 16, 2025

Uh oh!

as-suvorov commented Sep 16, 2025

Uh oh!

sbalandi commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sbalandi commented Sep 12, 2025 •

edited

Loading

sbalandi commented Sep 16, 2025 •

edited

Loading