Skip to content

Conversation

sbalandi
Copy link
Contributor

@sbalandi sbalandi commented Sep 12, 2025

usage example:
python ../tools/llm_bench/benchmark.py -m ./models/ms-marco-MiniLM-L6-v2-text-class/ -n 1 --rerank --texts "side profile centered painted portrait, Gandhi rolling a blunt, Gloomhaven, matte painting concept art, art nouveau, 8K HD Resolution, beautifully background" --reranking_max_length 512 --reranking_top_n 3 --rerank

example output:

./tools/llm_bench/prompts/texts_for_rerank.jsonl
Multiple distributions found for package optimum. Picked distribution: optimum
The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
/home/labuser/work/notebook/sampler_env/lib/python3.10/site-packages/torch/onnx/_internal/registration.py:162: OnnxExporterWarning: Symbolic function 'aten::scaled_dot_product_attention' already registered for opset 14. Replacing the existing function with new function. This is unexpected. Please report it on https://github.com/pytorch/pytorch/issues.
  warnings.warn(
[ INFO ] ==SUCCESS FOUND==: use_case: rag, model_type: bert
[ INFO ] OV Config={'CACHE_DIR': ''}
[ WARNING ] It is recommended to set the environment variable OMP_WAIT_POLICY to PASSIVE, so that OpenVINO inference can use all CPU resources without waiting.
[ INFO ] The num_beams is 1, update Torch thread num from 18 to 9, avoid to use the CPU cores for OpenVINO inference.
[ INFO ] Model path=/home/labuser/work/notebook/openvino.genai/models/ms-marco-MiniLM-L6-v2-text-class, openvino runtime version: 2025.4.0-19834-71b56616e5d, genai version: 2025.4.0.0-2521-7e8d586a5aa-reranker
[ INFO ] Selected OpenVINO GenAI for benchmarking
[ INFO ] Pipeline initialization time: 0.81s
[ INFO ] Read texts from /home/labuser/work/notebook/openvino.genai/tools/llm_bench/prompts/texts_for_rerank.jsonl
[ INFO ] [warm-up][P0] Input query: What are the main features of Intel Core Ultra processors?
 Input texts: ['The commercial PC market is propelled by premium computing solutions that drive user productivity and help service organizations protect and maintain devices. Corporations must empower mobile and hybrid workers while extracting value from artificial intelligence (AI) to improve business outcomes. Moreover, both public and private sectors must address sustainability initiatives pertaining to the full life cycle of computing fleets. An inflection point in computing architecture is needed to stay ahead of evolving requirements. Introducing IntelÂŽ Coreâ
  ĸ Ultra Processors IntelÂŽ Coreâ
                                  ĸ Ultra processors shape the future of commercial computing in four major ways: Power Efficiency The new product line features a holistic approach to powerefficiency that benefits mobile work. Substantial changes to the microarchitecture, manufacturing process, packaging technology, and power management software result in up to 40% lower processor power consumption for modern tasks such as video conferencing with a virtual camera.  Artificial Intelligence Intel Core Ultra processors incorporate an AI-optimized architecture that supports new user experiences and the next wave of commercial applications. The CPU, GPU, and the new neural processing unit (NPU) are all capable of executing AI tasks as directed by application developers. For example, elevated mobile collaboration is possible with support for AI assisted background blur, noise suppression, eye tracking, and picture framing. Intel Core Ultra processors are capable of up to 2.5x the AI inference performance per watt as compared to Intelâs previous mobile processor offering.', 'Intel Core Ultra processors incorporate an AI-optimized architecture that supports new user experiences and the next wave of commercial applications.']
[ INFO ] [warm-up][P0] Document 1, score: 7.2574: Intel Core Ultra processors incorporate an AI-optimized architecture that supports new user experiences and the next wave of commercial applications.
[ INFO ] [warm-up][P0] Document 0, score: 5.1098: The commercial PC market is propelled by premium computing solutions that drive user productivity and help service organizations protect and maintain devices. Corporations must empower mobile and hybrid workers while extracting value from artificial intelligence (AI) to improve business outcomes. Moreover, both public and private sectors must address sustainability initiatives pertaining to the full life cycle of computing fleets. An inflection point in computing architecture is needed to stay ahead of evolving requirements. Introducing IntelÂŽ Coreâ
                                    ĸ Ultra Processors IntelÂŽ Coreâ
                                                                    ĸ Ultra processors shape the future of commercial computing in four major ways: Power Efficiency The new product line features a holistic approach to powerefficiency that benefits mobile work. Substantial changes to the microarchitecture, manufacturing process, packaging technology, and power management software result in up to 40% lower processor power consumption for modern tasks such as video conferencing with a virtual camera.  Artificial Intelligence Intel Core Ultra processors incorporate an AI-optimized architecture that supports new user experiences and the next wave of commercial applications. The CPU, GPU, and the new neural processing unit (NPU) are all capable of executing AI tasks as directed by application developers. For example, elevated mobile collaboration is possible with support for AI assisted background blur, noise suppression, eye tracking, and picture framing. Intel Core Ultra processors are capable of up to 2.5x the AI inference performance per watt as compared to Intelâs previous mobile processor offering.
[ INFO ] [warm-up][P0] Input token size: 598, Infer count: 1, Tokenization Time: 3.86ms, Total Time: 31.2880s, Latency: 31.2880 ms/prompt
[ INFO ] [warm-up][P0] First iteration latency: 31.29 ms, other iteration latency: NA, len of input tokens: 598, texts number: 2
[ INFO ] [warm-up][P0] First infer latency: 31.29 ms, other infers latency: NA, inference count: 1
[ INFO ] [warm-up][P0] start: 2025-09-15T15:23:52.131644, end: 2025-09-15T15:23:52.167208
[ INFO ] [1][P0] Document 1, score: 7.2574
[ INFO ] [1][P0] Document 0, score: 5.1098
[ INFO ] [1][P0] Input token size: 598, Infer count: 1, Tokenization Time: 1.48ms, Total Time: 15.3794s, Latency: 15.3794 ms/prompt
[ INFO ] [1][P0] First iteration latency: 15.38 ms, other iteration latency: NA, len of input tokens: 598, texts number: 2
[ INFO ] [1][P0] First infer latency: 15.38 ms, other infers latency: NA, inference count: 1
[ INFO ] [1][P0] start: 2025-09-15T15:23:52.167254, end: 2025-09-15T15:23:52.184472
[ INFO ] <<< Warm-up iteration is excluded. >>>
[ INFO ] [Total] Iterations: 1
[ INFO ] [Average] P[0] Input token size: 598, 1st iteration latency: 15.38 ms, 2nd iteration latency: NA, 2nd iteration throughput: NA

@github-actions github-actions bot added category: llm_bench Label for tool/llm_bench folder category: GGUF GGUF file reader category: RAG samples RAG samples labels Sep 12, 2025
@github-actions github-actions bot removed the category: RAG samples RAG samples label Sep 12, 2025
@Wovchena Wovchena requested a review from Copilot September 12, 2025 11:59
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds reranking pipeline functionality to the llm_bench tool, enabling benchmarking of text reranking models. The implementation supports both PyTorch and OpenVINO frameworks with Optimum Intel and GenAI backends.

  • Introduces new TextRerankerOptimum and TextRerankerGenAI pipeline classes for text reranking
  • Adds configuration options for reranking parameters (max_length, top_n, input texts)
  • Integrates reranking functionality into the existing benchmark infrastructure

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tools/llm_bench/task/text_reranker.py Core implementation of text reranking pipelines and benchmark runner
tools/llm_bench/task/pipeline_utils.py Common pipeline base class with timing decorators
tools/llm_bench/llm_bench_utils/pt_utils.py PyTorch model creation utilities for reranking
tools/llm_bench/llm_bench_utils/ov_utils.py OpenVINO model creation utilities for reranking
tools/llm_bench/llm_bench_utils/model_utils.py Model configuration and argument processing updates
tools/llm_bench/llm_bench_utils/metrics_print.py Metrics printing enhancements for reranking
tools/llm_bench/llm_bench_utils/hook_forward.py Renamed hook class from EmbedForwardHook to RAGForwardHook
tools/llm_bench/llm_bench_utils/hook_common.py Updated hook creation for RAG/reranking use cases
tools/llm_bench/llm_bench_utils/config_class.py Configuration mappings for reranking model classes
tools/llm_bench/benchmark.py Main benchmark script integration with new reranking arguments
tests/python_tests/samples/test_tools_llm_benchmark.py Test cases for reranking functionality
Comments suppressed due to low confidence (1)

tools/llm_bench/task/text_reranker.py:1

  • Missing import statement for json module. The json.loads() function is used on line 388 but json is not imported.
# -*- coding: utf-8 -*-

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@sbalandi sbalandi force-pushed the reranker branch 5 times, most recently from 2efebe7 to f08e2e7 Compare September 15, 2025 11:00
@sbalandi sbalandi force-pushed the reranker branch 3 times, most recently from e54086a to f9a9ddd Compare September 15, 2025 18:47
@as-suvorov
Copy link
Collaborator

Data from example: Time: 31.2880s, Latency: 31.2880 ms/prompt numbers s vs ms looks odd. Could you please clarify?

@as-suvorov as-suvorov requested a review from Copilot September 16, 2025 08:32
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@sbalandi
Copy link
Contributor Author

Data from example: Time: 31.2880s, Latency: 31.2880 ms/prompt numbers s vs ms looks odd. Could you please clarify?

To be honest I don't have an answer, it's just this way it is historically, it's possible to align it, but this part of the code is used in many parts, so I would do it in a separate pr.

@as-suvorov
Copy link
Collaborator

Data from example: Time: 31.2880s, Latency: 31.2880 ms/prompt numbers s vs ms looks odd. Could you please clarify?

To be honest I don't have an answer, it's just this way it is historically, it's possible to align it, but this part of the code is used in many parts, so I would do it in a separate pr.

How many prompts were processed during generation time of 31.2880s ?

@sbalandi
Copy link
Contributor Author

sbalandi commented Sep 16, 2025

Data from example: Time: 31.2880s, Latency: 31.2880 ms/prompt numbers s vs ms looks odd. Could you please clarify?

To be honest I don't have an answer, it's just this way it is historically, it's possible to align it, but this part of the code is used in many parts, so I would do it in a separate pr.

How many prompts were processed during generation time of 31.2880s ?

One query and two texts. Total is in ms here, I thought you ask about mixing ms and s, thank you ! I will fix

@sbalandi sbalandi force-pushed the reranker branch 2 times, most recently from 77d26d4 to 175ed77 Compare September 25, 2025 09:15
@sbalandi sbalandi enabled auto-merge September 25, 2025 10:44
@sbalandi sbalandi added this pull request to the merge queue Sep 25, 2025
Merged via the queue into openvinotoolkit:master with commit 1564b2e Sep 25, 2025
91 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: GGUF GGUF file reader category: llm_bench Label for tool/llm_bench folder
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants