Skip to content

[#6187][feat] add LayerNorm module #6625

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Funatiq
Copy link
Collaborator

@Funatiq Funatiq commented Aug 5, 2025

Summary by CodeRabbit

  • New Features

    • Introduced a customizable Layer Normalization module with support for optional residual connections and configurable parameters.
  • Style

    • Improved code readability and formatting in the RMSNorm module.

Description

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Copy link
Contributor

coderabbitai bot commented Aug 5, 2025

📝 Walkthrough

Walkthrough

A new LayerNorm module was introduced, implementing configurable layer normalization with optional weight, bias, and residual support. Minor stylistic changes were made to the RMSNorm module for readability, without affecting functionality or public interfaces.

Changes

Cohort / File(s) Change Summary
New LayerNorm Module
tensorrt_llm/_torch/modules/layer_norm.py
Added a custom LayerNorm class with configurable parameters, supporting optional weights, bias, and residuals.
RMSNorm Style Adjustments
tensorrt_llm/_torch/modules/rms_norm.py
Reformatted the constructor and added a blank line in forward for readability; no functional changes made.

Sequence Diagram(s)

sequenceDiagram
    participant Input
    participant LayerNorm
    participant Output

    Input->>LayerNorm: forward(hidden_states, residual)
    alt residual provided
        LayerNorm->>LayerNorm: Add residual to hidden_states
    end
    LayerNorm->>LayerNorm: Compute mean and variance
    LayerNorm->>LayerNorm: Normalize and apply weight/bias
    LayerNorm->>Output: Return normalized output (and residual if provided)
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested reviewers

  • liji-nv
  • Wanli-Jiang

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between baab83b and 16dcaa7.

📒 Files selected for processing (2)
  • tensorrt_llm/_torch/modules/layer_norm.py (1 hunks)
  • tensorrt_llm/_torch/modules/rms_norm.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • tensorrt_llm/_torch/modules/rms_norm.py
  • tensorrt_llm/_torch/modules/layer_norm.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coderabbitai coderabbitai bot changed the title @coderabbitai title feat add LayerNorm module and update RMSNorm styling for readability Aug 5, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
tensorrt_llm/_torch/modules/layer_norm.py (1)

78-78: Add missing docstrings for the LayerNorm class and methods.

According to the coding guidelines, interfaces that may be used outside a file should prefer docstrings over comments. The LayerNorm class and its methods should have Google-style docstrings.

Add docstrings for the class and methods:

 class LayerNorm(nn.Module):
+    """Layer normalization module with configurable weight and bias parameters.
+    
+    This implementation provides standard layer normalization with optional
+    learnable parameters and residual connection support.
+    
+    Args:
+        hidden_size: The size of the hidden dimension to normalize.
+        eps: Small constant for numerical stability.
+        dtype: Optional data type for parameters.
+        device: Optional device for parameters.
+        has_weights: Whether to include learnable weight parameters.
+        has_bias: Whether to include learnable bias parameters.
+    """

     def __init__(
     def forward(
         self,
         hidden_states: torch.Tensor,
         residual: Optional[torch.Tensor] = ...,
     ) -> Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
+        """Apply layer normalization to input tensor.
+        
+        Args:
+            hidden_states: Input tensor to normalize.
+            residual: Optional residual tensor to add before normalization.
+            
+        Returns:
+            Normalized tensor, or tuple of (normalized_tensor, residual) if residual provided.
+        """
     def skip_forward(
         self,
         hidden_states: torch.Tensor,
         residual: Optional[torch.Tensor] = ...,
     ) -> Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
+        """Skip normalization and return inputs unchanged.
+        
+        Args:
+            hidden_states: Input tensor to pass through.
+            residual: Optional residual tensor to pass through.
+            
+        Returns:
+            Input tensors unchanged, maintaining same signature as forward.
+        """
🧹 Nitpick comments (1)
tensorrt_llm/_torch/modules/layer_norm.py (1)

47-49: Consider implementing Flashinfer support for LayerNorm.

While raising NotImplementedError is appropriate for now, consider adding Flashinfer support for LayerNorm in the future to maintain performance parity with RMSNorm.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08ed9d7 and d0d63c1.

📒 Files selected for processing (2)
  • tensorrt_llm/_torch/modules/layer_norm.py (1 hunks)
  • tensorrt_llm/_torch/modules/rms_norm.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without reflection.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • tensorrt_llm/_torch/modules/rms_norm.py
  • tensorrt_llm/_torch/modules/layer_norm.py
**/*.{cpp,h,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • tensorrt_llm/_torch/modules/rms_norm.py
  • tensorrt_llm/_torch/modules/layer_norm.py
🔇 Additional comments (5)
tensorrt_llm/_torch/modules/rms_norm.py (2)

12-20: LGTM! Good formatting improvement.

The constructor formatting with proper indentation and trailing commas enhances readability while maintaining the same functionality.


53-53: LGTM! Minor formatting improvement.

The additional blank line improves code readability by visually separating the residual handling logic from the normalization computation.

tensorrt_llm/_torch/modules/layer_norm.py (3)

11-40: LGTM! Well-structured constructor with proper parameter handling.

The constructor correctly implements configurable weight and bias parameters with appropriate fallbacks to buffers when disabled. The implementation follows the same pattern as RMSNorm for consistency.


57-62: LGTM! Correct LayerNorm implementation.

The implementation correctly computes layer normalization using mean and variance along the last dimension, with proper numerical stability through epsilon. The dtype handling and residual support follow established patterns.


69-77: LGTM! Consistent skip_forward implementation.

The skip_forward method correctly maintains the same signature and behavior as the RMSNorm implementation, providing consistency across normalization modules.

@Funatiq Funatiq changed the title feat add LayerNorm module and update RMSNorm styling for readability [feat] add LayerNorm module Aug 5, 2025
Copy link
Contributor

coderabbitai bot commented Aug 5, 2025

Note

Unit test generation is an Early Access feature. Expect some limitations and changes as we gather feedback and continue to improve it.


Generating unit tests... This may take up to 20 minutes.

Copy link
Contributor

coderabbitai bot commented Aug 5, 2025

Here are the copyable unit test edits:

Copyable Edits
## tests/unittest/_torch/test_layer_norm.py

This is a new file.

# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import pytest
import torch
import torch.nn as nn
from unittest.mock import patch

# Import the LayerNorm class from the source file
from tensorrt_llm._torch.layer_norm import LayerNorm


class TestLayerNorm:
    """Comprehensive test suite for LayerNorm class.

    Testing framework: pytest
    This test suite covers initialization, forward pass, skip_forward, 
    parameter configurations, dtype handling, device consistency, 
    and edge cases for the LayerNorm implementation.
    """

    @pytest.fixture
    def default_params(self):
        """Default parameters for LayerNorm initialization."""
        return {
            'hidden_size': 128,
            'eps': 1e-5,
            'dtype': torch.float32,
            'device': None,
            'has_weights': True,
            'has_bias': True
        }

    @pytest.fixture
    def sample_tensor(self):
        """Sample input tensor for testing."""
        return torch.randn(2, 10, 128)

    def test_init_default_parameters(self, default_params):
        """Test LayerNorm initialization with default parameters."""
        layer_norm = LayerNorm(**default_params)

        assert layer_norm.variance_epsilon == default_params['eps']
        assert hasattr(layer_norm, 'weight')
        assert hasattr(layer_norm, 'bias')
        assert isinstance(layer_norm.weight, nn.Parameter)
        assert isinstance(layer_norm.bias, nn.Parameter)
        assert layer_norm.weight.shape == (default_params['hidden_size'],)
        assert layer_norm.bias.shape == (default_params['hidden_size'],)

    def test_init_with_weights_disabled(self):
        """Test LayerNorm initialization with has_weights=False."""
        layer_norm = LayerNorm(
            hidden_size=64,
            eps=1e-6,
            has_weights=False,
            has_bias=True
        )

        assert hasattr(layer_norm, 'weight')
        assert hasattr(layer_norm, 'bias')
        assert not isinstance(layer_norm.weight, nn.Parameter)
        assert isinstance(layer_norm.bias, nn.Parameter)
        assert torch.allclose(layer_norm.weight, torch.ones(64))

    def test_init_with_bias_disabled(self):
        """Test LayerNorm initialization with has_bias=False."""
        layer_norm = LayerNorm(
            hidden_size=64,
            eps=1e-6,
            has_weights=True,
            has_bias=False
        )

        assert hasattr(layer_norm, 'weight')
        assert hasattr(layer_norm, 'bias')
        assert isinstance(layer_norm.weight, nn.Parameter)
        assert not isinstance(layer_norm.bias, nn.Parameter)
        assert torch.allclose(layer_norm.bias, torch.zeros(64))

    def test_init_with_both_disabled(self):
        """Test LayerNorm initialization with both weights and bias disabled."""
        layer_norm = LayerNorm(
            hidden_size=32,
            eps=1e-4,
            has_weights=False,
            has_bias=False
        )

        assert hasattr(layer_norm, 'weight')
        assert hasattr(layer_norm, 'bias')
        assert not isinstance(layer_norm.weight, nn.Parameter)
        assert not isinstance(layer_norm.bias, nn.Parameter)
        assert torch.allclose(layer_norm.weight, torch.ones(32))
        assert torch.allclose(layer_norm.bias, torch.zeros(32))

    def test_init_with_custom_dtype_and_device(self):
        """Test LayerNorm initialization with custom dtype and device."""
        if torch.cuda.is_available():
            device = torch.device('cuda:0')
        else:
            device = torch.device('cpu')

        layer_norm = LayerNorm(
            hidden_size=256,
            eps=1e-3,
            dtype=torch.float16,
            device=device
        )

        assert layer_norm.weight.dtype == torch.float16
        assert layer_norm.bias.dtype == torch.float16
        assert layer_norm.weight.device == device
        assert layer_norm.bias.device == device

    def test_init_edge_cases(self):
        """Test LayerNorm initialization with edge case values."""
        # Very small hidden size
        layer_norm_small = LayerNorm(hidden_size=1, eps=1e-8)
        assert layer_norm_small.weight.shape == (1,)
        assert layer_norm_small.bias.shape == (1,)

        # Large epsilon
        layer_norm_large_eps = LayerNorm(hidden_size=64, eps=1.0)
        assert layer_norm_large_eps.variance_epsilon == 1.0

        # Very small epsilon
        layer_norm_small_eps = LayerNorm(hidden_size=64, eps=1e-12)
        assert layer_norm_small_eps.variance_epsilon == 1e-12

    @patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', True)
    def test_forward_flashinfer_not_implemented(self, default_params, sample_tensor):
        """Test that forward raises NotImplementedError when FlashInfer is available."""
        layer_norm = LayerNorm(**default_params)

        with pytest.raises(NotImplementedError, match="Flashinfer is not supported for LayerNorm"):
            layer_norm.forward(sample_tensor)

    @patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False)
    def test_forward_basic_functionality(self, default_params):
        """Test basic forward pass functionality."""
        layer_norm = LayerNorm(**default_params)
        input_tensor = torch.randn(2, 10, 128)

        output = layer_norm.forward(input_tensor)

        assert isinstance(output, torch.Tensor)
        assert output.shape == input_tensor.shape
        assert output.dtype == input_tensor.dtype

    @patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False)
    def test_forward_with_residual_tensor(self, default_params):
        """Test forward pass with residual tensor."""
        layer_norm = LayerNorm(**default_params)
        input_tensor = torch.randn(2, 10, 128)
        residual_tensor = torch.randn(2, 10, 128)

        output, residual_out = layer_norm.forward(input_tensor, residual_tensor)

        assert isinstance(output, torch.Tensor)
        assert isinstance(residual_out, torch.Tensor)
        assert output.shape == input_tensor.shape
        assert residual_out.shape == residual_tensor.shape

    @patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False)
    def test_forward_with_ellipsis_residual(self, default_params):
        """Test forward pass with ellipsis residual (default case)."""
        layer_norm = LayerNorm(**default_params)
        input_tensor = torch.randn(2, 10, 128)

        output = layer_norm.forward(input_tensor, ...)

        assert isinstance(output, torch.Tensor)
        assert output.shape == input_tensor.shape

    @patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False)
    def test_forward_dtype_conversion(self, default_params):
        """Test that forward properly handles dtype conversion."""
        layer_norm = LayerNorm(**default_params)

        # Test with different input dtypes
        for dtype in [torch.float16, torch.float64]:
            input_tensor = torch.randn(2, 5, 128, dtype=dtype)
            output = layer_norm.forward(input_tensor)
            assert output.dtype == dtype

    @patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False)
    def test_forward_bfloat16_dtype(self, default_params):
        """Test forward pass with bfloat16 dtype if supported."""
        if torch.cuda.is_available():
            layer_norm = LayerNorm(**default_params)
            input_tensor = torch.randn(2, 5, 128, dtype=torch.bfloat16, device='cuda')
            output = layer_norm.forward(input_tensor)
            assert output.dtype == torch.bfloat16

    @patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False)
    def test_forward_normalization_correctness(self, default_params):
        """Test that forward pass produces correctly normalized output."""
        layer_norm = LayerNorm(**default_params)

        # Create input with known statistics
        input_tensor = torch.randn(1, 1, 128) * 10 + 5  # Large variance and mean
        output = layer_norm.forward(input_tensor)

        # Check that output is finite
        assert torch.isfinite(output).all()

        # Test with identity weights and zero bias for exact normalization check
        layer_norm_identity = LayerNorm(hidden_size=128, eps=1e-5)
        with torch.no_grad():
            layer_norm_identity.weight.fill_(1.0)
            layer_norm_identity.bias.fill_(0.0)

        normalized_output = layer_norm_identity.forward(input_tensor)
        output_mean = normalized_output.mean(dim=-1)
        output_var = normalized_output.var(dim=-1, unbiased=False)

        # Should have approximately zero mean and unit variance
        assert torch.allclose(output_mean, torch.zeros_like(output_mean), atol=1e-5)
        assert torch.allclose(output_var, torch.ones_like(output_var), atol=1e-5)

    @patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False)
    def test_forward_with_different_tensor_shapes(self, default_params):
        """Test forward pass with various tensor shapes."""
        layer_norm = LayerNorm(**default_params)

        # Test different shapes
        shapes = [
            (128,),          # 1D
            (10, 128),       # 2D
            (2, 10, 128),    # 3D
            (1, 2, 10, 128), # 4D
            (2, 3, 4, 10, 128)  # 5D
        ]

        for shape in shapes:
            input_tensor = torch.randn(shape)
            output = layer_norm.forward(input_tensor)
            assert output.shape == shape

    @patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False)
    def test_forward_gradient_flow(self, default_params):
        """Test that gradients flow properly through the layer."""
        layer_norm = LayerNorm(**default_params)
        input_tensor = torch.randn(2, 10, 128, requires_grad=True)

        output = layer_norm.forward(input_tensor)
        loss = output.sum()
        loss.backward()

        assert input_tensor.grad is not None
        assert layer_norm.weight.grad is not None
        assert layer_norm.bias.grad is not None

    def test_skip_forward_basic_functionality(self, default_params, sample_tensor):
        """Test skip_forward basic functionality."""
        layer_norm = LayerNorm(**default_params)

        output = layer_norm.skip_forward(sample_tensor)

        assert torch.equal(output, sample_tensor)

    def test_skip_forward_with_residual_tensor(self, default_params):
        """Test skip_forward with residual tensor."""
        layer_norm = LayerNorm(**default_params)
        input_tensor = torch.randn(2, 10, 128)
        residual_tensor = torch.randn(2, 10, 128)

        output, residual_out = layer_norm.skip_forward(input_tensor, residual_tensor)

        assert torch.equal(output, input_tensor)
        assert torch.equal(residual_out, residual_tensor)

    def test_skip_forward_with_ellipsis_residual(self, default_params, sample_tensor):
        """Test skip_forward with ellipsis residual."""
        layer_norm = LayerNorm(**default_params)

        output = layer_norm.skip_forward(sample_tensor, ...)

        assert torch.equal(output, sample_tensor)

    def test_skip_forward_preserves_tensor_properties(self, default_params):
        """Test that skip_forward preserves tensor properties."""
        layer_norm = LayerNorm(**default_params)

        for dtype in [torch.float16, torch.float32, torch.float64]:
            input_tensor = torch.randn(2, 5, 128, dtype=dtype, requires_grad=True)
            output = layer_norm.skip_forward(input_tensor)

            assert output.dtype == dtype
            assert output.requires_grad == input_tensor.requires_grad
            assert output.shape == input_tensor.shape

    @patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False)
    def test_forward_vs_skip_forward_behavior(self, default_params):
        """Test behavioral differences between forward and skip_forward."""
        layer_norm = LayerNorm(**default_params)
        input_tensor = torch.randn(2, 10, 128)

        forward_output = layer_norm.forward(input_tensor)
        skip_output = layer_norm.skip_forward(input_tensor)

        # They should not be equal (forward normalizes, skip doesn't)
        assert not torch.equal(forward_output, skip_output)
        assert torch.equal(skip_output, input_tensor)

    def test_parameter_initialization_values(self):
        """Test that parameters are initialized to correct values."""
        layer_norm = LayerNorm(hidden_size=64, eps=1e-5)

        # Weight should be initialized to ones
        assert torch.allclose(layer_norm.weight, torch.ones(64))

        # Bias should be initialized to zeros
        assert torch.allclose(layer_norm.bias, torch.zeros(64))

    @patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False)
    def test_forward_numerical_stability(self):
        """Test forward pass numerical stability with extreme values."""
        layer_norm = LayerNorm(hidden_size=64, eps=1e-5)

        # Test with very small values
        small_tensor = torch.full((1, 1, 64), 1e-8)
        output_small = layer_norm.forward(small_tensor)
        assert torch.isfinite(output_small).all()

        # Test with very large values
        large_tensor = torch.full((1, 1, 64), 1e8)
        output_large = layer_norm.forward(large_tensor)
        assert torch.isfinite(output_large).all()

        # Test with mixed extreme values
        mixed_tensor = torch.tensor([[[1e-8, 1e8] + [1.0] * 62]], dtype=torch.float32)
        output_mixed = layer_norm.forward(mixed_tensor)
        assert torch.isfinite(output_mixed).all()

    @patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False)
    def test_forward_with_zero_variance(self):
        """Test forward pass when input has zero variance."""
        layer_norm = LayerNorm(hidden_size=64, eps=1e-5)

        # All values are the same (zero variance)
        constant_tensor = torch.full((2, 3, 64), 5.0)
        output = layer_norm.forward(constant_tensor)

        assert torch.isfinite(output).all()
        assert output.shape == constant_tensor.shape

    @patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False)
    def test_forward_batch_independence(self):
        """Test that forward pass treats each sample in batch independently."""
        layer_norm = LayerNorm(hidden_size=64, eps=1e-5)

        # Create batch with different statistics per sample
        batch_tensor = torch.stack([
            torch.randn(10, 64) * 2 + 1,   # Different mean and std
            torch.randn(10, 64) * 5 - 2,   # Different mean and std
        ])

        output = layer_norm.forward(batch_tensor)

        # Each sample should be normalized along the last dimension
        assert torch.isfinite(output).all()
        assert output.shape == batch_tensor.shape

    def test_state_dict_compatibility(self, default_params):
        """Test that state_dict works correctly with different parameter configurations."""
        # Test with weights and bias
        ln_with_params = LayerNorm(**default_params)
        state_dict = ln_with_params.state_dict()
        assert 'weight' in state_dict
        assert 'bias' in state_dict

        # Test without weights (buffer should not appear in state_dict due to persistent=False)
        ln_no_weights = LayerNorm(hidden_size=128, eps=1e-5, has_weights=False, has_bias=True)
        state_dict_no_weights = ln_no_weights.state_dict()
        assert 'bias' in state_dict_no_weights
        assert 'weight' not in state_dict_no_weights  # Non-persistent buffer

        # Test without bias (buffer should not appear in state_dict due to persistent=False)
        ln_no_bias = LayerNorm(hidden_size=128, eps=1e-5, has_weights=True, has_bias=False)
        state_dict_no_bias = ln_no_bias.state_dict()
        assert 'weight' in state_dict_no_bias
        assert 'bias' not in state_dict_no_bias  # Non-persistent buffer

    @pytest.mark.parametrize("hidden_size", [1, 16, 128, 512, 1024])
    def test_different_hidden_sizes(self, hidden_size):
        """Test LayerNorm with various hidden sizes."""
        layer_norm = LayerNorm(hidden_size=hidden_size, eps=1e-5)
        input_tensor = torch.randn(2, 10, hidden_size)

        with patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False):
            output = layer_norm.forward(input_tensor)

        assert output.shape == input_tensor.shape
        assert output.dtype == input_tensor.dtype

    @pytest.mark.parametrize("eps", [1e-12, 1e-8, 1e-5, 1e-3, 0.1])
    def test_different_epsilon_values(self, eps):
        """Test LayerNorm with various epsilon values."""
        layer_norm = LayerNorm(hidden_size=64, eps=eps)
        assert layer_norm.variance_epsilon == eps

        input_tensor = torch.randn(2, 5, 64)
        with patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False):
            output = layer_norm.forward(input_tensor)

        assert torch.isfinite(output).all()

    def test_memory_efficiency(self, default_params):
        """Test memory usage patterns."""
        layer_norm = LayerNorm(**default_params)

        # Test that forward pass doesn't create unnecessary copies
        input_tensor = torch.randn(100, 128, requires_grad=True)

        with patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False):
            output = layer_norm.forward(input_tensor)

        # Memory should be reasonable (not testing exact values due to variability)
        assert torch.isfinite(output).all()

    def test_module_mode_compatibility(self, default_params):
        """Test compatibility with PyTorch module modes."""
        layer_norm = LayerNorm(**default_params)
        input_tensor = torch.randn(2, 10, 128)

        # Test in training mode
        layer_norm.train()
        with patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False):
            train_output = layer_norm.forward(input_tensor)

        # Test in eval mode
        layer_norm.eval()
        with patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False):
            eval_output = layer_norm.forward(input_tensor)

        # LayerNorm should behave the same in train and eval mode
        assert torch.allclose(train_output, eval_output)

    @pytest.mark.skipif(not torch.cuda.is_available(), reason="CUDA not available")
    def test_device_consistency(self):
        """Test that all operations maintain device consistency."""
        device = torch.device('cuda:0')
        layer_norm = LayerNorm(hidden_size=64, eps=1e-5, device=device)
        input_tensor = torch.randn(2, 5, 64, device=device)

        with patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False):
            output = layer_norm.forward(input_tensor)

        assert output.device == device
        assert layer_norm.weight.device == device
        assert layer_norm.bias.device == device

    def test_docstring_examples(self):
        """Test examples from class and method docstrings."""
        # Test basic usage as described in docstring
        layer_norm = LayerNorm(hidden_size=256, eps=1e-6)
        input_tensor = torch.randn(32, 128, 256)

        with patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False):
            # Test without residual
            output1 = layer_norm.forward(input_tensor)
            assert output1.shape == input_tensor.shape

            # Test with residual
            residual = torch.randn(32, 128, 256)
            output2, residual_out = layer_norm.forward(input_tensor, residual)
            assert output2.shape == input_tensor.shape
            assert residual_out.shape == residual.shape

    @patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False)
    def test_residual_addition_correctness(self):
        """Test that residual connections work correctly."""
        layer_norm = LayerNorm(hidden_size=64, eps=1e-5)
        input_tensor = torch.randn(2, 5, 64)
        residual_tensor = torch.randn(2, 5, 64)

        # Test that residual is added to input before normalization
        output, residual_output = layer_norm.forward(input_tensor, residual_tensor)

        # The residual output should be the sum of input and residual (in original dtype)
        expected_residual = input_tensor + residual_tensor
        assert torch.allclose(residual_output, expected_residual, atol=1e-6)

    @patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False)
    def test_epsilon_effect_on_normalization(self):
        """Test that epsilon parameter affects numerical stability."""
        # Create input with very small variance
        input_tensor = torch.randn(1, 1, 64) * 1e-10

        # Test with small epsilon
        layer_norm_small_eps = LayerNorm(hidden_size=64, eps=1e-12)
        output_small_eps = layer_norm_small_eps.forward(input_tensor)

        # Test with large epsilon
        layer_norm_large_eps = LayerNorm(hidden_size=64, eps=1e-3)
        output_large_eps = layer_norm_large_eps.forward(input_tensor)

        # Both should produce finite outputs
        assert torch.isfinite(output_small_eps).all()
        assert torch.isfinite(output_large_eps).all()

        # The outputs should be different due to different epsilon values
        assert not torch.allclose(output_small_eps, output_large_eps)

    def test_keyword_only_arguments(self):
        """Test that all arguments are keyword-only as specified in __init__."""
        # This should work (keyword arguments)
        layer_norm = LayerNorm(hidden_size=64, eps=1e-5)
        assert layer_norm.weight.shape == (64,)

        # Verify the signature requires keyword-only args
        assert hasattr(layer_norm, 'variance_epsilon')

    @patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False)
    def test_internal_dtype_conversion_preservation(self):
        """Test that internal float32 conversion preserves original dtype in output."""
        for input_dtype in [torch.float16, torch.float32, torch.float64]:
            layer_norm = LayerNorm(hidden_size=32, eps=1e-5, dtype=input_dtype)
            input_tensor = torch.randn(2, 3, 32, dtype=input_dtype)

            output = layer_norm.forward(input_tensor)

            # Output should have the same dtype as input
            assert output.dtype == input_dtype

            # Parameters should have the specified dtype
            assert layer_norm.weight.dtype == input_dtype
            assert layer_norm.bias.dtype == input_dtype

    def test_non_persistent_buffers(self):
        """Test that non-parameter weights/bias are registered as non-persistent buffers."""
        layer_norm = LayerNorm(hidden_size=64, eps=1e-5, has_weights=False, has_bias=False)

        # Check that buffers exist
        assert 'weight' in layer_norm._buffers
        assert 'bias' in layer_norm._buffers

        # Check that they are not parameters
        assert 'weight' not in layer_norm._parameters
        assert 'bias' not in layer_norm._parameters

        # Buffers should have correct values
        assert torch.allclose(layer_norm.weight, torch.ones(64))
        assert torch.allclose(layer_norm.bias, torch.zeros(64))

    @patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False)
    def test_edge_case_single_element_tensors(self):
        """Test forward pass with single element tensors."""
        layer_norm = LayerNorm(hidden_size=1, eps=1e-5)

        # Single element tensor
        input_tensor = torch.tensor([[[2.5]]])
        output = layer_norm.forward(input_tensor)

        assert output.shape == (1, 1, 1)
        assert torch.isfinite(output).all()

    @patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False)
    def test_backward_compatibility_with_standard_layernorm(self):
        """Test that behavior is similar to PyTorch's LayerNorm for basic cases."""
        hidden_size = 64
        eps = 1e-5

        # Our implementation
        our_layer_norm = LayerNorm(hidden_size=hidden_size, eps=eps)

        # PyTorch standard LayerNorm
        torch_layer_norm = nn.LayerNorm(hidden_size, eps=eps)

        # Copy weights to make them identical
        with torch.no_grad():
            torch_layer_norm.weight.copy_(our_layer_norm.weight)
            torch_layer_norm.bias.copy_(our_layer_norm.bias)

        # Test with same input
        input_tensor = torch.randn(2, 5, hidden_size)

        our_output = our_layer_norm.forward(input_tensor)
        torch_output = torch_layer_norm(input_tensor)

        # Should be approximately equal (small numerical differences expected)
        assert torch.allclose(our_output, torch_output, atol=1e-6, rtol=1e-5)

    @patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False)
    def test_variance_calculation_correctness(self):
        """Test that variance calculation is correct and uses unbiased=False."""
        layer_norm = LayerNorm(hidden_size=64, eps=1e-5)

        # Create a tensor with known variance
        input_tensor = torch.tensor([[[1.0, 2.0, 3.0, 4.0] + [0.0] * 60]], dtype=torch.float32)

        # Manual calculation of variance (unbiased=False)
        manual_mean = input_tensor.mean(dim=-1, keepdim=True)

        # The layer should compute the same variance
        output = layer_norm.forward(input_tensor)
        assert torch.isfinite(output).all()

    def test_ellipsis_parameter_handling(self):
        """Test that ellipsis (...) parameter is handled correctly in both methods."""
        layer_norm = LayerNorm(hidden_size=32, eps=1e-5)
        input_tensor = torch.randn(2, 3, 32)

        # Test forward with ellipsis
        with patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False):
            output_forward = layer_norm.forward(input_tensor, ...)

        # Test skip_forward with ellipsis
        output_skip = layer_norm.skip_forward(input_tensor, ...)

        # Both should return single tensor (not tuple)
        assert isinstance(output_forward, torch.Tensor)
        assert isinstance(output_skip, torch.Tensor)

        # skip_forward should return unchanged input
        assert torch.equal(output_skip, input_tensor)

    @patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False)
    def test_mixed_precision_training_compatibility(self):
        """Test compatibility with mixed precision training patterns."""
        layer_norm = LayerNorm(hidden_size=64, eps=1e-5, dtype=torch.float32)

        # Simulate mixed precision: fp16 input, fp32 layer
        input_tensor = torch.randn(2, 5, 64, dtype=torch.float16, requires_grad=True)

        output = layer_norm.forward(input_tensor)

        # Output should match input dtype
        assert output.dtype == torch.float16
        assert output.requires_grad

        # Test gradient flow
        loss = output.sum()
        loss.backward()
        assert input_tensor.grad is not None

    @patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False)
    def test_large_tensor_shapes(self):
        """Test with realistic large tensor shapes used in language models."""
        layer_norm = LayerNorm(hidden_size=4096, eps=1e-5)  # Large embedding size

        # Simulate batch processing
        input_tensor = torch.randn(8, 512, 4096)  # batch_size=8, seq_len=512, hidden=4096

        output = layer_norm.forward(input_tensor)

        assert output.shape == input_tensor.shape
        assert torch.isfinite(output).all()

    def test_thread_safety_simulation(self, default_params):
        """Test that the layer can handle concurrent-like access patterns."""
        layer_norm = LayerNorm(**default_params)

        # Simulate multiple "concurrent" forward passes
        tensors = [torch.randn(2, 10, 128) for _ in range(5)]

        with patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False):
            outputs = [layer_norm.forward(tensor) for tensor in tensors]

        # All outputs should be valid
        for i, output in enumerate(outputs):
            assert output.shape == tensors[i].shape
            assert torch.isfinite(output).all()

    @patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False)
    def test_inf_nan_input_handling(self):
        """Test behavior with inf/nan inputs."""
        layer_norm = LayerNorm(hidden_size=64, eps=1e-5)

        # Test with inf values
        inf_tensor = torch.full((1, 1, 64), float('inf'))
        output_inf = layer_norm.forward(inf_tensor)
        # Output should handle inf gracefully (may contain inf/nan but shouldn't crash)
        assert output_inf.shape == inf_tensor.shape

        # Test with nan values
        nan_tensor = torch.full((1, 1, 64), float('nan'))
        output_nan = layer_norm.forward(nan_tensor)
        # Output should handle nan gracefully (may contain nan but shouldn't crash)
        assert output_nan.shape == nan_tensor.shape

    @pytest.mark.parametrize("batch_size,seq_len,hidden_size", [
        (1, 1, 128),
        (2, 512, 768),
        (4, 1024, 1024),
        (8, 2048, 2048),
    ])
    def test_various_realistic_shapes(self, batch_size, seq_len, hidden_size):
        """Test with various realistic tensor shapes from transformer models."""
        layer_norm = LayerNorm(hidden_size=hidden_size, eps=1e-5)
        input_tensor = torch.randn(batch_size, seq_len, hidden_size)

        with patch('tensorrt_llm._torch.custom_ops.IS_FLASHINFER_AVAILABLE', False):
            output = layer_norm.forward(input_tensor)

        assert output.shape == (batch_size, seq_len, hidden_size)
        assert torch.isfinite(output).all()

@Funatiq Funatiq requested a review from yuxianq August 5, 2025 13:14
@Funatiq Funatiq marked this pull request as ready for review August 5, 2025 13:14
@Funatiq Funatiq requested a review from a team as a code owner August 5, 2025 13:14
@Funatiq Funatiq requested a review from lucaslie August 5, 2025 13:14
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d0d63c1 and baab83b.

📒 Files selected for processing (2)
  • tensorrt_llm/_torch/modules/layer_norm.py (1 hunks)
  • tensorrt_llm/_torch/modules/rms_norm.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tensorrt_llm/_torch/modules/rms_norm.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without reflection.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • tensorrt_llm/_torch/modules/layer_norm.py
**/*.{cpp,h,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • tensorrt_llm/_torch/modules/layer_norm.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
📚 Learning: applies to **/*.{cpp,h,cu,py} : all tensorrt-llm open source software code should contain an nvidia ...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-05T07:49:09.875Z
Learning: Applies to **/*.{cpp,h,cu,py} : All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Applied to files:

  • tensorrt_llm/_torch/modules/layer_norm.py
📚 Learning: applies to **/*.py : the code developed for tensorrt-llm should conform to python 3.8+....
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-05T07:49:09.875Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.

Applied to files:

  • tensorrt_llm/_torch/modules/layer_norm.py
📚 Learning: applies to **/*.{h,hpp} : use a preprocessor guard in header files. the guard name must have prefix ...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-05T07:49:09.875Z
Learning: Applies to **/*.{h,hpp} : Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.

Applied to files:

  • tensorrt_llm/_torch/modules/layer_norm.py
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tensorrt_llm/_torch/modules/layer_norm.py
📚 Learning: in tensorrt_llm/executor/worker.py, the lora adapter cache optimization logic that checks `is_adapte...
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

Applied to files:

  • tensorrt_llm/_torch/modules/layer_norm.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (4)
tensorrt_llm/_torch/modules/layer_norm.py (4)

1-20: LGTM! Proper copyright header and clean imports.

The NVIDIA copyright header is correctly formatted with the current year, and imports follow the namespace convention as required by coding guidelines.


22-35: Excellent class definition and documentation.

The class follows PascalCase naming convention and includes a comprehensive Google-style docstring that clearly documents the purpose and all parameters.


37-66: Well-designed constructor with proper parameter handling.

The implementation correctly uses keyword-only arguments and handles both learnable parameters and non-learnable buffers appropriately. The use of persistent=False for buffers when weights/bias are disabled is a good design choice.


100-118: Clean skip_forward implementation.

The method correctly maintains the same interface as forward while bypassing normalization computation. This provides a useful pattern for conditional normalization.

@Funatiq Funatiq changed the title [feat] add LayerNorm module [#6187][feat] add LayerNorm module Aug 6, 2025
@Funatiq
Copy link
Collaborator Author

Funatiq commented Aug 6, 2025

/bot run

Funatiq added 5 commits August 6, 2025 10:57
…sting parameter formatting

Signed-off-by: Robin Kobus <[email protected]>
- Introduce a new LayerNorm class, implementing layer normalization with optional weights and biases.
- The forward method handles input tensors and supports residual connections, while the skip_forward method allows bypassing normalization.
- The implementation includes checks for Flashinfer availability.

Signed-off-by: Robin Kobus <[email protected]>
Signed-off-by: Robin Kobus <[email protected]>
@Funatiq Funatiq force-pushed the dev/feat/layer_norm_module branch from baab83b to 16dcaa7 Compare August 6, 2025 08:57
@Funatiq Funatiq requested a review from a team as a code owner August 6, 2025 08:57
@Funatiq Funatiq requested a review from HuiGao-NV August 6, 2025 08:57
@tensorrt-cicd
Copy link
Collaborator

PR_Github #14282 [ run ] triggered by Bot

@Funatiq
Copy link
Collaborator Author

Funatiq commented Aug 6, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14286 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14282 [ run ] completed with state ABORTED

hidden_states = hidden_states + residual.to(torch.float32)
residual = hidden_states.to(input_dtype)

mean = hidden_states.mean(-1, keepdim=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unfused vanilla implementation. Unfortunately flashinfer only provides RMSNorm, no LayerNorm.
Just curious, why not directly use torch.nn.LayerNorm? I am not sure whether there is an optimized kernel underneath the module though.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14286 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10791 completed with status: 'SUCCESS'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants