Fix input offsets calculation #89

lukamac · 2025-05-27T13:19:19Z

Fix input offsets calculation for PULPOpen tiled convolution kernels.

Fixed

input offsets calculation for PULPOpen tiled convolution kernels

PR Merge Checklist

The PR is rebased on the latest devel commit and pointing to devel.
Your PR reviewed and approved.
All checks are passing.
The CHANGELOG.md file has been updated.
If the docker was modified, change back its link after review.

Victor-Jung · 2025-05-27T13:23:39Z

Why wasn't this bug breaking integer convolutions tests?

lukamac · 2025-05-27T13:26:49Z

@Victor-Jung I honestly don't understand it either, and apparently this fix breaks current status quo. So I need to investigate this further..

lukamac · 2025-05-27T15:36:25Z

@Victor-Jung I was missing padding. Now it's passing with this commit b03fd98

I will add a test that fails with the previous version and rewrite this piece of logic a little bit while I'm here and check other places if we had the same mistake.

Victor-Jung · 2025-05-27T15:37:48Z

@Victor-Jung I was missing padding. Now it's passing with this commit b03fd98

I will add a test that fails with the previous version and rewrite this piece of logic a little bit while I'm here and check other places if we had the same mistake.

Thanks a lot, that's very useful!

lukamac · 2025-05-27T15:38:06Z

Why wasn't this bug breaking integer convolutions tests?

I am suspecting because we are only testing 3x3 convolutions with padding = (1, 1, 1, 1) or 1x1 convolutions with padding = (0, 0, 0, 0).

lukamac · 2025-05-28T16:03:22Z

@Victor-Jung test added, some small rewrites done. I think it's ready for review. The test I added is for a very small (In: 1x3x5x5, Out: 1x2x2x2) requantized convolution kernel with stride = 2 and padding = same (all 1) and that's why I have to reduce the L1 size to 600 to force tiling.
I didn't in the end create a test with a manual tiling solution (too tired).

lukamac · 2025-05-29T12:11:50Z

@Victor-Jung ok now it's done. The last commit (c268ff0) addresses the issue with even sized kernels by deleting the computeMargins function, and replacing it with a very simple calculation from the kernel shape. The extra kernel shape margin (also called halo) is always, no matter even or uneven, equal to kernel size - 1.

Victor-Jung

Elegant solution, LGTM!

Victor-Jung · 2025-06-12T06:11:33Z

As soon as you rebase and CI passes I will merge :)

…, restrict kernel shape to uneven

The condition to fail the previous implementation is to have stride > 1 and have tiling in the spatial dimensions. To exercise this bug you need to force the siracusa platform to tile it, I set the L1 size to 600.

This release contains major architectural changes, new platform support, enhanced simulation workflows, floating-point kernel support, training infrastructure for CCT models, memory allocation strategies, and documentation improvements. After merging this into `main`, the release process will proceed with: - Pushing a Git tag for the release after merging this PR - Creating a GitHub release with the prepared tag. Note: Since the release tag references the Docker container tagged with the release tag (`ghcr.io/pulp-platform/deeploy:v0.2.0`), the CI will initially fail. The Deeploy Docker image must be built after the release PR is merged and the CI restarted. ### List of Pull Requests - Prepare v0.2.0 release [#102](#102) - Add Luka as Code Owner [#101](#101) - Fix CI, Docker Files, and Documentation Workflow [#100](#100) - Chimera Platform Integration [#96](#96) - Add Tutorial and Refactor README [#97](#97) - Reduce Mean Float Template [#92](#92) - Reshape Memory Freeing and Generic Float GEMM Fixes [#91](#91) - Prepare for Release and Separate Dependencies [#90](#90) - Fix input offsets calculation [#89](#89) - Move PULP SDK to main branch/fork [#88](#88) - Finite Lifetime for IO Tensors [#51](#51) - Improved Memory Visualization and Multi-Layer Tiling Profiling [#56](#56) - Fix Linting in CI and Reformat C Files [#86](#86) - Fix Broken CMake Flow For pulp-sdk [#87](#87) - Refactor Changelog For Release [#85](#85) - ARM Docker Container and Minor Bug Fix [#84](#84) - Added Kernel for Generic Float DW Conv2D [#63](#63) - Autoselect Self-Hosted Runners if the Action is on Upstream [#81](#81) - TEST_RECENT linking on MacOS [#78](#78) - Add RV32IMF Picolibc support for Siracusa platform [#66](#66) - Improve Documentation and VSCode Support [#76](#76) - Debug Print Topology Pass and Code Transformation [#75](#75) - Find all subdirectories of Deeploy when installing with pip install [#70](#70) - Add milestone issue template [#71](#71) - Bunch of fixes and changes [#58](#58) - Add SoftHier platform [#65](#65) - rv32imf_xpulpv2 ISA support for Siracusa platform [#64](#64) - One LLVM To Compile Them All [#60](#60) - One GVSoC to Simulate Them All [#59](#59) - Add Support for CCT Last Layer Training with Embedding Dim 8-128 [#55](#55) - Add CCT Classifier Training Support [#53](#53) - L3 Bugs: DMA Struct Datatype and Maxpool Margin Error [#45](#45) - DeepQuant Quantized Linear Support [#54](#54) - Implemented Dequant Layer for Generic and Siracusa [#52](#52) - Infinite Lifetime Buffers Considered in Tiling & Memory Allocation (+ Visualization) [#44](#44) - Implemented Quant Layer for Generic and Siracusa [#49](#49) - Increase maximal Mchan DMA transfer sizes from 64KiB to 128KiB [#47](#47) - Add MiniMalloc and Decouple Memory Allocation and Tiling [#40](#40) - Float CCT Bugs on L3 [#37](#37) - Memory Allocation Strategies and Visualization [#36](#36) - Add CODEOWNERS [#42](#42) - Add Tiling Support to All CCT Kernels and Fix CCT Operators on Siracusa Platform for L2 [#35](#35) - Add Fp gemm and Softmax for Snitch platform [#31](#31) - Add Float Kernels for CCT [#29](#29) - documentation deployment [#34](#34) - main.c Float Cast Bugs [#28](#28) - Add Float GEMM on PULP with Tiling [#26](#26) - Add Float Support & Float GEMM for Generic [#25](#25) - GVSOC support for the Snitch Cluster platform [#23](#23) - Snitch Cluster Tiling Support [#22](#22) - Snitch support integration [#14](#14) - Update bibtex citation [#20](#20) - the PR template location, bump min python to 3.10, change install command [#17](#17) - Add pre-commit for python formatting [#15](#15) - FP integration (v2) [#12](#12) - shell for sequential tests of Generic, Cortex, and Mempool platforms [#11](#11) - Add issue templates [#10](#10) - Minor CI and Readme Improvements [#8](#8) - Fix GHCR Link for Docker Build [#7](#7) - neureka's ccache id [#6](#6) - GitHub-based CI/CD Flow [#4](#4) - Generic Softmax Kernel [#2](#2) - Port GitLab CI [#1](#1)

lukamac requested review from Victor-Jung and Xeratec as code owners May 27, 2025 13:19

Victor-Jung assigned lukamac May 27, 2025

Victor-Jung added the Bug Something isn't working label May 27, 2025

Victor-Jung added this to Deeploy May 27, 2025

Victor-Jung added this to the Release 0.2.0 milestone May 27, 2025

Victor-Jung moved this to In review in Deeploy May 27, 2025

lukamac force-pushed the fix-input-offset-when-strided branch from 7998128 to af66c05 Compare May 28, 2025 15:57

Victor-Jung approved these changes Jun 12, 2025

View reviewed changes

lukamac added 6 commits June 12, 2025 09:00

Fix input offsets calculation

3b19adf

Update changelog

34ebffc

Don't forget padding

e85be13

Remove unused copies of computeInputCube, consistently name variables…

b00fc08

…, restrict kernel shape to uneven

Add a test that fails

5cd62c1

The condition to fail the previous implementation is to have stride > 1 and have tiling in the spatial dimensions. To exercise this bug you need to force the siracusa platform to tile it, I set the L1 size to 600.

Remove calculateMargins and calculate it directly from the kernel shape

c5b8510

lukamac force-pushed the fix-input-offset-when-strided branch from c268ff0 to c5b8510 Compare June 12, 2025 07:01

Victor-Jung merged commit 8688f41 into pulp-platform:devel Jun 12, 2025
117 checks passed

github-project-automation bot moved this from In review to Done in Deeploy Jun 12, 2025

lukamac deleted the fix-input-offset-when-strided branch June 12, 2025 07:27

Xeratec mentioned this pull request Jul 8, 2025

Release v0.2.0 #103

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix input offsets calculation #89

Fix input offsets calculation #89

Uh oh!

lukamac commented May 27, 2025 •

edited by Victor-Jung

Loading

Uh oh!

Victor-Jung commented May 27, 2025

Uh oh!

lukamac commented May 27, 2025

Uh oh!

lukamac commented May 27, 2025

Uh oh!

Victor-Jung commented May 27, 2025

Uh oh!

lukamac commented May 27, 2025

Uh oh!

lukamac commented May 28, 2025

Uh oh!

lukamac commented May 29, 2025 •

edited

Loading

Uh oh!

Victor-Jung left a comment

Uh oh!

Victor-Jung commented Jun 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix input offsets calculation #89

Fix input offsets calculation #89

Uh oh!

Conversation

lukamac commented May 27, 2025 • edited by Victor-Jung Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fixed

PR Merge Checklist

Uh oh!

Victor-Jung commented May 27, 2025

Uh oh!

lukamac commented May 27, 2025

Uh oh!

lukamac commented May 27, 2025

Uh oh!

Victor-Jung commented May 27, 2025

Uh oh!

lukamac commented May 27, 2025

Uh oh!

lukamac commented May 28, 2025

Uh oh!

lukamac commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Victor-Jung left a comment

Choose a reason for hiding this comment

Uh oh!

Victor-Jung commented Jun 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lukamac commented May 27, 2025 •

edited by Victor-Jung

Loading

lukamac commented May 29, 2025 •

edited

Loading