Improved Memory Visualization and Multi-Layer Tiling Profiling #56

Victor-Jung · 2025-03-21T09:50:33Z

This PR improved the memory allocation visualization and fixed the tiling profiling.

Added

The memory allocation visualization now displays the allocation for each level used.

Changed

Tiling profiling is an ON/OFF version where you get the I/O DMA time for each DMA call (from L2 to L3 and from L2 to L1, for instance).
The profiling strings are const static, such that they are stored in .rodata

Fixed

Previously, profiling tiling for L3 would report the L2 DMA calls in the "kernel" time. It was not intuitive and developed most likely prefer knowing the time of each DMA transfer.

PR Merge Checklist

The PR is rebased on the latest devel commit and pointing to devel.
Your PR reviewed and approved.
All checks are passing.
The CHANGELOG.md file has been updated.
If the docker was modified, change back its link after review.

runwangdl · 2025-04-24T22:42:52Z

Regarding the profiling issue in this PR: when the default memoryLevel is set to L3, we observed unexpected dual-loop logging on L2. This can be resolved by adjusting the loop to target the corresponding index in the numTiles array. My proposed fix is available here:
Fix profiling dual loop issue

Additionally, the mainStackSize issue was due to the use of non-static strings, which caused stack overflow. A fix for this can be found here:
Profilling string change to const static

After applying both changes, the new profiling results for dim128 show no issues.

runwangdl · 2025-05-11T22:10:07Z

Ready for review

Victor-Jung · 2025-05-12T14:07:07Z

Ready for review

I think this branch is not properly rebased on devel. Could you also add a short summary of the Added/Changed/Modified?

runwangdl · 2025-05-12T14:44:42Z

In this PR, I only added two commits: 81c3460 and 4af69de. The remaining changes are just from rebasing onto the devel branch.

runwangdl · 2025-05-12T14:48:57Z

Ready for review

I think this branch is not properly rebased on devel. Could you also add a short summary of the Added/Changed/Modified?

Hi Victor, I don’t have permission to edit the PR description. The changes I contributed are:

Fixing the profiling dual-loop issue when the default memoryLevel is set to L3.
Preventing a profiling-related stack overflow by converting strings to const static.

Victor-Jung · 2025-05-12T15:00:19Z

In this PR, I only added two commits: 81c3460 and 4af69de. The remaining changes are just from rebasing onto the devel branch.

Rebasing means that all changes in the PR are on the top of devel. You should pick the commits from this PR and rebase them onto devel, not interleave PR's commits and regular devel's commits.

runwangdl · 2025-05-12T15:57:09Z

I see, I’ll make the changes shortly.

Xeratec

Looks good. I only have one bigger comment regarding code duplication. It looks like the ProfilingSingleBufferingTilingMixIn and ProfilingDoubleBufferingTilingMixIn share quite some code that could be extracted into a common structure to simplify maintenance.

Deeploy/DeeployTypes.py

Deeploy/TilingExtension/CodeTransformationPasses/TilingPrototypes.py

Deeploy/TilingExtension/TilerExtension.py

Xeratec

Looks good! Feel free to merge

This release contains major architectural changes, new platform support, enhanced simulation workflows, floating-point kernel support, training infrastructure for CCT models, memory allocation strategies, and documentation improvements. After merging this into `main`, the release process will proceed with: - Pushing a Git tag for the release after merging this PR - Creating a GitHub release with the prepared tag. Note: Since the release tag references the Docker container tagged with the release tag (`ghcr.io/pulp-platform/deeploy:v0.2.0`), the CI will initially fail. The Deeploy Docker image must be built after the release PR is merged and the CI restarted. ### List of Pull Requests - Prepare v0.2.0 release [#102](#102) - Add Luka as Code Owner [#101](#101) - Fix CI, Docker Files, and Documentation Workflow [#100](#100) - Chimera Platform Integration [#96](#96) - Add Tutorial and Refactor README [#97](#97) - Reduce Mean Float Template [#92](#92) - Reshape Memory Freeing and Generic Float GEMM Fixes [#91](#91) - Prepare for Release and Separate Dependencies [#90](#90) - Fix input offsets calculation [#89](#89) - Move PULP SDK to main branch/fork [#88](#88) - Finite Lifetime for IO Tensors [#51](#51) - Improved Memory Visualization and Multi-Layer Tiling Profiling [#56](#56) - Fix Linting in CI and Reformat C Files [#86](#86) - Fix Broken CMake Flow For pulp-sdk [#87](#87) - Refactor Changelog For Release [#85](#85) - ARM Docker Container and Minor Bug Fix [#84](#84) - Added Kernel for Generic Float DW Conv2D [#63](#63) - Autoselect Self-Hosted Runners if the Action is on Upstream [#81](#81) - TEST_RECENT linking on MacOS [#78](#78) - Add RV32IMF Picolibc support for Siracusa platform [#66](#66) - Improve Documentation and VSCode Support [#76](#76) - Debug Print Topology Pass and Code Transformation [#75](#75) - Find all subdirectories of Deeploy when installing with pip install [#70](#70) - Add milestone issue template [#71](#71) - Bunch of fixes and changes [#58](#58) - Add SoftHier platform [#65](#65) - rv32imf_xpulpv2 ISA support for Siracusa platform [#64](#64) - One LLVM To Compile Them All [#60](#60) - One GVSoC to Simulate Them All [#59](#59) - Add Support for CCT Last Layer Training with Embedding Dim 8-128 [#55](#55) - Add CCT Classifier Training Support [#53](#53) - L3 Bugs: DMA Struct Datatype and Maxpool Margin Error [#45](#45) - DeepQuant Quantized Linear Support [#54](#54) - Implemented Dequant Layer for Generic and Siracusa [#52](#52) - Infinite Lifetime Buffers Considered in Tiling & Memory Allocation (+ Visualization) [#44](#44) - Implemented Quant Layer for Generic and Siracusa [#49](#49) - Increase maximal Mchan DMA transfer sizes from 64KiB to 128KiB [#47](#47) - Add MiniMalloc and Decouple Memory Allocation and Tiling [#40](#40) - Float CCT Bugs on L3 [#37](#37) - Memory Allocation Strategies and Visualization [#36](#36) - Add CODEOWNERS [#42](#42) - Add Tiling Support to All CCT Kernels and Fix CCT Operators on Siracusa Platform for L2 [#35](#35) - Add Fp gemm and Softmax for Snitch platform [#31](#31) - Add Float Kernels for CCT [#29](#29) - documentation deployment [#34](#34) - main.c Float Cast Bugs [#28](#28) - Add Float GEMM on PULP with Tiling [#26](#26) - Add Float Support & Float GEMM for Generic [#25](#25) - GVSOC support for the Snitch Cluster platform [#23](#23) - Snitch Cluster Tiling Support [#22](#22) - Snitch support integration [#14](#14) - Update bibtex citation [#20](#20) - the PR template location, bump min python to 3.10, change install command [#17](#17) - Add pre-commit for python formatting [#15](#15) - FP integration (v2) [#12](#12) - shell for sequential tests of Generic, Cortex, and Mempool platforms [#11](#11) - Add issue templates [#10](#10) - Minor CI and Readme Improvements [#8](#8) - Fix GHCR Link for Docker Build [#7](#7) - neureka's ccache id [#6](#6) - GitHub-based CI/CD Flow [#4](#4) - Generic Softmax Kernel [#2](#2) - Port GitLab CI [#1](#1)

Victor-Jung self-assigned this Mar 21, 2025

Victor-Jung requested a review from Xeratec as a code owner March 21, 2025 09:50

Victor-Jung marked this pull request as draft March 21, 2025 09:54

Victor-Jung assigned runwangdl May 12, 2025

Victor-Jung changed the title ~~DRAFT: Improved Memory Visualization and Multi-Layer Tiling Profiling~~ Improved Memory Visualization and Multi-Layer Tiling Profiling May 12, 2025

Victor-Jung marked this pull request as ready for review May 12, 2025 14:17

runwangdl force-pushed the exp/heterogeneous-memory-placement branch from d8fc141 to d7346a5 Compare May 12, 2025 16:11

Victor-Jung added the Feature Addition of new features label May 18, 2025

Victor-Jung force-pushed the exp/heterogeneous-memory-placement branch from d7346a5 to 586f815 Compare May 20, 2025 07:57

Xeratec added this to the Release 0.2.0 milestone May 22, 2025

Xeratec added this to Deeploy May 22, 2025

Xeratec moved this to In progress in Deeploy May 22, 2025

Victor-Jung moved this from In progress to In review in Deeploy May 22, 2025

Victor-Jung force-pushed the exp/heterogeneous-memory-placement branch from 586f815 to bbb31c2 Compare May 26, 2025 08:56

Victor-Jung and others added 6 commits May 26, 2025 11:07

Improve memory alloc visualization

d43cb56

Multi-level profiling + Linting

daca9ad

profilling string change to const static

ab02262

Fix profiling dual loop issue

b185484

Fix README Status Badges

693c30c

Update CHANGELOG

b717f1d

Victor-Jung force-pushed the exp/heterogeneous-memory-placement branch from bbb31c2 to b717f1d Compare May 26, 2025 09:13

Xeratec reviewed May 26, 2025

View reviewed changes

Deeploy/DeeployTypes.py Outdated Show resolved Hide resolved

Deeploy/TilingExtension/CodeTransformationPasses/TilingPrototypes.py Outdated Show resolved Hide resolved

Deeploy/TilingExtension/TilerExtension.py Show resolved Hide resolved

Align comment and type hint

9aa5efd

Victor-Jung added 2 commits May 27, 2025 10:00

Refactor profiling methods in TilingPrototype

63fa0e5

Linting

9058732

Xeratec approved these changes May 29, 2025

View reviewed changes

Xeratec moved this from In review to Done in Deeploy May 29, 2025

Victor-Jung moved this from Done to In review in Deeploy Jun 2, 2025

Victor-Jung moved this from In review to Ready for Merge in Deeploy Jun 2, 2025

Victor-Jung merged commit e536374 into pulp-platform:devel Jun 2, 2025
122 checks passed

github-project-automation bot moved this from Ready for Merge to Done in Deeploy Jun 2, 2025

Xeratec mentioned this pull request Jul 8, 2025

Release v0.2.0 #103

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improved Memory Visualization and Multi-Layer Tiling Profiling #56

Improved Memory Visualization and Multi-Layer Tiling Profiling #56

Uh oh!

Victor-Jung commented Mar 21, 2025 •

edited by Xeratec

Loading

Uh oh!

runwangdl commented Apr 24, 2025

Uh oh!

runwangdl commented May 11, 2025 •

edited

Loading

Uh oh!

Victor-Jung commented May 12, 2025

Uh oh!

runwangdl commented May 12, 2025

Uh oh!

runwangdl commented May 12, 2025

Uh oh!

Victor-Jung commented May 12, 2025

Uh oh!

runwangdl commented May 12, 2025

Uh oh!

Xeratec left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Xeratec left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Improved Memory Visualization and Multi-Layer Tiling Profiling #56

Improved Memory Visualization and Multi-Layer Tiling Profiling #56

Uh oh!

Conversation

Victor-Jung commented Mar 21, 2025 • edited by Xeratec Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Added

Changed

Fixed

PR Merge Checklist

Uh oh!

runwangdl commented Apr 24, 2025

Uh oh!

runwangdl commented May 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Victor-Jung commented May 12, 2025

Uh oh!

runwangdl commented May 12, 2025

Uh oh!

runwangdl commented May 12, 2025

Uh oh!

Victor-Jung commented May 12, 2025

Uh oh!

runwangdl commented May 12, 2025

Uh oh!

Xeratec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Xeratec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Victor-Jung commented Mar 21, 2025 •

edited by Xeratec

Loading

runwangdl commented May 11, 2025 •

edited

Loading