Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR types
New features
PR changes
Others
Describe
增加CPU内存统计。
主机端和设备端通过不同接口进行区分:
主机端内存统计接口:
设备端显存统计接口:
目前主机端和设备端支持的统计类型(stat_type)均包括Allocated和Reserced,前者为Allocator中分配给Tensor持有的内存/显存,后者为Allocator向硬件实际申请的内存/显存(包括分配给Tensor的和框架底层缓存和管理的两部分),设备端支持多种异构设备,不限于GPU。
为何要通过不同接口名称进行区分,而不直接通过stat_type进行区分,如主机端信息stat_type="HostAllocated", 设备端信息stat_type="DeviceAllocated"?
因为需要对主机端和设备端这两种统计类型做不同的实现。
目前主要的驱动点来自编译性能方面的考虑,设备端对每张卡单独统计显存信息,最大支持卡数为16,因而会在编译期为每种设备端统计类型定义16个统计对象。这将在stats.h中带来大量的宏和模板展开,而stats.h又被大量文件引用,导致在编译时引入大量的符号,既影响编译速度,也会使得编译build目录体积大幅增长(增加GPU显存统计功能的PR #38657 使Coverage CI上build目录体积从136G增长到了140G)。
基于上述原因,此PR若复用之前的GPU统计接口,直接增加标识主机端的stat_type,将同样会导致编译build目录体积过大的增长。因此,选择通过接口名称而不是stat_type对主机端和设备端做区分,在主机端接口里,每种统计类型只需要支持1个设备,引入的符号数量可降低为1/16。