Skip to content

wa008/Triton-fused-kernel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Triton-fused-kernel

list fused kernels in transformer written by Triton

Attention: I only test the correctness and speed on core kernel but on whole class, becuase there's some unsloved issued.

  • Why error is significantly larger in default mode than INTERPRET mode, issue

Fast cross entropy loss

Full detail

Performance: improve 7% than torch kernel

Difference beween black line and red line is change the block size of GPU kernel

Fused two layer feed forward network

like this part in attention

TODO

  • ffn2: working
  • ffn2 + residual + norm
  • linear + softmax

About

fused triton kernel

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages