-
Notifications
You must be signed in to change notification settings - Fork 3.1k
[LLM INFER] Append attn #9244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
[LLM INFER] Append attn #9244
Changes from 1 commit
Commits
Show all changes
51 commits
Select commit
Hold shift + click to select a range
b072465
append_attention 0914
yuanlehome b915f95
paddle::empty to phi::allocator
yuanlehome 9b1e1d8
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…
yuanlehome 140a509
append_attn 0919
yuanlehome 5272b6f
0920 fix split_kv_block
yuanlehome a42157d
my change for merge 4 to 1
yuanlehome bec8eef
fix prev
yuanlehome 8dab056
merge zhenyun 0923
yuanlehome d5047b5
fix prev
yuanlehome 006a467
fix var name
yuanlehome 73e2c06
update
yuanlehome a8acb2b
fix config
yuanlehome ec46a89
fix
yuanlehome cb02ee5
fix append_attn
lizhenyun01 83a19a6
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…
yuanlehome 37fc7da
fix --use_fake_parameter
yuanlehome a3b265b
refine paddle::empty(), fix memory error, support multi_stream for at…
yuanlehome 68a09b6
fix and rename attention as append_attention
yuanlehome 2bcd939
rename file
yuanlehome 74941a0
fix
yuanlehome 19a0bdb
encoder GQANEOX rope support
lizhenyun01 a9078cb
decoder a8w8c8 GQANEOX rope support
lizhenyun01 f64f962
merge get_block_shape and split_kv_block
yuanlehome 7ba73f8
bf16 neox rope support
lizhenyun01 6837c23
fix diff
lizhenyun01 0a5ae96
separate compilation
lizhenyun01 e9cfc55
manual destroy stream
lizhenyun01 478c517
fix multi stream
yuanlehome aa1e96a
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…
yuanlehome e8ddfe8
qwen/llama support weightonly
yuanlehome 8798938
fix multi stream
yuanlehome f6a64d0
qwen-moe and mixtral support append_attn
yuanlehome 2292780
refine code
yuanlehome 036fb73
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…
yuanlehome b85782d
decoder neox_rope_c4 support
lizhenyun01 9814578
instantiation of append_attn with float16
lizhenyun01 7a1f591
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…
yuanlehome 5c126ad
optimize cpu performance
yuanlehome 2ef7c11
format code
yuanlehome 4a4a4b4
c16/c8/c4 分离编译 加快编译速度
yuanlehome 0e35a1e
fix bug
yuanlehome c5b4633
gqa_group_size -> kv_num_heads
yuanlehome ea8c07e
support speculate_attn
lizhenyun01 3789175
adjust network
yuanlehome 6eacbca
cache_int4 -> cache_int4_zp
yuanlehome 358115d
fix use_fake_parameter multi cards
yuanlehome 30ac44c
fix speculate_decoder
lizhenyun01 4011d89
delete comment
lizhenyun01 7efff99
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…
yuanlehome c30c112
Merge branch 'append_attn' of https://github.com/yuanlehome/PaddleNLP…
yuanlehome 84a6864
fix ci
yuanlehome File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…
…nto append_attn
- Loading branch information
commit 83a19a6465bfb774ff3c3c57febc5f53b2281462
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
You are viewing a condensed version of this merge commit. You can view the full changes here.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么 append_attn 下 可以不同 fp32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为kernel实现里是要求half精度的,访存 不同