[LLM INFER] Append attn #9244

yuanlehome · 2024-10-11T07:33:05Z

PR types

New features

PR changes

Others

Description

大模型推理attention组网重构，新的append_attn方案相比旧方案有10%到90%的性能提升。

目前已支持了llama/qwen/qwen-moe/mixtral的推理。

使用方式，原推理脚本的 --block_attn选项改为--append_attn即可。

TODO：

fp8推理适配
性能数据补充，稍后见llm docs

…nto append_attn

Change-Id: Ibe8920ba41ea9775e676b05b12dc01cb9da95b5e

…nto append_attn

…tention

…nto append_attn

paddle-bot · 2024-10-11T07:33:10Z

Thanks for your contribution!

codecov · 2024-10-11T08:05:01Z

Codecov Report

Attention: Patch coverage is 0% with 60 lines in your changes missing coverage. Please review.

Project coverage is 52.74%. Comparing base (fe8b527) to head (84a6864).
Report is 264 commits behind head on develop.

Files with missing lines	Patch %	Lines
...erimental/transformers/fused_transformer_layers.py	0.00%	38 Missing ⚠️
...dlenlp/experimental/transformers/qwen2/modeling.py	0.00%	8 Missing ⚠️
...dlenlp/experimental/transformers/llama/modeling.py	0.00%	7 Missing ⚠️
...enlp/experimental/transformers/mixtral/modeling.py	0.00%	5 Missing ⚠️
...lp/experimental/transformers/qwen2_moe/modeling.py	0.00%	1 Missing ⚠️
paddlenlp/experimental/transformers/utils.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #9244   +/-   ##
========================================
  Coverage    52.73%   52.74%           
========================================
  Files          661      661           
  Lines       107422   107371   -51     
========================================
- Hits         56653    56630   -23     
+ Misses       50769    50741   -28

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

yuanlehome and others added 30 commits September 14, 2024 14:05

append_attention 0914

b072465

paddle::empty to phi::allocator

b915f95

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

9b1e1d8

…nto append_attn

append_attn 0919

140a509

0920 fix split_kv_block

5272b6f

my change for merge 4 to 1

a42157d

fix prev

bec8eef

merge zhenyun 0923

8dab056

fix prev

d5047b5

fix var name

006a467

update

73e2c06

fix config

a8acb2b

fix

ec46a89

fix append_attn

cb02ee5

Change-Id: Ibe8920ba41ea9775e676b05b12dc01cb9da95b5e

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

83a19a6

…nto append_attn

fix --use_fake_parameter

37fc7da

refine paddle::empty(), fix memory error, support multi_stream for at…

a3b265b

…tention

fix and rename attention as append_attention

68a09b6

rename file

2bcd939

fix

74941a0

encoder GQANEOX rope support

19a0bdb

decoder a8w8c8 GQANEOX rope support

a9078cb

merge get_block_shape and split_kv_block

f64f962

bf16 neox rope support

7ba73f8

fix diff

6837c23

separate compilation

0a5ae96

manual destroy stream

e9cfc55

fix multi stream

478c517

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

aa1e96a

…nto append_attn

qwen/llama support weightonly

e8ddfe8

yuanlehome added 2 commits October 11, 2024 12:52

refine code

2292780

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

036fb73

…nto append_attn

yuanlehome mentioned this pull request Oct 11, 2024

[LLM INFER] Append attn Moved to https://github.com/PaddlePaddle/PaddleNLP/pull/9244 #9242

Closed

decoder neox_rope_c4 support

b85782d

lizhenyun01 and others added 5 commits October 11, 2024 20:22

instantiation of append_attn with float16

9814578

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LLM INFER] Append attn #9244

[LLM INFER] Append attn #9244

Uh oh!

yuanlehome commented Oct 11, 2024 •

edited

Loading

Uh oh!

paddle-bot bot commented Oct 11, 2024

Uh oh!

codecov bot commented Oct 11, 2024 •

edited

Loading

Uh oh!

[LLM INFER] Append attn #9244

[LLM INFER] Append attn #9244

Uh oh!

Conversation

yuanlehome commented Oct 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Description

Uh oh!

paddle-bot bot commented Oct 11, 2024

Uh oh!

codecov bot commented Oct 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yuanlehome commented Oct 11, 2024 •

edited

Loading

codecov bot commented Oct 11, 2024 •

edited

Loading