-
Notifications
You must be signed in to change notification settings - Fork 3.1k
[NPU] adaptation for LLaMA #7262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
for export npu model
Thanks for your contribution! |
yuanwei66 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
fix accuracy bugs
调整模型目录,放开benchma分支
适配静态多batch att mask
调整非benchmark输入
优化最大batch数,修改增量attention mask为向量
add hccl_buffsize control
优化内存,避免大batch attention mask的操作
优化显存,避免对attention mask tenso频繁操作,造成内存碎片化。并使用加速库算子,避免numpy运算慢问题
使能weight transpose功能,精度验证OK,配合PR:MyAngelAyase/PaddleCustomDevice#66
export NZ
Pad/UnPad 接入适配(需要重新导出模型)
Update modeling.py for embeding
update for master branch && position embeding speed up
This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动,被标记为stale。 |
This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动,被标记为stale。 |
This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动,被标记为stale。 |
PR types
PR changes
Description