-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Add Gradient Cache&Recompute into Neural Search #3697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* `corpus_file`: 召回库数据 corpus_file | ||
* `use_recompute`: 使用Recompute策略,用于节省显存,是一种以时间换空间的技术 | ||
* `use_gradient_cache`: 使用Gradient Cache策略,用于节省显存,是一种以时间换空间的技术 | ||
* `chunk_numbers`: 使用Gradient Cache策略的参数,表示的是同一个批次的样本分几次执行 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议recompute和gradient_cache的这种分布式能力建议收口到Trainer里面,后续@ZHUI来加入
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经发给 @ZHUI
title_cls_embedding, | ||
transpose_y=True) | ||
|
||
# substract margin from all positive samples cosine_sim() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的注释注意统一,首字母大写
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改
# substract margin from all positive samples cosine_sim() | ||
margin_diag = paddle.full(shape=[query_cls_embedding.shape[0]], | ||
fill_value=self.margin, | ||
dtype=paddle.get_default_dtype()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的逻辑比较奇怪,为什么不拿一个变量来赋值,可以直接拿cosine_sim dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我试了一下,效果是一样的,已经改成cosine_sim dtype
all_grads): | ||
|
||
sub_query_input_ids, sub_query_token_type_ids, sub_title_input_ids, sub_title_token_type_ids = sub_batch | ||
paddle.framework.random.set_cuda_rng_state(CUDA_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里还原cuda的随机种子,是为了还原dropout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
由于Gradient Cache需要计算两次前向,设置随机种子是为了使dropout等随机种子一致,确保模型的中间状态一样,使得2次前向得到的结果一致。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
PR changes
Description