Skip to content

Conversation

w5688414
Copy link
Contributor

@w5688414 w5688414 commented Nov 7, 2022

PR types

  • New features

PR changes

  • APIs

Description

  • Add Gradient Cache&Recompute into Neural Search

@w5688414 w5688414 requested a review from wawltor November 8, 2022 02:21
@w5688414 w5688414 self-assigned this Nov 8, 2022
* `corpus_file`: 召回库数据 corpus_file
* `use_recompute`: 使用Recompute策略,用于节省显存,是一种以时间换空间的技术
* `use_gradient_cache`: 使用Gradient Cache策略,用于节省显存,是一种以时间换空间的技术
* `chunk_numbers`: 使用Gradient Cache策略的参数,表示的是同一个批次的样本分几次执行
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议recompute和gradient_cache的这种分布式能力建议收口到Trainer里面,后续@ZHUI来加入

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经发给 @ZHUI

title_cls_embedding,
transpose_y=True)

# substract margin from all positive samples cosine_sim()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的注释注意统一,首字母大写

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

# substract margin from all positive samples cosine_sim()
margin_diag = paddle.full(shape=[query_cls_embedding.shape[0]],
fill_value=self.margin,
dtype=paddle.get_default_dtype())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的逻辑比较奇怪,为什么不拿一个变量来赋值,可以直接拿cosine_sim dtype

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我试了一下,效果是一样的,已经改成cosine_sim dtype

all_grads):

sub_query_input_ids, sub_query_token_type_ids, sub_title_input_ids, sub_title_token_type_ids = sub_batch
paddle.framework.random.set_cuda_rng_state(CUDA_state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里还原cuda的随机种子,是为了还原dropout?

Copy link
Contributor Author

@w5688414 w5688414 Nov 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

由于Gradient Cache需要计算两次前向,设置随机种子是为了使dropout等随机种子一致,确保模型的中间状态一样,使得2次前向得到的结果一致。

Copy link
Contributor

@wawltor wawltor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@w5688414 w5688414 merged commit 39c4f76 into PaddlePaddle:develop Nov 16, 2022
@w5688414 w5688414 deleted the pip38 branch October 13, 2023 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants