Update demo #72

Steffy-zxf · 2021-03-05T09:40:47Z

PR types

New features

PR changes

APIs

Describe

add ChnSentiCorp & LCQMC datasets to paddlenlp.experimental.datasets
adopt text_cls & text_matching example

…nto develop

…rp & LCQMC datasets

smallv0221

好像没看到新加的dataset

examples/text_classification/pretrained_models/train.py

smallv0221 · 2021-03-05T09:50:53Z

examples/text_matching/sentence_transformers/predict.py

        example(obj:`list[str]`): List of input data, containing query, title and label if it have label.
        tokenizer(obj:`PretrainedTokenizer`): This tokenizer inherits from :class:`~paddlenlp.transformers.PretrainedTokenizer` 
            which contains most of the methods. Users should refer to the superclass for more information regarding methods.
        label_list(obj:`list[str]`): All the labels that the data has.


这个是不是不需要了

examples/text_classification/pretrained_models/predict.py

smallv0221 · 2021-03-08T07:32:19Z

examples/text_classification/pretrained_models/train.py

+    encoded_inputs = tokenizer(
+        text=example["text"],
+        max_seq_len=max_seq_length,
+        pad_to_max_seq_len=True)


同上回复

examples/text_matching/sentence_transformers/predict.py

paddlenlp/datasets/experimental/chnsenticorp.py

smallv0221 · 2021-03-08T07:43:50Z

examples/text_matching/simnet/utils.py


-    query, title = example[0], example[1]
+    query, title = example["query"], example["title"]
    query_ids = np.array(tokenizer.encode(query), dtype="int64")


统一改成__call__()方法吧

这个是JiebaTokenizer，不是PretrainedTokenizer，所以不需要更改。

paddlenlp/datasets/experimental/lcqmc.py

ZeyuChen

Need to discuss about the scenario of data.Pad API and Tokenizer pad_to_max_seq

examples/text_classification/pretrained_models/predict.py

ZeyuChen · 2021-03-08T22:54:18Z

examples/text_classification/rnn/train.py

-    train_ds, dev_ds, test_ds = ChnSentiCorp.get_datasets(
-        ['train', 'dev', 'test'])
+    train_ds, dev_ds, test_ds = load_dataset(
+        "chnsenticorp", splits=["train", "dev", "test"], lazy=False)


Lazy=False是不是默认选项? @smallv0221 我们是否要求只有Iterable场景下才需要lazy=True?

是的是默认选项。只有Iterable场景下才需要lazy=True。

load_dataset() lazy参数默认为None
https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/datasets/experimental/dataset.py#L59

paddlenlp/datasets/experimental/chnsenticorp.py

…nto update-demo

2. remove dataset field_indices, num_discard_samples

smallv0221

LGTM @ZeyuChen

ZeyuChen

LGTM

Steffy-zxf added 20 commits March 2, 2021 16:05

add LIC 2021 Information Extraction Baseline

8aaeb46

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

233166c

…nto develop

update Copy License

f675ab7

add sentence-level event extraction script

c36a7fe

add README

68cb694

update event extraction readme

8f58ee0

add event extraction images

ed1cd54

upadte docs

354bdaa

add event extraction image

da1aa69

fix image setting

2238d06

update docs

fe0c05e

rename knowledge extraction to relation extraction

29bfd65

remove the redundant license

d0671a0

fix license and remove unuseless flags setting

2c2eb17

fix sentence transformer train.py warmup arguement error

56ade3b

update docs

5ff52d1

add aistudio tutorail

238095b

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

8e8f7cd

…nto develop

1. adopt paddlenlp.experimental.dataset & tokenizer 2. add ChnSentiCo…

a588da4

…rp & LCQMC datasets

remove useless changes

e2522b5

smallv0221 reviewed Mar 5, 2021

View reviewed changes

add paddlenlp.datasets.experimental.chnsenticorp/lcqmc

3e729f0

ZeyuChen assigned smallv0221 Mar 6, 2021

update doc_string

fa93cc1

smallv0221 reviewed Mar 8, 2021

View reviewed changes

ZeyuChen reviewed Mar 8, 2021

View reviewed changes

Steffy-zxf added 3 commits March 9, 2021 11:00

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

a868e15

…nto update-demo

1. update text classifion & text matching demo docs

4a0b4e7

2. remove dataset field_indices, num_discard_samples

remove tokenizer pad_to_max_seq_len parameter

f751a94

smallv0221 approved these changes Mar 9, 2021

View reviewed changes

Merge branch 'develop' into update-demo

0c9681e

ZeyuChen approved these changes Mar 9, 2021

View reviewed changes

ZeyuChen merged commit 102ddf3 into PaddlePaddle:develop Mar 9, 2021

Update demo #72

Update demo #72

Uh oh!

Conversation

Steffy-zxf commented Mar 5, 2021

PR types

PR changes

Describe

Uh oh!

smallv0221 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ZeyuChen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

smallv0221 left a comment

Choose a reason for hiding this comment

Uh oh!

ZeyuChen left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants