Skip to content
Merged
Changes from 1 commit
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
64571f9
Add Cloud natural language pipeline colab notebook
Narwhalprime Dec 15, 2022
917d513
Add ready-to-go text classification pipeline colab notebook
Narwhalprime Dec 15, 2022
782e685
Ran reformatting scripts on text classification pipeline colab notebooks
Narwhalprime Dec 15, 2022
ebc7b62
Update CODEOWNERS files
Narwhalprime Dec 15, 2022
aa3f917
Fix order of cells in cloud_natural_language_pipeline.ipynb
Narwhalprime Dec 16, 2022
d596206
Remove unused variables via linter for text classification colabs; fi…
Narwhalprime Dec 16, 2022
a125a31
Minor fix: remove GCPC version requirement
Narwhalprime Dec 19, 2022
93eaf80
Minor fix: remove outputs
Narwhalprime Dec 19, 2022
ca14981
fix formatting with nbfmt
Narwhalprime Dec 19, 2022
e4bbe5b
move ready-to-go pipeline to notebooks/community
Narwhalprime Dec 22, 2022
a84d590
fix link
Narwhalprime Dec 22, 2022
a983fd6
update CODEOWNERS
Narwhalprime Dec 22, 2022
0b26ee3
move text classification colabs to notebooks/community/pipelines
Narwhalprime Dec 22, 2022
909144a
Address initial comments on NL notebook
Narwhalprime Jan 4, 2023
0cdb0f2
Remove commented lines in NL notebook
Narwhalprime Jan 4, 2023
05fba85
minor cell formatting
Narwhalprime Jan 4, 2023
ca99062
clear outputs
Narwhalprime Jan 4, 2023
9a18784
minor changes to NL notebook
Narwhalprime Jan 4, 2023
fe686a3
address comments for ready-to-go pipeline
Narwhalprime Jan 4, 2023
855e960
run linter locally
Narwhalprime Jan 4, 2023
fab4080
add pipeline description to NL pipeline
Narwhalprime Jan 4, 2023
74d8ce0
run linter locally (PR check could not lint)
Narwhalprime Jan 5, 2023
dae0c93
Add cell to examine metrics, update kernel restart cell from official…
Narwhalprime Jan 9, 2023
fc804d4
lint
Narwhalprime Jan 9, 2023
ede48b6
Update default fields and URLs in NL notebook
Narwhalprime Jan 23, 2023
be3ae6a
Fix URLs in ready to go notebook
Narwhalprime Jan 23, 2023
fa4632f
run linter
Narwhalprime Jan 23, 2023
47978b3
merge
Narwhalprime Jan 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Address initial comments on NL notebook
  • Loading branch information
Narwhalprime committed Jan 4, 2023
commit 909144a7a8b5d9cf9a48c2a21a8601c622d93993
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,11 @@
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github"
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/Narwhalprime/vertex-ai-samples/blob/main/notebooks/community/natural_language/cloud_natural_language_pipeline.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
"<a href=\"https://colab.research.google.com/github/Narwhalprime/vertex-ai-samples/blob/main/notebooks/community/pipelines/google_cloud_pipeline_components_cloud_natural_language_pipeline.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
Expand Down Expand Up @@ -38,15 +39,7 @@
"id": "BwO30Ag12YcB"
},
"source": [
"# Vertex Pipelines: Cloud Natural Language model training pipeline"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3Th4B98ze9ik"
},
"source": [
"# Vertex Pipelines: Cloud Natural Language model training pipeline\n",
"<table align=\"left\">\n",
"\n",
" <td>\n",
Expand All @@ -72,38 +65,55 @@
{
"cell_type": "markdown",
"metadata": {
"id": "31a0c126"
"id": "tvgnzT1CKxrO"
},
"source": [
"## Overview\n",
"This notebook shows how to use [Google Cloud Pipeline Components SDK](https://cloud.google.com/vertex-ai/docs/pipelines/components-introduction) and additional components in this directory to run a machine learning pipeline in [Vertex AI Pipelines](https://cloud.google.com/vertex-ai/docs/pipelines/introduction) to train a TensorFlow text classification model.\n",
"\n",
"In this pipeline, the model training Docker image utilizes [TFHub](https://tfhub.dev/) models to perform state-of-the-art text classification training. The image is pre-built and ready to use, so no additional Docker setup is required.\n",
"In this pipeline, the model training Docker image utilizes [TFHub](https://tfhub.dev/) models to perform state-of-the-art text classification training. The image is pre-built and ready to use, so no additional Docker setup is required."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "d975e698c9a4"
},
"source": [
"### Objective\n",
"\n",
"## Dataset\n",
"In this tutorial, you learn how to construct an end-to-end training pipeine within Vertex AI pipelines that ingests a dataset, trains a text classification model on it, and outputs evaluation metrics.\n",
"\n",
"This notebook requires that the user has two datasets exported from Vertex AI [managed datasets](https://cloud.google.com/vertex-ai/docs/training/using-managed-datasets): one with train and validation data splits, and the other with test data used for evaluation. Please ensure no data is shared between the two datasets (in particular, no evaluation data should be part of the train or validation splits). To export a Vertex dataset, please follow the following public docs:\n",
"* [Preparing data](https://cloud.google.com/vertex-ai/docs/text-data/classification/prepare-data)\n",
"* [Creating a Vertex dataset](https://cloud.google.com/vertex-ai/docs/text-data/classification/create-dataset) from the above data\n",
"* [Exporting dataset and its annotations](https://cloud.google.com/vertex-ai/docs/datasets/export-metadata-annotations); ensure the resulting export is located in a Google Cloud Storage (GCS) bucket you own. You may need to manually separate the test split data into its own file.\n",
"This tutorial uses the following Google Cloud ML services and resources:\n",
"\n",
"## Components\n",
"- Vertex AI Pipelines\n",
"- Vertex AI Datasets\n",
"\n",
"This pipeline is composed from the following components:\n",
"The steps performed include:\n",
"\n",
"- **train-tfhub-model** - Trains a new Tensorflow model using TFHub layers from pre-built Docker image\n",
"- **upload-tensorflow-model-to-google-cloud-vertex-ai** - Uploads resulting model to Vertex model registry\n",
"- **get-vertex-model** - Gets model that has just been uploaded as an artifact in pipeline\n",
"- **convert-dataset-export-for-batch-predict** - Preprocessing component that takes the test dataset exported from Vertex datasets and converts it to a simpler compatible one that is readable from the batch predict component\n",
"- **target-field-data-remover** - Removes the target field (i.e., label) in the test dataset for the downstream batch predict component\n",
"- **model-batch-predict** - Performs a batch prediction job\n",
"- **model-evaluation-classification** - Calculates the evaluation metrics from the above batch predict job and exports the metrics artifact\n"
"- Define Kubeflow pipeline components\n",
"- Setup Kubeflow pipeline\n",
"- Run pipeline on Vertex AI"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "costs"
"id": "08d289fa873f"
},
"source": [
"## Dataset\n",
"\n",
"This notebook requires that the user has two datasets exported from Vertex AI [managed datasets](https://cloud.google.com/vertex-ai/docs/training/using-managed-datasets): one with train and validation data splits, and the other with test data used for evaluation. Please ensure no data is shared between the two datasets (in particular, no evaluation data should be part of the train or validation splits). To export a Vertex AI dataset, please follow the following public docs:\n",
"* [Preparing data](https://cloud.google.com/vertex-ai/docs/text-data/classification/prepare-data)\n",
"* [Creating a Vertex AI dataset](https://cloud.google.com/vertex-ai/docs/text-data/classification/create-dataset) from the above data\n",
"* [Exporting dataset and its annotations](https://cloud.google.com/vertex-ai/docs/datasets/export-metadata-annotations); ensure the resulting export is located in a Google Cloud Storage (GCS) bucket you own. You may need to manually separate the test split data into its own file."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aed92deeb4a0"
},
"source": [
"## Costs\n",
Expand All @@ -120,6 +130,13 @@
"to generate a cost estimate based on your projected usage."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "31a0c126"
},
"source": []
},
{
"cell_type": "markdown",
"metadata": {
Expand Down Expand Up @@ -578,7 +595,17 @@
"id": "zAaMJKrhAe5L"
},
"source": [
"### Define components"
"## Define components\n",
"\n",
"This pipeline is composed from the following components:\n",
"\n",
"- **train-tfhub-model** - Trains a new Tensorflow model using TFHub layers from pre-built Docker image\n",
"- **upload-tensorflow-model-to-google-cloud-vertex-ai** - Uploads resulting model to Vertex AI model registry\n",
"- **get-vertex-model** - Gets model that has just been uploaded as an artifact in pipeline\n",
"- **convert-dataset-export-for-batch-predict** - Preprocessing component that takes the test dataset exported from Vertex datasets and converts it to a simpler compatible one that is readable from the batch predict component\n",
"- **target-field-data-remover** - Removes the target field (i.e., label) in the test dataset for the downstream batch predict component\n",
"- **model-batch-predict** - Performs a batch prediction job\n",
"- **model-evaluation-classification** - Calculates the evaluation metrics from the above batch predict job and exports the metrics artifact\n"
]
},
{
Expand Down Expand Up @@ -800,8 +827,31 @@
],
"metadata": {
"colab": {
"name": "cloud_natural_language_pipeline.ipynb",
"toc_visible": true
"toc_visible": true,
"provenance": [],
"collapsed_sections": [
"tvgnzT1CKxrO",
"d975e698c9a4",
"08d289fa873f",
"setup_local",
"568d5c16",
"B9IYalYObAbY",
"VA_kzAIIj2G_",
"set_service_account",
"a27d4cee",
"timestamp",
"bucket:mbsdk",
"f3a09765",
"89bb4a50",
"d33c87e4-2ada-4b87-bf75-064247f3162d",
"zAaMJKrhAe5L",
"TEnh9Pcx6Xfi",
"3211ba19",
"ax0jOxIaholy",
"TpV-iwP9qw9c",
"UMuyzrnZLoUa"
],
"include_colab_link": true
},
"kernelspec": {
"display_name": "Python 3",
Expand All @@ -810,4 +860,4 @@
},
"nbformat": 4,
"nbformat_minor": 0
}
}