Address initial comments on NL notebook

GoogleCloudPlatform · andrewferlitsch · Jan 23, 2023 · Dec 15, 2022 · Dec 15, 2022 · Dec 15, 2022
commit 909144a7a8b5d9cf9a48c2a21a8601c622d93993
@@ -3,10 +3,11 @@
     {
       "cell_type": "markdown",
       "metadata": {
-        "id": "view-in-github"
+        "id": "view-in-github",
+        "colab_type": "text"
       },
       "source": [
-        "<a href=\"https://colab.research.google.com/github/Narwhalprime/vertex-ai-samples/blob/main/notebooks/community/natural_language/cloud_natural_language_pipeline.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+        "<a href=\"https://colab.research.google.com/github/Narwhalprime/vertex-ai-samples/blob/main/notebooks/community/pipelines/google_cloud_pipeline_components_cloud_natural_language_pipeline.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
       ]
     },
     {
@@ -38,15 +39,7 @@
         "id": "BwO30Ag12YcB"
       },
       "source": [
-        "# Vertex Pipelines: Cloud Natural Language model training pipeline"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "3Th4B98ze9ik"
-      },
-      "source": [
+        "# Vertex Pipelines: Cloud Natural Language model training pipeline\n",
         "<table align=\"left\">\n",
         "\n",
         "  <td>\n",
@@ -72,38 +65,55 @@
     {
       "cell_type": "markdown",
       "metadata": {
-        "id": "31a0c126"
+        "id": "tvgnzT1CKxrO"
       },
       "source": [
         "## Overview\n",
         "This notebook shows how to use [Google Cloud Pipeline Components SDK](https://cloud.google.com/vertex-ai/docs/pipelines/components-introduction) and additional components in this directory to run a machine learning pipeline in [Vertex AI Pipelines](https://cloud.google.com/vertex-ai/docs/pipelines/introduction) to train a TensorFlow text classification model.\n",
         "\n",
-        "In this pipeline, the model training Docker image utilizes [TFHub](https://tfhub.dev/) models to perform state-of-the-art text classification training. The image is pre-built and ready to use, so no additional Docker setup is required.\n",
+        "In this pipeline, the model training Docker image utilizes [TFHub](https://tfhub.dev/) models to perform state-of-the-art text classification training. The image is pre-built and ready to use, so no additional Docker setup is required."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "d975e698c9a4"
+      },
+      "source": [
+        "### Objective\n",
         "\n",
-        "## Dataset\n",
+        "In this tutorial, you learn how to construct an end-to-end training pipeine within Vertex AI pipelines that ingests a dataset, trains a text classification model on it, and outputs evaluation metrics.\n",
         "\n",
-        "This notebook requires that the user has two datasets exported from Vertex AI [managed datasets](https://cloud.google.com/vertex-ai/docs/training/using-managed-datasets): one with train and validation data splits, and the other with test data used for evaluation. Please ensure no data is shared between the two datasets (in particular, no evaluation data should be part of the train or validation splits). To export a Vertex dataset, please follow the following public docs:\n",
-        "* [Preparing data](https://cloud.google.com/vertex-ai/docs/text-data/classification/prepare-data)\n",
-        "* [Creating a Vertex dataset](https://cloud.google.com/vertex-ai/docs/text-data/classification/create-dataset) from the above data\n",
-        "* [Exporting dataset and its annotations](https://cloud.google.com/vertex-ai/docs/datasets/export-metadata-annotations); ensure the resulting export is located in a Google Cloud Storage (GCS) bucket you own. You may need to manually separate the test split data into its own file.\n",
+        "This tutorial uses the following Google Cloud ML services and resources:\n",
         "\n",
-        "## Components\n",
+        "- Vertex AI Pipelines\n",
+        "- Vertex AI Datasets\n",
         "\n",
-        "This pipeline is composed from the following components:\n",
+        "The steps performed include:\n",
         "\n",
-        "- **train-tfhub-model** - Trains a new Tensorflow model using TFHub layers from pre-built Docker image\n",
-        "- **upload-tensorflow-model-to-google-cloud-vertex-ai** - Uploads resulting model to Vertex model registry\n",
-        "- **get-vertex-model** - Gets model that has just been uploaded as an artifact in pipeline\n",
-        "- **convert-dataset-export-for-batch-predict** - Preprocessing component that takes the test dataset exported from Vertex datasets and converts it to a simpler compatible one that is readable from the batch predict component\n",
-        "- **target-field-data-remover** - Removes the target field (i.e., label) in the test dataset for the downstream batch predict component\n",
-        "- **model-batch-predict** - Performs a batch prediction job\n",
-        "- **model-evaluation-classification** - Calculates the evaluation metrics from the above batch predict job and exports the metrics artifact\n"
+        "- Define Kubeflow pipeline components\n",
+        "- Setup Kubeflow pipeline\n",
+        "- Run pipeline on Vertex AI"
       ]
     },
     {
       "cell_type": "markdown",
       "metadata": {
-        "id": "costs"
+        "id": "08d289fa873f"
+      },
+      "source": [
+        "## Dataset\n",
+        "\n",
+        "This notebook requires that the user has two datasets exported from Vertex AI [managed datasets](https://cloud.google.com/vertex-ai/docs/training/using-managed-datasets): one with train and validation data splits, and the other with test data used for evaluation. Please ensure no data is shared between the two datasets (in particular, no evaluation data should be part of the train or validation splits). To export a Vertex AI dataset, please follow the following public docs:\n",
+        "* [Preparing data](https://cloud.google.com/vertex-ai/docs/text-data/classification/prepare-data)\n",
+        "* [Creating a Vertex AI dataset](https://cloud.google.com/vertex-ai/docs/text-data/classification/create-dataset) from the above data\n",
+        "* [Exporting dataset and its annotations](https://cloud.google.com/vertex-ai/docs/datasets/export-metadata-annotations); ensure the resulting export is located in a Google Cloud Storage (GCS) bucket you own. You may need to manually separate the test split data into its own file."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "aed92deeb4a0"
       },
       "source": [
         "## Costs\n",
@@ -120,6 +130,13 @@
         "to generate a cost estimate based on your projected usage."
       ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "31a0c126"
+      },
+      "source": []
+    },
     {
       "cell_type": "markdown",
       "metadata": {
@@ -578,7 +595,17 @@
         "id": "zAaMJKrhAe5L"
       },
       "source": [
-        "### Define components"
+        "## Define components\n",
+        "\n",
+        "This pipeline is composed from the following components:\n",
+        "\n",
+        "- **train-tfhub-model** - Trains a new Tensorflow model using TFHub layers from pre-built Docker image\n",
+        "- **upload-tensorflow-model-to-google-cloud-vertex-ai** - Uploads resulting model to Vertex AI model registry\n",
+        "- **get-vertex-model** - Gets model that has just been uploaded as an artifact in pipeline\n",
+        "- **convert-dataset-export-for-batch-predict** - Preprocessing component that takes the test dataset exported from Vertex datasets and converts it to a simpler compatible one that is readable from the batch predict component\n",
+        "- **target-field-data-remover** - Removes the target field (i.e., label) in the test dataset for the downstream batch predict component\n",
+        "- **model-batch-predict** - Performs a batch prediction job\n",
+        "- **model-evaluation-classification** - Calculates the evaluation metrics from the above batch predict job and exports the metrics artifact\n"
       ]
     },
     {
@@ -800,8 +827,31 @@
   ],
   "metadata": {
     "colab": {
-      "name": "cloud_natural_language_pipeline.ipynb",
-      "toc_visible": true
+      "toc_visible": true,
+      "provenance": [],
+      "collapsed_sections": [
+        "tvgnzT1CKxrO",
+        "d975e698c9a4",
+        "08d289fa873f",
+        "setup_local",
+        "568d5c16",
+        "B9IYalYObAbY",
+        "VA_kzAIIj2G_",
+        "set_service_account",
+        "a27d4cee",
+        "timestamp",
+        "bucket:mbsdk",
+        "f3a09765",
+        "89bb4a50",
+        "d33c87e4-2ada-4b87-bf75-064247f3162d",
+        "zAaMJKrhAe5L",
+        "TEnh9Pcx6Xfi",
+        "3211ba19",
+        "ax0jOxIaholy",
+        "TpV-iwP9qw9c",
+        "UMuyzrnZLoUa"
+      ],
+      "include_colab_link": true
     },
     "kernelspec": {
       "display_name": "Python 3",
@@ -810,4 +860,4 @@
   },
   "nbformat": 4,
   "nbformat_minor": 0
-}
+}